20
© 2013 Royal Statistical Society 0035–9254/14/63559 Appl. Statist. (2014) 63, Part 4, pp. 559–578 Two-stage adaptive randomization for delayed response in clinical trials Jiajing Xu and Guosheng Yin University of Hong Kong, Hong Kong [Received May 2012. Final revision August 2013] Summary. Despite the widespread use of equal randomization in clinical trials, response- adaptive randomization has attracted considerable interest. There is typically a prerun of equal randomization before the implementation of response-adaptive randomization, although it is often not clear how many subjects are needed in this prephase, and in practice the number of patients in the equal randomization stage is often arbitrary. Another concern that is associated with realtime response-adaptive randomization is that trial conduct often requires patients’ re- sponses to be immediately available after the treatment, whereas clinical responses may take a relatively long period of time to exhibit.To resolve these two issues, we propose a two-stage procedure to achieve a balance between power and response, which is equipped with a likeli- hood ratio test before skewing the allocation probability towards a better treatment. Furthermore, we develop a non-parametric fractional model and a parametric survival design with an optimal allocation scheme to tackle the common problem caused by delayed response. We evaluate the operating characteristics of the two-stage designs through extensive simulation studies and illustrate them with a human immunodeficiency virus clinical trial. Numerical results show that the methods proposed satisfactorily resolve the arbitrary size of the equal randomization phase and the delayed response problem in response-adaptive randomization. Keywords: Censoring; Delayed response; Exponential distribution; Kaplan–Meier estimator; Likelihood ratio test; Redistribution to the right; Response-adaptive randomization 1. Introduction During recent decades, various response-based adaptive randomization (AR) methods have been developed with a primary goal of treating more patients in superior treatment arms. The fundamental idea underlying AR is to assign more patients to better treatments by skewing the allocation probability on the basis of the information that is accumulated in the trial. Pio- neering work, including Thompson (1933), Robbins (1952), Feldman (1962) and Zelen (1969), has demonstrated the value of AR so that patients would be treated more effectively in the trial. Rosenberger and Lachin (2002) outlined two major classes of AR procedures. One is design- driven AR, in which the allocation rule is built on an intuitive perception. For example, the urn model is a commonly used procedure in this class, which originated from the play-the-winner rule (Zelen, 1969). The play-the-winner rule determines each treatment assignment on the basis of the outcome of the previous patient. If a success response is observed for the previous patient, the next patient will be assigned to the same treatment; otherwise the treatment is switched to the alternative treatment. To avoid the deterministic treatment allocation, Wei and Durham (1978) Address for correspondence: Guosheng Yin, Department of Statistics and Actuarial Science, University of Hong Kong, Pokfulam Road, Hong Kong. E-mail: [email protected]

Two-stage adaptive randomization for delayed response in ...gyin/materials/2014XuYinJRSSC.pdf · Despite the widespread use of equal randomization in clinical trials, response- adaptive

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Two-stage adaptive randomization for delayed response in ...gyin/materials/2014XuYinJRSSC.pdf · Despite the widespread use of equal randomization in clinical trials, response- adaptive

© 2013 Royal Statistical Society 0035–9254/14/63559

Appl. Statist. (2014)63, Part 4, pp. 559–578

Two-stage adaptive randomization for delayedresponse in clinical trials

Jiajing Xu and Guosheng Yin

University of Hong Kong, Hong Kong

[Received May 2012. Final revision August 2013]

Summary. Despite the widespread use of equal randomization in clinical trials, response-adaptive randomization has attracted considerable interest. There is typically a prerun of equalrandomization before the implementation of response-adaptive randomization, although it isoften not clear how many subjects are needed in this prephase, and in practice the number ofpatients in the equal randomization stage is often arbitrary. Another concern that is associatedwith realtime response-adaptive randomization is that trial conduct often requires patients’ re-sponses to be immediately available after the treatment, whereas clinical responses may takea relatively long period of time to exhibit. To resolve these two issues, we propose a two-stageprocedure to achieve a balance between power and response, which is equipped with a likeli-hood ratio test before skewing the allocation probability towards a better treatment.Furthermore,we develop a non-parametric fractional model and a parametric survival design with an optimalallocation scheme to tackle the common problem caused by delayed response. We evaluatethe operating characteristics of the two-stage designs through extensive simulation studies andillustrate them with a human immunodeficiency virus clinical trial. Numerical results show thatthe methods proposed satisfactorily resolve the arbitrary size of the equal randomization phaseand the delayed response problem in response-adaptive randomization.

Keywords: Censoring; Delayed response; Exponential distribution; Kaplan–Meier estimator;Likelihood ratio test; Redistribution to the right; Response-adaptive randomization

1. Introduction

During recent decades, various response-based adaptive randomization (AR) methods havebeen developed with a primary goal of treating more patients in superior treatment arms. Thefundamental idea underlying AR is to assign more patients to better treatments by skewingthe allocation probability on the basis of the information that is accumulated in the trial. Pio-neering work, including Thompson (1933), Robbins (1952), Feldman (1962) and Zelen (1969),has demonstrated the value of AR so that patients would be treated more effectively in thetrial.

Rosenberger and Lachin (2002) outlined two major classes of AR procedures. One is design-driven AR, in which the allocation rule is built on an intuitive perception. For example, the urnmodel is a commonly used procedure in this class, which originated from the play-the-winnerrule (Zelen, 1969). The play-the-winner rule determines each treatment assignment on the basisof the outcome of the previous patient. If a success response is observed for the previous patient,the next patient will be assigned to the same treatment; otherwise the treatment is switched to thealternative treatment. To avoid the deterministic treatment allocation, Wei and Durham (1978)

Address for correspondence: Guosheng Yin, Department of Statistics and Actuarial Science, University of HongKong, Pokfulam Road, Hong Kong.E-mail: [email protected]

Page 2: Two-stage adaptive randomization for delayed response in ...gyin/materials/2014XuYinJRSSC.pdf · Despite the widespread use of equal randomization in clinical trials, response- adaptive

560 J. Xu and G.Yin

proposed a randomized play-the-winner rule, which gives a higher randomization probabilityto the treatment that has produced a success response. The randomized play-the-winner rulecan be represented as an urn model, where patients are assigned to a treatment corresponding toa randomly selected ball from the urn, while putting the same type of balls back into the urn toreward success. The urn model can also be used to characterize the drop-the-loser rule (Ivanovaet al., 2000), which takes out balls when treatment failure is observed. The other class of ARmethods is target driven, which is established on an optimization criterion. Target-driven ARwould lead to an optimal allocation ratio by optimizing a criterion function. For example, wecan calculate the optimal allocation ratio by minimizing the variance (equivalently, to maximizepower), or by minimizing the expected number of non-responders in a trial. More specifically,in a two-arm trial with binary end points, let p1 and p2 denote the response rates of treatments1 and 2 respectively. By minimizing the variance of the difference between the estimates of p1and p2, the allocation ratio between arm 1 and arm 2 is

√{p1.1−p1/}=√{p2.1−p2/}, which is

known as Neyman’s allocation (Yin, 2012). In contrsat, by minimizing the expected number ofnon-responders while fixing the variance, the allocation ratio becomes

√p1=

√p2 (Rosenberger

et al., 2001). For continuous data, let μ1 and μ2 denote the means of two normal distributions,and let σ2

1 and σ22 denote the corresponding variances. In this case, Neyman’s allocation ratio is

σ1=σ2, which minimizes the variance. For the case where a smaller response is preferred, Zhangand Rosenberger (2005) proposed an optimal allocation ratio of σ1

√μ2=.σ2

√μ1/ by minimizing

the total expected response from all patients. Under the same set-up, Biswas and Mandal (2004)considered the use of a probit transformation and minimizing

n1 Φ(

μ1 − c

σ1

)+n2 Φ

(μ2 − c

σ2

),

where Φ.·/ is the cumulative distribution function of the standard normal distribution, and c

is a threshold (a response larger than c is considered undesirable). This results in an allocationratio of

σ1√

Φ{.μ2 − c/=σ2}σ2

√Φ{.μ1 − c/=σ1}

:

By taking both the optimal target and the available allocation result into consideration, thedoubly biased coin design modifies the optimal AR procedure through a family of two-variablefunctions, which maps the target and the currently observed composition into the allocationratio for new patients (Eisele, 1994; Eisele and Woodroofe, 1995; Hu and Zhang, 2004). Sincethe optimal allocation procedure involves unknown design parameters, the allocation ratio isobtained by substituting the unknown parameters with the current parameter estimates at therandomization time, which is known as the sequential maximum likelihood procedure. Berryand Eick (1995) compared four different AR procedures including a two-arm bandit problemand a robust Bayesian method. Melfi et al. (2001) developed an optimal proportion for patientallocation and its asymptotic properties. Karrison et al. (2003) studied group sequential methodsin response-adaptive randomized clinical trials. Cheng and Berry (2007) investigated constrainedoptimal adaptive randomization designs by using backward induction. For a systematic reviewand theory on AR methods, see Hu and Rosenberger (2006).

Although the AR procedures can generally allocate more patients to better treatments, someconcerns cast doubt on their practical implementation. One major issue is the instability of theestimators at the beginning of a trial due to sparse information. Although adding a prephaseof equal randomization (ER) may help to facilitate the initiation of trial conduct, it is oftennot clear how many subjects are needed in this ER stage. The size of ER is limited by the totalsample size and also depends on the response rates of treatments in comparison. Intuitively

Page 3: Two-stage adaptive randomization for delayed response in ...gyin/materials/2014XuYinJRSSC.pdf · Despite the widespread use of equal randomization in clinical trials, response- adaptive

Adaptive Randomization in Clinical Trials 561

speaking, we may start AR as soon as the estimators become stable after a sufficient numberof patients have been enrolled. Moreover, if the treatment difference is large, we hope to startAR sooner by allocating a smaller proportion of patients in the ER stage. Unfortunately, inpractice there is no standard guideline on how to find an appropriate switching point from ERto AR.

The other concern about AR is caused by the potential response delay, which may hamperthe realtime implementation of AR. Most of the existent AR schemes require that the end pointbe observed quickly after the treatment, such that, by the time for the next treatment assign-ment, all the currently treated patients have complete information on their response outcomes.Nevertheless, delayed responses are commonly encountered in clinical trials. For instance, in aphase II oncology trial, the possible primary end points include more than 50% shrinkage ofa solid tumour compared with the baseline measurement, and partial or complete response (a30% decrease in the sum of the longest diameter of target lesions or disappearance of all targetlesions). These end points often exhibit long after the treatments have been administered, whichmight not be available at the time of randomization. If such delayed information is not takeninto consideration, the estimated response rate could be biased, which then leads to incorrectcalculation of the randomization probability for AR.

Fig. 1 shows a common scenario of delayed response, in which patients are expected torespond in a .0, τ / window, instead of immediately after treatment. The accrual rate is onecohort every a weeks .τ =6a/, and the cohort size is 4. Under this set-up, when a new cohort ofpatients is ready to start treatment, some of the currently treated patients still have not completedassessment and thus their response outcomes are not available. Such inevitable missing responsedata would leave the trial conduct in a dilemma: the allocation of the next cohort of patients issolely based on the observed efficacy outcomes, or the accrual of patients is suspended till the full

0 a 2a 3a 4a 5a 6a

Cohort 1

Cohort 2

Assessment period τTime

Fig. 1. Illustration of censored outcomes with τ D 6a (every a weeks, a cohort of four patients enters thetrial): , , follow-up; �, event; �, censoring

Page 4: Two-stage adaptive randomization for delayed response in ...gyin/materials/2014XuYinJRSSC.pdf · Despite the widespread use of equal randomization in clinical trials, response- adaptive

562 J. Xu and G.Yin

follow-up for all patients who are currently being treated in the trial. However, neither of thesetwo strategies appears to be a sensible solution.

For delayed response, it is natural to consider the survival end point, i.e. the time to the eventof interest in a trial. For studies involving survival data, Flehinger and Louis (1971) consideredexponential survival end points in non-randomized response adaptive designs, in which theyincorporated a sequential likelihood ratio test in the allocation scheme. Louis (1977) discussedsequential allocation when comparing two exponential survival curves. Yao and Wei (1996)suggested a multistage design for failure time data based on the play-the-winner rule. Hardwicket al. (2006) proposed an optimal algorithm for a two-arm bandit model by assuming an ex-ponential distribution for the delayed response time. Zhang and Rosenberger (2007) proposeda parametric AR procedure under exponential and Weibull distributions by minimizing thenumber of non-responders. Bandyopadhyay et al. (2009) studied a Bayesian two-stage designfor AR under the exponential distribution, in which the sample size is re-estimated at the end ofstage 1 to achieve certain design requirements. More recently, Sverdlov et al. (2011) developedthree optimal allocation schemes in multiarm survival trials for estimation, hypothesis testingand treatment comparison.

Our research is motivated by a randomized clinical trial (known as the P1060 study) compar-ing nevirapine with ritonavir-boosted lopinavir, in combination with the standard treatmentszidovudine and lamivudine, in children aged 2–36 months with human immunodeficiency virus(HIV) (Violari et al., 2012). To examine rigorously the performance of the two combinationsof drugs, the P1060 trial was conducted in six African countries and India. Before this study,nevirapine was almost the only option for HIV-infected infants, which raised the chance ofnevirapine resistance after the mothers’ exposure to the drug (for preventing the maternal-to-child HIV transmission). In the study of maternal-to-child HIV transmission, one of the mostimportant trials conducted by the AIDS Clinical Trial Group (ACTG) is the ACTG 076 trial,which examined whether zidovudine could reduce HIV transmission from infected mothers totheir infants (Connor et al., 1994). The original trial design was based on the fixed balancedrandomization, which assigned about a half of the women to the placebo group. This ER de-sign unfortunately resulted in a large number of HIV positive babies, who otherwise couldhave been saved if their mothers had been assigned to the zidovudine group. As a result, ithas been recommended that the interim results in the ACTG 076 trial should have been usedfor adaptive randomization to reduce the number of patients in the placebo group (Yao andWei, 1996). Similarly, for the P1060 trial, we explored the possibility of using the optimal ARmethods with a goal of reducing the number of treatment failures in the trial. The primary endpoint of the P1060 trial was virologic failure or discontinuation of treatment by study week 24.However, with such a long evaluation window, missing or censored data might arise if new par-ticipants entered the trial before the completion of the 24-week follow-up for currently treatedpatients.

To study the influence of delayed response, Rosenberger and Seshaiyer (1997) considered urnmodels through simulation studies. Hu et al. (2008) provided the operating characteristics of thedoubly biased coin design by discarding the delayed responses, and they showed that ignoringthe delayed outcomes still results in more patients randomized to the better performing treat-ment. To utilize the available data in a trial fully, we develop an extension of the response-basedAR design by naturally incorporating censored data. We model the underlying time to responsethrough the Kaplan–Meier estimator by redistributing the mass of each censored observationto the right (Kaplan and Meier, 1958; Efron, 1967). Within the evaluation period .0, τ /, if apatient has responded, then the efficacy outcome Y takes a value of 1; if a patient has not res-ponded by the time of decision making for randomization, we observe a censored outcome. By

Page 5: Two-stage adaptive randomization for delayed response in ...gyin/materials/2014XuYinJRSSC.pdf · Despite the widespread use of equal randomization in clinical trials, response- adaptive

Adaptive Randomization in Clinical Trials 563

redistributing the mass of each censored observation to the right, we obtain a fraction of 1 asthe contribution of the censored observation, which depends on the patient actual follow-uptime. In addition, before the initiation of AR, we implement an ER stage to allocate patientswith equal probabilities and at the same time continuously update a likelihood ratio test statis-tic. Once the test statistic exceeds a certain threshold, we switch patient allocation from ER toAR. From then on, all the remaining patients in the enrolment are allocated through the ARprocedure. Therefore, if there is indeed a superior treatment, we can protect the statistical poweras well as adaptively randomizing more patients to the better treatment. We also consider mod-elling the time to the event of interest directly and propose a new optimal allocation scheme.In contrast with that developed by Zhang and Rosenberger (2007), the optimal allocation pro-posed enjoys the invariance property, i.e. the optimal allocation ratio is invariant to parametertransformations, which turns out to be an important property; otherwise, the allocation couldbe completely opposite to the desired target.

The remainder of the paper is organized as follows. In Section 2, we introduce a new adaptiveallocation scheme based on the likelihood ratio test to determine the sample size of the ER stage,which takes a trade-off between power and AR. In Section 3, we propose the non-parametricfractional scheme to address the issue of delayed response. In Section 4, we study the parametricmodel with the survival end point and derive a new optimal allocation ratio for AR. In Section5, we present simulation studies to examine the operating characteristics of the new designs, aswell as conducting the sensitivity analysis to explore various parametric modelling structures.We also illustrate the proposed designs with a paediatric HIV clinical trial. We conclude with abrief discussion in Section 6.

The program that was used to analyse the data can be obtained from

http://wileyonlinelibrary.com/journal/rss-datasets

2. Two-stage response-adaptive randomization

In a typical clinical trial for treatment comparison, the scientific goal is to identify the super-ior treatment as soon as possible, whereas the ethical concern calls for treating every patienteffectively. Often, ER provides the most efficient way to achieve the desired statistical power(Azriel et al., 2012). However, response-based AR may help to treat each patient in the trialmore effectively. In fact, before the implementation of AR, a prerun of ER is typically used tostabilize the parameter estimates. However, it is not clear how long this prerun of ER should beand, in general, the chosen prerun sample size is arbitrary without any statistical justification.To gain more insight into this issue, we propose a two-stage response-based AR scheme.

For ease of exposition, we consider a two-arm clinical trial with binary end points. Patientsenter the trial sequentially over time, and each patient is assigned to one of the two treatmentson arrival. Let Y1i denote the outcome of patient i in treatment arm 1, so Y1i ∼ Bernoulli.p1/,for i = 1, : : : , n1; and similarly Y2i ∼ Bernoulli.p2/, for i = 1, : : : , n2, where p1 and p2 are thecorresponding response rates of the two treatments. The typical null and alternative hypothesesare formulated as

H0 : p1 =p2 versus H1 : p1 �=p2:

The trial starts with ER and continuously makes decisions on whether to switch to AR asmore data are collected. Suppose that we have enrolled m patients in each arm thus far, and wedenote the observed data in arm 1 and arm 2 as y1i and y2i, i= 1, : : : , m. The likelihood ratiotest statistic can be written as

Page 6: Two-stage adaptive randomization for delayed response in ...gyin/materials/2014XuYinJRSSC.pdf · Despite the widespread use of equal randomization in clinical trials, response- adaptive

564 J. Xu and G.Yin

Tm =−2 log

⎧⎪⎨⎪⎩

maxH0:p1=p2=p

pΣmi=1.y1i+y2i/.1−p/Σ

mi=1.2−y1i−y2i/

maxp1,p2

pΣm

i=1y1i

1 .1−p1/Σmi=1.1−y1i/p

Σmi=1y2i

2 .1−p2/Σmi=1.1−y2i/

⎫⎪⎬⎪⎭: .1/

Under the null hypothesis, the likelihood ratio test statistic follows a χ2-distribution with 1degree of freedom, i.e. Tm ∼χ2

.1/. In practice, we first obtain the maximum likelihood estimatorsof the response rates of the two treatments under the null and alternative hypotheses. We thencompute T m by plugging the maximum likelihood estimators of p1 and p2 into equation (1), andthe rejection region is defined as T m >χ2

.1/.1− α/, where α is the prespecified level of significanceand χ2

.1/.1− α/ is the 100.1− α/th percentile of the χ2.1/-distribution.

The two-stage AR trial proceeds as follows.

(a) In stage 1, the trial begins with ER and continuously updates the likelihood ratio teststatistic after enrolling every new patient. If T m <χ2

.1/.1− α/, ER remains; otherwise, thetrial proceeds to stage 2.

(b) In stage 2, we start to implement response-based AR for each patient on the basis ofan optimal allocation ratio, e.g. using

√p1=

√p2 as the allocation ratio to minimize the

number of non-responders, where p1 and p2 are replaced by their respective maximumlikelihood estimators.

The asymptotic χ2-distribution of the likelihood ratio statistic Tm may not be accurate when thesample size is small or the data are unequally distributed among treatments. At the beginning ofa trial with sparse data, Fisher’s exact test could be more accurate as it is the exact likelihood ratiotest (Cox and Hinkley, 1974). The choice of α is somewhat subjective, which can be interpretedas a threshold level for switching from ER to AR. Via sensitivity analysis on α in Section 5.2, weshow that the value of α affects the sample size in the ER stage, nE. If the treatment differenceis large, nE would be small so the trial moves to AR quickly; and, if the treatment difference issmall, nE would be large as ER and AR are not much different so it would take a longer timebefore switching to AR. Therefore, by controlling α, the two-stage design can automaticallyadapt to the real situation, and thus it reduces a certain amount of arbitrariness. In contrast, ifwe fix the sample size nE in the prephase, it would not be adjustable to the treatment difference.

3. Non-parametric fractional model for delayed response

Patient response may not be immediately ascertainable after treatment; for example, it oftentakes a relatively long period of time to observe 50% tumour shrinkage. The missingness orcensoring of responses poses immense difficulties when applying response-based AR during thetrial conduct. One possibility is simply to discard the missing response data and to computethe patient allocation ratio solely on the basis of the observed data. However, this strategy isnot efficient and often leads to biased parameter estimation. Instead, if we view the efficacyend point as an event of interest, we can model the time to efficacy by using the Kaplan–Meierestimator of the survival function and fractionalize the censored observations on the basis ofpatients’ exposure times in the trial. More specifically, we follow each patient from the time oftrial participation to τ , where τ is a prespecified assessment period. If a drug-related efficacyevent occurs, it is expected to occur within the observation window [0, τ ].

Y ={

0 if the subject does not respond within [0, τ ],1 if the subject responded within [0, τ ].

.2/

Page 7: Two-stage adaptive randomization for delayed response in ...gyin/materials/2014XuYinJRSSC.pdf · Despite the widespread use of equal randomization in clinical trials, response- adaptive

Adaptive Randomization in Clinical Trials 565

Therefore, if we denote the survival function of the response time T as S.t/ = Pr.T > t/, thenY ∼Bernoulli{1−S.τ /}. Fig. 1 shows that if the efficacy assessment period τ is longer than theaccrual window a, say τ = 6a, by the time that a new cohort is ready for treatment, some ofthe patients in the trial may be partially followed and their efficacy outcomes have not yet beenobserved. This would cause difficulty in calculating the allocation ratio, which determines theprobability of assigning the next patient to each arm.

Without loss of generality, we consider patient i in treatment arm 1, and the results for arm2 can be derived similarly. Let T1i denote the time to efficacy, and let u1i .u1i � τ / denote theactual follow-up time for subject i in arm 1. The patient’s response is censored if he or she hasnot responded .u1i < T1i/ and also has not been fully followed up to τ .u1i < τ /. If there is noresponse up to time τ , then y1i =0; if we observe an efficacy event before τ , then y1i =1; and ifwe observe a censored event before τ , i.e. efficacy has not occurred yet, we can obtain a fractionof 1 as the contribution of the censored observation to the response probability. In particular,for a censored observation, one part of the point mass is assigned to the censored point andthe other part can be assigned anywhere that is larger than τ . Only the weight that is assignedto the censored time point within .0, τ / is counted towards the response probability, and thatassigned beyond τ is disregarded. If subject i is censored by the decision-making time u1i, wetake the fractional contribution as

Pr.T1i < τ |T1i >u1i/= Pr.u1i <T1i < τ /

Pr.T1i >u1i/: .3/

The weight or the fraction in equation (3) has a meaningful interpretation as a conditionalprobability of experiencing efficacy within .u1i, τ / given that the efficacy event has not occurredyet in .0, u1i]. Therefore, we can define a fractional contribution for a censored efficacy outcomeas

y1i =S1.u1i/− S1.τ /

S1.u1i/, .4/

where S1.·/ is the Kaplan–Meier estimator based on all the data observed in treatment arm 1.It is worth noting that, if no efficacy is observed within .0, τ /, then y1i = 0 by the definitionin equation (2), and it also holds that y1i = 0 by equation (4), i.e. the actual observed efficacyoutcome y1i and the fractional efficacy outcome y1i coincide. In other words, subjects who havenot responded by time τ (i.e. y1i =0) can be considered as censored at τ , and they still contributeto the risk set in the Kaplan–Meier estimator. It is possible that the outcome of interest maynever occur for some patients in the trial, particularly if a patient’s disease status has changed(or progressed) and the original study end point is not applicable to this patient any more. Inthis case, the patient should be excluded from the study instead of being treated as a censoredobservation at τ . In general, patients’ arrival times are independent of the times to efficacy owingto staggered entry of patients in the trial, which automatically satisfies the usual independentcensoring assumption.

During the trial conduct, we can estimate the response rate p1 on the basis of a new responsevariable r1i, where r1i =1 for all the patients who have responded to treatment 1, r1i =0 for thosewho have not responded by time τ and r1i = y1i for those who are censored but still contributesome fraction of 1 to the response probability. Suppose that n1 patients have been treated inarm 1; then the estimate for the corresponding response rate is given by

p1 =n1∑

i=1r1i

/n1,

Page 8: Two-stage adaptive randomization for delayed response in ...gyin/materials/2014XuYinJRSSC.pdf · Despite the widespread use of equal randomization in clinical trials, response- adaptive

566 J. Xu and G.Yin

where

r1i =⎧⎨⎩

0 if patient i does not respond,1 if patient i has responded,y1i if the response of patient i is censored.

4. Parametric survival model with time-to-event end point

It is possible to use the time-to-event end point for AR in a survival trial directly. Zhang andRosenberger (2007) developed an optimal allocation scheme under the assumption of paramet-ric survival models. Let T denote the survival time; under an exponential model the survivalfunction of T is given by

Sj.t/= exp.−λjt/= exp.−t=θj/, j =1, 2,

where λj is the constant hazard rate for treatment arm j, and θj = 1=λj is the mean survivaltime.

If we compare two groups with respect to their mean survival times θ1 and θ2, the hypothesistest formulates

H0 :θ1 =θ2 versus H1: θ1 �=θ2:

Suppose that the total sample size of the trial is n = n1 + n2 and the allocation ratio betweenarm 1 and arm 2 is r =n1=n2. We can construct a Wald test statistic,

Zθ = θ1 − θ2√(θ

21

/ n1∑i=1

Δ1i + θ22

/ n2∑i=1

Δ2i

) ,

where Δ1i and Δ2i are the censoring indicators in group 1 and group 2 respectively. Denoteδ1 =E.Δ1i/ and δ2 =E.Δ2i/, and the variance of θ1 − θ2 is given by

var.θ1 − θ2/= θ21

n1δ1+ θ2

2

n2δ2: .5/

In contrast, if we compare hazard rates λ1 and λ2 between the two groups, we formulate thehypotheses as

H0: λ1 =λ2 versus H1: λ1 �=λ2:

We can construct another Wald test statistic,

Zλ = λ1 − λ2√(λ

21

/ n1∑i=1

Δ1i + λ22

/ n2∑i=1

Δ2i

) ,

and the corresponding variance is given by

var.λ1 − λ2/= λ21

n1δ1+ λ2

2

n2δ2: .6/

As in the survival analysis context, we take the patient response as a bad event, e.g. failure ordeath. Zhang and Rosenberger (2007) obtained the optimal allocation ratio by minimizing thetotal expected hazard n1θ

−11 +n2θ

−12 with respect to r, subject to fixing the variance in equation

(5) as a constant,

rZR,θ =√.θ3

1δ2/=√

.θ32δ1/:

Page 9: Two-stage adaptive randomization for delayed response in ...gyin/materials/2014XuYinJRSSC.pdf · Despite the widespread use of equal randomization in clinical trials, response- adaptive

Adaptive Randomization in Clinical Trials 567

In contrast, if we use a different parameterization with λ and minimize the same objectivefunction n1λ1 +n2λ2 subject to fixing the variance in equation (6), the optimal allocation ratiochanges to

rZR,λ =√.λ1δ2/=

√.λ2δ1/:

The above two optimal allocation ratios are derived under the same parametric survival model,but by using different parameterizations: the first optimization is based on the mean survival timeθ, whereas the second is based on the hazard λ, with θ=1=λ. Through such a simple one-to-oneparameter transformation, the optimization procedure results in two different allocation ratiosrZR,θ and rZR,λ, which in fact would lead to opposite trends of allocation, i.e. one allocation ratio.rZR,θ/ assigns more patients to a better arm, whereas the other .rZR,λ/ results in more patientstreated in a worse arm. Therefore, the method of Zhang and Rosenberger (2007) does not havethe invariance property in terms of a one-to-one transformation of the model parameter.

In contrast, if the patient response is a good event as we have discussed in Section 3, then, thesooner patients experience the event, the better. For example, the sooner that leukaemia patientsachieve remission after treatment, the more likely they would be to survive (Estey et al., 2000).For this case, we adopt a more meaningful optimal criterion, i.e. to minimize the total numberof patients who have not responded within the assessment window .0, τ / subject to fixing thevariance of the difference of the estimated survival probabilities at τ . Let K be a generic symbolfor a constant. We derive the optimal allocation ratio by minimizing n1 S1.τ , λ1/+n2 S2.τ , λ2/

subject to fixing var{S1.τ , λ1/−S2.τ , λ2/}=K.Suppose that the time to response follows an exponential distribution with survival function

Sj.t, λj/= exp.−λjt/, j =1, 2. We can calculate the variance via the delta method,

var{S1.τ , λ1/−S2.τ , λ2/}≈ λ21

n1δ1τ2 exp.−2λ1τ /+ λ2

2

n2δ2τ2 exp.−2λ2τ /, .7/

which leads to the variance constraint

λ21

n1δ1exp.−2λ1τ /+ λ2

2

n2δ2exp.−2λ2τ /=K:

As a result, our optimal allocation ratio is given by

rNew,S = λ1√{δ2 exp.−λ1τ /}

λ2√{δ1 exp.−λ2τ /} :

As shown in Appendix A, when the sample size is sufficiently large and both p1 and p2are small, the new optimal allocation ratio can be approximated by that of the binary case,i.e. rNew,S ≈√

p1=√

p2: Such an approximation is generally accurate when both p1 and p2 aresmaller than 0.5.

Furthermore, the optimal allocation ratio rNew,S is invariant under any one-to-one trans-formation of the model parameter. Such an invariance property originates from the fact thatthe variance of the survival function does not change under a simple parameter transfor-mation g.·/, i.e. var[Sj{τ ; g.λj/}] = var{Sj.τ ; λj/}, j = 1, 2. By contrast, if we take g.x/ = 1=x,then var.θj/=var{g.λj/} �=var.λj/, j =1, 2, which results in different optimal allocation ratios:rZR,θ versus rZR, λ.

5. Numerical studies

5.1. SimulationsWe investigated the operating characteristics of the proposed two-stage fractional AR designwith efficacy end points through simulation studies. For simplicity, we considered a two-arm trial

Page 10: Two-stage adaptive randomization for delayed response in ...gyin/materials/2014XuYinJRSSC.pdf · Despite the widespread use of equal randomization in clinical trials, response- adaptive

568 J. Xu and G.Yin

with binary outcomes. Traditional AR typically incorporates a prephase of ER until we observeat least one response in each treatment arm. Similarly, the two-stage AR design proposed alsoinvolves both ER and AR stages, which employs sequential likelihood ratio tests to examinewhether there is sufficient information to support a switch from ER to AR. Once AR takesplace, the allocation ratio of AR is set to be

√p1=

√p2 such as to minimize the expected number

of non-responders, where p1 and p2 are the response rates of the two treatments, estimatedby using the sequential maximum likelihood procedure. The assessment period for efficacy wasτ =12 weeks, and the accrual time interval between two consecutive cohorts was a=1 week, i.e.every week a new cohort (four patients) would enter the trial. The sample size was calculatedon the basis of the prespecified type I error and type II error rates, α= 0:1 and β = 0:2 for atwo-sided test. The total sample size was n=132 under the alternative hypothesis with p1 =0:2and p2 = 0:4, and n= 146 with p1 = 0:3 and p2 = 0:5. We fixed the threshold level α= 0:3 andassumed that the time to response in treatment j followed Weibull distributions with survivalfunction Sj.t;λj, γj/= exp{−.λjt/γj}, j =1, 2. We fixed γ1 =1 and γ2 =5, and then computedλj according to

1− exp{−.λjτ /γj}=pj, j =1, 2,

such that a patient may respond with probability pj at the end of evaluation time τ . The trialwould be terminated after the last cohort of patients had been fully followed for τ weeks. Asshown in Fig. 2, the cumulative distribution function of the time to efficacy evaluated at τ isp1 = 0:4 and p2 = 0:2 for arm 1 and arm 2 respectively. Although arm 1 doubles the responserate of arm 2 at time τ , the response probability of arm 2 clearly surpasses that of arm 1 beforeweek 10.

0 2 4 6 8 10 12 140

0.1

0.2

0.3

0.4

0.5

0.6

Time (weeks)

Res

pons

e pr

obab

ility

Fig. 2. Cumulative distribution functions of Weibull distributions with the response probability at time τbeing 0.4 for arm 1 and 0.2 for arm 2: , arm 1, T �Weibull(γ D 0:5, λ D 0:0729); , arm 2,T �Weibull(γ D1, λD0:0186)

Page 11: Two-stage adaptive randomization for delayed response in ...gyin/materials/2014XuYinJRSSC.pdf · Despite the widespread use of equal randomization in clinical trials, response- adaptive

Adaptive Randomization in Clinical Trials 569

Generally speaking, we need to observe all the efficacy data of the treated patients completelyto implement AR. The complete-data AR procedure follows each subject till the occurrence ofresponse or the end of the assessment period before randomizing each new patient. In contrast,the fractional design utilizes the scheme of redistribution to the right for censored data, so thateach patient would be immediately randomized on arrival. We also considered the observeddata AR procedure based on the observed efficacy data only, while treating censored patients(who have not responded or have been fully followed yet) as non-responders. Table 1 shows 10different scenarios, and each is based on 10000 simulated trials. Under each scenario, the firstrow shows the simulation results for the two-stage observed data AR procedure, and the othertwo rows correspond to the two-stage complete-data design and the two-stage fractional design.In particular, we present the percentage of patients who are allocated to arm 1, the correspondingstandard deviation, the total number of patients who have responded, the statistical power, theduration of the trial (in weeks) and the number of patients allocated in the ER stage (denotedas nE).

The first five scenarios in Table 1 correspond to trials with a total sample size of n = 132and a fixed response rate for treatment 2, p2 = 0:2. In scenario 1, the two treatments have thesame response rate p1 = p2 = 0:2, which indicates that the null hypothesis is the truth and allthree designs maintained the type I error rate at α = 0:1. The fractional and complete-datadesigns performed similarly in terms of the allocation ratio, whereas the observed data ARdesign falsely assigned more patients to arm 2 because a much higher response rate in arm 2 wasobserved at the beginning of the follow-up (before week 10 as shown in Fig. 2). The number ofpatients who were allocated by ER (nE) in this scenario is the largest, as the two-stage designswould automatically extend the ER stage when there is no treatment difference. In scenario2, the difference in the two response rates is very small, i.e. p1 − p2 = 0:1, and the number ofpatients who were allocated in the ER stage decreased slightly compared with that in scenario 1.Scenario 3 corresponds to the alternative hypothesis, which thus has the targeting power of 80%under all three designs. In terms of the allocation ratio, there is not much difference between thefractional and the complete-data designs, whereas both of them are better than the observeddata design. As the difference between the two response rates increases in scenarios 4 and 5,the sample size of ER becomes smaller because fewer patients are needed to detect a largerdifference. More importantly, the fractional AR design increased the number of respondersby more than four patients over the observed data design. Comparing the duration of thetrial between the proposed fractional design and the complete-data design, the trial time wasdramatically reduced from 370 weeks to 53 weeks. Scenarios 6–10 in Table 1 explore a differentsample size n = 146 with the fixed p2 = 0:3. The conclusions are similar to those drawn fromscenarios 1–5: the two-stage fractional design proposed can immensely shorten the durationof the trial yet without sacrificing the AR performance much. Moreover, the two-stage designautomatically adjusts the length of the ER stage on the basis of the difference between the twotreatments and satisfactorily resolves the issue of the arbitrary number of patients in the ERstage before the implementation of AR.

To compare the proposed optimal allocation scheme with that of Zhang and Rosenberger(2007), we now consider the event of interest to be a treatment failure (or death) as in the usualsurvival analysis. To derive the optimal allocation ratio, we minimize the expected numberof failures by time τ , n1{1 − S1.τ , θ1/} + n2{1 − S2.τ , θ2/}, subject to fixing var{S1.τ , θ1/ −S2.τ , θ2/}=K. Under the exponential distribution, the optimal allocation ratio is given by

rNew,1−S =√

[θ22δ2 exp.−2τ=θ1/{1− exp.−τ=θ2/}]√

[θ21δ1exp.−2τ=θ2/{1− exp.−τ=θ1/}]

:

Page 12: Two-stage adaptive randomization for delayed response in ...gyin/materials/2014XuYinJRSSC.pdf · Despite the widespread use of equal randomization in clinical trials, response- adaptive

570 J. Xu and G.Yin

Table 1. Simulation study comparing the two-stage observed data, complete-data and fractionalAR designs under Weibull distributions

Two-stage Allocation Allocation Number of Statistical Trial nE§§design arm 1 (%)† standard responders power (%) duration§

deviation‡

p2 =0.2 and n=132Scenario 1, p1 =0:2

Observed 47.4 0.06 26.5 10.1 52.7 56.9Complete 50.0 0.06 26.4 10.5 362.1 53.9Fractional 50.3 0.07 26.4 9.8 52.7 45.2

Scenario 2, p1 =0:3Observed 50.8 0.06 33.1 37.7 53.2 52.4Complete 54.4 0.06 33.6 38.2 368.6 40.8Fractional 54.5 0.07 33.7 37.6 53.2 37.0

Scenario 3, p1 =0:4Observed 53.5 0.06 40.5 81.1 53.4 44.5Complete 57.8 0.06 41.6 80.5 370.8 27.5Fractional 57.5 0.07 41.6 80.9 53.4 30.1

Scenario 4, p1 =0:6Observed 57.4 0.06 56.7 99.9 53.6 30.8Complete 62.5 0.06 59.3 99.9 372.7 16.1Fractional 61.8 0.06 59.0 99.9 53.6 22.1

Scenario 5, p1 =0:8Observed 60.7 0.06 74.4 100.0 53.5 22.4Complete 65.7 0.05 78.4 100.0 370.0 12.5Fractional 64.8 0.06 77.7 100.0 53.5 17.7

p2 =0.3 and n=146Scenario 6, p1 =0:3

Observed 46.7 0.06 43.8 10.1 57.3 42.0Complete 49.9 0.06 43.9 10.2 418.5 45.8Fractional 50.3 0.06 43.9 9.9 57.3 33.7

Scenario 7, p1 =0:4Observed 49.4 0.05 51.1 36.1 57.6 40.4Complete 53.5 0.06 51.6 35.4 421.7 36.3Fractional 53.5 0.06 51.7 35.0 57.6 29.7

Scenario 8, p1 =0:5Observed 51.6 0.05 58.9 80.4 57.8 36.4Complete 56.1 0.05 60.2 80.7 423.2 23.6Fractional 56.0 0.06 60.2 79.9 57.8 25.3

Scenario 9, p1 =0:7Observed 55.2 0.05 76.0 100.0 57.8 26.8Complete 60.2 0.05 79.0 99.9 423.1 13.0Fractional 59.7 0.06 78.6 100.0 57.8 19.4

Scenario 10, p1 =0:9Observed 58.3 0.05 94.9 100.0 57.7 19.2Complete 63.2 0.05 99.2 100.0 416.7 9.3Fractional 62.4 0.06 98.6 100.0 57.7 15.6

†‘Allocation arm 1 (%)’ corresponds to the allocation percentage to treatment arm 1.‡‘Allocation standard deviation’ corresponds to the allocation standard deviation for treatment arm 1.§‘Trial duration’ is the total duration of the trial (in weeks).§§‘nE’ denotes the number of patients in the ER stage.

We can also minimize the total expected hazard n1θ−11 +n2θ

−12 , subject to fixing var{S1.τ , θ1/−

S2.τ , θ2/}=K; then the optimal allocation ratio is given by

rNew,h =√{θ2δ2 exp.−2τ=θ1/}√{θ1δ1exp.−2τ=θ2/} :

Page 13: Two-stage adaptive randomization for delayed response in ...gyin/materials/2014XuYinJRSSC.pdf · Despite the widespread use of equal randomization in clinical trials, response- adaptive

Adaptive Randomization in Clinical Trials 571

In Table 2, we compare the performance of four different optimal allocation ratios, the pro-posed rNew,h and rNew,1−S , and rZR,θ and rZR,λ by Zhang and Rosenberger (2007), with the fixedbalanced design. The sample size was calculated under the type I and type II error rates α=0:1and β =0:2 by using a two-sided log-rank test. Assume that a cohort of four patients enters thestudy every month and the minimal follow-up period was τ =12 months. The total sample sizewas n = 172 under the alternative hypothesis with the median survival rates θ1 = 9 and θ2 = 6months; and n= 180 with the alternative hypothesis of θ1 = 12 and θ2 = 8 months. In additionto the allocation ratio and power, we also present the total hazard, and the half-year and 1-yearsurvival rates. Clearly, rNew,h and rZR,θ have comparable performance, whereas rNew,1−S tendsto be more conservative in allocating patients to a better treatment. All the AR designs have acorrect allocation direction towards a better treatment, except for that with rZR,λ which falselyassigns more patients to the inferior treatment arm. This in turn demonstrates the importance ofthe invariance property, which makes all allocations consistent in the right direction regardlessof parameterization. Both the half-year and 1-year survival rates by using rNew,h are the highest,as the allocation ratio rNew,h assigns patients more aggressively to a better treatment arm.

5.2. Sensitivity analysisTable 3 compares the performance of the proposed optimal allocation ratio rNew,S under theparametric survival design with the non-parametric fractional design under various survivaldistributions. For the binary data case, the treatment response rates were p1 =0:4 and p2 =0:2,and to achieve power 80% the total sample size was n=132. In scenarios 1–6, patients’ responsetimes were generated from Weibull distributions, where the shape parameter γj was varied toexamine the sensitivity of the model, and the scale parameter λj was determined via solvingSj.τ ;λj, γj/ = 1 − pj, for j = 1, 2. In the two-stage parametric survival design, the likelihoodratio statistic was computed for testing the difference of the two hazards, i.e. H0 :λ1 =λ2 versusH1 :λ1 �=λ2, under the exponential model. In scenarios 7–10, the times to efficacy were simulatedfrom log-normal distributions. For each configuration, we simulated 10000 trials.

In scenario 1, the parametric survival design tends to be more conservative in terms of theallocation ratio compared with the non-parametric fractional design, whereas the standarddeviation of the allocation ratio under the parametric design almost tripled that of the non-parametric design. Scenarios 2 and 3 correspond to the cases with shape parameter γ = 0:2and γ = 0:5, and scenario 4 matches with the true exponential model which is the assumedparametric model for the optimal allocation scheme. As the parameter γ increases, the varianceof the corresponding Weibull distribution decreases, and thus the standard deviation of theallocation ratio in the parametric survival design decreases. The fractional and survival designsperformed similarly in all aspects except for the ER sample size, since the hypotheses tested weredifferent in the two designs, i.e. the former with H0 : p1 = p2 and the latter with H0 : λ1 =λ2.Under the log-normal distributions (scenarios 7–10), the standard deviation of the allocationratio in the parametric survival design increases as the variance σ2 becomes larger. In all thescenarios, both designs achieve approximately 80% power and the number of responders isaround 41. As a conclusion, both the non-parametric fractional design and the parametricsurvival design are reasonably stable and robust under different survival distributions.

The two-stage design proposed relies on the value of α, which acts as a threshold for switchingfrom ER to AR. In general, α should be greater than the trial’s type I error rate α. Increasing thevalue of α, on average, would reduce the sample size in the ER stage. When α=1, the two-stagedesign contains only a single stage: AR would be implemented from the start of the trial conductwithout the ER stage. We investigated the effect of various values of α on the sample size in theER stage, nE. Table 4 shows that a larger value of α leads to a smaller number of patients in the

Page 14: Two-stage adaptive randomization for delayed response in ...gyin/materials/2014XuYinJRSSC.pdf · Despite the widespread use of equal randomization in clinical trials, response- adaptive

572 J. Xu and G.Yin

Table 2. Simulation study comparing four optimal allocation ratios rNew,h, rNew,1�S , rZR,θ and rZR,λ,and the balanced design by using the survival end point

Two-stage Allocation Allocation Statistical Total Half-year 1-yeardesign arm 1 (%) standard power (%) hazard survival (%) survival (%)

deviation

θ2 =6 and n=172Scenario 1, θ1 =6

rNew, h 50.0 0.10 10.7 28.7 36.8 13.5rNew,1−S 50.9 0.04 8.7 28.7 36.8 13.5rZR,θ 50.0 0.07 9.0 28.7 36.7 13.5rZR,λ 49.9 0.04 9.2 28.7 36.8 13.6Balanced 50.0 0.00 9.1 28.7 36.8 13.5

Scenario 2, θ1 =9rNew, h 65.4 0.09 79.8 22.4 46.3 21.9rNew,1−S 54.1 0.04 80.2 23.5 44.7 20.5rZR,θ 61.5 0.07 79.6 22.8 45.7 21.4rZR,λ 47.6 0.04 80.5 24.1 43.7 19.6Balanced 50.0 0.00 80.9 23.9 44.1 19.9

Scenario 3, θ1 =12rNew, h 72.5 0.07 99.3 18.3 54.0 30.3rNew,1−S 56.2 0.04 99.6 20.6 50.2 26.6rZR,θ 68.9 0.06 99.4 18.8 53.2 29.5rZR,λ 45.9 0.04 99.5 22.1 47.7 24.2Balanced 50.0 0.00 99.6 21.5 48.7 25.2

Scenario 4, θ1 =15rNew, h 76.2 0.05 100.0 15.6 59.8 37.4rNew,1−S 57.6 0.04 100.0 18.8 54.2 31.6rZR,θ 73.6 0.05 100.0 16.0 59.0 36.6rZR,λ 44.9 0.04 100.0 20.9 50.3 27.6Balanced 50.0 0.00 100.0 20.1 51.9 29.2

θ2 =8 and n=180Scenario 5, θ1 =8

rNew, h 50.1 0.09 9.2 22.5 47.2 22.3rNew,1−S 51.5 0.04 8.9 22.5 47.2 22.3rZR,θ 50.1 0.07 9.5 22.5 47.2 22.3rZR,λ 50.0 0.04 8.2 22.5 47.2 22.3Balanced 50.0 0.00 8.7 22.5 47.2 22.3

Scenario 6, θ1 =12rNew, h 63.5 0.08 79.3 17.7 55.7 31.5rNew, 1−S 54.6 0.04 80.0 18.4 54.5 30.2rZR,θ 62.0 0.07 80.1 17.8 55.6 31.3rZR,λ 47.7 0.04 80.8 18.9 53.6 29.2Balanced 50.0 0.00 80.6 18.8 53.9 29.5

Scenario 7, θ1 =16rNew, h 70.5 0.07 99.5 14.6 62.4 39.8rNew, 1−S 56.7 0.04 99.6 16.1 59.4 36.4rZR,θ 69.6 0.06 99.5 14.7 62.2 39.7rZR,λ 46.4 0.04 99.6 17.3 57.2 33.8Balanced 50.0 0.00 99.5 16.9 58.0 34.8

Scenario 8, θ1 =20rNew, h 74.5 0.06 100.0 12.4 67.2 46.5rNew,1−S 58.2 0.04 100.0 14.6 62.8 41.2rZR,θ 74.3 0.05 100.0 12.5 67.1 46.4rZR,λ 45.5 0.04 100.0 16.4 59.3 37.1Balanced 50.0 0.00 100.0 15.8 60.6 38.6

Page 15: Two-stage adaptive randomization for delayed response in ...gyin/materials/2014XuYinJRSSC.pdf · Despite the widespread use of equal randomization in clinical trials, response- adaptive

Adaptive Randomization in Clinical Trials 573

Table 3. Simulation study comparing the two-stage non-parametric fractional and theparametric survival designs under various survival distributions

Two-stage Allocation Allocation Number of Statistical Trial nEdesign arm 1 (%) standard responders power (%) duration

deviation

Scenario 1, Weibull distribution with γ =0:1Fractional 57.8 0.06 41.7 81.0 53.4 26.7Survival 55.0 0.18 43.3 81.6 53.1 24.7

Scenario 2, Weibull distribution with γ =0:2Fractional 57.6 0.06 41.6 80.6 53.4 27.2Survival 56.3 0.16 41.7 79.7 53.4 26.5

Scenario 3, Weibull distribution with γ =0:5Fractional 57.4 0.06 41.6 81.2 53.4 26.3Survival 57.1 0.12 41.5 80.1 53.4 31.1

Scenario 4, Weibull distribution with γ =1 (exponential distribution)Fractional 57.3 0.07 41.5 81.6 53.4 27.9Survival 57.1 0.08 41.4 81.3 53.4 37.8

Scenario 5, Weibull distribution with γ =5Fractional 56.8 0.07 41.3 81.5 53.4 34.4Survival 55.6 0.06 41.1 80.4 53.4 55.3

Scenario 6, Weibull distribution with γ =10Fractional 56.6 0.07 41.3 80.6 53.4 36.2Survival 55.2 0.06 41.0 81.0 53.4 59.2

Scenario 7, log-normal distribution with σ2 =1Fractional 57.1 0.07 41.4 80.4 53.4 29.5Survival 57.3 0.06 41.4 80.8 53.4 42.4

Scenario 8, log-normal distribution with σ2 =2Fractional 57.3 0.07 41.5 81.6 53.4 28.1Survival 57.7 0.07 41.7 81.0 53.4 37.8

Scenario 9, log-normal distribution with σ2 =5Fractional 57.5 0.06 41.6 81.3 53.4 26.8Survival 57.5 0.10 41.5 80.7 53.4 32.5

Scenario 10, log-normal distribution with σ2 =10Fractional 57.5 0.06 41.7 80.6 53.4 26.4Survival 57.3 0.12 41.6 80.2 53.4 30.3

ER stage. We also explored Fisher’s exact test, which tends to be more conservative than theasymptotic χ2-approximation and thus requires a relatively larger sample size for detecting thetreatment difference.

5.3. Trial exampleWe illustrate the proposed AR methods by the randomized clinical trial P1060 with HIV-infectedchildren (Violari et al., 2012). The objective of the P1060 trial was to compare treatment per-formance between ritonavir-boosted lopinavir and nevirapine, in addition to zidovudine andlamivudine. The primary end point of the study was treatment failure by 24 weeks and thus theevaluation window for the treatment outcome was 24 weeks. Fig. 3 shows the Kaplan–Meiercurves for ritonavir-boosted lopinavir and nevirapine treatment groups stratified by age. Al-though the trial followed HIV-infected children till treatment failure (time-to-event data), thefinal analysis was based on the binary outcome truncated at week 24.

The P1060 trial enrolled a total of 288 patients between November 23rd, 2006, and March19th, 2010 (the enrolment period was 168 weeks). An additional follow-up of 24 weeks was

Page 16: Two-stage adaptive randomization for delayed response in ...gyin/materials/2014XuYinJRSSC.pdf · Despite the widespread use of equal randomization in clinical trials, response- adaptive

574 J. Xu and G.Yin

Table 4. Sensitivity analysis of parameter Qα in the two-stage designwith the asymptotic χ2-test and Fisher’s exact test

α Allocation Allocation Number of Statistical nEarm 1 (%) standard responders power (%)

deviation

p1 =0.4, p2 =0.2, n=132Asymptotic χ2-test

0.10 56.78 0.06 41.3 81.16 51.80.20 57.57 0.06 41.7 81.92 35.20.30 57.90 0.06 41.7 81.42 27.50.40 58.01 0.06 41.7 81.79 22.70.50 58.11 0.06 41.7 81.41 17.8

Fisher’s exact test0.10 56.05 0.06 41.2 81.90 63.10.20 56.92 0.06 41.4 81.36 47.70.30 57.32 0.06 41.4 81.57 38.80.40 57.59 0.06 41.5 81.74 31.90.50 57.78 0.06 41.7 81.07 27.4

p1 =0.5, p2 =0.3, n=146Asymptotic χ2-test

0.10 55.20 0.06 59.8 79.52 53.50.20 55.86 0.05 60.1 79.27 32.90.30 55.97 0.05 60.1 79.41 24.20.40 56.21 0.05 60.2 79.42 19.00.50 56.19 0.05 60.2 80.04 13.7

Fisher’s exact test0.10 54.75 0.05 59.7 80.53 66.70.20 55.26 0.05 60.0 79.06 49.80.30 55.67 0.05 60.1 79.90 38.20.40 55.89 0.05 60.1 80.03 30.90.50 56.01 0.05 60.0 80.52 25.0

taken for the last enrolled patient, so the trial ended on October 27th, 2010. Eligible childrenwere originally stratified by age into three strata: from 2 to under 6, from 6 to under 12 and12–36 months. Since there were only four children in the first stratum, the first two strata weremerged in the analysis. ER was used to assign patients to the two treatment arms: 147 patientsin the nevirapine group and 140 patients in the ritonavir-boosted lopinavir group. One child inthe ritonavir-boosted lopinavir group never started therapy and therefore was excluded fromthe trial. The goal was to compare the percentage of children who reached the primary endpoint (virologic failure or discontinuation of the study for any reason) by 24 weeks. The resultshowed that the nevirapine group had a significantly higher failure rate (40.8%) compared withthe ritonavir-boosted lopinavir group (19.3%). Similarly to the ACTG 076 trial, the findingof the P1060 trial was decisive, and the interim analysis could have been used to randomizechildren adaptively, so that more children would be assigned to the ritonavir-boosted lopinavirgroup.

To gain more insight into our fractional and survival AR methods in real applications, weexplored the possibility of redesigning the trial by using the methods proposed. To imitate thereal trial conduct with stratified groups, we first generated the stratum indicator according tothe size of each stratum and then simulated failure times in each treatment arm on the basis ofthe Kaplan–Meier curves shown in Fig. 3. The assessment period for each patient was τ = 24

Page 17: Two-stage adaptive randomization for delayed response in ...gyin/materials/2014XuYinJRSSC.pdf · Despite the widespread use of equal randomization in clinical trials, response- adaptive

Adaptive Randomization in Clinical Trials 575

(a) (b)

Fig. 3. Kaplan–Meier curves stratified by age groups (a) less than 12 months or (b) 12 months and olderin the HIV P1060 trial: , nevirapine-based therapy; , ritonavir-boosted lopinavir

Fig. 4. Probability density functions of the numbers of events over 10000 simulations in the HIV P1060 trial:, two-stage fractional design; , two-stage parametric design; , fixed balance design

weeks, which means, if the failure time was greater than τ , that we defined the response as a‘success’. To gain initial information on the two treatments and also to stabilize the estimation,we started by assigning 15 children to each treatment arm and followed them for 24 weeks.After that, two patients entered the trial at the beginning of each week in a sequential order.This setting ensured that the total accrual period was 168 weeks, which exactly matched that ofthe original trial. We replicated the process 10000 times and recorded the number of childrenwho were assigned to each arm and the total number of events in each simulated trial. Fig. 4shows that the distributions of the numbers of failure events under the proposed two-stage

Page 18: Two-stage adaptive randomization for delayed response in ...gyin/materials/2014XuYinJRSSC.pdf · Despite the widespread use of equal randomization in clinical trials, response- adaptive

576 J. Xu and G.Yin

Table 5. Application of the proposed two-stage designs to the HIV P1060 trial

Two-stage Allocation Allocation Number of Stratum 1 Stratum 2 nEdesign arm 1 (%) standard events power (%) power (%)

deviation

Fractional 46.9 0.03 84.4 80.5 98.5 44.2Survival 46.4 0.03 84.1 79.7 98.5 47.6Balanced 50.2 0.00 86.5 80.4 98.5 —

fractional and parametric survival designs are clearly shifted to the left compared with thatunder the fixed balanced design. This indicates that the AR can reduce the number of failuresamong trial participants as more children would be assigned to the better treatment arm ofritonavir-boosted lopinavir.

Table 5 summarizes the results corresponding to the proposed two-stage fractional design,the two-stage parametric survival design and the conventional fixed balanced design. Statisticalpower was calculated under the prespecified type I error rate α=0:1 on the basis of a one-sidedWald test for each stratum over 10000 simulated trials. On average, two children could have beensaved when using our proposed AR method compared with the balanced design. Although twochildren may not appear to be a significant gain in contrast with the sample size of 287 children,saving two children’s lives would still be a real benefit. Therefore, from ethical perspectives, boththe fractional design and the parametric survival design should have been recommended overthe conventional fixed balanced randomization. In terms of the allocation ratio and standarddeviation, the two designs proposed are comparable. Our AR designs did not need to postponetreatment assignment for fully observing all the outcomes in the trial and thus the trial couldbe finished in a reasonable period of time.

6. Discussion

We have proposed a two-stage fractional design to meet the practical needs for response-basedAR when

(a) the number of patients in the ER stage is not clearly defined and(b) patients’ responses cannot be observed sufficiently quickly.

In the new design, unobserved efficacy outcomes are naturally treated as censored data, andtheir fractional point masses are used to make decisions on treatment assignment. The fractionalscheme is intuitive, which is motivated from the self-consistent property of the Kaplan–Meiersurvival estimator. The fractional efficacy outcome can be interpreted as the conditional prob-ability of experiencing the efficacy event during the remaining assessment period given thatthe subject has not yet responded. The non-parametric fractional design is robust because itutilizes the Kaplan–Meier estimator without imposing any parametric modelling assumption.Meanwhile, we have also derived a new optimal allocation ratio under the exponential survivalmodel. Compared with the optimal allocation of Zhang and Rosenberger (2007), the new targetsatisfies the invariance property, i.e., under the one-to-one parameterization, the allocation ratioremains the same.

In addition, the two-stage AR procedure successfully resolves the issue of how many patientsshould be allocated by using ER before the implementation of AR. It serves as a safeguardto protect patients from being falsely assigned to an inferior arm with a high probability at

Page 19: Two-stage adaptive randomization for delayed response in ...gyin/materials/2014XuYinJRSSC.pdf · Despite the widespread use of equal randomization in clinical trials, response- adaptive

Adaptive Randomization in Clinical Trials 577

the beginning of the randomization procedure. It is also worth emphasizing that the two-stageprocedure does not involve any multiple testing issue because the test statistic is used only fordecision making on whether to switch to AR, but not for hypothesis testing in the usual sense.

Acknowledgements

We thank the two referees, the Associate Editor and the Joint Editor for their very insightfulcomments that substantially improved this paper. This research was partially supported by grant784010 from the Research Grants Council of Hong Kong.

Appendix A: Allocation ratio approximation between binary and survival end points

Under the exponential model for the time to efficacy, the survival function evaluated at τ is given by

S.τ , λj/= exp.−λjτ /=1−pj , j =1, 2,

where pj is the response probability of treatment j evaluated at time τ . During the trial, the censored datainclude patients who have been fully followed up to time τ but have not experienced the event and thosewho have been censored by the decision-making time before τ . Since the number of the second type ofcensored observations decreases toward 0 as the trial proceeds, the expected censoring indicator δj can beapproximated by pj when the sample size becomes sufficiently large. Thus, we have

rNew,S = λ1√{δ2 exp.−λ1τ /}

λ2√{δ1 exp.−λ2τ /} ≈ log.1−p1/

√{.1−p1/=p1}log.1−p2/

√{.1−p2/=p2} :

Define h.x/= log.1−x/2.1−x/=x, 0<x< 1; then, by the Taylor series expansion of h.x/ around point 0+,we have

h.x/=x−x3=12+O.x4/:

Therefore, the optimal allocation ratio rNew,S can be written as

rNew,S =√

h.p1/√h.p2/

=√{p1 −p3

1=12+O.p41/}√{p2 −p3

2=12+O.p42/} ≈

√p1√p2

,

which indicates that the time-to-event model reduces to the binary case approximately.

References

Azriel, D., Mandel, M. and Rinott, Y. (2012) Optimal allocation to maximize the power of two-sample tests forbinary response. Biometrika, 99, 101–113.

Bandyopadhyay, U., Biswas, A. and Bhattacharya, R. (2009) A Bayesian adaptive design for two-stage clinicaltrials with survival data. Liftim. Data Anal., 15, 468–492.

Berry, D. A. and Eick, S. G. (1995) Adaptive assignment versus balanced randomization in clinical trials: a decisionanalysis. Statist. Med., 14, 231–246.

Biswas, A. and Mandal, S. (2004) Optimal adaptive designs in phase III clinical trials for continuous responseswith covariates. In mODa 7: Advances in Model-oriented Design and Analysis (eds A. D. Byccguabuci, H. Lauterand H. P. Wynn), pp. 51–58. Heidelberg: Physica.

Cheng, Y. and Berry, D. (2007) Optimal adaptive randomized designs for clinical trials. Biometrika, 94, 673–689.Connor, E. M., Sperling, R. S., Gelber, R., Kiseley, P., Scott, G., O’Sullivan, M. J., VanDyke, R., Bey, M.,

Shearer, W., Jacobson, R., Jimenez, E., O’Neil, E., Bazin, B., Delfraissy, J., Culname, M., Coombs, R., Elkins,M., Moye, J., Stratton, P., Balsley, J. and the Pediatric AIDS Clinical Trials Group Protocol 076 Study Group(1994) Reduction of maternal-infant transmission of human immunodeficiency virus type I with zidovudinetreatment. New Engl. J. Med., 331, 1173–1180.

Cox, D. R. and Hinkley, D. V. (1974) Theoretical Statistics. London: Chapman and Hall.Efron, B. (1967) The two sample problem with censored data. In Proc. 5th Berkeley Symp. Mathematical Statistics

and Probability (eds L. Le Cam and J. Neyman), vol. 4, pp. 831–853. Berkeley: University of California Press.

Page 20: Two-stage adaptive randomization for delayed response in ...gyin/materials/2014XuYinJRSSC.pdf · Despite the widespread use of equal randomization in clinical trials, response- adaptive

578 J. Xu and G.Yin

Eisele, J. R. (1994) The doubly adaptive biased coin design for sequential clinical trials. J. Statist. Planng Inf., 38,249–261.

Eisele, J. R. and Woodroofe, M. B. (1995) Central limit theorems for doubly adaptive biased coin designs. Ann.Statist., 23, 234–254.

Estey, E., Shen, Y. and Thall, P. (2000) Effect of time to complete remission on subsequent survival and disease-freesurvival time in AML, RAEB-t, and RAEB. Blood, 95, 72–77.

Feldman, D. (1962) Contributions to the “two-armed bandit” problem. Ann. Math. Statist., 33, 847–856.Flehinger, B. and Louis, T. (1971) Sequential treatment allocation in clinical trials. Biometrika, 58, 419–426.Hardwick, J., Oehmke, R. and Stout, Q. F. (2006) New adaptive designs for delayed response models. J. Statist.

Planng Inf., 136, 1940–1955.Hu, F. and Rosenberger, W. F. (2006) The Theory of Response-adaptive Randomization in Clinical Trials. Hoboken:

Wiley.Hu, F. and Zhang, L.-X. (2004) Asymptotic properties of doubly adaptive biased coin designs for multitreatment

clinical trials. Ann. Statist., 32, 268–301.Hu, F., Zhang, L.-X., Cheung, S. H. and Chan, W. S. (2008) Doubly adaptive biased coin designs with delayed

responses. Can. J. Statist., 36, 541–559.Ivanova, A., Rosenberger, W. F., Durham, S. D. and Flournoy, N. (2000) A birth and death urn for randomized

clinical trials: asymptotic methods. Sankhya B, 62, 104–118.Kaplan, E. L. and Meier, P. (1958) Nonparametric estimation from incomplete observations. J. Am. Statist. Ass.,

53, 457–481.Karrison, T., Huo, D. and Chappell, R. (2003) A group sequential, response-adaptive design for randomized

clinical trials. Contr. Clin. Trials, 24, 506–522.Louis, T. (1977) Sequential allocation in clinical trials comparing two exponential survival curves. Biometrics, 33,

627–634.Melfi, V., Page, C. and Geraldes, M. (2001) An adaptive randomized design with application to estimation. Can.

J. Statist., 29, 107–116.Robbins, H. (1952) Some aspects of the sequential design of experiments. Bull. Am. Math. Soc., 58, 527–535.Rosenberger, W. F. and Lachin, J. M. (2002) Randomization in Clinical Trials: Theory and Practice. New York:

Wiley.Rosenberger, W. F. and Seshaiyer, P. (1997) Adaptive survival trials. J. Biopharm. Statist., 7, 617–624.Rosenberger, W. F., Stallard, N., Ivanova, A., Harper, C. N. and Ricks, M. L. (2001) Optimal adaptive designs

for binary response trials. Biometrics, 57, 909–913.Sverdlov, O., Tymofyeyev, Y. and Wong, W. (2011) Optimal response-adaptive randomized designs for multi-

armed survival trials. Statist. Med., 30, 2890–2910.Thompson, W. R. (1933) On the likelihood that one unknown probability exceeds another in view of the evidence

of two samples. Biometrika, 25, 285–294.Violari, A., Lindsey, J. C., Hughes, M. D., Mujuru, H. A., Barlow-Mosha, L., Kamthunzi, P., Chi, B. H., Cotton,

M. F., Moultrie, H., Khadse, S., Schimana, W., Bobat, R., Purdue, L., Eshleman, S. H., Abrams, E. J., Millar,L., Petzold, E., Mofenson, L. M., Jean-Philippe, P. and Palumbo, P. (2012) Nevirapine versus ritonavir-boostedlopinavir for hiv-infected children. New Engl. J. Med., 366, 2380–2389.

Wei, L. J. and Durham, S. (1978) The randomized play-the-winner rule in medical trials. J. Am. Statist. Ass., 73,840–843.

Yao, Q. and Wei, L. (1996) Play the winner for phase II/III clinical trials. Statist. Med., 15, 2413–2423.Yin, G. (2012) Clinical Trial Design: Bayesian and Frequentist Adaptive Methods. Hoboken: Wiley.Zelen, M. (1969) Play the winner rule and the controlled clinical trial. J. Am. Statist. Ass., 64, 131–146.Zhang, L. and Rosenberger, W. (2005) Response-adaptive randomization for clinical trials with continuous out-

comes. Biometrics, 62, 562–569.Zhang, L. and Rosenberger, W. F. (2007) Response-adaptive randomization for survival trials: the parametric

approach. Appl. Statist., 56, 153–165.