Motivation for Utility-Based Method Using a “3+3” or similar algorithm based on binary toxicity is like handing over a newborn baby to an idiot. The CRM

Embed Size (px)

Citation preview

Motivation for Utility-Based Method Using a 3+3 or similar algorithm based on binary toxicity is like handing over a newborn baby to an idiot. The CRM was a great idea 22 years ago, but we can do much better now. (Many extensions of CRM exist.) Both efficacy and toxicity matter for the patient Any reasonable statistical method should use both. Ordinal efficacy and toxicity carry much more information than binary simplifications, and also reflect how clinicians actually think. Utilities underlie all medical decision-making They are natural tools for statistical algorithms. A Phase I-II Radiation Therapy Trial Diffuse intrinsic pontine gliomas (DIPGs) Very aggressive brain tumors No treatment with substantive anti-disease activity exists Radiation Therapy (RT) is standard trt, mainly palliative RT dose-toxicity & dose-efficacy profiles not understood The Trial: Children (median age = 5 years) with DIPGs Three RT levels x = 1,2,3 (biologically equivalent doses, in Gy, given serially per a fractionation schedule) Toxicity : Y 1 = { Low, Moderate, High, Severe } Defined in terms of fatigue, nausea/vomiting, headache, skin toxicities, blindness, brain edema or necrosis. Efficacy : Y 2 = # improvements in (i) Clinical Symptoms, (ii) Radiographic Appearance of the Tumor, (iii) Quality of Life Y 2 = { 0, 1, 2, or 3 } (Y 1,Y 2 ) scored at day 42 Elicited Joint Outcome Utilities = U(Y 1,Y 2 ) (from the two co-PIs, A. Mahajan and H. Fontanilla) 1) If the 4 efficacy levels are equally likely Mean utilities (81.75, 52.5, 17.5, 5.5) for Y 1 = (Low, Mod, High, Sev). 2) If Toxicity Severity = {Low, Moderate} is acceptable, but {High, Severe} is not acceptable Use DLT = {High, Severe} for the 3+3 or CRM. Scoring these two outcomes as No DLT and DLT makes no sense! 3) But U(0,Moderate) = U(3, High) = 25 Scoring these two outcomes as No DLT and DLT makes no sense! Using binary DLT is just plain wrong. Toxicity Severity Y 1 = Toxicity Severity LowModerateHighSevere Efficacy Y 2 = EfficacyScore Utility Based Dose Finding highest utility 1)Goal: Given doses d 1 < < d k of a new agent, based on Efficacy and Toxicity, find a dose having highest utility. 2)The principal investigator chooses a starting dose using clinical judgment, and possibly animal or in vitro data Based on elicited U(Efficacy,Toxicity), select doses for successive cohorts to maximize posterior mean utility, u(d,data) 3) Based on elicited U(Efficacy,Toxicity), select doses for successive cohorts to maximize posterior mean utility, u(d,data) 4)Restrict selections to acceptable doses. If all doses are unacceptable stop the trial. 5) Randomize among doses with u(d,data) close to the maximum, to avoid getting stuck at a suboptimal dose. Probability Model Y 1 = efficacy index in {0, 1,..., m 1 } Y 2 = toxicity index in {0, 1,..., m 2 } (m 1 = m 2 = 3 in the RT trial) x = dose index = 1,2, , J k,y, x = Pr( Y k = y | x, ) is defined by k,y, x = Pr( Y k > y | Y k > y-1, x, ) k,y, x = (1 - k,y+1, x ) k,1, x k,2, x k,y, x for y = 1,, m k -1 k,mk, x = k,1, x k,2, x k,mk, x for y = m k logit( k,y, x ) = k,y for x =1, k,y + k,y, 2 + + k,y, x, for x >1, with all k,y, x > 0 Saturated Marginals : dim k k = dim( k ) Model Dimensions Assume a bivariate Gaussian copula to get (y 1, y 2 | x, ) = ( ) has dimension p = J(m 1 + m 2 ) +1 J = # dose levels, (m 1 +1) = # Eff levels, (m 2 +1) = # Tox levels 1)The algorithm for computing a prior works well 2)Implementing MCMC to compute posteriors works well 3)Design performance across many scenarios is good Additional Remarks: 1)Ordinal outcomes carry a lot of information 2)For J > 5 doses, a more parsimonious model is needed. J m 1, m 2 2, 23, 33, 23, 32, 23, 23, 3 p Mean Utilities mean utility The mean utility of dose x given parameters is posterior mean utility The posterior mean utility of dose x given data D n is Dose Acceptability Criteria 1) Given toxicity severity y*, fixed upper limit 1 *, and upper probability cut-off p U, a dose x is unacceptably toxic if 2) Let { n } be a non-increasing sequence of utility increments. Given data D n, the set of n optimal doses satisfies ( x, D n ) - ( x opt, D n ) < n 3) A dose x is unlikely to be best if Adaptive Randomization Criteria Let G = {good outcomes}, e.g. G = {Y: U(Y) > 25}. Denote the posterior mean probability of an outcome with good utility Randomize patients to acceptable doses with probability Or use posterior mean utility of x in place of G (x, D n ) Establishing Priors Elicited prior mean outcome probabilities for the RT trial Computing Prior Hyperparameters (24 elicited probs, p=19) 1)Apply nonlinear least squares or a (new) sampling based algorithm to establish the 19 prior means 2) Calibrate hyper-variances to ensure a small prior effective sample size (ESS) Designing the Radiation Therapy Trial For the RT trial, m 1 = m 2 = 3, so there are 4x4-1 = 15 joint probabilities for each of J=3 doses, and p = 3(3+3) +1 =19. The marginal probabilities are with the logit link, Designing the Radiation Therapy Trial Approximating each prior( k,x,y ) as a beta(a,b), prior ESS ~ a+b values were 0.31 to.070, with mean x unacceptably toxic if I.e. a 10% limit was imposed on Pr(High or Severe toxicity) Anticipated accrual rate = 6 to 10 pats/year N = 30 maximum. Treat the first 3 pats. at x =1, then adapt, but do not skip dose level x =2 when escalating at the start. AR applied with lower utility cut-off U = 25 for good Y Increment for utility close to max n = 20 for n = 4 to 15, n = 15 for n = 16 to 30. Posteriors computed with using MCMC with Gibbs sampling. Evaluating the Method in the RT Trial A simulation scenario consists of fixed marginal outcome probabilities, using Gaussian copula parameter =.10. To evaluate selection : To evaluate patient treatment : Simulation Scenarios Pr(High or Severe Tox) =.50 for all doses Simulation Scenarios Operating Characteristics of the RT Trial Design Pr(High or Severe Tox) =.50 for all doses Operating Characteristics of the RT Trial Design OCs of a 5-dose version of the RT Trial Design, with maximum N = 40 Pr(High or Severe Tox) =.50 for all doses Evaluating Admissibility Criteria and AR To evaluate the admissibility criteria and the use of AR, we examined 4 alternative versions of the method: Method 1: Admissible = {safe, not unlikely to be best, n -close posterior mean utility}, with AR Method 2: Admissible = {safe, n -close posterior mean utility}, with AR Method 3: Admissible = {safe}, with AR Method 4: Admissible = {safe}, no AR (pick the optimal dose) R select for the 4 Methods The No AR (Greedy) Algorithm gets stuck and is a disaster in Scenarios 2 and 7 AR is more consistent across all scenarios R treat for the 4 Methods AR with two additional admissibility criteria improves R treat compared to AR with only a safe dose requirement More Parsimonious Probability Models Proportional Odds (PO) Model For each outcome k = 1, 2, link {Pr( Y k > y | x, )} = k,y, x k,y k x, with k,y in y is the PO model when link = logit. Under a Gaussian copula, the joint model has p = m 1 + m parameters. Elaboration: k,y, x k,y k,y x with k,y, x in y for all x, and p = 2m 1 + 2m E.g. m 1 = m 2 = 3 and J=3 in the RT trial Saturated model: p = 21 Elaborated PO model: p = 13 PO model: p = 9 R select for Proportional Odds vs Saturated Model R treat for Proportional Odds vs Saturated Model A Phase I-II Trial of a BRAF Inhibitor A BRAF inhibitor is given with a fixed dose of with tumor infiltrating lymphocytes (TILs) for advanced melanoma Three doses: {320, 640, 960} mg b.i.d N = 36 patients, accrual rate 1 2 per month Two binary outcomes Toxicity Y 1 = I[Grade 3 or 4 non-hematologic NCI toxicity occurring within 4 weeks] Efficacy Y 2 = [ > 50% reduction in tumor by week 8] Dose x is unacceptably 1) toxic if Pr{ T,x >.25 | data } >.80 2) inefficacious if Pr{ R,x.80 Response NoYes Toxicity No Yes 050 Elicited Prior Means Elicited Utilities Dose in mg b.i.d Response Toxicity Scenario 1 Operating Characteristics of the TIL Design: Scenario 1 Scenario 2 Operating Characteristics of the TIL Design: Scenario 2 Scenario 3 Operating Characteristics of the TIL Design: Scenario 3 Scenario 4 Operating Characteristics of the TIL Design: Scenario 4 All doses too toxic, with ptox =.50 Pr(select none) =.95 Conclusions 1. Using Efficacy and utilities is vastly superior to ignoring efficacy. 2. Randomizing among doses with posterior mean utility close to the maximum is an insurance policy against disaster in cases where the greedy algorithm gets stuck at an inferior dose. 3. The safety rule works well 4. The additional admissibility rules give incremental improvements in R treat 5.Saturated model versus parsimonious PO model : Sometimes better, sometimes worse.