Upload
ashlie-hopkins
View
219
Download
4
Tags:
Embed Size (px)
Citation preview
Predictive Analysis of Clinical Trials
Richard Simon, D.Sc.Chief, Biometric Research
BranchNational Cancer Institute
http://linus.nci.nih.gov/brb
Although the randomized clinical trial remains of fundamental importance for 21st century genomics based medicine, some of the conventional wisdom of how to design and analyze rct’s requires re-examination
In most positive phase III clinical trials comparing a new treatment to control, most of the patients treated with the new treatment do not benefit. Adjuvant breast cancer:
70% long-term disease-free survival on control
80% disease-free survival on new treatment. 70% of patients don’t need the new treatment. Of the remaining 30%, only 1/3rd benefit
Treat 10 to benefit 1
For most broad eligibility clinical trials in oncology, the primary ITT analysis has very low statistical power for the types of alternative hypothesis that are realistic to expect for molecularly targeted drugs based on the heterogeneity of histologic diagnoses
4
Cancers of a primary site and histologic type often represent a heterogeneous group of molecular diseases which vary fundamentally with regard to the oncogenic mutations that cause them
their responsiveness to specific drugs
The standard approach to designing phase III clinical
trials is based on two assumptions
Qualitative treatment by subset interactions are unlikely
“Costs” of over-treatment are less than “costs” of under-treatment
These assumptions were derived for studies of inexpensive treatments like aspirin for diseases that were presumed to be biologically homogeneous
7
Oncology therapeutics development is now focused on molecularly targeted drugs that can only be expected to be effective in a restricted set of patients whose tumors are driven by the molecular targets
Most new cancer drugs are very expensive
Standard Clinical Trial Approach
Has led to widespread over-treatment of patients with drugs to which few benefit
Is not scientifically well founded nor economically sustainable for future cancer therapeutics
Keys to developing effective drugs in oncology
The target of the drug must be central to disease invasion and progression for some sub-type of cases
Drug should be selective for the target so that it can be administered at a concentration that totally shuts down the de-regulated pathway
Need a test that identifies the patients who have disease driven by de-regulation of the target
Co-development of a new drug and test increases the complexity of development and presents new challenges for companies and regulators
To avoid unnecessary roadblocks to progress, oncologists and statisticians must Discard the components of the “aspirin” paradigms that are not science based
Avoid treating all problems as hypothesis testing problems
11
Traditional Subset Analysis
In the past often studied as un-focused post-hoc analyses Multiple tests, no control of type I error
To protect type I error, require that overall treatment is significant
Based on implicit assumption that qualitative interactions are unlikely, only evaluate treatment within a subset if there were a significant treatment by subset interaction
Neither current practices of subset analysis nor current practices of ignoring subset analysis are effective for evaluating treatments in biologically heterogeneous diseases
How can we develop new drugs in a manner more consistent with modern tumor biology and obtain reliable information about what regimens work for what kinds of patients?
When the Biology is Clear
Develop a classifier that identifies the patients likely (or unlikely) to benefit from the new drug
Develop an analytically validated test Measures what it should accurately and reproducibly
Design a focused clinical trial to evaluate effectiveness of the new treatment in test + patients
Companion Test Single gene or protein measurement
ER protein expression BCR-ABL translocation HER2 amplification EGFR mutation KRAS mutation V600E mutation ALK translocation CD20 expression
Index or classifier that summarizes expression levels of multiple genes
Using phase II data, develop predictor of response to new drug
Develop Predictor of Response to New Drug
Patient Predicted Responsive
New Drug Control
Patient Predicted Non-Responsive
Off Study
Targeted (Enrichment) Design
Evaluating the Efficiency of Evaluating the Efficiency of Targeted DesignTargeted Design
Simon R and Maitnourim A. Evaluating the Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research clinical trials. Clinical Cancer Research 10:6759-63, 2004; Correction and supplement 10:6759-63, 2004; Correction and supplement 12:3229, 200612:3229, 2006
Maitnourim A and Simon R. On the efficiency of Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine targeted clinical trials. Statistics in Medicine 24:329-339, 2005.24:329-339, 2005.
Relative efficiency of targeted Relative efficiency of targeted design depends on design depends on proportion of patients test positiveproportion of patients test positive specificity of treatment effect for specificity of treatment effect for test positive patientstest positive patients
When less than half of patients are When less than half of patients are test positive and the drug has test positive and the drug has minimal benefit for test negative minimal benefit for test negative patients, the targeted design patients, the targeted design requires dramatically fewer requires dramatically fewer randomized patients than the randomized patients than the standard design in which the marker standard design in which the marker is not usedis not used
Intended use of the companion test is identifying patients who have disease subtype for which the drug is proven effective
The Biology is Often Not So The Biology is Often Not So ClearClear
Cancer biology is complex and it is Cancer biology is complex and it is not always possible to have the not always possible to have the right single predictive classifier right single predictive classifier identified with an appropriate cut-identified with an appropriate cut-point by the time the phase 3 trial point by the time the phase 3 trial of a new drug is ready to start of a new drug is ready to start accrualaccrual
Has the sponsor selected the right biomarker?
Can we adequately pre-define a cut-point to maker the biomarker binary?
What if we have several candidate markers? E.g. Is de-regulation of the pathway best measured by protein expression of a receptor or amplification or mutation of the corresponding gene?
Does the sponsor have to establish effectiveness of the drug for every mutation of a specified gene
22
“Stratification Design”“Interaction Design”
Can of Worms Design
Develop Predictor of Response to New Rx
Predicted Non-responsive to New Rx
Predicted ResponsiveTo New Rx
Control
New RX ControlNew RX
Can of Worms Design Invites poor statistical analysis based on standard paradigm Requiring that overall analysis be significant before evaluating test + subset
Requiring that a significant interaction be demonstrated before evaluating test + subset
Requiring that the randomization be stratified by the marker before evaluating test + subset
Ethically problematic in some cases
24
What is Important Study-wise type I error be protected Trial be sized to have adequate power for the comparisons of a-priori interest Eg over-all analysis at .01 level Analysis in test + subset at .04 level
The marker be measured on all (or almost all) patients using an analytically validated test
Sample size and futility analysis to protect test - patients
25
Principle
If a drug is found safe and effective in a defined patient population, approval should not depend on finding the drug ineffective in some other population
δ+ =treatment effect in test + patients
δ− = treatment effect in test - patients
Two-point priors for δ+ and δ− with values {0,δ*}
Pr[δ+ =δ− =0] =p00
Pr[δ− =0 |δ+ =δ*] =r1
Pr[δ+ =0 |δ− =δ*] =r2
Strong confidence in test: Small r2 and large r1
Weak confidence in test: Small r2 and small r1
p00 selected to control type I error rates
Biomarker Adaptive Threshold Design
Randomized trial of E vs C Single candidate predictive biomarker score B
No threshold for biomarker determined
Candidate thresholds b1, …, bK
Eligibility not restricted by biomarker
lk =likelihood under hypothesis that treatment effect is 0 unless B≥bk
l* =max l1, l2 , ..., lK{ }
Test strong null hypothesis:Approximate null distribution of l* by permuting treatment labels, re-calculating { lk}and l* and repeatingfor 1000 permutations
logh(t, x, z)
h0 (t)
⎛
⎝⎜⎞
⎠⎟=βx+γzI (x≥b)
z=0,1 treatment indicatorx=biomarker valueI=indicator function
Compute Bootstrap Confidence Intervals for Threshold b
Select i'th bootstrap sample of n patients by sampling with replacement
the survival times and biomarker values of patients in dataset
L(b,β,γ)=log partial likelihood of bootstrap sample
L* (b) = maximum value of L(b,β,γ) for fixed value of b i.e. profile likelihood
Compute L* (b) for a grid of b values and calculate
bi =argmax L* (b){ }Repeat for 1000 bootstrap samples
Tabulate for all x∈[0,1]
F(x) =#bi ≤x1000
⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
The confidence interval for the cut-point can be used to inform treatment decisions for future
patients
For patient with marker value x
Pr[patient benefits from new rx] =Pr[x≥b] ≈F(x)
F(x) =Cumulative distribution of b from bootstrap
Key Points It can be beneficial not to define a cut-point for the biomarker prior to conducting the phase III clinical trial The phase II database may be inadequate with regard to number of cases, lack of control group, different endpoint
The only thing that stands in the way of a more informative phase III trial is the aspirin paradigm that the ITT analysis of the eligible population is required to serve as a basis for approval 38
Learn (the right target population)
and Confirm
Adaptive Signature Design
Adaptive Signature Design
Randomized trial comparing E to C End of trial analysis: Compare E to C for all patients at reduced significance level p0 (e.g. 0.01) If overall H0 is rejected, then claim effectiveness of E for eligible patients
Otherwise Can be used with any set of candidate predictive variables, not just high dimensional genomic measurements
Using only an unbiasedly selected Using only an unbiasedly selected subset of patients of pre-specified subset of patients of pre-specified size (e.g. 1/3) to be used as a size (e.g. 1/3) to be used as a training set training set TT, develop a binary , develop a binary “indication classifier” M of whether “indication classifier” M of whether a patient is likely to benefit from E a patient is likely to benefit from E relative to Crelative to C The classifier may use a single marker The classifier may use a single marker selected from candidates or multiple selected from candidates or multiple markersmarkers
The classifier classifies patients into The classifier classifies patients into only 2 subsets; those predicted to only 2 subsets; those predicted to benefit from E and those not predicted to benefit from E and those not predicted to benefit from Ebenefit from E
Apply the classifier M to Apply the classifier M to classify patients in the classify patients in the validation set validation set V=D-TV=D-T
Compare E vs C in the subset of Compare E vs C in the subset of V V who are predicted to benefit who are predicted to benefit from E using a threshold of from E using a threshold of significance of 0.05 – psignificance of 0.05 – p00
The indication classifier is not a binary classifier of whether a patient has good prognosis or poor prognosis
It is a “two sample classifier” of whether the prognosis of a patient on E is better than the prognosis of the patient on C
The indication classifier can be a binary classifier that maps the vector of candidate covariates into {E,C} indicating which treatment is predicted superior for that patient
The classifier need not use all the covariates but variable selection must be determined using only the training set Variable selection may be based on selecting variables with apparent interactions with treatment, with cut-off for variable selection determined by cross-validation within training set for optimal classification
The indication classifier can be a probabilistic classifier 46
47
log λ(t,x,z) / λ0 (t)( ) =αz+ βx1 + zγx2z=(0,1) treatment indicatorx1=score summarizing contribution of multiple variables selected based on their univariate prognostic effectx2=score summarizing contribution of multiple variables selected based on their univariate interaction with treatment
Fit model to obtain estimates α, β,γ
log λ(t,x,z=1) / λ0 (t)( ) =α + βx1 +γx2log λ(t,x,z=0) / λ0 (t)( ) =0 + βx1 + 0
E preferred for patient if α + γx2 ≤ Δ
Δ based on the minimally clinically significant treatment effect
For each patient in validation set, compute the score x2
corresponding to the patient's covariate vector x
Classify the patient as indication + if α+γx2 ≤Δ
Compute Kaplan-Meier curves of treatments for indication + patients in validation set
Compute log-rank tests comparing Kaplan-Meier
curves in indication set
Key Idea
Replace multiple significance testing by development of one indication classifier and obtain unbiased estimates of the properties of that classifier if used on future patients
This approach can also be used This approach can also be used to identify the subset of to identify the subset of patients who don’t benefit from patients who don’t benefit from E in cases where E is superior E in cases where E is superior to C overall to C overall
Key Ideas
• Replace multiple significance testing by development of one indication classifier
• Control study-wise type I error for significance test of
• Overall average treatment effect
• Treatment effect in classifier positive patients
• Test of treatment effect in classifier positive patients does not depend on significance of overall test nor on significant interaction
• Obtain unbiased or conservative estimate of the treatment effect of future classifier positive patients
Figure 1: Overall analysis. The value of the log-rank statistic is 2.9 and the corresponding p-value is 0.09. The new treatment thus shows no benefit overall at the
0.05 level.
Figure 2: Cross-validated survival curves for patients predicted to benefit from the new treatment. log-rank
statistic = 10.0, permutation p-value is .002
Figure 3: Survival curves for cases predicted not to benefit from the new treatment. The value of the log-rank
statistic is 0.54.
Closing• Developments in biotechnology and
tumor genomics provide important opportunities to develop improved drugs and companion diagnostics
• The established molecular heterogeneity of human cancer requires the use new approaches to the development and evaluation of therapeutics
Closing
• Maximizing opportunities to develop improved treatments and providing useful labeling information creates major challenges for industry and for regulators
• To help achieve these objectives statisticians must re-examine the standard paradigms and limit their seeing all problems as hypothesis testing problems
Acknowledgements
• Boris Freidlin
• Wenyu Jiang
• Stella Karuri
• Aboubakar Maitournam
• Shigeyuki Matsui