New Approaches to Clinical Trial Design Development of New Drugs & Predictive Biomarkers Richard...

Preview:

Citation preview

New Approaches to Clinical Trial Design

Development of New Drugs & Predictive Biomarkers

Richard Simon, D.Sc.Chief, Biometric Research Branch

National Cancer Institute

http://linus.nci.nih.gov

• Powerpoint presentations• Reprints• BRB-ArrayTools software• BRB-ArrayTools Data Archive• Sample Size Planning

– Targeted Clinical Trials– Phase II Trials– Developing Predictive Classifiers

“Biomarkers”

• Surrogate endpoints– Of treatment effect– Of patient benefit

• Prognostic marker– Pre-treatment measurement correlated with long-term

outcome

• Predictive classifier– A measurement made before treatment to predict

whether a particular treatment is likely to be beneficial

Surrogate Endpoints

• It is extremely difficult to properly validate a biomarker as a surrogate for clinical benefit. – It requires a series of randomized trials with both the

candidate biomarker and clinical outcome measured

• Biomarkers can be useful in phase I/II studies as indicators of treatment effect and need not be validated as surrogates for clinical benefit.

Predictive Classifiers

• Most cancer treatments benefit only a minority of patients to whom they are administered– Particularly true for molecularly targeted drugs

• Being able to predict which patients are likely to benefit would – save patients from unnecessary toxicity, and enhance

their chance of receiving a drug that helps them– Help control medical costs – Improve the efficiency of clinical drug development

“If new refrigerators hurt 7% of customers and failed to work for

another one-third of them, customers would expect refunds.”

BJ Evans, DA Flockhart, EM Meslin Nature Med 10:1289, 2004

Oncology Needs Predictive Markers not Prognostic Factors

Pusztai et al. The Oncologist 8:252-8, 2003

• 939 articles on “prognostic markers” or “prognostic factors” in breast cancer in past 20 years

• ASCO guidelines only recommend routine testing for ER, PR and HER-2 in breast cancer

• “With the exception of ER or progesterone receptor expression and HER-2 gene amplification, there are no clinically useful molecular predictors of response to any form of anticancer therapy.”

Oncology Needs Predictive Markers not Prognostic Factors

• Many prognostic factor studies use a convenience sample of patients for whom tissue is available. Generally the patients are too heterogeneous to support therapeutically relevant conclusions

• Targeted clinical trials can be much more efficient than untargeted clinical trials, if we know who to target

• In new drug development, the role of a predictive classifier is to select a target population for treatment– The focus should be on evaluating the new

drug in a population defined by a predictive classifier, not on “validating” the classifier

Developmental Strategy (I)

• Develop a diagnostic classifier that identifies the patients likely to benefit from the new drug

• Develop a reproducible assay for the classifier• Use the diagnostic to restrict eligibility to a

prospectively planned evaluation of the new drug

• Demonstrate that the new drug is effective in the prospectively defined set of patients determined by the diagnostic

Using phase II data, develop predictor of response to new drugDevelop Predictor of Response to New Drug

Patient Predicted Responsive

New Drug Control

Patient Predicted Non-Responsive

Off Study

Applicability of Design I

• Primarily for settings where there is a substantial biological basis for restricting development to classifier positive patients– eg HER2 expression with Herceptin

• With substantial biological basis for the classifier, it may be ethically unacceptable to expose classifier negative patients to the new drug

Evaluating the Efficiency of Strategy (I)

• Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research 10:6759-63, 2004.

• Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine 24:329-339, 2005.

• Efficiency relative to trial of unselected patients depends on proportion of patients test positive, and effectiveness of drug (compared to control) for test negative patients

• When less than half of patients are test positive and the drug has little or no benefit for test negative patients, the targeted design requires dramatically fewer randomized patients

No treatment Benefit for Assay - Patientsnstd / ntargeted

Proportion Assay Positive

Randomized Screened

0.75 1.78 1.33

0.5 4 2

0.25 16 4

Treatment Benefit for Assay – Pts Half that of Assay + Pts

nstd / ntargeted

Proportion Assay Positive

Randomized Screened

0.75 1.31 0.98

0.5 1.78 0.89

0.25 2.56 0.64

Trastuzumab

• Metastatic breast cancer• 234 randomized patients per arm• 90% power for 13.5% improvement in 1-year

survival over 67% baseline at 2-sided .05 level• If benefit were limited to the 25% assay +

patients, overall improvement in survival would have been 3.375%– 4025 patients/arm would have been required

• If assay – patients benefited half as much, 627 patients per arm would have been required

Interactive Software for Evaluating a Targeted Design

• http://linus.nci.nih.gov

Developmental Strategy (II)

Develop Predictor of Response to New Rx

Predicted Non-responsive to New Rx

Predicted ResponsiveTo New Rx

ControlNew RX Control

New RX

Developmental Strategy (II)

• Do not use the diagnostic to restrict eligibility, but to structure a prospective analysis plan.

• Compare the new drug to the control overall for all patients ignoring the classifier.– If poverall 0.04 claim effectiveness for the eligible

population as a whole

• Otherwise perform a single subset analysis evaluating the new drug in the classifier + patients– If psubset 0.01 claim effectiveness for the classifier +

patients.

Developmental Strategy (IIb)

• Do not use the diagnostic to restrict eligibility, but to structure a prospective analysis plan.

• Compare the new drug to the control for classifier positive patients – If p+>0.05 make no claim of effectiveness

– If p+ 0.05 claim effectiveness for the classifier positive patients and

• Continue accrual of classifier negative patients and eventually test treatment effect at 0.05 level

Key Features of Design (II)

• The purpose of the RCT is to evaluate treatment T vs C overall and for the pre-defined subset; not to re-evaluate the components of the classifier, or to modify or refine the classifier

The Roadmap

1. Develop a completely specified genomic classifier of the patients likely to benefit from a new drug

2. Establish reproducibility of measurement of the classifier

3. Use the completely specified classifier to design and analyze a new clinical trial to evaluate effectiveness of the new treatment with a pre-defined analysis plan.

Guiding Principle

• The data used to develop the classifier must be distinct from the data used to test hypotheses about treatment effect in subsets determined by the classifier– Developmental studies are exploratory– Studies on which treatment effectiveness

claims are to be based should be definitive studies that test a treatment hypothesis in a patient population completely pre-specified by the classifier

Use of Archived Samples

• From a non-targeted “negative” clinical trial to develop a binary classifier of a subset thought to benefit from treatment

• Test that subset hypothesis in a separate clinical trial– Prospective targeted type (I) trial– Using archived specimens from a second

previously conducted clinical trial

Development of Genomic Classifiers

• Single gene or protein based on knowledge of therapeutic target

• Single gene or protein culled from set of candidate genes identified based on imperfect knowledge of therapeutic target

• Empirically determined based on correlating gene expression to patient outcome after treatment

Use of DNA Microarray Expression Profiling

• For settings where you don’t know how to identify the patients likely to be responsive to the new treatment based on its mechanism of action

• Only pre-treatment specimens are needed

Development of Genomic Classifiers

• During phase II development or

• After failed phase III trial using archived specimens.

• Adaptively during early portion of phase III trial.

Development of Empirical Gene Expression Based Classifier

• 20-30 phase II responders are needed to compare to non-responders in order to develop signature for predicting response– Dobbin KK, Simon RM. Sample size planning

for developing classifiers using high dimensional DNA microarray data, Biostatistics 8:101-117,2007

Adaptive Signature Design An adaptive design for generating and prospectively testing a gene expression

signature for sensitive patients

Boris Freidlin and Richard SimonClinical Cancer Research 11:7872-8, 2005

Adaptive Signature DesignEnd of Trial Analysis

• Compare E to C for all patients at significance level 0.04– If overall H0 is rejected, then claim

effectiveness of E for eligible patients– Otherwise

• Otherwise:– Using only the first half of patients accrued during the

trial, develop a binary classifier that predicts the subset of patients most likely to benefit from the new treatment E compared to control C

– Compare E to C for patients accrued in second stage who are predicted responsive to E based on classifier

• Perform test at significance level 0.01

• If H0 is rejected, claim effectiveness of E for subset defined by classifier

Biomarker Adaptive Threshold Design

Wenyu Jiang, Boris Freidlin & Richard Simon

JNCI 99:1036-43, 2007

Biomarker Adaptive Threshold Design

• Randomized phase III trial comparing new treatment E to control C

• Survival or DFS endpoint

Biomarker Adaptive Threshold Design

• Have identified a predictive index B thought to be predictive of patients likely to benefit from E relative to C

• Eligibility not restricted by biomarker

• No threshold for biomarker determined

• S(b)=log likelihood ratio statistic for treatment versus control comparison in subset of patients with Bb

• Compute S(b) for all possible threshold values• Determine T=max{S(b)}• Compute null distribution of T by permuting

treatment labels– Permute the labels of which patients are in which

treatment group– Re-analyze to determine T for permuted data– Repeat for 10,000 permutations

• If the data value of T is significant at 0.05 level using the permutation null distribution of T, then reject null hypothesis that E is ineffective

• Compute point and interval estimates of the threshold b

Validation of Predictive Classifiers for Use with Available Treatments

• Should establish that the classifier is reproducibly measurable and has clinical utility

Developmental vs Validation Studies

• Developmental studies should select patients sufficiently homogeneous for addressing a therapeutically relevant question

• Developmental studies should develop a completely specified classifier

• Developmental studies should provide an unbiased estimate of predictive accuracy

Limitations to Developmental Studies

• Sample handling and assay conduct are performed under controlled conditions that do not incorporate real world sources of variability

• Small study size limits precision of estimates of predictive accuracy

• Predictive accuracy may not reflect clinical utility

Types of Clinical Utility

• Identify patients whose prognosis is sufficiently good without cytotoxic chemotherapy

• Identify patients who are likely or unlikely to benefit from a specific therapy

Prognosis Good Without Chemotherapy

• Develop prognostic classifier for patients not receiving cytotoxic chemotherapy

• Identify patients for whom– current practice standards imply

chemotherapy– Classifier indicates very good prognosis

without chemotherapy

• Withhold chemotherapy to test predictions

Prospectively Planned Validation Using Archived Materials

Oncotype-Dx• Fully specified classifier developed using

data from NSABP B20 applied prospectively to frozen specimens from NSABP B14 patients who received Tamoxifen for 5 years

• Good risk patients had very good relapse-free survival

B-14 Results—Relapse-Free Survival

338 pts

149 pts

181 pts

0 2 4 6 8 10 12 14 16

Time (yrs)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Relap

se-Free S

urvival

Low R isk (R S < 18) Intermediate R isk (R S 18 - 30) H igh R isk (RS 31)

p<0.0001

Paik et al, SABCS 2003

Prospective ValidationUS Intergroup Study

• OncotypeDx risk score <15– Tam alone

• OncotypeDx risk score >30– Tam + Chemo

• OncotypeDx risk score 15-30– Randomize to Tam vs Tam + Chemo

• Measure classifier for all patients and randomize only those for whom classifier determined therapy differs from standard of care

Types of Clinical Utility

• Identify patients whose prognosis is sufficiently good without cytotoxic chemotherapy– Identify patients whose prognosis is so good

on standard therapy S that they do not need additional treatment T

• Identify patients who are likely or unlikely to benefit from a specific systemic therapy

Developmental Strategy (II)EGFR Inhibitor in NSCLC

Develop Predictor of Response to New Rx

Predicted Non-responsive to New Rx

Predicted ResponsiveTo New Rx

ControlNew RX Control

New RX

Validation Study for Identifying Patients Who Do Not Benefit from Standard Adjuvant

Regimen• Standard adjuvant treatment S• Classifier based on previous data for identifying

patients who do not benefit from S relative to previous control C

• RCT of S vs C for patients predicted not to benefit from S– It’s difficult to go back

• Alternatively, RCT comparing S vs new treatment T for patients predicted not to benefit from S

BRB-ArrayTools

• Contains analysis tools that I have selected as valid and useful

• Targeted to biomedical scientists with analysis wizard and numerous help screens

• Imports data from all platforms and major databases• Extensive built-in gene annotation and linkage to gene

annotation websites• Extensive gene-set enrichment tools for integrating

gene expression with pathways, transcription factor targets, microRNA targets, protein domains and other biological information

Predictive Classifiers in BRB-ArrayTools

• Classifiers– Diagonal linear discriminant– Compound covariate – Bayesian compound covariate– Support vector machine with

inner product kernel– K-nearest neighbor

– Nearest centroid– Shrunken centroid (PAM)– Random forrest– Tree of binary classifiers for k-

classes

• Survival risk-group– Supervised pc’s

• Feature selection options– Univariate t/F statistic– Hierarchical variance option– Restricted by fold effect– Univariate classification power– Recursive feature elimination– Top-scoring pairs

• Validation methods– Split-sample– LOOCV– Repeated k-fold CV– .632+ bootstrap

BRB-ArrayTools

• Publicly available for non-commercial use– http://linus.nci.nih.gov/brb

Conclusions

• New technology makes it increasingly feasible to identify which patients are likely or unlikely to benefit from a specified treatment

• Targeting treatment can greatly improve the therapeutic ratio of benefit to adverse effects– Smaller clinical trials needed– Treated patients benefit– Economic benefit

Conclusions

• Some of the conventional wisdom about how to develop predictive classifiers and how to use them in clinical trial design is flawed

• Prospectively specified analysis plans for phase III studies are essential to achieve reliable results– Biomarker analysis does not mean exploratory

analysis except in developmental studies

Conclusions

• Achieving the potential of new technology requires paradigm changes in “correlative science.”

• Effective interdisciplinary research requires increased emphasis on cross education of laboratory, clinical and statistical/computational scientists

Acknowledgements

• Kevin Dobbin

• Boris Freidlin

• Wenyu Jiang

• Aboubakar Maitnourim

Recommended