Predictive Analysis of Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute

Predictive Analysis of Clinical Trials

Richard Simon, D.Sc.Chief, Biometric Research

BranchNational Cancer Institute

http://linus.nci.nih.gov/brb

Although the randomized clinical trial remains of fundamental importance for 21st century genomics based medicine, some of the conventional wisdom of how to design and analyze rct’s requires re-examination

In most positive phase III clinical trials comparing a new treatment to control, most of the patients treated with the new treatment do not benefit. Adjuvant breast cancer:

70% long-term disease-free survival on control

80% disease-free survival on new treatment. 70% of patients don’t need the new treatment. Of the remaining 30%, only 1/3rd benefit

Treat 10 to benefit 1

For most broad eligibility clinical trials in oncology, the primary ITT analysis has very low statistical power for the types of alternative hypothesis that are realistic to expect for molecularly targeted drugs based on the heterogeneity of histologic diagnoses

4

Cancers of a primary site and histologic type often represent a heterogeneous group of molecular diseases which vary fundamentally with regard to the oncogenic mutations that cause them

their responsiveness to specific drugs

The standard approach to designing phase III clinical

trials is based on two assumptions

Qualitative treatment by subset interactions are unlikely

“Costs” of over-treatment are less than “costs” of under-treatment

These assumptions were derived for studies of inexpensive treatments like aspirin for diseases that were presumed to be biologically homogeneous

7

Oncology therapeutics development is now focused on molecularly targeted drugs that can only be expected to be effective in a restricted set of patients whose tumors are driven by the molecular targets

Most new cancer drugs are very expensive

Standard Clinical Trial Approach

Has led to widespread over-treatment of patients with drugs to which few benefit

Is not scientifically well founded nor economically sustainable for future cancer therapeutics

Keys to developing effective drugs in oncology

The target of the drug must be central to disease invasion and progression for some sub-type of cases

Drug should be selective for the target so that it can be administered at a concentration that totally shuts down the de-regulated pathway

Need a test that identifies the patients who have disease driven by de-regulation of the target

Co-development of a new drug and test increases the complexity of development and presents new challenges for companies and regulators

To avoid unnecessary roadblocks to progress, oncologists and statisticians must Discard the components of the “aspirin” paradigms that are not science based

Avoid treating all problems as hypothesis testing problems

11

Traditional Subset Analysis

In the past often studied as un-focused post-hoc analyses Multiple tests, no control of type I error

To protect type I error, require that overall treatment is significant

Based on implicit assumption that qualitative interactions are unlikely, only evaluate treatment within a subset if there were a significant treatment by subset interaction

Neither current practices of subset analysis nor current practices of ignoring subset analysis are effective for evaluating treatments in biologically heterogeneous diseases

How can we develop new drugs in a manner more consistent with modern tumor biology and obtain reliable information about what regimens work for what kinds of patients?

When the Biology is Clear

Develop a classifier that identifies the patients likely (or unlikely) to benefit from the new drug

Develop an analytically validated test Measures what it should accurately and reproducibly

Design a focused clinical trial to evaluate effectiveness of the new treatment in test + patients

Companion Test Single gene or protein measurement

ER protein expression BCR-ABL translocation HER2 amplification EGFR mutation KRAS mutation V600E mutation ALK translocation CD20 expression

Index or classifier that summarizes expression levels of multiple genes

Using phase II data, develop predictor of response to new drug

Develop Predictor of Response to New Drug

Patient Predicted Responsive

New Drug Control

Patient Predicted Non-Responsive

Off Study

Targeted (Enrichment) Design

Evaluating the Efficiency of Evaluating the Efficiency of Targeted DesignTargeted Design

Simon R and Maitnourim A. Evaluating the Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research clinical trials. Clinical Cancer Research 10:6759-63, 2004; Correction and supplement 10:6759-63, 2004; Correction and supplement 12:3229, 200612:3229, 2006

Maitnourim A and Simon R. On the efficiency of Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine targeted clinical trials. Statistics in Medicine 24:329-339, 2005.24:329-339, 2005.

Relative efficiency of targeted Relative efficiency of targeted design depends on design depends on proportion of patients test positiveproportion of patients test positive specificity of treatment effect for specificity of treatment effect for test positive patientstest positive patients

When less than half of patients are When less than half of patients are test positive and the drug has test positive and the drug has minimal benefit for test negative minimal benefit for test negative patients, the targeted design patients, the targeted design requires dramatically fewer requires dramatically fewer randomized patients than the randomized patients than the standard design in which the marker standard design in which the marker is not usedis not used

Intended use of the companion test is identifying patients who have disease subtype for which the drug is proven effective

The Biology is Often Not So The Biology is Often Not So ClearClear

Cancer biology is complex and it is Cancer biology is complex and it is not always possible to have the not always possible to have the right single predictive classifier right single predictive classifier identified with an appropriate cut-identified with an appropriate cut-point by the time the phase 3 trial point by the time the phase 3 trial of a new drug is ready to start of a new drug is ready to start accrualaccrual

Has the sponsor selected the right biomarker?

Can we adequately pre-define a cut-point to maker the biomarker binary?

What if we have several candidate markers? E.g. Is de-regulation of the pathway best measured by protein expression of a receptor or amplification or mutation of the corresponding gene?

Does the sponsor have to establish effectiveness of the drug for every mutation of a specified gene

22

“Stratification Design”“Interaction Design”

Can of Worms Design

Develop Predictor of Response to New Rx

Predicted Non-responsive to New Rx

Predicted ResponsiveTo New Rx

Control

New RX ControlNew RX

Can of Worms Design Invites poor statistical analysis based on standard paradigm Requiring that overall analysis be significant before evaluating test + subset

Requiring that a significant interaction be demonstrated before evaluating test + subset

Requiring that the randomization be stratified by the marker before evaluating test + subset

Ethically problematic in some cases

24

What is Important Study-wise type I error be protected Trial be sized to have adequate power for the comparisons of a-priori interest Eg over-all analysis at .01 level Analysis in test + subset at .04 level

The marker be measured on all (or almost all) patients using an analytically validated test

Sample size and futility analysis to protect test - patients

25

Principle

If a drug is found safe and effective in a defined patient population, approval should not depend on finding the drug ineffective in some other population

δ+ =treatment effect in test + patients

δ− = treatment effect in test - patients

Two-point priors for δ+ and δ− with values {0,δ*}

Pr[δ+ =δ− =0] =p00

Pr[δ− =0 |δ+ =δ*] =r1

Pr[δ+ =0 |δ− =δ*] =r2

Strong confidence in test: Small r2 and large r1

Weak confidence in test: Small r2 and small r1

p00 selected to control type I error rates

Biomarker Adaptive Threshold Design

Randomized trial of E vs C Single candidate predictive biomarker score B

No threshold for biomarker determined

Candidate thresholds b1, …, bK

Eligibility not restricted by biomarker

lk =likelihood under hypothesis that treatment effect is 0 unless B≥bk

l* =max l1, l2 , ..., lK{ }

Test strong null hypothesis:Approximate null distribution of l* by permuting treatment labels, re-calculating { lk}and l* and repeatingfor 1000 permutations

logh(t, x, z)

h0 (t)

⎛

⎝⎜⎞

⎠⎟=βx+γzI (x≥b)

z=0,1 treatment indicatorx=biomarker valueI=indicator function

Compute Bootstrap Confidence Intervals for Threshold b

Select i'th bootstrap sample of n patients by sampling with replacement

the survival times and biomarker values of patients in dataset

L(b,β,γ)=log partial likelihood of bootstrap sample

L* (b) = maximum value of L(b,β,γ) for fixed value of b i.e. profile likelihood

Compute L* (b) for a grid of b values and calculate

bi =argmax L* (b){ }Repeat for 1000 bootstrap samples

Tabulate for all x∈[0,1]

F(x) =#bi ≤x1000

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

The confidence interval for the cut-point can be used to inform treatment decisions for future

patients

For patient with marker value x

Pr[patient benefits from new rx] =Pr[x≥b] ≈F(x)

F(x) =Cumulative distribution of b from bootstrap

Key Points It can be beneficial not to define a cut-point for the biomarker prior to conducting the phase III clinical trial The phase II database may be inadequate with regard to number of cases, lack of control group, different endpoint

The only thing that stands in the way of a more informative phase III trial is the aspirin paradigm that the ITT analysis of the eligible population is required to serve as a basis for approval 38

Learn (the right target population)

and Confirm

Adaptive Signature Design

Adaptive Signature Design

Randomized trial comparing E to C End of trial analysis: Compare E to C for all patients at reduced significance level p0 (e.g. 0.01) If overall H0 is rejected, then claim effectiveness of E for eligible patients

Otherwise Can be used with any set of candidate predictive variables, not just high dimensional genomic measurements

Using only an unbiasedly selected Using only an unbiasedly selected subset of patients of pre-specified subset of patients of pre-specified size (e.g. 1/3) to be used as a size (e.g. 1/3) to be used as a training set training set TT, develop a binary , develop a binary “indication classifier” M of whether “indication classifier” M of whether a patient is likely to benefit from E a patient is likely to benefit from E relative to Crelative to C The classifier may use a single marker The classifier may use a single marker selected from candidates or multiple selected from candidates or multiple markersmarkers

The classifier classifies patients into The classifier classifies patients into only 2 subsets; those predicted to only 2 subsets; those predicted to benefit from E and those not predicted to benefit from E and those not predicted to benefit from Ebenefit from E

Apply the classifier M to Apply the classifier M to classify patients in the classify patients in the validation set validation set V=D-TV=D-T

Compare E vs C in the subset of Compare E vs C in the subset of V V who are predicted to benefit who are predicted to benefit from E using a threshold of from E using a threshold of significance of 0.05 – psignificance of 0.05 – p00

The indication classifier is not a binary classifier of whether a patient has good prognosis or poor prognosis

It is a “two sample classifier” of whether the prognosis of a patient on E is better than the prognosis of the patient on C

The indication classifier can be a binary classifier that maps the vector of candidate covariates into {E,C} indicating which treatment is predicted superior for that patient

The classifier need not use all the covariates but variable selection must be determined using only the training set Variable selection may be based on selecting variables with apparent interactions with treatment, with cut-off for variable selection determined by cross-validation within training set for optimal classification

The indication classifier can be a probabilistic classifier 46

47

log λ(t,x,z) / λ0 (t)( ) =αz+ βx1 + zγx2z=(0,1) treatment indicatorx1=score summarizing contribution of multiple variables selected based on their univariate prognostic effectx2=score summarizing contribution of multiple variables selected based on their univariate interaction with treatment

Fit model to obtain estimates α, β,γ

log λ(t,x,z=1) / λ0 (t)( ) =α + βx1 +γx2log λ(t,x,z=0) / λ0 (t)( ) =0 + βx1 + 0

E preferred for patient if α + γx2 ≤ Δ

Δ based on the minimally clinically significant treatment effect

For each patient in validation set, compute the score x2

corresponding to the patient's covariate vector x

Classify the patient as indication + if α+γx2 ≤Δ

Compute Kaplan-Meier curves of treatments for indication + patients in validation set

Compute log-rank tests comparing Kaplan-Meier

curves in indication set

Key Idea

Replace multiple significance testing by development of one indication classifier and obtain unbiased estimates of the properties of that classifier if used on future patients

This approach can also be used This approach can also be used to identify the subset of to identify the subset of patients who don’t benefit from patients who don’t benefit from E in cases where E is superior E in cases where E is superior to C overall to C overall

Key Ideas

• Replace multiple significance testing by development of one indication classifier

• Control study-wise type I error for significance test of

• Overall average treatment effect

• Treatment effect in classifier positive patients

• Test of treatment effect in classifier positive patients does not depend on significance of overall test nor on significant interaction

• Obtain unbiased or conservative estimate of the treatment effect of future classifier positive patients

Figure 1: Overall analysis. The value of the log-rank statistic is 2.9 and the corresponding p-value is 0.09. The new treatment thus shows no benefit overall at the

0.05 level.

Figure 2: Cross-validated survival curves for patients predicted to benefit from the new treatment. log-rank

statistic = 10.0, permutation p-value is .002

Figure 3: Survival curves for cases predicted not to benefit from the new treatment. The value of the log-rank

statistic is 0.54.

Closing• Developments in biotechnology and

tumor genomics provide important opportunities to develop improved drugs and companion diagnostics

• The established molecular heterogeneity of human cancer requires the use new approaches to the development and evaluation of therapeutics

Closing

• Maximizing opportunities to develop improved treatments and providing useful labeling information creates major challenges for industry and for regulators

• To help achieve these objectives statisticians must re-examine the standard paradigms and limit their seeing all problems as hypothesis testing problems

Acknowledgements

• Boris Freidlin

• Wenyu Jiang

• Stella Karuri

• Aboubakar Maitournam

• Shigeyuki Matsui

Documents

Predictive Analysis of Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute