Upload
vuongdung
View
214
Download
1
Embed Size (px)
Citation preview
1
Interim Analysis in Clinical Trials
Professor Bikas K Sinha [ ISI, KolkatA ]Courtesy : Dr Gajendra Viswakarma
Visiting ScientistIndian Statistical Institute
Tezpur Centree-mail: [email protected]
2
What is a clinical trial?
A test of a new intervention or treatment on people for detecting
-Tolerability
-Safety
-Efficacy
A Clinical trial is defined as a prospective study comparing the effect and value of intervention (s) against a control in human beings.
3
Types of clinical trials
Superiority
Non-inferiority
Equivalence
It can be a Phase I, Phase II or Phase III Trial
4
Diagrammatical Presentation of Clinical Trials
Control better Test better- 0
equivalence
non-inferior
superior
5
Clinical Trial Stages
Phase I: Clinical Pharmacology and Toxicity
Objective: To determine a safe drug dose for further studies of therapeutic efficacy of the drug
Design: Dose-escalation to establish a maximum tolerated dose (MTD) for a new drug
Subjects: 1-10 normal volunteers or patients with disease
6
Clinical Trial Stages
Phase II: Initial Clinical Investigation for Treatment Effect
Is a fairly small-scale
Objective: To get preliminary information on effectiveness and safety of the drug
Design: Often single arm (no control group)
Subjects: 100-500 patients with disease (or depends on Therapeutic Area [TA])
7
Clinical Trial StagesPhase III: Full-Scale Evaluation of the Treatment (Comparative clinical trial): planned experiment on human subjects. To some people the term “Clinical trial” is synonymous with such a full-scale Phase III trial.
Phase III trial is most rigorous and extensive type of scientific clinical investigation of a new treatment.
Objective: To compare efficacy of the new treatment with the standard regimenDesign: Randomized ControlSubjects: depends on phase II trial patients with disease
8
Clinical Trial Stages
Phase IV: Post-Marketing
After the research program leading to a drug being approved for marketing, there remain substantial inquiries still to be undertaken as regards monitoring for adverse effects and additional large-scale, long-term studies of morbidity and mortality.
Objective: To get more information (long-term side effects)
Design: no control group
Subjects: Patients with disease using the treatment
10
… So What is Different?
Ethics: Experiment involving human subjects brings up new ethical issuesBias: Experiment on intelligent subjects requires new measures of control
We will also study the additional considerations in clinical trials
to address the above requirements.
11
Interim Analysis
Analysis comparing intervention groups at any time before the formal completion of the trial, usually before recruitment is complete.
Often used with "stopping rules" so that a trial can be stopped if participants are being put at risk unnecessarily.
Timing and frequency of interim analyses should be specified in the protocol.
12
Interim Analyses
Interim analyses is a tool to protect the welfare of subjects
By stopping enrollment/treatment as soon as a drug is determined to be harmfulBy stopping enrollment as soon as a drug is determined to be highly beneficialBy stopping trials which will yield little additional useful information (or which have negligible chance of demonstrating efficacy if fully enrolled, given results to date)
The associated statistical methods are generally referred to as group sequential methods
13
Flowchart of the Study
Visit 6
T2
T1
Visit 5End of treatment
Visit 4
Control
Visit 1Enrolment
Visit 2 Visit 3
15 days to 4 weeks
4 weeks 4 weeks 4 weeks
Treatment-free follow upTreatment period
4 weeks 4 weeks
Screening
Test (safe dose determined)
Visit 7
4 weeks
Required Sample size of the study is 330 (each are required 110 subjects)
14
Disposition Table on going study
Drug C Drug T1 Drug T2 Total
Patient Screened 129
Screening Failure 23
Patient Randomized 36 36 34 106
Study Incomplete + ongoing 9+5 8+5 10+3 28+12
Completed Visits 5+ 22 23 21 66
15
Mean PASI Change at Visits in Different Treatment Groups
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
V1 V2 V3 V4 V5
Visit
Mea
n PA
SIDrug A Drug B Drug C
16
Some Examples of Why a Trial May Be Terminated
Treatments found to be convincingly differentTreatments found to be convincingly not differentSide effects or toxicities are too severeData quality is poorAccrual is slowDefinitive information becomes available from an outside source making trial unnecessary or unethicalScientific question is no longer importantAdherence to treatment is unacceptably lowResources to perform study are lost or diminishedStudy integrity has been undermined by fraud or misconduct
17
Opposing Pressures in Interim Analyses
To Terminate:minimize size of trialminimize number of patients on inferior armcosts and economicstimeliness of results
To Continue:increase precisionreduce errorsincrease powerincrease ability to look at subgroupsgather information on secondary endpoints
18
The pitfalls of interim analyses
RCTs [Randomized Clinical Trials] with interim analysis
1. Calculate sample size2. Carry out the clinical trial3. Employ statistical test of efficacy at pre-planned
stages in the interim until sample size has been reached*
*One treatment declared significantly better than the other if we get a p-value less than 5%.....
19
Statistical Considerations in Interim Analyses
Consider a safety/efficacy study (phase II)“At this point in time, is there statistical evidence that….”
The treatment will not be as efficacious as we would hope/need it to be?The treatment is clearly dangerous/unsafe?The treatment is very efficacious and we should proceed to a comparative trial?
20
Consider a comparative study (phase III)“At this point in time, is there statistical evidence that….”
One arm is clearly more effective than the other?One arm is clearly dangerous/unsafe?The two treatments have such similar responses that there is no possibility that we will see a significant difference by the end of the trial?
Statistical Considerations in Interim Analyses
21
We use interim statistical analyses to determine the answers to these questions.It is a tricky business:
interim analyses involve relatively few data pointsinferences can be inexactwe increase chance of errors.if interim results are conveyed to investigators, a bias may be introducedin general, we look for strong evidence in one or another direction.
Statistical Considerations in Interim Analyses
22
Example: ECMO trialExtra-corporeal membrane oxygenation (ECMO) versus standard treatment for newborn infants with persistent pulmonary hypertension.N = 39 infants enrolled in studyTrial terminated after interim analysis
4/10 deaths in standard therapy arm0/9 deaths in ECMO armp = 0.054 (one-sided)
Questions:Is this result sufficient evidence on which to change routine practice?Is the evidence in favor of ECMO very strong?
23
Example: ISIS trialThe Second International Study of Infarct Survival (ISIS-2) Five week study of streptokinase versus placebo based on 17,187 patients with myocardial infarction. Trial continued until
12% death rate in placebo group9.2% death rate in streptokinase groupp < 0.000001
Issues:strong evidence in favor of streptokinase was available early onimpact would be greater with better precision on death rate, which would not be possible if trial stopped earlyearlier trials of streptokinase has similar results, yet little impact.
24
Statistical Approaches for Interim AnalysisThree main philosophic approaches
Frequentist approach:Multiple LooksGroup Sequential Designs
Stopping BoundariesAlpha Spending Functions
Two Stage DesignsLikelihood approachBayesian approachAll differ in their approachesFrequentist (Multiple Looks) is most commonly seen ( but not necessarily the best ! )
25
RCT (Randomized Clinical Trial with Trt A vs Trt B): Required Sample Size: 200
TRT A100
TRT B100
An Example of “Multiple Looks:”
26
Four interim looks (50, 100, 150, and 200)
TRT A100
TRT B100
1st Interim lookP = 0.028
An Example of “Multiple Looks:”
27
Four interim looks (50, 100, 150, and 200)
TRT A100
TRT B100
2nd Interim lookP = 0.38
An Example of “Multiple Looks:”
28
Four interim looks (50, 100, 150, and 200)
TRT A100
TRT B100
P = 0.028 P = 0.38 P = 0.62 P = 1.00
An Example of “Multiple Looks:”
P = 0.028
29
An Example of “Multiple Looks:”Consider planning a comparative trial in which two treatments are being compared for efficacy (response rate).
H0: p2 = p1
H1: p2 > p1
A standard design says that for 80% power and with alpha of 0.05, you need about 100 patients per arm based on the assumption p2 = 0.50, p1= 0.30 which results in 0.20 for the difference. So what happens if we find p < 0.05 before all patients are enrolled ?Why can’t we look at the data a few times in the middle of the trial and conclude that one treatment is better if we see p < 0.05?
30
The plots to the right show simulated data where p1= 0.40 and p2 = 0.50
In our trial, looking to find a difference between 0.30 to 0.50, we would not expect to conclude that there is evidence for a difference.
However, if we look after every 4 patients, we get the scenario where we would stop at 96 patients and conclude that there is a significant difference.
Number of Patients
Ris
k R
atio
0 50 100 150 200
0.0
0.5
1.0
1.5
Number of Patients
pval
ue
0 50 100 150 200
0.2
0.4
0.6
0.8
1.0
H1
31
If we look after every 10 patients, we get the scenario where we would not stop until all 200 patients were observed and would conclude that there is not a significant difference (p =0.40)
Number of Patients
Ris
k R
atio
50 100 150 200
1.0
1.2
1.4
1.6
Number of Patients
pval
ue
50 100 150 200
0.2
0.4
0.6
0.8
1.0
H 1
32
If we look after every 40 patients, we get the scenario where we would not stop either.
If we wait until the END of the trial (N = 200), then we estimate p1 to be 0.45 and p2 to be 0.52. The p-value for testing that there is a significant difference is 0.40.
Number of Patients
Ris
k R
atio
50 100 150 200
1.0
1.2
1.4
Number of Patients
pval
ue
50 100 150 200
0.2
0.4
0.6
0.8
1.0
H1
33
Would we have messed up if we looked early on?
Every time we look at the data and consider stopping, we introduce the chance of falsely rejecting the null hypothesis.In other words, every time we look at the data, we have the chance of a type 1 error.If we look at the data multiple times, and we use alpha of 0.05 as our criterion for significance, then we have a 5% chance of stopping each time.Under the true null hypothesis and just 2 looks at the data, then we “approximate” the error rates as:
Probability stop at first look: 0.05Probability stop at second look: 0.95*0.05 = 0.0475Total probability of stopping is 0.0975
Effect of Sample Size on a True Proportion
n\p^ 0.20 0.30 0.40 0.50 0.60 10 0, .45 0, .60 .1, .7 .18, .82 .3, .920 .02,.38 .1, .5 .18, .62 .28, .72 .38, .8230 .05, .35 .42, .7840 .07, .33 .35, .7550 .09, .31 p^ +/- 2 sqrt{p^(1-p^)/n} .36, .74100 .12, .28 serve as both-sided .50, .70200 .15, .25 limits to TRUE p .53, .67300 .16, .24 .54, .6634
Effect of Sample Size on a True Proportion
n\p^ 0.2 0.3 0.4 0.5 0.6 400 0.16, 0.24500 0.17, 0.231000 .175, .225 1500 .18, .222000 .182, .218 p^ +/- 2 sqrt{p^(1-p^)/n}3000 .185, .215 serve as both-sided limits4000 .19, .21 for TRUE p5000 .19, .21
35
Illustrative Examples :Interim AnalysisExample 1. It is desired to carry out an experimentto examine the superiority, or otherwise, of a thera-peutic drug over a standard drug with 5% level and90% power for detection of 10% difference in the proportions ‘cured’. ‘C’ : Standard Drug ‘T’ : Therapeutic DrugH_0 : P_C - P_T = 0H_1 : P_C # P_TSize = 0.05, Power = 0.90 for =P_T – P_C = 0.10.IT IS A BOTH-SIDED TEST.
36
Determination of Sample Size for Full Analysis
37
Two-sided Test = 0.05; Z_ /2 = 1.96
Power = 0.90; = 0.10, Z_ = 1.282, =0.10N = 2(Z_ /2 + Z_ )^2 pbar(1-pbar)/ ^2
Assume pbar = 0.35 [suggestive cure rate] N = 2(1.96 + 1.282)^2 (0.35)(0.65)/(0.10)^2
= 21.021128 x 22.75= 478.23……480Conclusion: Each arm involves 480 subjects.
Full Experiment vs. Interim AnalysisFor Full Experiment : Needed 480 subjects in each ‘arm’.At the end of the entire experiment, suppose we observe :‘C’ : # cured = 156 out of 480 i.e., 32.5%‘T’ : # cured = 190 out of 480 i.e., 39.6%Therefore, p^_C = 0.325 and p^_T = 0.396.Hence, pbar = [p^_C + p^_T]/2 = 0.3605.Finally, we compute the value of z given by
38
Full Analysis…..Z_obs. = [p^_C – p^_T]/sqrt[pbar(1-pbar)2/N]
=[.325-.396]/sqrt[.36x.64x2/480] = -[.071]/sqrt[0.00192] = -2.29
In absolute value, z_obs. is computed as 2.29 which is more than the ‘critical’ value of z given by 1.96 [for a both-sided test with size 5%]. Hence, we conclude that the Null Hypothesis is ‘not tenable’, given the experimental outputs.
39
Interim Analysis : 2 ‘Looks’First Look : use 50% of data2nd Look : At the end, if continued after 1st.Q. What is the size of the test at 1st look ?
Also, what is the size at the 2nd look so that on the whole the size is 5 % ?
Ans. If we use 5% for the size at each of 1st and 2nd looks, then the over-all size becomes 8%.Hence……both can NOT be taken at 5%. Start with < 5% and then take > 5%.....
40
Interim Analysis : 2 Looks Defining Equation :
= P[ Z_I > z*] + P[ Z_I < z*, Z_{I,II} > z**] where Z_I and Z_II are based on 50% data in two identical and independent segments so that their distributions are identical. Further, Z_{I,II} = [z_I + z_II]/sqrt(2) is based on combined evidence of I & II and hence Z_I and Z_{I,II} are dependent.Choices of z* and z** : intricate formulae.
41
Interim Analysis : 2 Looks Z-computation….z_I obs. is to be based on 50% data upto the 1st look for each of ‘C’ and ‘T’.Data : C (90/240) & T(120/240) & n = 240.p^_C = 90/240 = 0.375; p^_T = 120/240=0.50pbar = (0.375 + 0.50)/2 = 0.4375.z_I obs. = [p^_C – p^_T]/sqrt[pbar(1-pbar)2/n]
= - [ 0.125 ]/sqrt{.4375x.5625x2/240}= - (0.125)/sqrt{0.002050}= - 2.76 implies ???
42
Interim Analysis : 2 Looks
43
Suggested cut-off points :Adopted for 2 Looks z_c Hebittle-Peto Pocock O’Brien-Fleming
z* 3.0 2.46 3.5z** 2.0 2.46 2.0 z_I obs. in absolute value = 2.76Conclusion ? Reject H_0 ….suggested by Pocock’s RuleContinue …suggested by other two. Finally, z = - 2.29 suggests acceptance of H_0 only by Pocock’s rule
Interim Analysis : 4 Looks Cut-off points : Suggested Rulesz_c Hebittle-Peto Pocock O’Brien-Fleming
z* 3.0 2.42 4.00z** 3.0 2.42 2.83
z*** 3.0 2.42 2.32z**** 2.0 2.42 2.00
• : 1st look; ** : 2nd look; *** : 3rd look and • **** : last [4th] look
44
Interim Analysis : 4 Looks Details of data sets :C : 48/120; 42/120; 30/120; 36/120 …Total
156/480T : 54/120; 66/120; 32/120; 38/120 …Total
190/480Progressive proportions for ‘C’ :48/120=0.40; (48+42)/240= 0.375;(48+42+30)/360=0.333; 156/480=0.325 Progressive proportions for ‘T’ :54/120=0.45; (54+66)/240= 0.50;(54+ 66+32)/360=0.422; 190/480=0.39645
Interim Analysis : 4 Looks
Progressive computations of pbar……1st Look : pbar = (0.40 + 0.45)/2 = 0.4252nd Look : pbar = (0.375 + 0.50)/2 = 0.43753rd Look : pbar = ( 0.333 + 0.422)/2 = 0.36394th Look : pbar = (0.325 + 0.396)/2 = 0.3605
46
Interim Analysis : 4 Looks
Progressive Computations of z-statistic Generic Formula : z-obs. for ‘Look # i’ is the ratio of (a) [p^_C(i)– p^_T(i)] for i-th Look (b) sqrt[pbar(i)(1-pbar(i))2/n(i)]where pbar(i) corresponds to Look # i and also ‘n(i) ’ corresponds to size of each armof Look # i for each i = 1, 2, 3,4.
Note : n(1)=120; n(2)=240; n(3)=360, n(4)=480 47
Interim Analysis : 1st Look z_(Look I) obs.
= [p^_C – p^_T]/sqrt[pbar(1-pbar)2/n*]= [ 0.40-0.45 ]/sqrt{.425x.575x2/120}
= - (0.05)/sqrt{0.004073}= -0.7835
Conclusion : All Rules are suggestive of Continuation to 2nd Look
48
Interim Analysis : 2nd Look
z_(Look II) obs. = [p^_C – p^_T]/sqrt[pbar(1-pbar)2/n**]= [0.375-0.50 ]/sqrt{.4375x.5625x2/240}= - (0.125)/sqrt{0.002050}= - 2.76
Conclusion : Reject H_0 by Pocock’s RuleHowever, continue to 3rd Look according to the other two rules.
49
Interim Analysis : 3rd Look and …z_(Look III) obs.
= [p^_C – p^_T]/sqrt[pbar(1-pbar)2/n***]= [0.333-0.422 ]/sqrt{.3639x.6361x2/360}= - (0.089)/sqrt{0.001286}= - 2.48
Conclusion : Reject H_0 by Pocock & OBF Rules but Continue by H-P RuleLast Look : z_obs. = -2.29
Accept H_0 by Pocock’s Rule only
50
Data Analysis….InterpretationsRelative Merits of Decision Rules :Pocock’s Rule : Maintains uniformity in critical values ….so …apparently ‘conservative’ at the start…slowly turns into ‘liberal’ !Other Rules : Liberal at the start and conservative at the end…..All Rules have to maintain the ‘averaging principle’ to meet alpha at the end.No Rule can be strict/liberal all through the Looks.
51
Interim Analysis : Example 2Continuous data : Testing for equality of mean effects of two treatments : ’C’ & ’T’. As before, we have Null and Alt. Hypotheses and we have a specified value of
DELTA = Mean of T – Mean of C and a specified power, say 90% to detect this. Taking size equal to 5%, we solve for the sample size in each arm.This is routine computation and we take sample size N = 525 in each arm.
Full Analysis : Sample Size ComputationAssume normal distribution with sigma = 5.Two-sided Test
= 0.05; Z_ /2 = 1.96Power = 0.90; = 0.10, Z_ = 1.282,
= 0.20 times sigma = 20% of sigma = 1.0N = 2(Z_ /2 + Z_ )^2 x sigma^2 / ^2
= 2(1.96 + 1.282)^2 / 0.04 = 525 [approx.]
We can think of 5 Looks altogether…at equalSteps…..each with approx. 105 observations.
Interim Analysis…Example contd.
Details of data sets : (mean, sample size)C : (30.5,105); (31.8, 105); (29.7, 105);
(30.2, 105); (31.3, 105) T : (31.7,105); (32.0, 105); (30.8, 105);
(33.7, 105); (32.8, 105) Progressive sample means for ‘C’ :30.5, 31.15, 30.67, 30.55, 30.70Progressive sample means for ‘T’ :31.7, 31.85, 30.83, 32.55, 32.60
Interim Analysis : Example contd….Progressive Computations of z-statistic Generic Formula : z-obs. for ‘Look # i’ is the ratio of (a) [mean_C(i)– mean_T(i)] for i-th Look (b) sigma times Sqrt 2/n(i)]where mean refers to sample mean for and also ‘n(i) ’ corresponds to size of each armof Look # i for each i = 1, 2, 3,4, 5.
Note : n(1)=105; n(2)=210; n(3)=315, n(4)=420 and n(5) = 525.
Interim Analysis : Example contd. Cut-off points : Suggested Rulesz_c Hebittle-Peto Pocock O’Brien-Fleming
z* 3.0 2.60 4.56z** 3.0 2.60 3.23
z*** 3.0 2.60 2.63z**** 3.0 2.60 2.28z***** 2.0 2.60 2.00
• : 1st look; ** : 2nd look; *** : 3rd look; • **** : 4th look & ***** : Last [5th] look
Interim Analysis…Example contd.
z_(Look I) obs. = [mean_C – mean_T]/sigma x sqrt[2/n*]
= - [ 1.2] / 5 x sqrt{2/105}= - 1.74
Conclusion : Continue to 2nd Look
Interim Analysis : Example contd.z_(Look II) obs.
= [mean_C – mean_T]/sigma x sqrt[2/n**]= - [ 0.7 ] / 5 x sqrt{2/210}
= - 1.43
Conclusion : Continue to 3rd Look
Interim Analysis : Example contd.z_(Look III) obs.
= [mean_C – mean_T]/sigma x sqrt[2/n***]= - [ 0.16 ] / 5 x sqrt{2/315}
= - 0.40
Conclusion : Continue to 4th Look
Interim Analysis : Example contd.z_(Look IV) obs. = [mean_C – mean_T]/sigma x sqrt[2/n****]
= - [ 2.0 ] / 5 x sqrt{2/420}= - 5.80
Conclusion : Stop and Reject H_0. Strong evidence against H_0 and yet 105 observations per arm are left to be studied. What if the expt was continued till the end anyway ?