137
Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006

Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

Embed Size (px)

Citation preview

Page 1: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

Overview of Monitoring Clinical Trials

Overview of Monitoring Clinical Trials

Scott S. Emerson, M.D., Ph.D.Professor of Biostatistics University of Washington

August 5, 2006

Scott S. Emerson, M.D., Ph.D.Professor of Biostatistics University of Washington

August 5, 2006

Page 2: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

22

TopicsTopics

• Goals / Disclaimers• Clinical trial setting / Study design• Stopping rules• Sequential inference• Adaptive designs

• Goals / Disclaimers• Clinical trial setting / Study design• Stopping rules• Sequential inference• Adaptive designs

Page 3: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

33

Goals / DisclaimersGoals / Disclaimers

GoalsGoals

Page 4: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

44

DemystificationDemystification

• All we are doing is statistics– Planning a study

– Gathering data

– Analyzing it

• All we are doing is statistics– Planning a study

– Gathering data

– Analyzing it

Page 5: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

55

Sequential StudiesSequential Studies

• All we are doing is statistics– Planning a study

• Added dimension of considering time required– Gathering data

• Sequential sampling allows early termination– Analyzing it

• The same old inferential techniques• The same old statistics• But new sampling distribution

• All we are doing is statistics– Planning a study

• Added dimension of considering time required– Gathering data

• Sequential sampling allows early termination– Analyzing it

• The same old inferential techniques• The same old statistics• But new sampling distribution

Page 6: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

66

Distinctions without DifferencesDistinctions without Differences

• Sequential sampling plans– Group sequential stopping rules– Error spending functions– Conditional / predictive power

• Statistical treatment of hypotheses– Superiority / Inferiority / Futility– Two-sided tests / bioequivalence

• Sequential sampling plans– Group sequential stopping rules– Error spending functions– Conditional / predictive power

• Statistical treatment of hypotheses– Superiority / Inferiority / Futility– Two-sided tests / bioequivalence

Page 7: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

77

Perpetual Motion MachinesPerpetual Motion Machines

• Discerning the hyperbole in much of the recent statistical literature– “Self-designing” clinical trials– Other adaptive clinical trial designs

• (But in my criticisms, I do use a more restricted definition than some in my criticisms)

• Discerning the hyperbole in much of the recent statistical literature– “Self-designing” clinical trials– Other adaptive clinical trial designs

• (But in my criticisms, I do use a more restricted definition than some in my criticisms)

Page 8: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

88

Key IssueKey Issue

You better think (think)

about what you’re

trying to do…

-Aretha Franklin

You better think (think)

about what you’re

trying to do…

-Aretha Franklin

Page 9: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

99

Evaluation of Trial DesignEvaluation of Trial Design

• When planning an expensive (in time or money) clinical trial, there is no substitute for planning– Will your clinical trial satisfy (as much as possible) the

various collaborating disciplines with respect to the primary endpoint?

• The major part of this question can be answered at the beginning of the clinical trial

• When planning an expensive (in time or money) clinical trial, there is no substitute for planning– Will your clinical trial satisfy (as much as possible) the

various collaborating disciplines with respect to the primary endpoint?

• The major part of this question can be answered at the beginning of the clinical trial

Page 10: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

1010

Goals / DisclaimersGoals / Disclaimers

DisclaimersDisclaimers

Page 11: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

1111

Conflict of InterestConflict of Interest

• Commercially available software for sequential clinical trials– S+SeqTrial (Emerson)– PEST (Whitehead)– EaSt (Mehta)– ?SAS

• Commercially available software for sequential clinical trials– S+SeqTrial (Emerson)– PEST (Whitehead)– EaSt (Mehta)– ?SAS

Page 12: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

1212

Personal DefectsPersonal Defects

• Physical• Personality

– In the perjorative sense of the words• Then:

– A bent twig

• Now:– Old and male– University professor

Page 13: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

1313

A More Descriptive TitleA More Descriptive Title

• All my talks

The Use of Statistics to Answer

Scientific Questions (Confessions of a Former Statistician)

• All my talks

The Use of Statistics to Answer

Scientific Questions (Confessions of a Former Statistician)

Page 14: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

1414

A More Descriptive TitleA More Descriptive Title

• All my talks

The Use of Statistics to Answer

Scientific Questions (Confessions of a Former Statistician)

• Group sequential talks

The Use of Statistics to Answer

Scientific Questions Ethically and Efficiently (Confessions of a Former Statistician)

• All my talks

The Use of Statistics to Answer

Scientific Questions (Confessions of a Former Statistician)

• Group sequential talks

The Use of Statistics to Answer

Scientific Questions Ethically and Efficiently (Confessions of a Former Statistician)

Page 15: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

1515

Science vs StatisticsScience vs Statistics

• Recognizing the difference between– The parameter space

• What is the true scientific relationship?– The sample space

• What data did you gather?

• Recognizing the difference between– The parameter space

• What is the true scientific relationship?– The sample space

• What data did you gather?

Page 16: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

1616

Science vs StatisticsScience vs Statistics

• Recognizing the difference between– The true scientific relationship

• Summary measures of the effect – Means, medians, geometric means, proportions…

– The precision with which you know the true effect• Power• P values, posterior probabilities

• Recognizing the difference between– The true scientific relationship

• Summary measures of the effect – Means, medians, geometric means, proportions…

– The precision with which you know the true effect• Power• P values, posterior probabilities

Page 17: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

1717

Reporting InferenceReporting Inference

• At the end of the study analyze the data• Report three measures (four numbers)

– Point estimate– Interval estimate– Quantification of confidence / belief in hypotheses

• At the end of the study analyze the data• Report three measures (four numbers)

– Point estimate– Interval estimate– Quantification of confidence / belief in hypotheses

Page 18: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

1818

Reporting Frequentist InferenceReporting Frequentist Inference

• Three measures (four numbers)– Consider whether the observed data might

reasonably be expected to be obtained under particular hypotheses

• Point estimate: minimal bias? MSE?• Confidence interval: all hypotheses for which the

data might reasonably be observed• P value: probability such extreme data would have

been obtained under the null hypothesis– Binary decision: Reject or do not reject the null according

to whether the P value is low

• Three measures (four numbers)– Consider whether the observed data might

reasonably be expected to be obtained under particular hypotheses

• Point estimate: minimal bias? MSE?• Confidence interval: all hypotheses for which the

data might reasonably be observed• P value: probability such extreme data would have

been obtained under the null hypothesis– Binary decision: Reject or do not reject the null according

to whether the P value is low

Page 19: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

1919

Reporting Bayesian InferenceReporting Bayesian Inference

• Three measures (four numbers)– Consider the probability distribution of the parameter

conditional on the observed data• Point estimate: Posterior mean, median, mode• Credible interval: The “central” 95% of the

posterior distribution • Posterior probability: probability of a particular

hypothesis conditional on the data– Binary decision: Reject or do not reject the null according

to whether the posterior probability is low

• Three measures (four numbers)– Consider the probability distribution of the parameter

conditional on the observed data• Point estimate: Posterior mean, median, mode• Credible interval: The “central” 95% of the

posterior distribution • Posterior probability: probability of a particular

hypothesis conditional on the data– Binary decision: Reject or do not reject the null according

to whether the posterior probability is low

Page 20: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

2020

Parallels Between Tests, CIsParallels Between Tests, CIs

• If the null hypothesis not in CI, reject null• (Using same level of confidence)

• Relative advantages– Test only requires sampling distn under null– CI requires sampling distn under alternatives– CI provides interpretation when null is not rejected

• If the null hypothesis not in CI, reject null• (Using same level of confidence)

• Relative advantages– Test only requires sampling distn under null– CI requires sampling distn under alternatives– CI provides interpretation when null is not rejected

Page 21: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

2121

Scientific InformationScientific Information

– “Rejection” uses a single level of significance• Different settings might demand different criteria

– P value communicates statistical evidence, not scientific importance

– Only confidence interval allows you to interpret failure to reject the null:

• Distinguish between– Inadequate precision (sample size)– Strong evidence for null

– “Rejection” uses a single level of significance• Different settings might demand different criteria

– P value communicates statistical evidence, not scientific importance

– Only confidence interval allows you to interpret failure to reject the null:

• Distinguish between– Inadequate precision (sample size)– Strong evidence for null

Page 22: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

2222

Hypothetical ExampleHypothetical Example

• Clinical trials of treatments for hypertension– Screening trials for four candidate drugs

• Measure of treatment effect is the difference in average SBP at the end of six months treatment

• Drugs may differ in– Treatment effect (goal is to find best)– Variability of blood pressure

• Clinical trials may differ in conditions– Sample size, etc.

• Clinical trials of treatments for hypertension– Screening trials for four candidate drugs

• Measure of treatment effect is the difference in average SBP at the end of six months treatment

• Drugs may differ in– Treatment effect (goal is to find best)– Variability of blood pressure

• Clinical trials may differ in conditions– Sample size, etc.

Page 23: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

2323

Reporting P valuesReporting P values

Study P value

A 0.1974

B 0.1974

C 0.0099

D 0.0099

Study P value

A 0.1974

B 0.1974

C 0.0099

D 0.0099

Page 24: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

2424

Point EstimatesPoint Estimates

Study SBP Diff

A 27.16

B 0.27

C 27.16

D 0.27

Study SBP Diff

A 27.16

B 0.27

C 27.16

D 0.27

Page 25: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

2525

Point EstimatesPoint Estimates

Study SBP Diff P value

A 27.16 0.1974

B 0.27 0.1974

C 27.16 0.0099

D 0.27 0.0099

Study SBP Diff P value

A 27.16 0.1974

B 0.27 0.1974

C 27.16 0.0099

D 0.27 0.0099

Page 26: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

2626

Confidence IntervalsConfidence Intervals

Study SBP Diff 95% CI P value

A 27.16 -14.14, 68.46 0.1974

B 0.27 -0.14, 0.68 0.1974

C 27.16 6.51, 47.81 0.0099

D 0.27 0.06, 0.47 0.0099

Study SBP Diff 95% CI P value

A 27.16 -14.14, 68.46 0.1974

B 0.27 -0.14, 0.68 0.1974

C 27.16 6.51, 47.81 0.0099

D 0.27 0.06, 0.47 0.0099

Page 27: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

2727

Interpreting NonsignificanceInterpreting Nonsignificance

• Studies A and B are both “nonsignificant”– Only study B ruled out clinically important differences– The results of study A might reasonably have been

obtained if the treatment truly lowered SBP by as much as 68 mm Hg

• Studies A and B are both “nonsignificant”– Only study B ruled out clinically important differences– The results of study A might reasonably have been

obtained if the treatment truly lowered SBP by as much as 68 mm Hg

Page 28: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

2828

Interpreting SignificanceInterpreting Significance

• Studies C and D are both statistically significant results– Only study C demonstrated clinically important

differences– The results of study D are only frequently obtained if

the treatment truly lowered SBP by 0.47 mm Hg or less

• Studies C and D are both statistically significant results– Only study C demonstrated clinically important

differences– The results of study D are only frequently obtained if

the treatment truly lowered SBP by 0.47 mm Hg or less

Page 29: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

2929

Bottom LineBottom Line

• If ink is not in short supply, there is no reason not to give point estimates, CI, and P value

• If ink is in short supply, the confidence interval provides most information– (but sometimes a confidence interval cannot be easily

obtained, because the sampling distribution is unknown under the null)

• If ink is not in short supply, there is no reason not to give point estimates, CI, and P value

• If ink is in short supply, the confidence interval provides most information– (but sometimes a confidence interval cannot be easily

obtained, because the sampling distribution is unknown under the null)

Page 30: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

3030

But: Impact of “Three over n”But: Impact of “Three over n”

• The sample size is also important– The pure statistical fantasy

• The P value and CI account for the sample size– The scientific reality

• We need to be able to judge what proportion of the population might have been missed in our sample

– There might be “outliers” in the population– If they are not in our sample, we will not have correctly

estimated the variability of our estimates

• The “Three over n” rule provides some guidance

• The sample size is also important– The pure statistical fantasy

• The P value and CI account for the sample size– The scientific reality

• We need to be able to judge what proportion of the population might have been missed in our sample

– There might be “outliers” in the population– If they are not in our sample, we will not have correctly

estimated the variability of our estimates

• The “Three over n” rule provides some guidance

Page 31: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

3131

Full Report of AnalysisFull Report of Analysis

Study n SBP Diff 95% CI P value

A 20 27.16 -14.14, 68.46 0.1974

B 20 0.27 -0.14, 0.68 0.1974

C 80 27.16 6.51, 47.81 0.0099

D 80 0.27 0.06, 0.47 0.0099

Study n SBP Diff 95% CI P value

A 20 27.16 -14.14, 68.46 0.1974

B 20 0.27 -0.14, 0.68 0.1974

C 80 27.16 6.51, 47.81 0.0099

D 80 0.27 0.06, 0.47 0.0099

Page 32: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

3232

Interpreting a “Negative Study”Interpreting a “Negative Study”

• This then highlights issues related to the interpretation of a study in which no statistically significant difference between groups was found– We have to consider the “differential diagnosis” of

possible situations in which we might observe nonsignificance

• This then highlights issues related to the interpretation of a study in which no statistically significant difference between groups was found– We have to consider the “differential diagnosis” of

possible situations in which we might observe nonsignificance

Page 33: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

3333

General approachGeneral approach

• Refined scientific question– We compare the distribution of some response

variable differs across groups• E.g., looking for an association between smoking

and blood pressure by comparing distribution of SBP between smokers and nonsmokers

– We base our decisions on a scientifically appropriate summary measure

• E.g., difference of means, ratio of medians, …

• Refined scientific question– We compare the distribution of some response

variable differs across groups• E.g., looking for an association between smoking

and blood pressure by comparing distribution of SBP between smokers and nonsmokers

– We base our decisions on a scientifically appropriate summary measure

• E.g., difference of means, ratio of medians, …

Page 34: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

3434

Interpreting a “Negative Study”Interpreting a “Negative Study”

• Possible explanations for no statistically significant difference in

• There is no true difference in the distribution of response across groups

• There is a difference in the distribution of response across groups, but the value of is the same for both groups

– (i.e., the distributions differ in some other way)

• There is a difference in the value of between the groups, but our study was not precise enough

– A “type II error” from low “statistical power”

• Possible explanations for no statistically significant difference in

• There is no true difference in the distribution of response across groups

• There is a difference in the distribution of response across groups, but the value of is the same for both groups

– (i.e., the distributions differ in some other way)

• There is a difference in the value of between the groups, but our study was not precise enough

– A “type II error” from low “statistical power”

Page 35: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

3535

Interpreting a “Positive Study”Interpreting a “Positive Study”

• Analogous interpretations when we do find a statistically significant difference in

• There is a true difference in the value of • There is no true difference in , but we were

unlucky and observed spuriously high or low results

– Random chance leading to a “type I error”» The p value tells us how unlucky we would have had

to have been – (Used a statistic that allows other differences in the distn

to be misinterpreted as a difference in » E.g., different variances causing significant t test)

• Analogous interpretations when we do find a statistically significant difference in

• There is a true difference in the value of • There is no true difference in , but we were

unlucky and observed spuriously high or low results

– Random chance leading to a “type I error”» The p value tells us how unlucky we would have had

to have been – (Used a statistic that allows other differences in the distn

to be misinterpreted as a difference in » E.g., different variances causing significant t test)

Page 36: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

3636

Bottom LineBottom Line

• I place greatest emphasis on estimation rather than hypothesis testing

• When doing testing, I take more of a decision theoretic view– I argue this is more in keeping with the scientific

method• (Ask me about The Scientist Game)

• All these principles carry over to sequential testing

• I place greatest emphasis on estimation rather than hypothesis testing

• When doing testing, I take more of a decision theoretic view– I argue this is more in keeping with the scientific

method• (Ask me about The Scientist Game)

• All these principles carry over to sequential testing

Page 37: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

3737

Group Sequential MethodsGroup Sequential Methods

• Scientific measures– Decision theoretic

• Statement of hypotheses• Criteria for early stopping based on point and

interval estimates– Proper inference for estimation

• Purely statistical measures– Focus on type I errors– Error spending functions– Conditional / predictive power

• Scientific measures– Decision theoretic

• Statement of hypotheses• Criteria for early stopping based on point and

interval estimates– Proper inference for estimation

• Purely statistical measures– Focus on type I errors– Error spending functions– Conditional / predictive power

Page 38: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

3838

Clinical Trial SettingClinical Trial Setting

Page 39: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

3939

Clinical TrialsClinical Trials

• Experimentation in human volunteers– Investigates a new treatment/preventive agent

• Safety: » Are there adverse effects that clearly outweigh any

potential benefit?

• Efficacy: » Can the treatment alter the disease process in a

beneficial way?

• Effectiveness: » Would adoption of the treatment as a standard affect

morbidity / mortality in the population?

• Experimentation in human volunteers– Investigates a new treatment/preventive agent

• Safety: » Are there adverse effects that clearly outweigh any

potential benefit?

• Efficacy: » Can the treatment alter the disease process in a

beneficial way?

• Effectiveness: » Would adoption of the treatment as a standard affect

morbidity / mortality in the population?

Page 40: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

4040

Clinical Trial DesignClinical Trial Design

• Finding an approach that best addresses the often competing goals: Science, Ethics, Efficiency– Basic scientists: focus on mechanisms– Clinical scientists: focus on overall patient health– Ethical: focus on patients on trial, future patients– Economic: focus on profits and/or costs– Governmental: focus on validity of marketing claims– Statistical: focus on questions answered precisely – Operational: focus on feasibility of mounting trial

• Finding an approach that best addresses the often competing goals: Science, Ethics, Efficiency– Basic scientists: focus on mechanisms– Clinical scientists: focus on overall patient health– Ethical: focus on patients on trial, future patients– Economic: focus on profits and/or costs– Governmental: focus on validity of marketing claims– Statistical: focus on questions answered precisely – Operational: focus on feasibility of mounting trial

Page 41: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

4141

Statistical PlanningStatistical Planning

• Satisfy collaborators as much as possible– Discriminate between relevant scientific hypotheses

• Scientific and statistical credibility– Protect economic interests of sponsor

• Efficient designs• Economically important estimates

– Protect interests of patients on trial• Stop if unsafe or unethical• Stop when credible decision can be made

– Promote rapid discovery of new beneficial treatments

• Satisfy collaborators as much as possible– Discriminate between relevant scientific hypotheses

• Scientific and statistical credibility– Protect economic interests of sponsor

• Efficient designs• Economically important estimates

– Protect interests of patients on trial• Stop if unsafe or unethical• Stop when credible decision can be made

– Promote rapid discovery of new beneficial treatments

Page 42: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

4242

Refine Scientific HypothesesRefine Scientific Hypotheses

– Target population

• Inclusion, exclusion, important subgroups

– Intervention

• Dose, administration (intention to treat)

– Measurement of outcome(s)

• Efficacy/effectiveness, toxicity

– Statistical hypotheses in terms of some summary measure of outcome distribution

• Mean, geometric mean, median, odds, hazard, etc.

– Criteria for statistical credibility

• Frequentist (type I, II errors) or Bayesian

– Target population

• Inclusion, exclusion, important subgroups

– Intervention

• Dose, administration (intention to treat)

– Measurement of outcome(s)

• Efficacy/effectiveness, toxicity

– Statistical hypotheses in terms of some summary measure of outcome distribution

• Mean, geometric mean, median, odds, hazard, etc.

– Criteria for statistical credibility

• Frequentist (type I, II errors) or Bayesian

Page 43: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

4343

Statistics to Address VariabilityStatistics to Address Variability

• At the end of the study:– Frequentist and/or Bayesian data analysis to assess

the credibility of clinical trial results• Estimate of the treatment effect

– Single best estimate– Precision of estimates

• Decision for or against hypotheses– Binary decision– Quantification of strength of evidence

• At the end of the study:– Frequentist and/or Bayesian data analysis to assess

the credibility of clinical trial results• Estimate of the treatment effect

– Single best estimate– Precision of estimates

• Decision for or against hypotheses– Binary decision– Quantification of strength of evidence

Page 44: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

4444

Statistical Sampling PlanStatistical Sampling Plan

• Ethical and efficiency concerns are addressed through sequential sampling– During the conduct of the study, data are analyzed at

periodic intervals and reviewed by the DMC– Using interim estimates of treatment effect

• Decide whether to continue the trial• If continuing, decide on any modifications to

– scientific / statistical hypotheses and/or– sampling scheme

• Ethical and efficiency concerns are addressed through sequential sampling– During the conduct of the study, data are analyzed at

periodic intervals and reviewed by the DMC– Using interim estimates of treatment effect

• Decide whether to continue the trial• If continuing, decide on any modifications to

– scientific / statistical hypotheses and/or– sampling scheme

Page 45: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

4545

Sample Size DeterminationSample Size Determination

• Based on sampling plan, statistical analysis plan, and estimates of variability, compute– Sample size that discriminates hypotheses with

desired power, or– Hypothesis that is discriminated from null with desired

power when sample size is as specified, or– Power to detect the specific alternative when sample

size is as specified

• Based on sampling plan, statistical analysis plan, and estimates of variability, compute– Sample size that discriminates hypotheses with

desired power, or– Hypothesis that is discriminated from null with desired

power when sample size is as specified, or– Power to detect the specific alternative when sample

size is as specified

Page 46: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

4646

Sample Size ComputationSample Size Computation

) : testsample (Fixed

:units sampling Required

unit sampling 1 within y Variabilit

ealternativDesign

when cesignifican of Level

power with detected :1)(n test level edStandardiz

2/1

201

2

1

0

zz

Vn

V

Page 47: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

4747

When Sample Size ConstrainedWhen Sample Size Constrained

• Often (usually?) logistical constraints impose a maximal sample size– Compute power to detect specified alternative

– Compute alternative detected with high power

• Often (usually?) logistical constraints impose a maximal sample size– Compute power to detect specified alternative

– Compute alternative detected with high power

n

V 01

01such that Find V

n

Page 48: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

4848

Stopping RulesStopping Rules

Monitoring a TrialMonitoring a Trial

Page 49: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

4949

Need for Monitoring a Trial• Ethical concerns

– Patients already on trial• Avoid harm; maintain informed consent

– Patients not on trial• Facilitate rapid introduction of treatments

• Efficiency concerns– Minimize costs

• Number of patients accrued, followed• Calendar time

Page 50: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

5050

Working Example Working Example

• Fixed sample two-sided tests– Test of a two-sided alternative (+ > 0 > - )

• Upper Alternative: H+: + (superiority)

• Null: H0: = 0 (equivalence)

• Lower Alternative: H -: - (inferiority)

– Decisions:

• Reject H0 , H - (for H+) T cU

• Reject H+ , H - (for H0) cL T cU

• Reject H+ , H0 (for H -) T cL

• Fixed sample two-sided tests– Test of a two-sided alternative (+ > 0 > - )

• Upper Alternative: H+: + (superiority)

• Null: H0: = 0 (equivalence)

• Lower Alternative: H -: - (inferiority)

– Decisions:

• Reject H0 , H - (for H+) T cU

• Reject H+ , H - (for H0) cL T cU

• Reject H+ , H0 (for H -) T cL

Page 51: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

5151

Sample Path for a StatisticSample Path for a Statistic

Page 52: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

5252

Fixed Sample Methods WrongFixed Sample Methods Wrong

• Simulated trials under null stop too often• Simulated trials under null stop too often

Page 53: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

5353

Simulated Trials (Pocock)Simulated Trials (Pocock)

• Three equally spaced level .05 analysesPattern of Proportion Significant

Significance 1st 2nd 3rd Ever

1st only .03046 .03046

1st, 2nd .00807 .00807 .00807

1st, 3rd .00317 .00317 .00317

1st, 2nd, 3rd .00868 .00868 .00868 .00868

2nd only .01921 .01921

2nd, 3rd .01426 .01426 .01426

3rd only .02445 .02445

Any pattern .05038 .05022 .05056 .10830

• Three equally spaced level .05 analysesPattern of Proportion Significant

Significance 1st 2nd 3rd Ever

1st only .03046 .03046

1st, 2nd .00807 .00807 .00807

1st, 3rd .00317 .00317 .00317

1st, 2nd, 3rd .00868 .00868 .00868 .00868

2nd only .01921 .01921

2nd, 3rd .01426 .01426 .01426

3rd only .02445 .02445

Any pattern .05038 .05022 .05056 .10830

Page 54: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

5454

Pocock Level 0.05Pocock Level 0.05

• Three equally spaced level .022 analysesPattern of Proportion Significant

Significance 1st 2nd 3rd Ever

1st only .01520 .01520

1st, 2nd .00321 .00321 .00321

1st, 3rd .00113 .00113 .00113

1st, 2nd, 3rd .00280 .00280 .00280 .00280

2nd only .01001 .01001

2nd, 3rd .00614 .00614 .00614

3rd only .01250 .01250

Any pattern .02234 .02216 .02257 .05099

• Three equally spaced level .022 analysesPattern of Proportion Significant

Significance 1st 2nd 3rd Ever

1st only .01520 .01520

1st, 2nd .00321 .00321 .00321

1st, 3rd .00113 .00113 .00113

1st, 2nd, 3rd .00280 .00280 .00280 .00280

2nd only .01001 .01001

2nd, 3rd .00614 .00614 .00614

3rd only .01250 .01250

Any pattern .02234 .02216 .02257 .05099

Page 55: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

5555

Unequally Spaced AnalysesUnequally Spaced Analyses

• Level .022 analyses at 10%, 20%, 100% of dataPattern of Proportion Significant

Significance 1st 2nd 3rd Ever

1st only .01509 .01509

1st, 2nd .00521 .00521 .00521

1st, 3rd .00068 .00068 .00068

1st, 2nd, 3rd .00069 .00069 .00069 .00069

2nd only .01473 .01473

2nd, 3rd .00165 .00165 .00165

3rd only .01855 .01855

Any pattern .02167 .02228 .02157 .05660

• Level .022 analyses at 10%, 20%, 100% of dataPattern of Proportion Significant

Significance 1st 2nd 3rd Ever

1st only .01509 .01509

1st, 2nd .00521 .00521 .00521

1st, 3rd .00068 .00068 .00068

1st, 2nd, 3rd .00069 .00069 .00069 .00069

2nd only .01473 .01473

2nd, 3rd .00165 .00165 .00165

3rd only .01855 .01855

Any pattern .02167 .02228 .02157 .05660

Page 56: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

5656

Varying Critical Values (OBF)Varying Critical Values (OBF)

• Level 0.10 O’Brien-Fleming (1979); equally spaced tests at .003, .036, .087

Pattern of Proportion Significant

Significance 1st 2nd 3rd Ever

1st only .00082 .00082

1st, 2nd .00036 .00036 .00036

1st, 3rd .00037 .00037 .00037

1st, 2nd, 3rd .00127 .00127 .00127 .00127

2nd only .01164 .01164

2nd, 3rd .02306 .02306 .02306

3rd only .06223 .01855

Any pattern .00282 .03633 .08693 .09975

• Level 0.10 O’Brien-Fleming (1979); equally spaced tests at .003, .036, .087

Pattern of Proportion Significant

Significance 1st 2nd 3rd Ever

1st only .00082 .00082

1st, 2nd .00036 .00036 .00036

1st, 3rd .00037 .00037 .00037

1st, 2nd, 3rd .00127 .00127 .00127 .00127

2nd only .01164 .01164

2nd, 3rd .02306 .02306 .02306

3rd only .06223 .01855

Any pattern .00282 .03633 .08693 .09975

Page 57: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

5757

Error Spending: Pocock 0.05Error Spending: Pocock 0.05

Pattern of Proportion Significant

Significance 1st 2nd 3rd Ever

1st only .01520 .01520

1st, 2nd .00321 .00321 .00321

1st, 3rd .00113 .00113 .00113

1st, 2nd, 3rd .00280 .00280 .00280 .00280

2nd only .01001 .01001

2nd, 3rd .00614 .00614 .00614

3rd only .01250 .01250

Any pattern .02234 .02216 .02257 .05099

Incremental error .02234 .01615 .01250

Cumulative error .02234 .03849 .05099

Pattern of Proportion Significant

Significance 1st 2nd 3rd Ever

1st only .01520 .01520

1st, 2nd .00321 .00321 .00321

1st, 3rd .00113 .00113 .00113

1st, 2nd, 3rd .00280 .00280 .00280 .00280

2nd only .01001 .01001

2nd, 3rd .00614 .00614 .00614

3rd only .01250 .01250

Any pattern .02234 .02216 .02257 .05099

Incremental error .02234 .01615 .01250

Cumulative error .02234 .03849 .05099

Page 58: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

5858

Stopping RulesStopping Rules

DefinitionsDefinitions

Page 59: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

5959

QuestionQuestion

• Under what conditions should we stop the study early?

• Under what conditions should we stop the study early?

Page 60: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

6060

Scientific ReasonsScientific Reasons

• Safety• Efficacy• Harm• Approximate equivalence• Futility

• Safety• Efficacy• Harm• Approximate equivalence• Futility

Page 61: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

6161

Statistical CriteriaStatistical Criteria

• Extreme estimates of treatment effect• Statistical significance (Frequentist)

– At final analysis: Curtailment– Based on experimentwise error

• Group sequential rule• Error spending function

• Statistical credibility (Bayesian)• Probability of achieving statistical significance /

credibility at final analysis– Condition on current data and presumed treatment effect

• Extreme estimates of treatment effect• Statistical significance (Frequentist)

– At final analysis: Curtailment– Based on experimentwise error

• Group sequential rule• Error spending function

• Statistical credibility (Bayesian)• Probability of achieving statistical significance /

credibility at final analysis– Condition on current data and presumed treatment effect

Page 62: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

6262

Sequential Sampling IssuesSequential Sampling Issues

– Design stage• Choosing sampling plan which satisfies desired

operating characteristics– E.g., type I error, power, sample size requirements

– Monitoring stage• Flexible implementation to account for assumptions

made at design stage– E.g., adjust sample size to account for observed variance

– Analysis stage• Providing inference based on true sampling

distribution of test statistics

– Design stage• Choosing sampling plan which satisfies desired

operating characteristics– E.g., type I error, power, sample size requirements

– Monitoring stage• Flexible implementation to account for assumptions

made at design stage– E.g., adjust sample size to account for observed variance

– Analysis stage• Providing inference based on true sampling

distribution of test statistics

Page 63: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

6363

Prespecified Stopping PlansPrespecified Stopping Plans

• Prior to collection of data, specify– Scientific and statistical hypotheses of interest– Statistical criteria for credible evidence– Rule for determining maximal statistical information

• E.g., fix power, maximal sample size, or study time– Randomization scheme– Rule for determining schedule of analyses

• E.g., according to sample size, statistical information, or calendar time

– Rule for determining conditions for early stopping• E.g., boundary shape function for stopping rule

• Prior to collection of data, specify– Scientific and statistical hypotheses of interest– Statistical criteria for credible evidence– Rule for determining maximal statistical information

• E.g., fix power, maximal sample size, or study time– Randomization scheme– Rule for determining schedule of analyses

• E.g., according to sample size, statistical information, or calendar time

– Rule for determining conditions for early stopping• E.g., boundary shape function for stopping rule

Page 64: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

6464

Sampling Plan: General ApproachSampling Plan: General Approach

– Perform analyses when sample sizes N1. . . NJ

• Can be randomly determined

– At each analysis choose stopping boundaries

• aj < bj < cj < dj

– Compute test statistic Tj=T(X1. . . XNj)

• Stop if Tj < aj (extremely low)

• Stop if bj < Tj < cj (approximate equivalence)

• Stop if Tj > dj (extremely high)

• Otherwise continue (maybe adaptive modification of analysis schedule, sample size, etc.)

– Boundaries for modification of sampling plan

– Perform analyses when sample sizes N1. . . NJ

• Can be randomly determined

– At each analysis choose stopping boundaries

• aj < bj < cj < dj

– Compute test statistic Tj=T(X1. . . XNj)

• Stop if Tj < aj (extremely low)

• Stop if bj < Tj < cj (approximate equivalence)

• Stop if Tj > dj (extremely high)

• Otherwise continue (maybe adaptive modification of analysis schedule, sample size, etc.)

– Boundaries for modification of sampling plan

Page 65: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

6565

Boundary ScalesBoundary Scales

• Choices for test statistic Tj – Sum of observations– Point estimate of treatment effect– Normalized (Z) statistic– Fixed sample P value– Error spending function– Bayesian posterior probability– Conditional probability– Predictive probability

• Choices for test statistic Tj – Sum of observations– Point estimate of treatment effect– Normalized (Z) statistic– Fixed sample P value– Error spending function– Bayesian posterior probability– Conditional probability– Predictive probability

Page 66: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

6666

Correspondence Among ScalesCorrespondence Among Scales

• Choices for test statistic Tj – All of those choices for test statistics can be shown to

be transformations of each other– Hence, a stopping rule for one test statistic is easily

transformed to a stopping rule for a different test statistic

– We regard these statistics as representing different scales for expressing the boundaries

• Choices for test statistic Tj – All of those choices for test statistics can be shown to

be transformations of each other– Hence, a stopping rule for one test statistic is easily

transformed to a stopping rule for a different test statistic

– We regard these statistics as representing different scales for expressing the boundaries

Page 67: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

6767

Boundary Scales: Notation• One sample inference about means

– Generalizable to most other commonly used models

jj

N

J

N

NNX

xxj

NNN

H

iidXX

j

2

1

1

00

21

,~ rule stopping a of absencein

:sassumption onalDistributi

,, :analysisth at Data

,, after Analyses

: :hypothesis Null

,,, :modely Probabilit

Page 68: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

6868

Partial Sum ScalePartial Sum Scale

• Uses:– Cumulative number of events

• Boundary for 1 sample test of proportion– Convenient when computing density

• Uses:– Cumulative number of events

• Boundary for 1 sample test of proportion– Convenient when computing density

jN

i ij xs1

Page 69: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

6969

MLE ScaleMLE Scale

j

jN

i iNj N

sxx

j

j

1

1

• Uses:– Natural (crude) estimate of treatment effect

Page 70: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

7070

Normalized (Z) Statistic ScaleNormalized (Z) Statistic Scale

0

jjj

xNz

• Uses:– Commonly computed in analysis routines

Page 71: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

7171

Fixed Sample P Value ScaleFixed Sample P Value Scale

• Uses:– Commonly computed in analysis routine– Robust to use with other distributions for estimates of

treatment effect

• Uses:– Commonly computed in analysis routine– Robust to use with other distributions for estimates of

treatment effect

due

z

zp

uj

jj

22

2

11

1

Page 72: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

7272

Error Spending ScaleError Spending Scale

• Uses:– Implementation of stopping rules with flexible

determination of number and timing of analyses

];Pr

;,Pr[1 1

1

1

1

,,

djj

j

i

i

kdkdkckbkakii

udj

sS

SdSE

Page 73: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

7373

Bayesian Posterior ScaleBayesian Posterior Scale

• Uses:– Bayesian inference (unaffected by stopping)– Posterior probability of hypotheses

• Uses:– Bayesian inference (unaffected by stopping)– Posterior probability of hypotheses

22

2222*

,,1**

2

1

|Pr

,~on distributiPrior

j

jjj

N

N

xNN

XXB

N

jj

Page 74: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

7474

Conditional Power ScaleConditional Power Scale

• Uses:– Conditional power

– Probability of significant result at final analysis conditional on data so far (and hypothesis)

– Futility of continuing under specific hypothesis

jJ

jXJ

jXJXj

X

NN

xNtN

XtXtC

t

jJ

JJ

J

**

*;*

*

1

|Pr,

mean of valueedHypothesiz

analysis finalat Threshold

Page 75: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

7575

Conditional Power Scale (MLE)Conditional Power Scale (MLE)

• Uses:– Conditional power– Futility of continuing under specific hypothesis

jJ

jXJ

jjXJtj

j

X

NN

xtN

xXtXC

x

t

J

JJX

J

1

|Pr,

mean of valueedHypothesiz

analysis finalat Threshold

;*

*

Page 76: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

7676

Predictive Power ScalePredictive Power Scale

2222

222

2

1

|,|Pr

,~on distributiPrior

analysis finalat Threshold

jJjJ

jjJjXjJ

jjXjXj

X

NNNN

xNNxtNN

dXXtXtH

N

t

J

JJ

J

• Uses:– Futility of continuing study

Page 77: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

7777

Predictive Power (Flat Prior)Predictive Power (Flat Prior)

jJNN

jXJ

jjXjXj

X

NN

xtN

dXXtXtH

N

t

j

J

J

JJ

J

1

|,|Pr

,~on distributiPrior

analysis finalat Threshold

2

• Uses:– Futility of continuing study

Page 78: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

7878

Boundary ScalesBoundary Scales

• Stopping rule for one test statistic is easily transformed to a rule for another statistic

• “Group sequential stopping rules”– Sum of observations– Point estimate of treatment effect– Normalized (Z) statistic

– Fixed sample P value– Error spending function

• Bayesian posterior probability • Stochastic Curtailment

– Conditional probability– Predictive probability

• Stopping rule for one test statistic is easily transformed to a rule for another statistic

• “Group sequential stopping rules”– Sum of observations– Point estimate of treatment effect– Normalized (Z) statistic

– Fixed sample P value– Error spending function

• Bayesian posterior probability • Stochastic Curtailment

– Conditional probability– Predictive probability

Page 79: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

7979

Which Scale: My ViewWhich Scale: My View

• Statistically

• Scientifically

• Statistically

• Scientifically

Page 80: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

8080

Which Scale: My ViewWhich Scale: My View

• Statistically– It doesn’t really matter

• Scientifically– You see what a difference it makes

• Statistically– It doesn’t really matter

• Scientifically– You see what a difference it makes

Page 81: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

8181

Spectrum of Stopping RulesSpectrum of Stopping Rules

– Down columns: Early stopping vs no early stopping– Across rows: One-sided vs two-sided decisions

– Down columns: Early stopping vs no early stopping– Across rows: One-sided vs two-sided decisions

Page 82: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

8282

Unified Family:Spectrum of Boundary ShapesUnified Family:Spectrum of Boundary Shapes• All of the rules depicted have the same type I

error and power to detect the design alternative• All of the rules depicted have the same type I

error and power to detect the design alternative

Page 83: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

8383

Error Spending FunctionsError Spending Functions

• My view: Poorly understood even by the researchers who advocate them– There is no such thing as THE Pocock or O’Brien-

Fleming error spending function• Depends on type I or type II error• Depends on number of analyses• Depends on spacing of analyses

• My view: Poorly understood even by the researchers who advocate them– There is no such thing as THE Pocock or O’Brien-

Fleming error spending function• Depends on type I or type II error• Depends on number of analyses• Depends on spacing of analyses

Page 84: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

8484

OBF, Pocock Error SpendingOBF, Pocock Error Spending

Page 85: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

8585

Stochastic CurtailmentStochastic Curtailment

• Stopping boundaries chosen based on predicting future data– My objections will be discussed later

• Stopping boundaries chosen based on predicting future data– My objections will be discussed later

Page 86: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

8686

Major IssueMajor Issue

• Frequentist operating characteristics are based on the sampling distribution– Stopping rules do affect the sampling distribution of

the usual statistics • MLEs are not normally distributed• Z scores are not standard normal under the null

– (1.96 is irrelevant)

• The null distribution of fixed sample P values is not uniform

– (They are not true P values)

• Frequentist operating characteristics are based on the sampling distribution– Stopping rules do affect the sampling distribution of

the usual statistics • MLEs are not normally distributed• Z scores are not standard normal under the null

– (1.96 is irrelevant)

• The null distribution of fixed sample P values is not uniform

– (They are not true P values)

Page 87: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

8787

Sampling Distribution of MLESampling Distribution of MLE

Page 88: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

8888

Sampling Distribution of MLESampling Distribution of MLE

Page 89: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

8989

Sampling DistributionsSampling Distributions

Page 90: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

9090

Sequential Sampling: The PriceSequential Sampling: The Price

• It is only through full knowledge of the sampling plan that we can assess the full complement of frequentist operating characteristics– In order to obtain inference with maximal precision

and minimal bias, the sampling plan must be well quantified

– (Note that adaptive designs using ancillary statistics pose no special problems if we condition on those ancillary statistics.)

• It is only through full knowledge of the sampling plan that we can assess the full complement of frequentist operating characteristics– In order to obtain inference with maximal precision

and minimal bias, the sampling plan must be well quantified

– (Note that adaptive designs using ancillary statistics pose no special problems if we condition on those ancillary statistics.)

Page 91: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

9191

Familiarity and ContemptFamiliarity and Contempt

• For any known stopping rule, however, we can compute the correct sampling distribution with specialized software– From the computed sampling distributions we then

compute• Bias adjusted estimates• Correct (adjusted) confidence intervals• Correct (adjusted) P values

– Candidate designs can then be compared with respect to their operating characteristics

• For any known stopping rule, however, we can compute the correct sampling distribution with specialized software– From the computed sampling distributions we then

compute• Bias adjusted estimates• Correct (adjusted) confidence intervals• Correct (adjusted) P values

– Candidate designs can then be compared with respect to their operating characteristics

Page 92: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

9292

Inferential MethodsInferential Methods

• Just extensions of methods that also work in fixed samples– But in fixed samples, many methods converge on the

same estimate, unlike in sequential designs

• Just extensions of methods that also work in fixed samples– But in fixed samples, many methods converge on the

same estimate, unlike in sequential designs

Page 93: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

9393

Point EstimatesPoint Estimates

– Bias adjusted (Whitehead, 1986)• Assume you observed the mean of the sampling

distribution– Median unbiased (Whitehead, 1983)

• Assume you observed the median of the sampling distribution

– Truncation adapted UMVUE (Emerson & Fleming, 1990)

– (MLE is the naïve estimator: Biased and high MSE)

– Bias adjusted (Whitehead, 1986)• Assume you observed the mean of the sampling

distribution– Median unbiased (Whitehead, 1983)

• Assume you observed the median of the sampling distribution

– Truncation adapted UMVUE (Emerson & Fleming, 1990)

– (MLE is the naïve estimator: Biased and high MSE)

Page 94: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

9494

Interval EstimatesInterval Estimates

• Quantile unbiased estimates– Assume you observed the 2.5th or 97.5th percentile

• Orderings of the outcome space– Analysis time or Stagewise

• Tend toward wider CI, but do not need entire sampling distribution

– Sample mean• Tend toward narrower CI

– Likelihood ratio• Tend toward narrower CI, but less implemented

• Quantile unbiased estimates– Assume you observed the 2.5th or 97.5th percentile

• Orderings of the outcome space– Analysis time or Stagewise

• Tend toward wider CI, but do not need entire sampling distribution

– Sample mean• Tend toward narrower CI

– Likelihood ratio• Tend toward narrower CI, but less implemented

Page 95: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

9595

P valuesP values

• Orderings of the outcome space– Analysis time ordering

• Lower probability of low p-values• Insensitive to late occurring treatment effects

– Sample mean• High probability of lower p-values

– Likelihood ratio• Highest probability of low p-values

• Orderings of the outcome space– Analysis time ordering

• Lower probability of low p-values• Insensitive to late occurring treatment effects

– Sample mean• High probability of lower p-values

– Likelihood ratio• Highest probability of low p-values

Page 96: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

9696

Evaluation of DesignsEvaluation of Designs

Page 97: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

9797

Evaluation of DesignsEvaluation of Designs

• Process of choosing a trial design– Define candidate design

• Usually constrain two operating characteristics– Type I error, power at design alternative– Type I error, maximal sample size

– Evaluate other operating characteristics• Different criteria of interest to different investigators

– Modify design– Iterate

• Process of choosing a trial design– Define candidate design

• Usually constrain two operating characteristics– Type I error, power at design alternative– Type I error, maximal sample size

– Evaluate other operating characteristics• Different criteria of interest to different investigators

– Modify design– Iterate

Page 98: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

9898

Which Operating CharacteristicsWhich Operating Characteristics

• The same regardless of the type of stopping rule – Frequentist power curve

• Type I error (null) and power (design alternative)– Sample size requirements

• Maximum, average, median, other quantiles• Stopping probabilities

– Inference at study termination (at each boundary)• Frequentist or Bayesian (under spectrum of priors)

– (Futility measures• Conditional power, predictive power)

• The same regardless of the type of stopping rule – Frequentist power curve

• Type I error (null) and power (design alternative)– Sample size requirements

• Maximum, average, median, other quantiles• Stopping probabilities

– Inference at study termination (at each boundary)• Frequentist or Bayesian (under spectrum of priors)

– (Futility measures• Conditional power, predictive power)

Page 99: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

9999

At Design StageAt Design Stage

• In particular, at design stage we can know – Conditions under which trial will continue at each

analysis• Estimates

» (Range of estimates leading to continuation)

• Inference» (Credibility of results if trial is stopped)

• Conditional and predictive power

– Tradeoffs between early stopping and loss in unconditional power

• In particular, at design stage we can know – Conditions under which trial will continue at each

analysis• Estimates

» (Range of estimates leading to continuation)

• Inference» (Credibility of results if trial is stopped)

• Conditional and predictive power

– Tradeoffs between early stopping and loss in unconditional power

Page 100: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

100100

Case StudyCase Study

• Randomized, placebo controlled Phase III study of antibody to endotoxin

• Intervention: Single administration• Endpoint: Difference in 28 day mortality rates

– Placebo arm: estimate 30% mortality– Treatment arm: hope for 23% mortality

• Analysis: Large sample test of binomial proportions– Frequentist based inference– Type I error: one-sided 0.025– Power: 90% to detect θ < -0.07– Point estimate with low bias, MSE; 95% CI

• Randomized, placebo controlled Phase III study of antibody to endotoxin

• Intervention: Single administration• Endpoint: Difference in 28 day mortality rates

– Placebo arm: estimate 30% mortality– Treatment arm: hope for 23% mortality

• Analysis: Large sample test of binomial proportions– Frequentist based inference– Type I error: one-sided 0.025– Power: 90% to detect θ < -0.07– Point estimate with low bias, MSE; 95% CI

Page 101: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

101101

Boundaries and Power Curves Boundaries and Power Curves

• O’Brien-Fleming, Pocock boundary shape functions when J= 4 analyses and maintain power

• O’Brien-Fleming, Pocock boundary shape functions when J= 4 analyses and maintain power

Page 102: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

102102

Impact of Interim AnalysesImpact of Interim Analyses

• Required increased maximal sample size in order to maintain power– Maximal sample size with 4 analyses

• O’Brien-Fleming: N= 1773 ( 4.3% increase)• Pocock : N= 2340 (37.6% increase)

– Need to consider• Average sample size• Probability of continuing past 1700 subjects• Conditions under which continue past 1700 subjects

• Required increased maximal sample size in order to maintain power– Maximal sample size with 4 analyses

• O’Brien-Fleming: N= 1773 ( 4.3% increase)• Pocock : N= 2340 (37.6% increase)

– Need to consider• Average sample size• Probability of continuing past 1700 subjects• Conditions under which continue past 1700 subjects

Page 103: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

103103

ASN, 75th %tile of Sample SizeASN, 75th %tile of Sample Size

• O’Brien-Fleming, Pocock boundary shape functions;J=4 analyses and maintain power

• O’Brien-Fleming, Pocock boundary shape functions;J=4 analyses and maintain power

Page 104: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

104104

Inference (Same Max N)Inference (Same Max N)O'Brien-Fleming Pocock

N MLEBias AdjEstimate 95% CI P val MLE

Bias AdjEstimate 95% CI P val

Efficacy

425 -0.171 -0.163 (-0.224, -0.087) 0.000 -0.099 -0.089 (-0.152, -0.015) 0.010

850 -0.086 -0.080 (-0.130, -0.025) 0.002 -0.070 -0.065 (-0.114, -0.004) 0.018

1275 -0.057 -0.054 (-0.096, -0.007) 0.012 -0.057 -0.055 (-0.101, -0.001) 0.023

1700 -0.043 -0.043 (-0.086, 0.000) 0.025 -0.050 -0.050 (-0.099, 0.000) 0.025

Futility

425 0.086 0.077 (0.001, 0.139) 0.977 0.000 -0.010 (-0.084, 0.053) 0.371

850 0.000 -0.006 (-0.061, 0.044) 0.401 -0.029 -0.035 (-0.095, 0.014) 0.078

1275 -0.029 -0.031 (-0.079, 0.010) 0.067 -0.042 -0.044 (-0.098, 0.002) 0.029

1700 -0.043 -0.043 (-0.086, 0.000) 0.025 -0.050 -0.050 (-0.099, 0.000) 0.025

Page 105: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

105105

At Design Stage: ExampleAt Design Stage: Example

• With O’Brien-Fleming boundaries having 90% power to detect a 7% absolute decrease in mortality– Maximum sample size of 1700– Continue past 1275 if crude difference in 28 day

mortality is between -2.9% and -5.7%– If we just barely stop for efficacy after 425 patients we

will report• Estimated difference in mortality: -16.3%• 95% confidence interval: -8.7% to -22.4%• One-sided lower P < 0.0001

• With O’Brien-Fleming boundaries having 90% power to detect a 7% absolute decrease in mortality– Maximum sample size of 1700– Continue past 1275 if crude difference in 28 day

mortality is between -2.9% and -5.7%– If we just barely stop for efficacy after 425 patients we

will report• Estimated difference in mortality: -16.3%• 95% confidence interval: -8.7% to -22.4%• One-sided lower P < 0.0001

Page 106: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

106106

Efficiency / Unconditional PowerEfficiency / Unconditional Power

• Futility: Tradeoffs between early stopping and loss of powerBoundaries Loss of Power Avg Sample Size

• Futility: Tradeoffs between early stopping and loss of powerBoundaries Loss of Power Avg Sample Size

Page 107: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

107107

Stochastic CurtailmentStochastic Curtailment

• Boundaries transformed to conditional or predictive power– Key issue: Computations are based on assumptions

about the true treatment effect• Conditional power

– “Design”: based on hypotheses– “Estimate”: based on current estimates

• Predictive power– “Prior assumptions”

• Boundaries transformed to conditional or predictive power– Key issue: Computations are based on assumptions

about the true treatment effect• Conditional power

– “Design”: based on hypotheses– “Estimate”: based on current estimates

• Predictive power– “Prior assumptions”

Page 108: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

108108

Conditional/Predictive PowerConditional/Predictive Power

Symmetric O’Brien-Fleming O’Brien-Fleming Efficacy, P=0.8 Futility

Conditional Power Predictive Power Conditional Power Predictive Power

N MLE Design Estimate Sponsor Noninf MLE Design Estimate Sponsor Noninf

Efficacy (rejects 0.00) Efficacy (rejects 0.00)

425 -0.171 0.500 0.000 0.002 0.000 -0.170 0.500 0.000 0.002 0.000

850 -0.085 0.500 0.002 0.015 0.023 -0.085 0.500 0.002 0.015 0.023

1275 -0.057 0.500 0.091 0.077 0.124 -0.057 0.500 0.093 0.077 0.126

Futility (rejects -0.0855) Futility (rejects -0.0866)

425 0.085 0.500 0.000 0.077 0.000 0.047 0.719 0.000 0.222 0.008

850 0.000 0.500 0.002 0.143 0.023 -0.010 0.648 0.015 0.247 0.063

1275 -0.028 0.500 0.091 0.241 0.124 -0.031 0.592 0.142 0.312 0.177

Page 109: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

109109

So What?So What?

• Why not use stochastic curtailment?– Choice of thresholds poorly understood

• Do not correspond to unconditional power– Inefficient designs result

• Why not use stochastic curtailment?– Choice of thresholds poorly understood

• Do not correspond to unconditional power– Inefficient designs result

Page 110: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

110110

Page 111: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

111111

Page 112: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

112112

Key IssuesKey Issues

• Very different probabilities based on assumptions about the true treatment effect– Extremely conservative O’Brien-Fleming boundaries

correspond to conditional power of 50% (!) under alternative rejected by the boundary

– Resolution of apparent paradox: if the alternative were true, there is less than .003 probability of stopping for futility at the first analysis

• Very different probabilities based on assumptions about the true treatment effect– Extremely conservative O’Brien-Fleming boundaries

correspond to conditional power of 50% (!) under alternative rejected by the boundary

– Resolution of apparent paradox: if the alternative were true, there is less than .003 probability of stopping for futility at the first analysis

Page 113: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

113113

Apples with ApplesApples with Apples

• Can compare a group sequential rule to a fixed sample test providing– Same maximal sample size (N= 1700)– Same (worst case) average sample size (N= 1336)– Same power under the alternative (N= 1598)

• Consider probability of “discordant decisions”– Conditional probability (conditional power)– Unconditional probability (power)

• Can compare a group sequential rule to a fixed sample test providing– Same maximal sample size (N= 1700)– Same (worst case) average sample size (N= 1336)– Same power under the alternative (N= 1598)

• Consider probability of “discordant decisions”– Conditional probability (conditional power)– Unconditional probability (power)

Page 114: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

114114

Page 115: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

115115

Ordering of the Outcome SpaceOrdering of the Outcome Space

• Choosing a threshold based on conditional power can lead to nonsensical orderings based on unconditional power– Decisions based on 19% conditional power may be

more conservative than decisions based on 8% conditional power

– Can result in substantial inefficiency (loss of power)

• Choosing a threshold based on conditional power can lead to nonsensical orderings based on unconditional power– Decisions based on 19% conditional power may be

more conservative than decisions based on 8% conditional power

– Can result in substantial inefficiency (loss of power)

Page 116: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

116116

Further CommentsFurther Comments

• Neither conditional power nor predictive power have good foundational motivation– Frequentists should use Neyman-Pearson paradigm

and consider optimal unconditional power across alternatives

• And conditional/predictive power is not a good indicator in loss of unconditional power

– Bayesians should use posterior distributions for decisions

• Neither conditional power nor predictive power have good foundational motivation– Frequentists should use Neyman-Pearson paradigm

and consider optimal unconditional power across alternatives

• And conditional/predictive power is not a good indicator in loss of unconditional power

– Bayesians should use posterior distributions for decisions

Page 117: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

117117

ImplementationImplementation

• Methods have been described for flexible implementation of stopping rules when number and timing of analyses is random– Error spending function– Constrained boundaries

• Methods have been described for flexible implementation of stopping rules when number and timing of analyses is random– Error spending function– Constrained boundaries

Page 118: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

118118

Adaptive Sampling PlansAdaptive Sampling Plans

Page 119: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

119119

Sequential Sampling StrategiesSequential Sampling Strategies

• Two broad categories of sequential sampling– Prespecified stopping guidelines

– Adaptive procedures

• Two broad categories of sequential sampling– Prespecified stopping guidelines

– Adaptive procedures

Page 120: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

120120

Adaptive Sampling PlansAdaptive Sampling Plans

• At each interim analysis, possibly modify– Scientific and statistical hypotheses of interest– Statistical criteria for credible evidence– Maximal statistical information– Randomization ratios– Schedule of analyses– Conditions for early stopping

• At each interim analysis, possibly modify– Scientific and statistical hypotheses of interest– Statistical criteria for credible evidence– Maximal statistical information– Randomization ratios– Schedule of analyses– Conditions for early stopping

Page 121: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

121121

Adaptive Sampling: ExamplesAdaptive Sampling: Examples

• Prespecified on the scale of statistical information– E.g., Modify sample size to account for estimated

information (variance or baseline rates)

• No effect on type I error IF– Estimated information independent of estimate of

treatment effect» Proportional hazards,» Normal data, and/or» Carefully phrased alternatives

– And willing to use conditional inference» Carefully phrased alternatives

• Prespecified on the scale of statistical information– E.g., Modify sample size to account for estimated

information (variance or baseline rates)

• No effect on type I error IF– Estimated information independent of estimate of

treatment effect» Proportional hazards,» Normal data, and/or» Carefully phrased alternatives

– And willing to use conditional inference» Carefully phrased alternatives

Page 122: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

122122

Estimate AlternativeEstimate Alternative

• If maximal sample size is maintained, the study discriminates between null hypothesis and an alternative measured in units of statistical information

• If maximal sample size is maintained, the study discriminates between null hypothesis and an alternative measured in units of statistical information

V

nV

n2

01

21

201

21

)()(

Page 123: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

123123

Estimate Sample SizeEstimate Sample Size

• If statistical power is maintained, the study sample size is measured in units of statistical information

• If statistical power is maintained, the study sample size is measured in units of statistical information

201

21

201

21

)()(

V

nVn

Page 124: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

124124

Adaptive Sampling: ExamplesAdaptive Sampling: Examples

– E.g., Proschan & Hunsberger (1995)• Modify ultimate sample size based on conditional

power– Computed under current best estimate (if high enough)

• Make adjustment to inference to maintain Type I error

– E.g., Proschan & Hunsberger (1995)• Modify ultimate sample size based on conditional

power– Computed under current best estimate (if high enough)

• Make adjustment to inference to maintain Type I error

Page 125: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

125125

Incremental StatisticsIncremental Statistics

• Statistic at the j-th analysis a weighted average of data accrued between analyses

• Statistic at the j-th analysis a weighted average of data accrued between analyses

.

ˆˆ

*

1

**

1

*

j

k

j

kk

jj

k

j

kk

jN

ZNZ

N

N

Page 126: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

126126

Conditional DistributionConditional Distribution

.1,0~|

1,/

~|

,~|ˆ

0**

*

0**

***

U

H

NP

NVNNZ

N

VNN

jj

j

jj

jjj

Page 127: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

127127

Unconditional DistributionUnconditional Distribution

.Pr|PrPr0

****

n

jjjj nNNzZzZ

Page 128: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

128128

Two Stage DesignTwo Stage Design

• Proschan & Hunsberger consider worst case– At first stage, choose sample size of second stage

• N2 = N2(Z1) to maximize type I error

– At second stage, reject if Z2 > a2

• Worst case type I error of two stage design

– Can be more than two times the nominal

• a2 = 1.96 gives type I error of 0.0616

• (Compare to Bonferroni results)

• Proschan & Hunsberger consider worst case– At first stage, choose sample size of second stage

• N2 = N2(Z1) to maximize type I error

– At second stage, reject if Z2 > a2

• Worst case type I error of two stage design

– Can be more than two times the nominal

• a2 = 1.96 gives type I error of 0.0616

• (Compare to Bonferroni results)

,

4

2/exp1

2)(2)(

2

ZZ

worst

aa

Page 129: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

129129

Better ApproachesBetter Approaches

• Proschan and Hunsberger describe adaptations using restricted procedures to maintain experimentwise type I error– Must prespecify a conditional error function which

would maintain type I error

• Then find appropriate a2 for second stage based on N2 which can be chosen arbitrarily

– But still have loss of power

• Proschan and Hunsberger describe adaptations using restricted procedures to maintain experimentwise type I error– Must prespecify a conditional error function which

would maintain type I error

• Then find appropriate a2 for second stage based on N2 which can be chosen arbitrarily

– But still have loss of power

Page 130: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

130130

Motivation for Adaptive DesignsMotivation for Adaptive Designs

• Scientific and statistical hypotheses of interest– Modify target population, intervention, measurement of

outcome, alternative hypotheses of interest– Possible justification

• Changing conditions in medical environment– Approval/withdrawal of competing/ancillary treatments– Diagnostic procedures

• New knowledge from other trials about similar treatments

• Evidence from ongoing trial– Toxicity profile (therapeutic index)– Subgroup effects

• Scientific and statistical hypotheses of interest– Modify target population, intervention, measurement of

outcome, alternative hypotheses of interest– Possible justification

• Changing conditions in medical environment– Approval/withdrawal of competing/ancillary treatments– Diagnostic procedures

• New knowledge from other trials about similar treatments

• Evidence from ongoing trial– Toxicity profile (therapeutic index)– Subgroup effects

Page 131: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

131131

Motivation for Adaptive DesignsMotivation for Adaptive Designs

• Modification of other design parameters may have great impact on the hypotheses considered– Statistical criteria for credible evidence– Maximal statistical information– Randomization ratios– Schedule of analyses– Conditions for early stopping

• Modification of other design parameters may have great impact on the hypotheses considered– Statistical criteria for credible evidence– Maximal statistical information– Randomization ratios– Schedule of analyses– Conditions for early stopping

Page 132: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

132132

Cost of Planning Not to PlanCost of Planning Not to Plan

• Major issues with use of adaptive designs– What do we truly gain?

• Can proper evaluation of trial designs obviate need?

– What can we lose?• Efficiency? (and how should it be measured?)• Scientific inference?

– Science vs Statistics vs Game theory – Definition of scientific/statistical hypotheses– Quantifying precision of inference

• Major issues with use of adaptive designs– What do we truly gain?

• Can proper evaluation of trial designs obviate need?

– What can we lose?• Efficiency? (and how should it be measured?)• Scientific inference?

– Science vs Statistics vs Game theory – Definition of scientific/statistical hypotheses– Quantifying precision of inference

Page 133: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

133133

Prespecified Modification RulesPrespecified Modification Rules

• Adaptive sampling plans exact a price in statistical efficiency– Tsiatis & Mehta (2002)

• A classic prespecified group sequential stopping rule can be found that is more efficient than a given adaptive design

– Shi & Emerson (2003)• Fisher’s test statistic in the self-designing trial

provides markedly less precise inference than that based on the MLE

– To compute the sampling distribution of the latter, the sampling plan must be known

• Adaptive sampling plans exact a price in statistical efficiency– Tsiatis & Mehta (2002)

• A classic prespecified group sequential stopping rule can be found that is more efficient than a given adaptive design

– Shi & Emerson (2003)• Fisher’s test statistic in the self-designing trial

provides markedly less precise inference than that based on the MLE

– To compute the sampling distribution of the latter, the sampling plan must be known

Page 134: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

134134

Conditional/Predictive PowerConditional/Predictive Power

• Additional issues with maintaining conditional or predictive power– Modification of sample size may allow precise

knowledge of interim treatment effect• Interim estimates may cause change in study

population– Time trends due to investigators gaining or losing

enthusiasm

• In extreme cases, potential for unblinding of individual patients

– Effect of outliers on test statistics

• Additional issues with maintaining conditional or predictive power– Modification of sample size may allow precise

knowledge of interim treatment effect• Interim estimates may cause change in study

population– Time trends due to investigators gaining or losing

enthusiasm

• In extreme cases, potential for unblinding of individual patients

– Effect of outliers on test statistics

Page 135: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

135135

Final CommentsFinal Comments

• Adaptive designs versus prespecified stopping rules– Adaptive designs come at a price of efficiency and

(sometimes) scientific interpretation

– With adequate tools for careful evaluation of designs, there is little need for adaptive designs

• Adaptive designs versus prespecified stopping rules– Adaptive designs come at a price of efficiency and

(sometimes) scientific interpretation

– With adequate tools for careful evaluation of designs, there is little need for adaptive designs

Page 136: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

136136

Bottom LineBottom Line

You better think (think)

about what you’re

trying to do…

-Aretha Franklin

You better think (think)

about what you’re

trying to do…

-Aretha Franklin

Page 137: Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,

137137

ReferencesReferences

• Available on www.emersonstatistics.com– Frequentist evaluation– Bayesian evaluation (Stat Med)– On the use of stochastic curtailment– Issues in the use of adaptive designs (Stat Med)– Group sequential P values under non-proportional

hazards (Biometrics)– Implementation of group sequential rules using

constrained boundaries (Biometrics)