Upload
herbert-pearson
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Statistical Power and Sample Size Calculations
Drug Development Statistics & Data Management
July 2014
Cathryn LewisProfessor of Genetic Epidemiology & StatisticsDepartment of Medical & Molecular GeneticsKing’s College London
With thanks to Irene Rebollo Mesa and Frühling Rijsdijk
Outline
Power and Sample size 2
1. Concepts of power2. Power and types of error3. Software to calculate power4. Power for continuous outcome5. Power for proportion, success/failure6. Quiz!
Power and Sample size 3
Planning a StudyQuestion : What are the study endpoints?
Types of Endpoints:
•Binary clinical outcome: Death from disease.
•Quantitative : Creatinine, cholesterol levels, QOL.
•Time to Event: Time to graft failure, time to death, time to recovery
Good Qualities:
-Clinically meaningful
-Practical and feasible to measure
-Occur frequently enough throughout the duration of the trial
4
Planning a StudyQuestion : What is the expected prevalence of outcome (discrete) or variability of the outcome (continuous)?•Based on previous studies, pilot study or hospital/NHS report.•Variability and prevalence are vital for power.• Both are best at intermediate levels.
Question:What is the expected difference between groups •in proportion of events (if discrete), or •in mean measure (if continuous)•Based on previous studies or pilot study•Alternatively, minimum difference clinically relevant•The larger the difference the higher the power
Power and Sample size
Power and Sample size 5
Design: What is your Hypothesis
1.Superiority
Objective To determine whether there is evidence of statistical difference in the comparison of interest between two Tx regimes:
A: Tx of Interest B: Placebo or
Active control Tx
H0: The two Txs have equal effect with respect to the mean response
H1: The two Txs are different with respect to the mean response
A B
A B
6
Statistical Power
Power and Sample size
7
Power
• Definition: The expected proportion of samples in which we decide correctly against the null hypothesis
• It depends on:
1. Size of the (treatment) effect in the population ()
2. The significance level at which we reject the null (0.05)
3. Sample size (N)
4. Design of the study: parallel or crossover etc.
5. Endpoint measurement (categorical, ordinal, continuous)
6. The expected dropout rate
Power and Sample size
8
Power primer
• We summarise results of a trial in a statistical analysis with a test statistic (e.g. chi-squared, Z score)
• Provide a measure of support for a certain hypothesis
• Pre-determine threshold on test statistic to reject null hypothesis
Test statistic
Inevitably leads to two types of mistake : false positive (YES instead of NO) (Type I)false negative (NO instead of YES) (Type II)
YES OR NO decision-making : significance testing
YESNO
Power and Sample size
9
T
alpha 0.05
Sampling distribution if HA were true
Sampling distribution if H0 were true
POWER: 1 -
Standard Case
Power and Sample size
10
Rejection of H0 Non-rejection of H0
H0 true
HA true
Power and Sample size
Power1-type II error = 1-β
Type II error = β
Signifcance level Type I error = α
11
Hypothesis testing
• Null hypothesis : no effect
• A ‘significant’ result means that we can reject
the null hypothesis
• A ‘non-significant’ result means that we cannot
reject the null hypothesis
Power and Sample size
12
Statistical significance
• The ‘p-value’
• The probability of a false positive error if the null were in fact true
• Typically, we are willing to incorrectly reject the null 5% or 1% of the time (Type I error)
Power and Sample size
13
Rejection of H0 Non-rejection of H0
H0 true
HA true
Power and Sample size
Power1-type II error = 1-β
Type II error = β
Signifcance level Type I error = α
14
Rejection of H0 Non-rejection of H0
H0 true
HA true
Nonsignificant result(1- )
Type II error at rate
Significant result(1-)
Type I error at rate
Power and Sample size
15
T
alpha 0.05
Sampling distribution if HA were true
Sampling distribution if H0 were true
POWER: 1 -
Standard Case
Power and Sample size
16
T
POWER: 1 - ↑
Increased effect size
alpha 0.05
Sampling distribution if HA were true
Sampling distribution if H0 were true
Power and Sample size
17T
More conservative α
alpha 0.01
Sampling distribution if HA were true
Sampling distribution if H0 were true
POWER: 1 - ↓
Power and Sample size
18
Less conservative α
alpha 0.1
Sampling distribution if HA were true
Sampling distribution if H0 were true
POWER: 1 - ↑
Power and Sample size
19
T
alpha 0.05
Sampling distribution if HA were true
Sampling distribution if H0 were true
Reduced variation
Power and Sample size
POWER: 1 - ↑
20
Determining Sample SizeWe need:
– Acceptable type I error rate (),
• usually 0.05, or 0.025 if one sided
– A meaningful difference in the response: the smallest Tx effect clinically worth detecting / that we wish to detect
– The desirable power (1- to detect this difference, min. 80%
– Ratio of allocation to the groups (equal sample sizes?)
– Whether to use one-sided or two-sided test
In addition, – The variability common to the two populations for continuous
endpoint– The response (event) rate of the control group for the binary
endpoint
Power and Sample size
21Power and Sample size
Calculating power using software or Web
-PRISM StatMate ($50)
-G*Power 3 (Free)
-Statistical software: SPSS, SAS, Stata, R
-PS Power and Sample size Calculation (free) (Windows)
-Web: Google “Statistical Power Calculation”
-Russell V. Lenth
-http://www.stat.uiowa.edu/~rlenth/Power/
-David Schoenfeld
-http://hedwig.mgh.harvard.edu/sample_size/size.html
-Perform calculation in two methods – similar answers
22Statistical Considerations
Russ Lenth’s Power and Sample size pagehttp://www.stat.uiowa.edu/~rlenth/Power/
23Statistical Considerations
http://hedwig.mgh.harvard.edu/sample_size/size.html
24
Determining Sample Size: Continuous outcome
• Two Anti-Hypertensives: – Testing for superiority
• Endpoint: Difference in Diastolic BP – Continuous variable
• Relevant parameters– Difference in Diastolic BP between drugs: =2 mm Hg– Standard deviation of Diastolic BP in each group: = 10 mm Hg– Significance level: 0.05– Required power: 0.8 – Assume equal sized groups
• Calculate sample size required
Power and Sample size
393 patients in each group
25Power and Sample size
Russ Lenth’s Power and Sample size pagehttp://www.stat.uiowa.edu/~rlenth/Power/
26Power and Sample size
27
Power, by difference between two groups
Statistical Considerations
28
Continuous outcome:
Power and Sample size
29Power and Sample size
Determining Sample Size: Discrete Example
• APT070 perfusion vs. cold storage of kidney • Testing for superiority
• Endpoint: Delayed Graft Function after transplantation• Proportion of patients experiencing delayed graft
• Relevant parameters• Baseline prevalence: 35%• Minimum difference clinically significance, 10%• p1=0.35, p2=0.25 [proportion with delayed graft function in each group]
• Significance level =0.05 • Power = 80%
• Calculate sample size required
349 patients in each group
30Power and Sample size
Russ Lenth’s Power and Sample size pagehttp://www.stat.uiowa.edu/~rlenth/Power/
31Power and Sample size
http://hedwig.mgh.harvard.edu/sample_size/size.html
With 349 patients on treatment A and 349 patients on treatment B there will be a 0% chance of detecting a significant difference at a two sided 0.05 significance level. This assumes that the response rate of treatment A is 0.35 and the response rate of treatment B is 0.25.
With 349 patients on treatment A and 349 patients on treatment B there will be a 80% chance of detecting a significant difference at a two sided 0.05 significance level. This assumes that the response rate of treatment A is 0.35 and the response rate of treatment B is 0.25.
32Power and Sample size
Discrete outcome
33
How to use power calculations
• Use power prospectively for planning future studies– Determine an appropriate sample size– Evaluating a planned study – will it yield useful information?
• Put science before statistics. – Use effect sizes that are clinically relevant – Don’t get distracted by statistical considerations
• Perform a pilot study – Helps establish procedures, understand and protect against
the unexpected– Gives variance estimates needed in determining sample
size
Power and Sample size
34Power and Sample size
1.Superiority
2.Equivalence:
Objective To demonstrate that two treatments have no clinically meaningful difference
H0: The two Txs effects are different with respect to the mean response
H1: The two Txs are equal with respect to the mean response
A B
A B
Design: What is your Hypothesis?
A B d or A B d
d A B d
d = largest difference clinically acceptable
35Power and Sample size
3.Non-Inferiority:
Objective To demonstrate that a given treatment is not clinically inferior to another
H0: A given Tx is inferior with respect to the mean response
H1: A given Tx is non-inferior with respect to the mean response
Design: What is your Hypothesis?
A B d
A B d
36
QUIZAssume 80% Power, α = 0.05, two-sided
(x) more with A(y) more with B(z) the same
Study A Study B1. Mortality 20% vs 10% 20% vs 15%
2. Mortality 20% vs 10% 40% vs 30%
3. Diastolic BP 80 vs 85 mmHg 90 vs 95 mmHgSt. dev 10 St dev 10
4. Diastolic BP 80 vs 85 mmHg 80 vs 85 mmHgSt. dev 10 St dev 8
A B
(x) more with A(y) more with B(z) the same
(x) more with A(y) more with B(z) the same
(x) more with A(y) more with B(z) the same
How manysubjects?
Which study needs largest sample size?
Power and Sample size
37
1. B
2. B
3. Same
4. A
ANSWERS
Bigger effect size in A (doubling of survival. Smaller effect, larger sample size needed to detect
Small difference need more subjects
Only standard deviation matters
Bigger standard deviation more subjects
Power and Sample size