Upload
victor-stokes
View
228
Download
0
Embed Size (px)
Citation preview
Sample Size
Robert F. Woolson, Ph.D.Department of Biostatistics,
Bioinformatics & Epidemiology
Joint Curriculum
NIH Study Section
Significance Approach Innovation Environment Investigators
Approach
Feasibility Study Design: Controls,
Interventions Study Size: Sample Size, Power Data Analysis
Sample Size
# of Animals # of Measurement Sites/Animal # of Replications
Sample Size
What #s are Proposed? Adequacy of #s? Compelling Rationale for
Adequacy? Do We need More? Can We Answer Questions With
Fewer?
Sample Size
Simple Question to Ask Answer May Involve:
• Assumptions• Pilot Data• Simplification of Overall Aims to a
Single Question
Simplification
What Is The Question? What Is The Primary Outcome
Variable? What Is The Principal Hypothesis?
Pilot Data
Relationship To Question. Relationship To Primary Variable. Relationship To Hypothesis.
Sample Size/Power Freeware on Web:
http://www.stat.uiowa.edu/~rlenth/Powe
r/ http://hedwig.mgh.harvard.edu/sample_
size/size.html http://www.bio.ri.ccf.org/power.html http://www.dartmouth.edu/~chance/
Sample Size
Purchase Software• http://www.powerandprecision.com/• Nquery: www.statsolusa.com
Animal Studies
Differences usually large Variability usually small Small sample sizes Many groups Repeated measures
Sample Size (# Animals Required)
Excerpts from the MUSC Vertebrate Animal Review Application Form:
“ A power analysis or other statistical justification is required where appropriate. Where the number of animals required is dictated by other than statistical considerations… justify the number… on this basis.”
Sample Size: Ethical Issues in Animal Studies
Ethical Issues• Study too large implies some animals
needlessly sacrificed• Study too small implies potential for
misleading conclusions, unnecessary experimentation
Mann MD, Crouse DA, Prentice ED. Appropriate animal numbers in biomedical research in Light of Animal Welfare Considerations. Laboratory Animal Science, 1991, 41:
Ethical Issues Cont.
Human studies - same rationale hold for studies that are too large or too small.
Sample Size:Specifying the Hypothesis
Specifying the hypothesis• difference from control?• differences among groups over time?• differences among groups at a
particular point in time? A “non-hypothesis”
• Animals in Group A will do better than animals in Group B
Sample Size:Specifying the Hypothesis
Ho: Mean blood pressure on drug A = mean blood pressure on drug B measured six hours after start of treatment.
Ha: Mean blood pressure on drug A < mean blood pressure on drug B measured six hours after start of treatment.
Example (SHR )
Animal blood pressures measured at baseline
Animals randomly assigned to placebo or minoxidil
Animals measured 6 hours post treatment
Changes from baseline calculated for each animal
Example(Continued)
Placebo changes thought to be centered at 0
Expect minoxidil to lower blood pressure, we think by 10 mm Hg
Blood pressure changes have a standard deviation of 5 mm Hg
If two blood pressure change distributions 1. have the same standard deviation (say 5 mm Hg),
2. have different means 1 (- 10)and 2 (0) and then
their distributions might look like the following:
Example(Continued)
How many animals/group needed to have 90 % power to detect the 10 mm Hg mean difference?
How would this sample size change if the standard deviation is 10 mm Hg rather than 5 mm Hg?
Example (Continued)
Suppose we change the endpoint to, did the animal achieve a reduction in blood pressure of 10 or more mm Hg.
Therefore 50 % of those on minoxidil would be expected to have reduction of 10 or more.
About 2.5 % of those on placebo would have reduction of 10 or more.
Example(Continued)
How many animals/group required to have 90 % power to detect the 50 % vs.. 2.5 %?
Why the difference in sample sizes for the same experiment?Comment on:• Assumptions• Endpoint• Specific hypothesis.
Sample Size: Distribution of Response
Nominal/binary (Binomial)• dead, alive
Ordinal (Non-parametric)• inflammation (mild, moderate,
severe) Continuous (Normal*)
• blood pressure* may require transformation
Sample Size:Distribution of Response
• Binomial• N is a function of probability of response
in control and probability of response in treated animals
• Normal• N is a function of difference in means and
standard deviation
Sample Size:One Sample or 2-sample Test
One sample• Change from baseline in one group• Comparison to standard (historical
controls) Two sample
• Two independent study groups
Sample Size:One or Two sided test
One sided test :• Ha: a > 0 • Ha: a < 0
Two sided test• Ha: a not = 0
Sample Size: Choosing
= probability of Type 1 error probability of rejecting Ho when Ho is
true significance level usually 0.01 or 0.05 “calling an innocent person guilty” “concluding two groups are different
when they are not”
Sample Size: Choosing
Multiple testing can lead to errors.
Pre-specified hypotheses, may not need to adjust;
If all pairwise comparisons are of interest, adjust ( /#tests)
Sample Size: Choosing
= probability of type II error; probability of failing to reject Ho when a
true difference exists. “Calling guilty person innocent” “Missing a true difference” Power = 1 - Large clinical trials use 0.9 or 0.95;
animal studies usually use 0.8 (80% power).
Sample Size: Power
Concluding groups do not differ when power is low is risky. True difference may have been missed.
80% power implies a 20% chance of missing a true difference.
40% power implies a 60% chance of missing a true difference.
Sample Size: Calculation
Calculate N • specify difference to be detected• specify variability (continuous data)
OR Calculate detectable difference:
• specify N• specify variability (continuous) or
control %
Sample Size: Putting it all together
Continuous (Normal) Distribution
Need all but one: , , 2, , N • Z = 1.96 (2 sided, 0.05);• Z = 1.645 (always one-sided, 0.05, 95%
power) = difference between means 2 = pooled variance
2
22
)Z4(Z2n
Difference (P1-P2)(=0.05, one-sided test, N per
group=100, P1=0.5)
0.2
0.4
0.6
0.8
1.0
0.0
Power
Sample Size(=0.05, one-sided test, P1=.5, P2=.3)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 90
Sample Size
Pow
er
Nquery Advisor
About $700 Many more options than many
other programs Available in student room in our
department
Nquery Advisor
Under “file” choose “New” Choices
• means• proportions• agreement• survival (time to event)• regression
# groups (1,2,>2) testing, confidence intervals, equivalence
Examples
Continuous response Binary response
Sample Size:Specifying the Hypothesis
Ho: Mean blood pressure on drug A = mean blood pressure on drug B measured six hours after start of treatment.
Ha: Mean blood pressure on drug A < mean blood pressure on drug B measured six hours after start of treatment.
Example (SHR )
Animal blood pressures measured at baseline
Animals randomly assigned to placebo or minoxidil
Animals measured 6 hours post treatment
Changes from baseline calculated for each animal
Example(Continued)
Placebo changes thought to be centered at 0
Expect minoxidil to lower blood pressure, we think by 10 mm Hg
Blood pressure changes have a standard deviation of 5 mm Hg
If two blood pressure change distributions 1. have the same standard deviation (say 5 mm Hg),
2. have different means 1 (- 10)and 2 (0) and then
their distributions might look like the following:
Example(Continued)
How many animals/group needed to have 90 % power to detect the 10 mm Hg mean difference?
How would this sample size change if the standard deviation is 10 mm Hg rather than 5 mm Hg?
Example (Continued)
Suppose we change the endpoint to, did the animal achieve a reduction in blood pressure of 10 or more mm Hg.
Therefore 50 % of those on minoxidil would be expected to have reduction of 10 or more.
About 2.5 % of those on placebo would have reduction of 10 or more.
Example(Continued)
How many animals/group required to have 90 % power to detect the 50 % vs. 2.5 %?
Why the difference in sample sizes for the same experiment?Comment on:• Assumptions• Endpoint• Specific hypothesis.
Sample Size: More Than One Primary Response
Use largest sample size.
Sample Size: Food for Thought
Is detectable difference biologically meaningful?
Is sample size too small to be believable?
N = 5 “rule of thumb” but is this valid for the experiment being planned.
Sample Size: Misunderstandings
“Larger the difference, smaller the sample size” ignores contribution of variability
failing to report power for negative study• calculate based on hypothesized
difference and observed variability
Sample Size: Keeping It Small
Study continuous rather than binary outcome (if variability does not increase)• change in tumor size instead of
recurrence Study surrogate outcome where
effect is large• cholesterol reduction rather than
mortality
Examples Of Surrogate Outcome Measures?
Bone density Quality of life Patency Pain relief Functional Status Cholesterol
Sample Size: Keeping It Small
Decrease variability• Change from baseline or analysis of
covariance• training• equipment• choice of animal model
Sample Size: Keeping It Small
= 0.05, 2-sided test = 0.2 ; power = 0.8 (80%) Difference between two means = 1 Standard deviation = 2; N =
64/group Standard deviation = 1; N =
17/group
Sample Size Estimation
Parameters are estimates Estimate of relative effectiveness based
on other populations Effectiveness overstated Patients in trials do better Assuming mathematical models Compromise between available
resources/objectives
Sample Size: Pilot Studies
No information on variability No information on efficacy Use effect size from similar studies
or gather pilot data for estimation
Simplification
What Is The Question? What Is The Primary Outcome
Variable? What Is The Principal Hypothesis?
Sample Size/Power Freeware on Web:
http://www.stat.uiowa.edu/~rlenth/Powe
r/ http://hedwig.mgh.harvard.edu/sample_
size/size.html http://www.bio.ri.ccf.org/power.html http://www.dartmouth.edu/~chance/
Sample Size
Purchase Software• http://www.powerandprecision.com/• Nquery: www.statsolusa.com
Additional Comments
Pilot Studies
Complication rate• P = 1 – (1 – r)N where r = complication rate N = sample size• If know desired P and N can solve for
r• If know desired r and P, can solve for
N
Example to Work
Want to have 90% probability of detecting at least one complication, given a 25% complication rate. What N do you need?
You are studying 25 people and want 80% probability of detecting at least one complication. What is the complication rate that would yield this probability.
Pilot Studies
Use larger alpha (>0.05, e.g. 0.15 or 0.2) to compute sample size• If reject null hypothesis will test in
future study Underlying concept – futility;
ensure new treatment not worse than standard.
Pilot Studies
Can reformulate hypothesis• Ho: new treatment = placebo• Ha: new treatment < placebo• Continue to larger study if fail to
reject Ho.
Avoid Data Driven Comparisons
Test here
0
20
40
60
80
month
1
month
2
month
3
month
4
Group 1
Group 2
Randomization: Bias Due to Order of Observations
Learning effect Change in laboratory techniques Different litters Carry-over effects
• under estimate carry-over• two treatments, same animal give A
& B; can only test effect of B after A
Randomization: Order Effects Continued
System fatigue• rabbit heart’s ability to function after
two different treatments
Randomization: Order Effects Continued
Seasonal variability• All rats male, same weight, same age,
media temperature and other incubation conditions identical, housed in identical conditions
• Outcome - unstimulated renin release from kidneys (in vitro) samples at 30 minutes
• Outcome - Metastasis - winter 16% (n=767; summer 8% (n=142) ; logistic regression p<0.03 for season