65
Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Embed Size (px)

Citation preview

Page 1: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size

Robert F. Woolson, Ph.D.Department of Biostatistics,

Bioinformatics & Epidemiology

Joint Curriculum

Page 2: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

NIH Study Section

Significance Approach Innovation Environment Investigators

Page 3: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Approach

Feasibility Study Design: Controls,

Interventions Study Size: Sample Size, Power Data Analysis

Page 4: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size

# of Animals # of Measurement Sites/Animal # of Replications

Page 5: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size

What #s are Proposed? Adequacy of #s? Compelling Rationale for

Adequacy? Do We need More? Can We Answer Questions With

Fewer?

Page 6: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size

Simple Question to Ask Answer May Involve:

• Assumptions• Pilot Data• Simplification of Overall Aims to a

Single Question

Page 7: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Simplification

What Is The Question? What Is The Primary Outcome

Variable? What Is The Principal Hypothesis?

Page 8: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Pilot Data

Relationship To Question. Relationship To Primary Variable. Relationship To Hypothesis.

Page 9: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size/Power Freeware on Web:

  http://www.stat.uiowa.edu/~rlenth/Powe

r/   http://hedwig.mgh.harvard.edu/sample_

size/size.html   http://www.bio.ri.ccf.org/power.html   http://www.dartmouth.edu/~chance/

Page 10: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size

Purchase Software• http://www.powerandprecision.com/• Nquery: www.statsolusa.com

Page 11: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Animal Studies

Differences usually large Variability usually small Small sample sizes Many groups Repeated measures

Page 12: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size (# Animals Required)

Excerpts from the MUSC Vertebrate Animal Review Application Form:

“ A power analysis or other statistical justification is required where appropriate. Where the number of animals required is dictated by other than statistical considerations… justify the number… on this basis.”

Page 13: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size: Ethical Issues in Animal Studies

Ethical Issues• Study too large implies some animals

needlessly sacrificed• Study too small implies potential for

misleading conclusions, unnecessary experimentation

Mann MD, Crouse DA, Prentice ED. Appropriate animal numbers in biomedical research in Light of Animal Welfare Considerations. Laboratory Animal Science, 1991, 41:

Page 14: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Ethical Issues Cont.

Human studies - same rationale hold for studies that are too large or too small.

Page 15: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size:Specifying the Hypothesis

Specifying the hypothesis• difference from control?• differences among groups over time?• differences among groups at a

particular point in time? A “non-hypothesis”

• Animals in Group A will do better than animals in Group B

Page 16: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size:Specifying the Hypothesis

Ho: Mean blood pressure on drug A = mean blood pressure on drug B measured six hours after start of treatment.

Ha: Mean blood pressure on drug A < mean blood pressure on drug B measured six hours after start of treatment.

Page 17: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Example (SHR )

Animal blood pressures measured at baseline

Animals randomly assigned to placebo or minoxidil

Animals measured 6 hours post treatment

Changes from baseline calculated for each animal

Page 18: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Example(Continued)

Placebo changes thought to be centered at 0

Expect minoxidil to lower blood pressure, we think by 10 mm Hg

Blood pressure changes have a standard deviation of 5 mm Hg

Page 19: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

If two blood pressure change distributions 1. have the same standard deviation (say 5 mm Hg),

2. have different means 1 (- 10)and 2 (0) and then

their distributions might look like the following:

Page 20: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Example(Continued)

How many animals/group needed to have 90 % power to detect the 10 mm Hg mean difference?

How would this sample size change if the standard deviation is 10 mm Hg rather than 5 mm Hg?

Page 21: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Example (Continued)

Suppose we change the endpoint to, did the animal achieve a reduction in blood pressure of 10 or more mm Hg.

Therefore 50 % of those on minoxidil would be expected to have reduction of 10 or more.

About 2.5 % of those on placebo would have reduction of 10 or more.

Page 22: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Example(Continued)

How many animals/group required to have 90 % power to detect the 50 % vs.. 2.5 %?

Why the difference in sample sizes for the same experiment?Comment on:• Assumptions• Endpoint• Specific hypothesis.

Page 23: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size: Distribution of Response

Nominal/binary (Binomial)• dead, alive

Ordinal (Non-parametric)• inflammation (mild, moderate,

severe) Continuous (Normal*)

• blood pressure* may require transformation

Page 24: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size:Distribution of Response

• Binomial• N is a function of probability of response

in control and probability of response in treated animals

• Normal• N is a function of difference in means and

standard deviation

Page 25: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size:One Sample or 2-sample Test

One sample• Change from baseline in one group• Comparison to standard (historical

controls) Two sample

• Two independent study groups

Page 26: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size:One or Two sided test

One sided test :• Ha: a > 0 • Ha: a < 0

Two sided test• Ha: a not = 0

Page 27: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size: Choosing

= probability of Type 1 error probability of rejecting Ho when Ho is

true significance level usually 0.01 or 0.05 “calling an innocent person guilty” “concluding two groups are different

when they are not”

Page 28: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size: Choosing

Multiple testing can lead to errors.

Pre-specified hypotheses, may not need to adjust;

If all pairwise comparisons are of interest, adjust ( /#tests)

Page 29: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size: Choosing

= probability of type II error; probability of failing to reject Ho when a

true difference exists. “Calling guilty person innocent” “Missing a true difference” Power = 1 - Large clinical trials use 0.9 or 0.95;

animal studies usually use 0.8 (80% power).

Page 30: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size: Power

Concluding groups do not differ when power is low is risky. True difference may have been missed.

80% power implies a 20% chance of missing a true difference.

40% power implies a 60% chance of missing a true difference.

Page 31: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size: Calculation

Calculate N • specify difference to be detected• specify variability (continuous data)

OR Calculate detectable difference:

• specify N• specify variability (continuous) or

control %

Page 32: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size: Putting it all together

Continuous (Normal) Distribution

Need all but one: , , 2, , N • Z = 1.96 (2 sided, 0.05);• Z = 1.645 (always one-sided, 0.05, 95%

power) = difference between means 2 = pooled variance

2

22

)Z4(Z2n

Page 33: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Difference (P1-P2)(=0.05, one-sided test, N per

group=100, P1=0.5)

0.2

0.4

0.6

0.8

1.0

0.0

Power

Page 34: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size(=0.05, one-sided test, P1=.5, P2=.3)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 90

Sample Size

Pow

er

Page 35: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Nquery Advisor

About $700 Many more options than many

other programs Available in student room in our

department

Page 36: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Nquery Advisor

Under “file” choose “New” Choices

• means• proportions• agreement• survival (time to event)• regression

# groups (1,2,>2) testing, confidence intervals, equivalence

Page 37: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Examples

Continuous response Binary response

Page 38: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size:Specifying the Hypothesis

Ho: Mean blood pressure on drug A = mean blood pressure on drug B measured six hours after start of treatment.

Ha: Mean blood pressure on drug A < mean blood pressure on drug B measured six hours after start of treatment.

Page 39: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Example (SHR )

Animal blood pressures measured at baseline

Animals randomly assigned to placebo or minoxidil

Animals measured 6 hours post treatment

Changes from baseline calculated for each animal

Page 40: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Example(Continued)

Placebo changes thought to be centered at 0

Expect minoxidil to lower blood pressure, we think by 10 mm Hg

Blood pressure changes have a standard deviation of 5 mm Hg

Page 41: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

If two blood pressure change distributions 1. have the same standard deviation (say 5 mm Hg),

2. have different means 1 (- 10)and 2 (0) and then

their distributions might look like the following:

Page 42: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Example(Continued)

How many animals/group needed to have 90 % power to detect the 10 mm Hg mean difference?

How would this sample size change if the standard deviation is 10 mm Hg rather than 5 mm Hg?

Page 43: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Example (Continued)

Suppose we change the endpoint to, did the animal achieve a reduction in blood pressure of 10 or more mm Hg.

Therefore 50 % of those on minoxidil would be expected to have reduction of 10 or more.

About 2.5 % of those on placebo would have reduction of 10 or more.

Page 44: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Example(Continued)

How many animals/group required to have 90 % power to detect the 50 % vs. 2.5 %?

Why the difference in sample sizes for the same experiment?Comment on:• Assumptions• Endpoint• Specific hypothesis.

Page 45: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size: More Than One Primary Response

Use largest sample size.

Page 46: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size: Food for Thought

Is detectable difference biologically meaningful?

Is sample size too small to be believable?

N = 5 “rule of thumb” but is this valid for the experiment being planned.

Page 47: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size: Misunderstandings

“Larger the difference, smaller the sample size” ignores contribution of variability

failing to report power for negative study• calculate based on hypothesized

difference and observed variability

Page 48: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size: Keeping It Small

Study continuous rather than binary outcome (if variability does not increase)• change in tumor size instead of

recurrence Study surrogate outcome where

effect is large• cholesterol reduction rather than

mortality

Page 49: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Examples Of Surrogate Outcome Measures?

Bone density Quality of life Patency Pain relief Functional Status Cholesterol

Page 50: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size: Keeping It Small

Decrease variability• Change from baseline or analysis of

covariance• training• equipment• choice of animal model

Page 51: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size: Keeping It Small

= 0.05, 2-sided test = 0.2 ; power = 0.8 (80%) Difference between two means = 1 Standard deviation = 2; N =

64/group Standard deviation = 1; N =

17/group

Page 52: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size Estimation

Parameters are estimates Estimate of relative effectiveness based

on other populations Effectiveness overstated Patients in trials do better Assuming mathematical models Compromise between available

resources/objectives

Page 53: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size: Pilot Studies

No information on variability No information on efficacy Use effect size from similar studies

or gather pilot data for estimation

Page 54: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Simplification

What Is The Question? What Is The Primary Outcome

Variable? What Is The Principal Hypothesis?

Page 55: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size/Power Freeware on Web:

  http://www.stat.uiowa.edu/~rlenth/Powe

r/   http://hedwig.mgh.harvard.edu/sample_

size/size.html   http://www.bio.ri.ccf.org/power.html   http://www.dartmouth.edu/~chance/

Page 56: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Sample Size

Purchase Software• http://www.powerandprecision.com/• Nquery: www.statsolusa.com

Page 57: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Additional Comments

Page 58: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Pilot Studies

Complication rate• P = 1 – (1 – r)N where r = complication rate N = sample size• If know desired P and N can solve for

r• If know desired r and P, can solve for

N

Page 59: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Example to Work

Want to have 90% probability of detecting at least one complication, given a 25% complication rate. What N do you need?

You are studying 25 people and want 80% probability of detecting at least one complication. What is the complication rate that would yield this probability.

Page 60: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Pilot Studies

Use larger alpha (>0.05, e.g. 0.15 or 0.2) to compute sample size• If reject null hypothesis will test in

future study Underlying concept – futility;

ensure new treatment not worse than standard.

Page 61: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Pilot Studies

Can reformulate hypothesis• Ho: new treatment = placebo• Ha: new treatment < placebo• Continue to larger study if fail to

reject Ho.

Page 62: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Avoid Data Driven Comparisons

Test here

0

20

40

60

80

month

1

month

2

month

3

month

4

Group 1

Group 2

Page 63: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Randomization: Bias Due to Order of Observations

Learning effect Change in laboratory techniques Different litters Carry-over effects

• under estimate carry-over• two treatments, same animal give A

& B; can only test effect of B after A

Page 64: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Randomization: Order Effects Continued

System fatigue• rabbit heart’s ability to function after

two different treatments

Page 65: Sample Size Robert F. Woolson, Ph.D. Department of Biostatistics, Bioinformatics & Epidemiology Joint Curriculum

Randomization: Order Effects Continued

Seasonal variability• All rats male, same weight, same age,

media temperature and other incubation conditions identical, housed in identical conditions

• Outcome - unstimulated renin release from kidneys (in vitro) samples at 30 minutes

• Outcome - Metastasis - winter 16% (n=767; summer 8% (n=142) ; logistic regression p<0.03 for season