77
SAMPLE SIZE CALCULATION SAMPLE SIZE CALCULATION Melchor V.G. Frias, IV Clinical Epidemiology Unit Angelo King Medical Research Center De La Salle Health Sciences Institute

Sample Size Doc Frias

Embed Size (px)

Citation preview

Page 1: Sample Size Doc Frias

SAMPLE SIZE SAMPLE SIZE CALCULATIONCALCULATION

Melchor V.G. Frias, IVClinical Epidemiology Unit

Angelo King Medical Research CenterDe La Salle Health Sciences Institute

Page 2: Sample Size Doc Frias

Learning Objectives:Learning Objectives:

At the end of this session, learners should be able to:

1. Explain the concept/importance of sample size,

2. Explain and apply the concept of hypothesis testing,

3. Apply sample size formulas for descriptive and analytic studies,

4. Identify the requirements for sample size calculation ,

5. Apply OPEN EPI/EPIINFO for sample size calculation for cross-sectional, cohort, case-control and experimental studies.

Page 3: Sample Size Doc Frias

How many subjects are How many subjects are to be included in the to be included in the sample?sample?

SAMPLE SIZE CALCULATION Why calculate?

for planning purposes for “power” of the study (low power – it

will have little chance of giving a statistically significant difference).

meaningful results (small sample - the study will have failed to establish that the intervention has no appreciable effect).

Page 4: Sample Size Doc Frias

How do we calculate sample How do we calculate sample size?size?

♦Using formulas♦Using tables of sample sizes♦Using statistical calculators (StatCalc of EpiInfo, Open EPI)

Page 5: Sample Size Doc Frias

Sample size calculationSample size calculation

Things to know: type of the study: descriptive or

analytic? proportions or means? usual values? amount of deviation from the true

value? Clinically important difference? confidence level? power? one-tailed or two-tailed hypotheses

Page 6: Sample Size Doc Frias

Hypotheses testingHypotheses testing

The first thing to do when given a claim is to write the claim mathematically (if possible), and decide whether the given claim is the null or alternative hypothesis.

Page 7: Sample Size Doc Frias

Hypotheses testingHypotheses testing

If the given claim contains equality, or a statement of no change from the given or accepted condition, then it is the null hypothesis, otherwise, if it represents change, it is the alternative hypothesis.

Page 8: Sample Size Doc Frias

Hypotheses testingHypotheses testing

hypothesis -- a statement about the population

null hypothesis (Ho) -- equality alternative hypothesis (Ha) --

two-tailed -- not equal one-tailed -- one is greater than the

other

Page 9: Sample Size Doc Frias

Hypotheses testingHypotheses testing

"He's dead, Jim," said Dr. McCoy to Captain Kirk.

Page 10: Sample Size Doc Frias

Hypotheses testingHypotheses testing

Mr. Spock, as the science officer, is put in charge of statistically determining the correctness of Bones‘ statement and deciding the fate of the crew member (to vaporize or try to revive)

Page 11: Sample Size Doc Frias

Hypotheses testingHypotheses testing

• His first step is to arrive at the hypothesis to be tested. • Does the statement represent a change in previous condition?

Yes, there is change, thus it is the alternative hypothesis, H1

No, there is no change, therefore it is the null hypothesis, H0

Page 12: Sample Size Doc Frias

Hypotheses testingHypotheses testing

The correct answer is that there is change. Dead represents a change from the accepted state* of alive. The null hypothesis always represents no change. Therefore, the hypotheses are:

H0 : Patient is alive.

H1 : Patient is not alive (dead).

Page 13: Sample Size Doc Frias

Hypotheses testingHypotheses testing

Possible states of nature (Based on H0)

Patient is alive (H0 true - H1 false )

Patient is dead (H0 false - H1 true)

Page 14: Sample Size Doc Frias

Hypotheses testingHypotheses testing

Decisions are something that you have control over. You may make a correct decision or an incorrect decision. It depends on the state of nature as to whether your decision is correct or in error.

Page 15: Sample Size Doc Frias

Hypotheses testingHypotheses testing

Possible decisions (Based on H0 ) / conclusions (Based on claim )

Reject H0 / "Sufficient evidence to say patient is dead"

Fail to Reject H0 / "Insufficient evidence to say patient is dead"

Page 16: Sample Size Doc Frias

Hypotheses testingHypotheses testing

There are four possibilities that can occur based on the two possible states of nature and the two decisions which we can make.

Page 17: Sample Size Doc Frias

Hypotheses testingHypotheses testing

Statisticians will never accept the null hypothesis, we will fail to reject. In other words, we'll say that it isn't, or that we don't have enough evidence to say that it isn't, but we'll never say that it is, because someone else might come along with another sample which shows that it isn't and we don't want to be wrong.

Page 18: Sample Size Doc Frias

Hypotheses testing - Hypotheses testing - Statistically speaking:Statistically speaking:

State of Nature

Decision H0 True H0 False

Reject H0 Patient is alive,

Sufficient evidence of

death

Patient is dead, Sufficient evidence

of death

Fail to reject H0

Patient is alive,

Insufficient evidence of

death

Patient is dead, Insufficient

evidence of death

Page 19: Sample Size Doc Frias

Hypotheses testing – In Hypotheses testing – In English (or Clingon?)English (or Clingon?)

State of Nature

Decision H0 True H0 False

Reject H0 Vaporize a live person

Vaporize a dead person

Fail to reject H0

Try to revive a live person

Try to revive a dead person

Page 20: Sample Size Doc Frias

Hypotheses testing – Hypotheses testing – Were you right?Were you right?

State of Nature

Decision H0 True H0 False

Reject H0 Type I Erroralpha

Correct Assessment

Fail to reject H0

Correct Assessment

Type II Errorbeta

Page 21: Sample Size Doc Frias

Hypotheses testingHypotheses testing

State of Nature

Decision H0 True H0 False

Reject H0 Type I Erroralpha

Correct Assessment

Fail to reject H0

Correct Assessment

Type II Errorbeta

Which of the two errors is more serious? Type I or Type II ?

Page 22: Sample Size Doc Frias

Hypotheses testingHypotheses testingState of Nature

Decision H0 True H0 False

Reject H0 Correct Assessment

Fail to reject H0

Correct Assessment

Which of the two errors is more serious? Type I or Type II ?

Patient is dead, Insufficient evidence of death: revive a dead person

Patient is alive, Sufficient evidence of death:vaporize a live person

Page 23: Sample Size Doc Frias

Hypotheses testingHypotheses testing

Disease actually present

Diagnosis No Yes

Disease present Mis-diagnosis Correct diagnosis

Disease absent Correct diagnosis

Missed diagnosis

Page 24: Sample Size Doc Frias

Hypotheses testingHypotheses testing

Assumption of innocence

Judgment True False

Pronounced guilty Serious error in judgment

Correct judgment

Pronounced not guilty

Correct judgment

Error in judgment

Page 25: Sample Size Doc Frias

Hypotheses testingHypotheses testing

Since Type I is the more serious error (usually), that is the one we concentrate on.

We usually pick alpha to be very small

(0.05, 0.01). Note: alpha is not a Type I error.

Alpha is the probability of committing a Type I error. Likewise beta is the probability of committing a Type II error.

Page 26: Sample Size Doc Frias

Hypotheses testingHypotheses testing

Conclusions Conclusions are sentence answers

which include whether there is enough evidence or not (based on the decision), the level of significance, and whether the original claim is supported or rejected.

Page 27: Sample Size Doc Frias

Hypotheses testingHypotheses testing

Conclusions Conclusions are based on the original

claim, which may be the null or alternative hypotheses. The decisions are always based on the null hypothesis

Page 28: Sample Size Doc Frias

Hypotheses testing - Hypotheses testing - ConclusionsConclusions

Original Claim

DecisionH0

"REJECT"

H1

"SUPPORT"

Reject H0

"SUFFICIENT"

There is sufficient evidence at the alpha level of significance to reject the claim

that (insert original claim here)

There is sufficient evidence at the alpha level of

significance to support the claim that (insert original

claim here)

Fail to reject H0

"INSUFFICIENT"

There is insufficient evidence at the alpha level of significance to reject the claim

that (insert original claim here)

There is insufficient evidence at the alpha level of significance to support

the claim that (insert original claim here)

Page 29: Sample Size Doc Frias

DefinitionsDefinitions

Null Hypothesis ( H0 ) Statement of zero or no change. If the original claim includes equality (<=,

=, or >=), it is the null hypothesis. If the original claim does not include

equality (<, not equal, >) then the null hypothesis is the complement of the original claim.

The null hypothesis always includes the equal sign. The decision is based on the null hypothesis.

Page 30: Sample Size Doc Frias

DefinitionsDefinitions

Alternative Hypothesis ( H1 or Ha ) Statement which is true if the null

hypothesis is false. The type of test (left, right, or two-tail) is

based on the alternative hypothesis.

Page 31: Sample Size Doc Frias

DefinitionsDefinitions

One-Tailed (Sided) Test

Page 32: Sample Size Doc Frias

DefinitionsDefinitions

Two-Tailed (Sided) Test

Page 33: Sample Size Doc Frias

DefinitionsDefinitions

Type I error Rejecting the null hypothesis when it is true

(saying false when true). Usually the more serious error.

Type II error Failing to reject the null hypothesis when it

is false (saying true when false).

Page 34: Sample Size Doc Frias

DefinitionsDefinitions

alpha ( - probability of committing Type I

error 1- - the confidence level

beta - probability of committing Type II

error 1- - power of the study; ability to

detect a true difference

Page 35: Sample Size Doc Frias

DefinitionsDefinitions

Significance level ( alpha ) The probability of rejecting the null

hypothesis when it is true. alpha = 0.05 and alpha = 0.01 are common.

If no level of significance is given, use alpha = 0.05.

The level of significance is the complement of the level of confidence in estimation.

Page 36: Sample Size Doc Frias

Confidence level, PowerConfidence level, Power

Usual Values:

= 0.05, 1- (confidence level) = .95

= 0.20,

1- (power) = 0.80

Page 37: Sample Size Doc Frias

Confidence level, PowerConfidence level, Power

The easiest ways to increase power are to:

increase sample size

increase desired difference (or effect size)

decrease significance level desired e.g. 10%

Page 38: Sample Size Doc Frias

DefinitionsDefinitions

Decision A statement based upon the null

hypothesis. It is either "reject the null hypothesis" or

"fail to reject the null hypothesis". We will never accept the null hypothesis.

Page 39: Sample Size Doc Frias

DefinitionsDefinitions

Conclusion A statement which indicates the level of

evidence (sufficient or insufficient), at what level of significance, and whether the original claim is rejected

(null) or supported (alternative).

Page 40: Sample Size Doc Frias

How do we calculate sample How do we calculate sample size?size?

A.J. Dobson’s formula (SIMPLE RANDOM SAMPLE)

descriptive studies population proportion population mean

analytic studies comparing two proportions comparing two means

Page 41: Sample Size Doc Frias

Sample size for descriptive studiesSample size for descriptive studies

1. Estimation of a population 1. Estimation of a population proportionproportion

wheren = computed sample size

p = estimate of the proportion = the desired width of the confidence interval 1- = confidence level

)1()100(

2

fpp

n

Page 42: Sample Size Doc Frias

Sample size for descriptive studiesSample size for descriptive studies

1. Estimation of a population 1. Estimation of a population proportionproportion

Table 1 Values for f(1-) for various confidence levels 100 (1-) %

(1-) 0.8 0.9 0.95 0.99

f(1-)* 1.642 2.706 3.842 6.635

* f(1-) is the square of the upper 1/2 point of the std. Normal Distribution

Page 43: Sample Size Doc Frias

Sample size for descriptive studiesSample size for descriptive studies

1. Estimation of a population 1. Estimation of a population proportionproportion

A researcher wants to estimate the smoking prevalence in high school students . What is the sample size if it is expected that the smoking prevalence is 15%, and a 95% confidence interval will be used for an interval of 4% (11-19%)?

)1()100(

2

fpp

n

Page 44: Sample Size Doc Frias

Sample size for descriptive studiesSample size for descriptive studies

1. Estimation of a population 1. Estimation of a population proportionproportion

Table 1 Values for f(1-) for various confidence levels 100 (1-) %

(1-) 0.8 0.9 0.95 0.99

f(1-)* 1.642 2.706 3.842 6.635

* f(1-) is the square of the upper 1/2 point of the std. Normal Distribution

Page 45: Sample Size Doc Frias

Sample size for descriptive studiesSample size for descriptive studies

1. Estimation of a population 1. Estimation of a population proportionproportion

A researcher wants to estimate the smoking prevalence in high school students . What is the sample size if it is expected that the smoking prevalence is 15%, and a 95% confidence interval will be used for an interval of 4% (11-19%)?

306

842.34

)15100(152

n

n

)1()100(

2

fpp

n

Page 46: Sample Size Doc Frias

Sample size for descriptive studiesSample size for descriptive studies

2. Estimation of a population 2. Estimation of a population meanmean

)1(2

2

fs

n

wheren = computed sample size

s = estimate of the standard deviation of the observations = the desired width of the confidence interval 1- = confidence level

Page 47: Sample Size Doc Frias

Sample size for descriptive studiesSample size for descriptive studies

2. Estimation of a population 2. Estimation of a population meanmean

Table 1 Values for f(1-) for various confidence levels 100 (1-) %

(1-) 0.8 0.9 0.95 0.99

f(1-)* 1.642 2.706 3.842 6.635

* f(1-) is the square of the upper 1/2 point of the std. Normal Distribution

Page 48: Sample Size Doc Frias

Sample size for descriptive studiesSample size for descriptive studies

2. Estimation of a population 2. Estimation of a population meanmean

A researcher wants to estimate the mean serum cholesterol level (mg/100ml) in a group of men. How many men should be included if he wants to be 90% confident that the estimate of the mean will fall within 10mg/100ml of the true value and standard deviation is estimated to be 40mg/100ml?

)1(2

2

fs

n

Page 49: Sample Size Doc Frias

Sample size for descriptive studiesSample size for descriptive studies

2. Estimation of a population 2. Estimation of a population meanmean

Table 1 Values for f(1-) for various confidence levels 100 (1-) %

(1-) 0.8 0.9 0.95 0.99

f(1-)* 1.642 2.706 3.842 6.635

* f(1-) is the square of the upper 1/2 point of the std. Normal Distribution

Page 50: Sample Size Doc Frias

Sample size for descriptive studiesSample size for descriptive studies

2. Estimation of a population 2. Estimation of a population meanmean

A researcher wants to estimate the mean serum cholesterol level (mg/100ml) in a group of men. How many men should be included if he wants to be 90% confident that the estimate of the mean will fall within 10mg/100ml of the true value and standard deviation is estimated to be 40mg/100ml?

)1(2

2

fs

n

43706.210

402

2

n

Page 51: Sample Size Doc Frias

Sample size for analytic studiesSample size for analytic studies

1. Hypothesis testing between two 1. Hypothesis testing between two proportionsproportions

),()21(

)2100(2)1100(12

fpp

ppppn

wheren = computed sample size

p1, p2 = estimate of the sample proportion for each group 1- = confidence level 1- = power of the test

Page 52: Sample Size Doc Frias

Sample size for analytic studiesSample size for analytic studies

1. Hypothesis testing between two 1. Hypothesis testing between two proportionsproportions

Significance level, one-tailed two-tailedPower,

1- 0.05 0.01 0.05 0.01

0.5 2.71 5.41 3.84 6.63

0.8 6.18 10.04 7.85 11.68

0.9 8.56 13.02 10.51 14.88

Table 2 Values for f(,)*

* f(,) is the square of the sum of the upper tail and the upper tail point (for one tailed test) or 1/2 point (for two-tailed test) of the std. Normal distribution

Page 53: Sample Size Doc Frias

Sample size for analytic studiesSample size for analytic studies

1. Hypothesis testing between two 1. Hypothesis testing between two proportionsproportions

A new antibiotic is to be compared to a standard drug with respect to cure rate of urinary tract infection. The new drug will be considered better than the standard drug if it shows a 5% difference from the cure rate of 80%. How many patients are needed if the investigator wants 90% power and 95% confidence?

),()21(

)2100(2)1100(12

fpp

ppppn

Page 54: Sample Size Doc Frias

Sample size for analytic studiesSample size for analytic studies

1. Hypothesis testing between two 1. Hypothesis testing between two proportionsproportions

Significance level, one-tailed two-tailedPower,

1- 0.05 0.01 0.05 0.01

0.5 2.71 5.41 3.84 6.63

0.8 6.18 10.04 7.85 11.68

0.9 8.56 13.02 10.51 14.88

Table 2 Values for f(,)*

* f(,) is the square of the sum of the upper tail and the upper tail point (for one tailed test) or 1/2 point (for two-tailed test) of the std. Normal distribution

Page 55: Sample Size Doc Frias

Sample size for analytic studiesSample size for analytic studies

1. Hypothesis testing between two 1. Hypothesis testing between two proportionsproportions

A new antibiotic is to be compared to a standard drug with respect to cure rate of urinary tract infection. The new drug will be considered better than the standard drug if it shows a 5% difference from the cure rate of 80%. How many patients are needed if the investigator wants 90% power and 95% confidence?

98456.8)8580(

)85100(85)80100(802

n

),()21(

)2100(2)1100(12

fpp

ppppn

Page 56: Sample Size Doc Frias

Sample size for analytic studiesSample size for analytic studies

2. Hypothesis testing between 2. Hypothesis testing between two meanstwo means

),(2

2

2

fsn

wheren = computed sample size

s = estimate of the standard deviation of the observations, assuming it is the same for each group = the true difference between the means 1- = confidence level 1- = power

Page 57: Sample Size Doc Frias

Sample size for analytic studiesSample size for analytic studies

1. Hypothesis testing between two 1. Hypothesis testing between two meansmeans

Significance level, one-tailed two-tailedPower,

1- 0.05 0.01 0.05 0.01

0.5 2.71 5.41 3.84 6.63

0.8 6.18 10.04 7.85 11.68

0.9 8.56 13.02 10.51 14.88

Table 2 Values for f(,)*

* f(,) is the square of the sum of the upper tail and the upper tail point (for one tailed test) or 1/2 point (for two-tailed test) of the std. Normal distribution

Page 58: Sample Size Doc Frias

Sample size for analytic studiesSample size for analytic studies

2. Hypothesis testing between 2. Hypothesis testing between two meanstwo means

To determine whether an antihypertension therapy can reduce the average blood pressure of some group by 5 mmHg when the standard deviation is 10 mmHg, how many patients are needed for a two-tailed test at the 5% significance level, and power of 90%?

),(2

2

2

fsn

Page 59: Sample Size Doc Frias

Sample size for analytic studiesSample size for analytic studies

2. Hypothesis testing between two 2. Hypothesis testing between two meansmeans

Significance level, one-tailed two-tailedPower,

1- 0.05 0.01 0.05 0.01

0.5 2.71 5.41 3.84 6.63

0.8 6.18 10.04 7.85 11.68

0.9 8.56 13.02 10.51 14.88

Table 2 Values for f(,)*

* f(,) is the square of the sum of the upper tail and the upper tail point (for one tailed test) or 1/2 point (for two-tailed test) of the std. normal distribution

Page 60: Sample Size Doc Frias

Sample size for analytic studiesSample size for analytic studies

2. Hypothesis testing between 2. Hypothesis testing between two meanstwo means

To determine whether an antihypertension therapy can reduce the average blood pressure of some group by 5 mmHg when the standard deviation is 10 mmHg, how many patients are needed for a two-tailed test at the 5% significance level, and power of 90%?

),(2

2

2

fsn

8451.105

)10(22

2

n

Page 61: Sample Size Doc Frias

Sample size calculation using EPI-Sample size calculation using EPI-Info6Info6http://www.cdc.gov/epiinfo/Epi6/ei6.hthttp://www.cdc.gov/epiinfo/Epi6/ei6.htmm STATCALC program

Page 62: Sample Size Doc Frias

Sample size for analytic studiesSample size for analytic studies

2. Hypothesis testing between 2. Hypothesis testing between two meanstwo means

To compare two antianemia treatment groups in terms of outcome of hemoglobin level. What is the sample size needed if expected mean hgb level after treatment for group A is 132.86 with standard deviation of 15.34 and the mean hemoglobin level for group B is 127.44 with sd of 18.23?

Page 63: Sample Size Doc Frias

http://www.openepi.com/Menu/OpenEpiMenu.htm

Page 64: Sample Size Doc Frias
Page 65: Sample Size Doc Frias

Sample size for analytic studiesSample size for analytic studies

Case Control StudyCase Control Study

Research question: Is there an association between receiving HRT and development of breast CA among women in Dasmarinas, Cavite?

Odds of exposure among diseased = 175/75 = 2.3Odds of exposure among non-diseased = 25/225 = 0.11

Odds Ratio = 21

Page 66: Sample Size Doc Frias

You need to have an estimate of the percentage of exposure among the controls and either the odds ratio or the percentage of exposure among cases

Page 67: Sample Size Doc Frias

Sample size for analytic studiesSample size for analytic studies

Cohort StudyCohort Study

Research question: Is Hib vaccine associated with the development of leukemia among children in Dasmarinas, Cavite ?

Incidence of disease among exposed = 150/500 = 0.3Incidence of disease among unexposed = 400/500 = 0.8

Relative Risk = 0.375

Page 68: Sample Size Doc Frias

You need to know the percentage of outcome among the unexposed, and either an OR, RR or the percentage of the outcome among the exposed.

Page 69: Sample Size Doc Frias

Calculate sample size: Calculate sample size: RCTRCT

Example: Efficacy of flubendazole compared to mebendazole in the treatment of trichiuriasis among pediatric patients.

Objective: To compare resolution of trichiuriasis for pediatric patients given flubendazole and those given mebendazole.

Flubendazole group(Exposed)

Mebendazolegroup(Unexposed)

+ resolution

(-) resolution

+ resolution

(-) resolution

Page 70: Sample Size Doc Frias

Calculate sample size: Calculate sample size: RCTRCT

Example: Efficacy of flubendazole compared to mebendazole in the treatment of trichiuriasis among pediatric patients.

Objective: To compare resolution of trichiuriasis for pediatric patients given flubendazole and those given mebendazole.

Flubendazole group(Exposed)

Mebendazolegroup(Unexposed)

+ resolution

(-) resolution

+ resolution

(-) resolution

75%

50%

Page 71: Sample Size Doc Frias
Page 72: Sample Size Doc Frias

50% with resolution inMebendazole group

75% with resolution in flubendazole group

Page 73: Sample Size Doc Frias
Page 74: Sample Size Doc Frias

General comments on estimation of General comments on estimation of sample sizesample size

Compute the sample size as early as possible during the design phase, (to estimate the resources required and the feasibility of the study.

The rarer the condition being investigated, the larger the sample size, all other things being equal.

Complex data analysis generally requires larger samples than simple analysis.

In general, longitudinal studies require a larger sample size than case-control and cross sectional studies.

Page 75: Sample Size Doc Frias

General comments on estimation of General comments on estimation of sample sizesample size

The higher the level of accuracy and precision desired for the resulting estimates, the larger the sample size necessary.

When more than 1 item or outcome are to be studied, sample sizes are estimated separately for each item. The final sample size will be a compromise between the largest n and the resources to conduct the study.

Page 76: Sample Size Doc Frias

SummarySummary

Explained the concept/importance of sample size,

Explained and applied the concept of hypothesis testing,

Applied sample size formulas for descriptive and analytic studies,

Identified the requirements for sample size calculation ,

Introduced OPEN EPI/EPIINFO for application in sample size calculation for cross-sectional, cohort, case-control and experimental studies.

Page 77: Sample Size Doc Frias

SummarySummary Statistical inference allows us to

generalize sample results to the target population

sample size is based on the research objectives/design sample estimates, variability from

previous studies power, level of confidence operational constraints (time,

resources)