118
Sampling, Statistics, Sample Size, Power

Sampling, Statistics and Sample Size

Embed Size (px)

DESCRIPTION

Sampling, Statistics and Sample Size

Citation preview

Page 1: Sampling, Statistics and Sample Size

Sampling, Statistics, Sample Size, Power

Page 2: Sampling, Statistics and Sample Size

Course Overview

1. What is evaluation?

2. Measuring impacts (outcomes, indicators)

3. Why randomize?

4. How to randomize?

5. Sampling and sample size

6. Threats and Analysis

7. Cost-Effectiveness Analysis

8. Project from Start to Finish

Page 3: Sampling, Statistics and Sample Size

Our Goal in This Lecture: From Sample to Population

1. To understand how samples and populations are related

1. Population- All people who meet a certain criteria. Ex: The population of all 3rd graders in India who take a certain exam

2. Sample- A subset of the population. Ex: 1000 3rd graders in India who take a certain exam

We want the sample to tell us something about the overall population

Specifically, we want a sample from the treatment and a sample from the control to tell us something about the true effect size of an intervention in a population

2. To build intuition for setting the optimal sample size for your study This will help us confidently detect a difference between

treatment and control

Page 4: Sampling, Statistics and Sample Size

Lecture Outline

1. Basic Statistics Terms

2. Sampling variation

3. Law of large numbers

4. Central limit theorem

5. Hypothesis testing

6. Statistical inference

7. Power

Page 5: Sampling, Statistics and Sample Size

Lesson 1: Basic Statistics

To understand how to interpret data, we need to understand three basic concepts: What is a distribution? What’s an average result? What is a standard deviation?

Page 6: Sampling, Statistics and Sample Size

What is a Distribution?

A distribution graph or table shows each possible outcome and the frequency that we observe that outcome

A probability distribution- same as a distribution but converts frequency to probability

Page 7: Sampling, Statistics and Sample Size

Baseline Test Scores

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

50100150200250300350400450500

frequency

test scores

Page 8: Sampling, Statistics and Sample Size

What’s the Average Result?

What is the “expected result”? (i.e. the average)?

Expected Result=the sum of all possible values each multiplied by the probability of its occurrence

Page 9: Sampling, Statistics and Sample Size

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

50

100

150

200

250

300

350

400

450

500

26 frequency

mean

test scores

Mean = 26

Page 10: Sampling, Statistics and Sample Size

Population

Population

mean

Mean=26

Page 11: Sampling, Statistics and Sample Size

What’s a Standard Deviation?

Standard deviation: Measure of dispersion in the population

Weighted average distance to the mean gives more weight to those points furthest from mean.

Page 12: Sampling, Statistics and Sample Size

Standard Deviation = 20

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

50

100

150

200

250

300

350

400

450

500

0

100

200

300

400

500

600

26 frequency sd

mean

test scores

1 Standard Deviation

Page 13: Sampling, Statistics and Sample Size

Lecture Outline

1. Basic Statistics Terms

2. Sampling variation

3. Law of large numbers

4. Central limit theorem

5. Hypothesis testing

6. Statistical inference

7. Power

Page 14: Sampling, Statistics and Sample Size

Our Goal in This Lecture: From Sample to Population

1. To understand how samples and populations are related

1. Population- All people who meet a certain criteria. Ex: The population of all 3rd graders in India who take a certain exam

2. Sample- A subset of the population. Ex: 1000 3rd graders in India who take a certain exam

We want the sample to tell us something about the overall population

Specifically, we want a sample from the treatment and a sample from the control to tell us something about the true effect size of an intervention in a population

2. To build intuition for setting the optimal sample size for your study This will help us confidently detect a difference between

treatment and control

Page 15: Sampling, Statistics and Sample Size

Sampling Variation: Example

We want to know the average test score of grade 3 children in Springfield

How many children would we need to sample to get an accurate picture of the average test score?

Page 16: Sampling, Statistics and Sample Size

Population: Test Scores of all 3rd Graders

Population

Page 17: Sampling, Statistics and Sample Size

Mean of Population is 26 (true mean)

Population

Population

mean

Rachel Glennerster
frequency should be on the y axis and mean=26 can be on the chart itself. letters and lables need to be bigger
Rachel Glennerster
why is there a patch of the chart that has thinner bars than others?
Page 18: Sampling, Statistics and Sample Size

Pick Sample 20 Students: Plot Frequency

Population

Population

mean

Sample

Sample mean

Page 19: Sampling, Statistics and Sample Size

Zooming in on Sample of 20 Students

Population

mean

Sample

Sample mean

Page 20: Sampling, Statistics and Sample Size

Pick a Different Sample of 20 Students

Population

mean

Sample

Sample mean

Page 21: Sampling, Statistics and Sample Size

Another Sample of 20 Students

Population

mean

Sample

Sample mean

Page 22: Sampling, Statistics and Sample Size

Sampling Variation: Definition

Sampling variation is the variation we get between different estimates (e.g. mean of test scores) due to the fact that we do not test everyone but only a sample

Sampling variation depends on:

• The variation in test scores in the underlying population

• The number of people we sample

Page 23: Sampling, Statistics and Sample Size

Population

Population

mean

What if our Population Instead of Looking Like This…

Page 24: Sampling, Statistics and Sample Size

…Looked Like This

Population

Population

mean

Page 25: Sampling, Statistics and Sample Size

Standard Deviation: Population 1

Measure of dispersion in the population

1 Standard deviation

1 Standard deviation Population

Population

mean

1 Standard

deviation

Page 26: Sampling, Statistics and Sample Size

Standard Deviation: Population II

1 sd1 sd

Population

Population

mean

1 Standard

deviation

Page 27: Sampling, Statistics and Sample Size

Different Samples of 20 Gives Similar Estimates

Population

mean

Sample

Sample mean

Page 28: Sampling, Statistics and Sample Size

Population

mean

Sample

Sample mean

Different Samples of 20 Gives Similar Estimates

Page 29: Sampling, Statistics and Sample Size

Population

mean

Sample

Sample mean

Different Samples of 20 Gives Similar Estimates

Page 30: Sampling, Statistics and Sample Size

Lecture Outline

1. Basic Statistics Terms

2. Sampling variation

3. Law of large numbers

4. Central limit theorem

5. Hypothesis testing

6. Statistical inference

7. Power

Page 31: Sampling, Statistics and Sample Size

Population

Population

Page 32: Sampling, Statistics and Sample Size

Pick Sample 20 Students: Plot Frequency

Population

Population

mean

Sample

Sample mean

Page 33: Sampling, Statistics and Sample Size

Zooming in on Sample of 20 Students

Population

mean

Sample

Sample mean

Page 34: Sampling, Statistics and Sample Size

Pick a Different Sample of 20 Students

Population

mean

Sample

Sample mean

Page 35: Sampling, Statistics and Sample Size

Another Sample of 20 Students

Population

mean

Sample

Sample mean

Page 36: Sampling, Statistics and Sample Size

Lets Pick a Sample of 50 Students

Population

mean

Sample

Sample mean

Page 37: Sampling, Statistics and Sample Size

A Different Sample of 50 Students

Population

mean

Sample

Sample mean

Page 38: Sampling, Statistics and Sample Size

A Third Sample of 50 Students

Population

mean

Sample

Sample mean

Page 39: Sampling, Statistics and Sample Size

Lets Pick a Sample of 100 Students

Population

mean

Sample

Sample mean

Page 40: Sampling, Statistics and Sample Size

Lets Pick a Different 100 Students

Population

mean

Sample

Sample mean

Page 41: Sampling, Statistics and Sample Size

Lets Pick a Different 100 Students- What do we Notice?

Population

mean

Sample

Sample mean

Page 42: Sampling, Statistics and Sample Size

Law of Large Numbers

The more students you sample (so long as it is

randomized), the closer most averages are to the true

average (the distribution gets “tighter”)

When we conduct an experiment, we can feel confident

that on average, our treatment and control groups would

have the same average outcomes in the absence of the

intervention

Page 43: Sampling, Statistics and Sample Size

Lecture Outline

1. Basic Statistics Terms

2. Sampling variation

3. Law of large numbers

4. Central limit theorem

5. Hypothesis testing

6. Statistical inference

7. Power

Page 44: Sampling, Statistics and Sample Size

Central Limit Theorem

If we take many samples and estimate the mean many times, the frequency plot of our estimates (the sampling distribution) will resemble the normal distribution

This is true even if the underlying population distribution is not normal

Page 45: Sampling, Statistics and Sample Size

Population of Test Scores is not Normal

Population

Page 46: Sampling, Statistics and Sample Size

Take the Mean of One Sample

Population

Population

mean

Sample

Sample mean

Page 47: Sampling, Statistics and Sample Size

Plot That One Mean

Population mean

Sample

Sample mean

Page 48: Sampling, Statistics and Sample Size

Take Another Sample and Plot that Mean

Population

mean

Sample

Sample mean

Page 49: Sampling, Statistics and Sample Size

Repeat Many Times

Population

mean

Sample

Sample mean

Page 50: Sampling, Statistics and Sample Size

Repeat Many Times

Population

mean

Sample

Sample mean

Page 51: Sampling, Statistics and Sample Size

Repeat Many Times

Sample mean

Page 52: Sampling, Statistics and Sample Size

Repeat Many Times

Sample mean

Page 53: Sampling, Statistics and Sample Size

Sample mean

Repeat Many Times

Page 54: Sampling, Statistics and Sample Size

Sample mean

Repeat Many Times

Page 55: Sampling, Statistics and Sample Size

Sample mean

Distribution of Sample Means

Page 56: Sampling, Statistics and Sample Size

Normal Distribution

Page 57: Sampling, Statistics and Sample Size

Central Limit Theorem

The more samples you take, the more the distribution of possible averages (the sampling distribution) looks like a bell curve (a normal distribution)

This result is INDEPENDENT of the underlying distribution The mean of the distribution of the means will be the same

as the mean of the population The standard deviation of the sampling distribution will be

the standard error (SE)

Page 58: Sampling, Statistics and Sample Size

Central Limit Theorem

The central limit theorem is crucial for statistical inference Even if the underlying distribution is not normal, IF THE

SAMPLE SIZE IS LARGE ENOUGH, we can treat it as being normally distributed

Page 59: Sampling, Statistics and Sample Size

THE Basic Questions in Statistics

How big does your sample need to be? Why is this the ultimate question?

• How confident can you be in your results? We need it to be large enough that both the law of large

numbers and the central limit theorem can be applied We need it to be large enough that we could detect a

difference in outcome of interest between the treatment and control samples

Page 60: Sampling, Statistics and Sample Size

Samples vs Populations

We have two different populations: treatment and comparison

We only see the samples: sample from the treatment population and sample from the comparison population

We will want to know if the populations are different from each other

We will compare sample means of treatment and comparison

We must take into account that different samples will give us different means (sample variation)

Page 61: Sampling, Statistics and Sample Size

Comparison

Treatment

Comparison mean

Treatment mean

One Experiment, 2 Samples, 2 Means

Page 62: Sampling, Statistics and Sample Size

Difference Between the Sample Means

Comparison mean

Treatment mean

Estimated effect

Page 63: Sampling, Statistics and Sample Size

What if we Ran a Second Experiment?

Comparison mean

Treatment mean

Estimated effect

Page 64: Sampling, Statistics and Sample Size

Many Experiments Give Distribution of Estimates

-3 -2 -1 0 1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

70

80

90

100

Difference

Freq

uenc

y

Page 65: Sampling, Statistics and Sample Size

Many Experiments Give Distribution of Estimates

-3 -2 -1 0 1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

70

80

90

100

Difference

Freq

uenc

y

Page 66: Sampling, Statistics and Sample Size

-3 -2 -1 0 1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

70

80

90

100

Difference

Freq

uenc

y

Many Experiments Give Distribution of Estimates

Page 67: Sampling, Statistics and Sample Size

-3 -2 -1 0 1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

70

80

90

100

Difference

Freq

uenc

y

Many Experiments Give Distribution of Estimates

Page 68: Sampling, Statistics and Sample Size

-3 -2 -1 0 1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

70

80

90

100

Difference

Freq

uenc

y

Many Experiments Give Distribution of Estimates

Page 69: Sampling, Statistics and Sample Size

-3 -2 -1 0 1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

70

80

90

100

Difference

Freq

uenc

y

Many Experiments Give Distribution of Estimates

Page 70: Sampling, Statistics and Sample Size

-3 -2 -1 0 1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

70

80

90

100

Difference

Freq

uenc

yWhat Does This Remind You Of?

Page 71: Sampling, Statistics and Sample Size

Hypothesis Testing

When we do impact evaluations we compare means from two different groups (the treatment and comparison groups)

Null hypothesis: the two means are the same and any observed difference is due to chance

• H0: treatment effect = 0

Research hypothesis: the true means are different from each other

• H1: treatment effect ≠ 0

Other possible tests

• H2: treatment effect > 0

Page 72: Sampling, Statistics and Sample Size

Distribution of Estimates if True Effect is Zero

Page 73: Sampling, Statistics and Sample Size

Distributions Under Two Alternatives

Page 74: Sampling, Statistics and Sample Size

We Don’t See These Distributions, Just our Estimate

Page 75: Sampling, Statistics and Sample Size

Is Our Estimate Consistent With the True Effect Being β*?

Page 76: Sampling, Statistics and Sample Size

If True Effect is β*, we would get with Frequency A

Page 77: Sampling, Statistics and Sample Size

Is it also Consistent with the True Effect Being 0?

Page 78: Sampling, Statistics and Sample Size

If True Effect is 0, we would get with Frequency A’

Page 79: Sampling, Statistics and Sample Size

Q: Which is More Likely, True Effect=β* or True Effect=0?

Page 80: Sampling, Statistics and Sample Size

A is Bigger than A’ so True Effect=β* is more Likely that True Effect=0

Page 81: Sampling, Statistics and Sample Size

But Can we Rule Out that True Effect=0?

Page 82: Sampling, Statistics and Sample Size

Is A’ so Small That True Effect=0 is Unlikely?

Page 83: Sampling, Statistics and Sample Size

Probability true effect=0 is area to the right of A’ over total area under the curve

Page 84: Sampling, Statistics and Sample Size

Critical Value

There is always a chance the true effect is zero, however, large our estimated effect

Recollect that, traditionally, if the probability that we would get if the true effect were 0 is less than 5% we say we can reject that the true effect is zero

Definition: the critical value is the value of the estimated effect which exactly corresponds to the significance level

If testing whether bigger than 0 a significant at 95% level it is the level of the estimate where exactly 95% of area under the curve lies to the left

is significant at 95% if it is further out in the tail than the critical value

Page 85: Sampling, Statistics and Sample Size

95% Critical Value for True Effect>0

Page 86: Sampling, Statistics and Sample Size

In this Case is > Critical Value So….

Page 87: Sampling, Statistics and Sample Size

…..We Can Reject that True Effect=0 with 95% Confidence

Page 88: Sampling, Statistics and Sample Size

What if the True Effect=β*?

Page 89: Sampling, Statistics and Sample Size

How Often Would we get Estimates that we Could Not Distinguish from 0? (if true effect=β*)

Page 90: Sampling, Statistics and Sample Size

How Often Would we get Estimates that we Could Distinguish from 0? (if true effect=β*)

Page 91: Sampling, Statistics and Sample Size

Chance of Getting Estimates we can Distinguish from 0 is the Area Under H β* that is above Critical Value for H0

Page 92: Sampling, Statistics and Sample Size

Proportion of Area under H β* that is above Critical Value is Power

Page 93: Sampling, Statistics and Sample Size

Recap Hypothesis Testing: Power

Underlying truth

Effective(H0 false)

No Effect(H0 true)

Statistical Test

Significant(reject H0)

True positiveProbability = (1

– κ)

False positiveType I Error

(low power)

Probability = α

Not significant

(fail to reject H0)

False zeroType II Error

Probability = κ

True zero

Probability = (1-

α)

Page 94: Sampling, Statistics and Sample Size

Definition of Power

Power: If there is a measureable effect of our intervention

(the null hypothesis is false), the probability that we will

detect an effect (reject the null hypothesis)

Reduce Type II Error: Failing to reject the null hypothesis

(concluding there is no difference), when indeed the null

hypothesis is false. Traditionally, we aim for 80% power. Some people aim for

90% power

Page 95: Sampling, Statistics and Sample Size

More Overlap Between H0 Curve and Hβ* Curve, the Lower the Power. Q: What Effects Overlap?

Page 96: Sampling, Statistics and Sample Size

Larger Hypothesized Effect, Further Apart the Curves, Higher the Power

Page 97: Sampling, Statistics and Sample Size

Greater Variance in Population, Increases Spread of Possible Estimates, Reduces Power

Page 98: Sampling, Statistics and Sample Size

Power Also Depends on the Critical Value, ie level of Significance we are Looking For…

Page 99: Sampling, Statistics and Sample Size

10% Significance Gives Higher Power than 5% Significance

Page 100: Sampling, Statistics and Sample Size

Why Does Significance Change Power?

Q: what trade off are we making when we chance significance level and increase power?

Remember: 10% significance means we’ll make Type I (false positive) errors 10% of the time

So moving from 5-10% significance means get more power but at the cost of more false positives

Its like widening the gap between the goal posts and saying “now we have a higher chance of getting a goal”

Page 101: Sampling, Statistics and Sample Size

Allocation Ratio and Power

Definition of allocation ratio: the fraction of the total sample that allocated to the treatment group is the allocation ratio

Usually, for a given sample size, power is maximized when half sample allocated to treatment, half to control

Page 102: Sampling, Statistics and Sample Size

Why Does Equal Allocation Paximize power?

Treatment effect is the difference between two means (mean of treatment and control)

Adding sample to treatment group increases accuracy of treatment mean, same for control

But diminishing returns to adding sample size

If treatment group is much bigger than control group, the marginal person adds little to accuracy of treatment group mean, but more to the control group mean

Thus we improve accuracy of the estimated difference when we have equal numbers in treatment and control groups

Page 103: Sampling, Statistics and Sample Size

Summary of Power Factors

Hypothesized effect size

• Q: A larger effect size makes power increase/decrease?

Variance

• Q: greater residual variance makes power increase/decrease?

Sample size

• Q: Larger sample size makes power increase/decrease?

Critical value

• Q: A looser critical value makes power increase/decrease

Unequal allocation ration

• Q: an unequal allocation ratio makes power increase/decrease?

103

Page 104: Sampling, Statistics and Sample Size

Power Equation: MDE

NPPttEffectSize

2

1 *1

1*

Effect SizeVariance

SampleSize

SignificanceLevel

Power

Proportion inTreatment

Page 105: Sampling, Statistics and Sample Size

Clustered RCT Experiments

Cluster randomized trials are experiments in which social units or clusters rather than individuals are randomly allocated to intervention groups

The unit of randomization (e.g. the village) is broader than the unit of analysis (e.g. farmers)

That is: randomize at the village level, but use farmer-level surveys as our unit of analysis

105

Page 106: Sampling, Statistics and Sample Size

Clustered Design: Intuition

We want to know how much rice the average farmer in Sierra Leone grew last year

Method 1: Randomly select 9,000 farmers from around the country

Method 2: Randomly select 9,000 farmers from one district

106

Page 107: Sampling, Statistics and Sample Size

Clustered Design: Intuition II

Some parts of the country may grow more rice than others in general; what if one district had a drought? Or a flood?

• ie we worry both about long term correlations and correlations of shocks within groups

Method 1 gives most accurate estimate

Method 2 much cheaper so for given budget could sample more farmers

What combination of 1 and 2 gives the highest power for given budget constraint?

Depends on the level of intracluster correlation, ρ (rho)

107

Page 108: Sampling, Statistics and Sample Size

Low Intracluster Correlation

Variation in the population

Clusters Sample clusters

Page 109: Sampling, Statistics and Sample Size

HIGH Intracluster Correlation

Page 110: Sampling, Statistics and Sample Size

Intracluster Correlation

Total variance can be divided into within cluster variance () and between cluster variance ()

When variance within clusters is small and the variance between clusters is large, the intra cluster correlation is high (previous slide)

Definition of intracluster correlation (ICC): the proportion of total variation explained by within cluster level variance

• Note, when within cluster variance is high, within cluster correlation is low and between cluster correlation is high

Page 111: Sampling, Statistics and Sample Size

HIGH Intracluster Correlation

Page 112: Sampling, Statistics and Sample Size

Low Intracluster Correlation

Page 113: Sampling, Statistics and Sample Size

Power with clustering

NPPtt

m

EffectSize 2

1 *1

1*

)1(1

Effect Size Variance

SampleSize

SignificanceLevel

Power

Proportion inTreatment

ICC AverageCluster Size

Page 114: Sampling, Statistics and Sample Size

Clustered RCTs vs. Clustered Sampling

Must cluster at the level at which you randomize

• Many reasons to randomize at group level Could randomize by farmer group, village, district If randomize one district to T and one to C have too little

power however many farmers you interview

• Can never distinguish treatment effect from possible district wide shocks

If randomize at individual level don’t need to worry about within village correlation or village level shocks, as that impacts both T and C

114

Page 115: Sampling, Statistics and Sample Size

Bottom Line for Clustering

If experimental design is clustered, we now need to consider ρ when choosing a sample size (as well as the other effects)

Must cluster at level of randomization It is extremely important to randomize an adequate number

of groups Often the number of individuals within groups

matter less than the total number of groups

115

Page 116: Sampling, Statistics and Sample Size

COMMON TRADEOFFS AND RULES OF THUMB

Page 117: Sampling, Statistics and Sample Size

Common Tradeoffs

Answer one question really well? Or many questions with less accuracy?

Large sample size with possible attrition? Or small sample size that we track very closely?

Few clusters with many observations? Or many clusters with few observations?

How do we allocate our sample to each group?

Page 118: Sampling, Statistics and Sample Size

Rules of Thumb

A larger sample is needed to detect differences between two variants of a program than between the program and the comparison group.

For a given sample size, the highest power is achieved when half the sample is allocated to treatment and half to comparison.

The more measurements are taken, the higher the power. In particular, if there is a baseline and endline rather than just an endline, you have more power

The lower compliance, the lower the power. The higher the attrition, the lower the power

For a given sample size, we have less power if randomization is at the group level than at the individual level.