55
Why Randomize?

Evaluation Methods

Embed Size (px)

DESCRIPTION

Evaluation Methods

Citation preview

Page 1: Evaluation Methods

Why Randomize?

Page 2: Evaluation Methods

Course Overview

1. What is evaluation?

2. Measuring impacts (outcomes, indicators)

3. Why randomize?

4. How to randomize?

5. Sampling and sample size

6. Threats and Analysis

7. Cost-Effectiveness Analysis

8. Project from Start to Finish

Page 3: Evaluation Methods

What is the most convincing argument you have heard against RCTs?

A. Too expensive

B. Takes too long

C. Unethical

D. Too difficult to design/implement

E. Not externally valid (Not generalizable)

F. Can tell us whether there is impact, and the magnitude of that impact, but not why or how (it is a black box)

Too expensiv

e

Takes too lo

ng

Unethica

l

Too difficu

lt to desig

n/...

Not extern

ally va

lid (N

o...

0% 0% 0%0%0%

Page 4: Evaluation Methods

Impact: What is it?

A. Positive

B. Negative

C. No impact

D. Don’t Know

0% 0%0%0%

Intervention

Pri

mary

Ou

tcom

e

Time

Page 5: Evaluation Methods

Impact: What is it?

Time

Pri

mary

Outc

om

e

Impact

Counterfactual

Intervention

Page 6: Evaluation Methods

Impact: What is it?

A. Positive

B. Negative

C. No impact

D. Don’t Know

Positive

Negative

No impact

Don’t Know

0% 0%0%0%

Pri

mary

Outc

om

e

Intervention

Counterfactual

Time

Page 7: Evaluation Methods

Impact: What is it?

Time

Pri

mary

Ou

tcom

e

ImpactCounterfactual

Intervention

Page 8: Evaluation Methods

Impact: What is it?

Time

Pri

mary

Ou

tcom

e

ImpactCounterfactual

Intervention

Page 9: Evaluation Methods

Impact is defined as a comparison between:

The outcome some time after the program has been introduced

The outcome at that same point in time had the program not been introduced

This is know as the “Counterfactual”

How to Measure Impact?

Page 10: Evaluation Methods

Counterfactual

The Counterfactual represents the state of the world that

program participants would have experienced in the

absence of the program (i.e. had they not participated in

the program)

Problem: Counterfactual cannot be observed

Solution: We need to “mimic” or construct the

counterfactual

Page 11: Evaluation Methods

IMPACT EVALUATION METHODS

Page 12: Evaluation Methods

Impact Evaluation Methods

1. Randomized Experiments

Also known as:

Random Assignment Studies

Randomized Field Trials

Social Experiments

Randomized Controlled Trials (RCTs)

Randomized Controlled Experiments

Page 13: Evaluation Methods

Impact Evaluation Methods

2. Non- or Quasi-Experimental Methods

Pre-Post

Simple Difference

Differences-in-Differences

Multivariate Regression

Statistical Matching

Interrupted Time Series

Instrumental Variables

Regression Discontinuity

Page 14: Evaluation Methods

WHAT IS A RANDOMIZED EXPERIMENT?

Page 15: Evaluation Methods

The Basics

Start with simple case:

Take a sample of program applicants

• Randomly assign them to either:

• Treatment Group – is offered treatment

• Control Group - not allowed to receive treatment (during the

evaluation period)

Page 16: Evaluation Methods

Key Advantage

Because members of the groups (treatment and control) do not differ systematically at the outset of the experiment,

Any difference that subsequently arises between them can be attributed to the program rather than to other factors.

Page 17: Evaluation Methods

WHY RANDOMIZE?

Page 18: Evaluation Methods

Example: Pratham’s Balsakhi Program

Case 2: Remedial Education in IndiaEvaluating the Balsakhi Program

Incorporating random assignment into the program

Case 2: Remedial Education in IndiaEvaluating the Balsakhi Program

Incorporating random assignment into the program

Page 19: Evaluation Methods

What was the Problem?

Many children in 3rd and 4th standard were not even at the 1st standard level of competency

Class sizes were large

Social distance between teacher and many of the students was large

Page 20: Evaluation Methods

Context and Partner

124 Municipal Schools in Vadodara (Western India)

2002 & 2003:Two academic years

~ 17,000 children

“Every child in school and learning well”

Works with most states in India reaching millions of children

Page 21: Evaluation Methods

Proposed Solution

Hire local women (Balsakhis)

From the community

Train them to teach remedial competencies

• Basic literacy, numeracy

Identify lowest performing 3rd and 4th standard students

• Take these students out of class (2 hours/day)

• Balsakhi teaches them basic competencies

Page 22: Evaluation Methods

Pros

Reduced social distance

Reduced class size

Teaching at appropriate level

Improved learning for lower-

performing students

Improved learning for higher-

performers

Cons

Less qualified

Teacher resentment

Reduced interaction with

higher-performing peers

Increased gap in learning

Reduced test scores for all

kids

Possible Outcomes

What is the Impact?

Page 23: Evaluation Methods

J-PAL Conducts a Test at the End

Balsakhi students score an average of 51%

What can we conclude?

Page 24: Evaluation Methods

1. Pre-post (Before vs. After)

Look at average change in test scores over the school year for the balsakhi children

Average change in the outcome of interest before and after the programme

Page 25: Evaluation Methods

Start of program End of program0

10

20

30

40

50

60

24.8

51.22

Average test scores of Balsakhi students

26.42

Method 1: Pre vs Post (Before vs. After)

Average post-test score for children with a Balsakhi

51.22

Average pretest score for children with a Balsakhi

24.80

Difference 26.42

Page 26: Evaluation Methods

Pre-Post

Limitations of the method

• No comparison group, doesn’t take time trend into account

What else can we do to estimate impact?

Page 27: Evaluation Methods

Method 2: Simple Difference

Divide the population into two groups:

One group enrolled in Balsakhi program

(Treatment)

One group not enrolled in Balsakhi program

(Control)

Compare test score of these two groups at the end of the program.

Measure difference between program participants and non-participants after the program is completed

Page 28: Evaluation Methods

Not enrolled in program Enrolled in program0

102030405060 56.27

51.22

Average test scores end of program

-5.05

Method 2: Simple Difference

Average score for children with a balsakhi

51.22

Average score for children without a balsakhi

56.27

Difference -5.05

QUESTION:Under what

conditions can the difference of

-5.05 be interpreted as the impact of the Balsakhi

program?

Page 29: Evaluation Methods

Method 3: Difference-in-difference

Divide the population into two groups:

• One group enrolled in Balsakhi program (Treatment)

• One group not enrolled in Balsakhi program (Control)

Compare the change in test scores between Treatment and Control

• i.e., difference in differences in test scores

Same thing: compare difference in test scores at post-test with difference in test scores at pretest

Measure improvement (change) over time of participants relative to the improvement (change) over time of non-participants

Page 30: Evaluation Methods

Start of program End of program0

102030405060

24.8

51.22

36.67

56.27

Average test scores

Enrolled in Balsakhi programNot enrolled in Balsahki program

Method 3: Difference-in-difference

Pretest Post-test Difference

Average score for children with a Balsakhi

24.80 51.22 26.42

Page 31: Evaluation Methods

Method 3: Difference-in-differences

Method 3: What would have Happened without Balsakhi?

26.42

75

50

25

0

2002 2003

Page 32: Evaluation Methods

Method 3: Difference-in-differences

Pretest Post-test Difference

Average score for children with a balsakhi

24.80 51.22 26.42

Average score for children without a Balsakhi

36.67 56.27 19.60

Page 33: Evaluation Methods

Method 3: Difference-in-differences

Method 3: What would have Happened without Balsakhi?

26.4219.60 6.82 points?

75

50

25

00

2002 2003

Page 34: Evaluation Methods

Method 3: Difference-in-Differences

QUESTION: Under what conditions can 6.82 be interpreted as the impact of the balsakhi program?

Issues:

• failure of “parallel trend assumption”, i.e. impact of time on both groups is not similar

Pretest Post-test

Difference

Average score for children with a Balsakhi

24.80 51.22 26.42

Average score for children without a Balsakhi

36.67 56.27 19.60

Difference 6.82

Page 35: Evaluation Methods

Method 4: Regression Analysis

Divide the population into two groups:• One group enrolled in Balsakhi program• One group not enrolled in Balsakhi program

Compare test score of these two groups at the start and at the end of the program.

Control for additional variables like gender, class-size

Post-test =

Page 36: Evaluation Methods

Method 4: Regression Analysis

0 10 20 30 40 50 60 70

post_tot_noB

Linear (post_tot_noB)

post_tot_B

Linear (post_tot_B)

Test Score (at Post Test)

Incom

e

QUESTION: Under what conditions can the coefficient of 1.92 be interpreted as the impact of the Balsakhi program?

1.92

Page 37: Evaluation Methods

-10

5

2026.42

-5.05

6.82 1.92

* Significant at 5% level

Impact of Balsakhi Program

Method Impact Estimate

(1) Pre-post 26.42*

(2) Simple Difference -5.05*

(3) Difference-in-Difference 6.82*

(4) Regression with controls 1.92

Page 38: Evaluation Methods

Counterfactual is often constructed by selecting a group not affected by the program

Non-randomized:• Argue that a certain excluded group mimics the

counterfactual.

Randomized:• Use random assignment of the program to create a

control group which mimics the counterfactual.

38

Constructing the Counterfactual

Page 39: Evaluation Methods

Randomised Evaluations

Individuals, villages, or districts are randomly selected to receive the treatment, while other villages serve as a comparison

Treatment Group

Comparison Group

Village 1Village 2

=

Groups are Statistically Identical before the Program

Any Difference at the Endline can be Attributed to the Program

Two groups continue to be identical, except for treatment. Later, compare outcomes (health, test scores) between the two groups. Any differences between the groups can be attributed to the program.

Page 40: Evaluation Methods

Basic Set-up of a Randomized Evaluation

Target Populatio

n

Not in evaluation

Evaluation Sample

TotalPopulation

Random Assignmen

t

Treatment Group

Control Group

Page 41: Evaluation Methods

Randomly samplefrom area of interest

Random Sampling and Random Assignment

Page 42: Evaluation Methods

Randomly samplefrom area of interest

Randomly assignto treatmentand control

Random Sampling and Random Assignment

Randomly samplefrom both treatment and control

Page 43: Evaluation Methods

Randomization Design

Population = all schools in case villages

Target population: weakest students in all of these schools

Stratify on three criteria:

• Pre-test scores

• Gender

• Language

Give 50% of them the Balsakhi program

Page 44: Evaluation Methods

Impact of Balsakhi - Summary

Method Impact Estimate

(1) Pre-post 26.42*

(2) Simple Difference -5.05*

(3) Difference-in-Difference 6.82*

(4) Regression 1.92

*: Statistically significant at the 5% level

Page 45: Evaluation Methods

Which of these methods do you think is closest to the truth?

A. Pre-post

B. Simple difference

C. Difference-in-Difference

D. Regression

E. Don’t know

Method Impact Estimate

(1) Pre-post 26.42*

(2) Simple Difference -5.05*

(3) Difference-in-Difference 6.82*

(4) Regression 1.92

*: Statistically significant at the 5% level

Page 46: Evaluation Methods

Impact of Balsakhi - Summary

Method Impact Estimate

(1) Pre-post 26.42*

(2) Simple Difference -5.05*

(3) Difference-in-Difference 6.82*

(4) Regression 1.92

(5)Randomized Experiment 5.87*

*: Statistically significant at the 5% level

Page 47: Evaluation Methods

Example #2 - Pratham’s Read India Program

*: Statistically significant at the 5% level

Method Impact

(1) Pre-Post 0.60*

(2) Simple Difference -0.90*

(3) Difference-in-Differences 0.31*

(4) Regression 0.06

Page 48: Evaluation Methods

Which of these methods do you think is closest to the truth?

A. Pre-post

B. Simple difference

C. Difference-in-DifferenceD. Regression

E. Don’t know

*: Statistically significant at the 5% level

Method Impact

(1) Pre-Post 0.60*

(2) Simple Difference -0.90*

(3) Difference-in-Differences

0.31*

(4) Regression 0.06

A. B. C. D. E.

0% 0% 0%0%0%

Page 49: Evaluation Methods

Example #2 – Pratham’s Read India Program

Method Impact

(1) Pre-Post 0.60*

(2) Simple Difference -0.90*

(3) Difference-in-Differences 0.31*

(4) Regression 0.06(5) Randomized Experiment 0.88*

*: Statistically significant at the 5% level

Page 50: Evaluation Methods

Method Comparison Works only if…

Pre-Post Program participants before program

Nothing else was affecting outcome

Simple Difference

Individuals who did not participate (data collected after program)

Non-participants are exactly equal to participants

Differences-in-Difference

Same as above + data collected before and after

If two groups have exactly the same trajectory over time

Regression

Same as above +additional “explanatory” variables

Omitted variables do not affect results

Randomized Evaluation

Participants randomly assigned to control group

The two groups are statistically identical on observed and unobserved characteristics

Summary of Methods

Page 51: Evaluation Methods

Conditions Required

Method Comparison Group Works if….

Pre-Post Program participants before program

The program was the only factor influencing any changes in the measured outcome over time

Simple Difference

Individuals who did not participate (data collected after program)

Non-participants are identical to participants except for program participation, and were equally likely to enter program before it started.

Differences in Differences

Same as above, plus: data collected before and after

If the program didn’t exist, the two groups would have had identical trajectories over this period.

Multivariate Regression

Same as above plus:Also have additional “explanatory” variables

Omitted (because not measured or not observed) variables do not bias the results because they are either: uncorrelated with the outcome, ordo not differ between participants and non-participants

Propensity Score Matching

Non-participants who have mix of characteristics which predict that they would be as likely to participate as participants

Same as above

Randomized Evaluation

Participants randomly assigned to control group

Randomization “works” – the two groups are statistically identical on observed and unobserved characteristics

Page 52: Evaluation Methods

Other Methods

There are more sophisticated non-experimental methods to estimate program impacts:

• Regression

• Matching

• Instrumental Variables

• Regression Discontinuity

These methods rely on being able to “mimic” the counterfactual under certain assumptions

Problem: Assumptions are not testable

Page 53: Evaluation Methods

Conclusions: Why Randomize?

There are many ways to estimate a program’s impact

This course argues in favor of one: randomized experiments

• Conceptual argument: If properly designed and

conducted, randomized experiments provide the most

credible method to estimate the impact of a program

• Empirical argument: Different methods can generate

different impact estimates

Page 54: Evaluation Methods

Key Steps in Conducting an Experiment

1. Design the study carefully

2. Randomly assign people to treatment or control

3. Collect baseline data

4. Verify that assignment looks random

5. Monitor process so that integrity of experiment is not compromised

6. Collect follow-up data for both the treatment and control groups

7. Estimate program impacts by comparing mean outcomes of treatment group vs. mean outcomes of control group.

8. Assess whether program impacts are statistically significant and practically significant.

Page 55: Evaluation Methods

THANK YOU