47
An introduction to Impact Evaluation Markus Goldstein AFTPM & DECRG

An introduction to Impact Evaluation

Embed Size (px)

DESCRIPTION

An introduction to Impact Evaluation. Markus Goldstein AFTPM & DECRG. Knowledge is the most democratic source of power. -Alvin Toffler. a world in which there are two types of people. Those who know Those who know that they don’t know. So how can we know?. Monitoring Evaluation - PowerPoint PPT Presentation

Citation preview

Page 1: An introduction to Impact Evaluation

An introduction to Impact Evaluation

Markus Goldstein

AFTPM & DECRG

Page 2: An introduction to Impact Evaluation

Knowledge is the most democratic source of power

-Alvin Toffler

Page 3: An introduction to Impact Evaluation

a world in which there are two types of people

1. Those who know

2. Those who know that they don’t know

Page 4: An introduction to Impact Evaluation

So how can we know?

• Monitoring

• Evaluation

• Impact evaluation

Page 5: An introduction to Impact Evaluation

Outline

• Monitoring and impact evaluation

• Why do impact evaluation

• Why we need a comparison group

• Methods for constructing the comparison group

• Microfinance example of why it matters

• When to do an impact evaluation

Page 6: An introduction to Impact Evaluation

Monitoring and evaluation

• Monitoring: collection, analysis and use of data on indicators at different levels (inputs, outputs, outcomes)

• Evaluation: focus on processes and understanding why indicators are moving the way they are

Page 7: An introduction to Impact Evaluation

Monitoring - levels

IMPACT

OUTPUTS

OUTCOMES

INPUTS

Effect on living standards - infant and child mortality, - prevalence of specific disease

Financial and physical resources - spending in primary health care

Goods and services generated - number of nurses - availability of medicine

Access, usage and satisfaction of users - number of children vaccinated, - percentage within 5 km of health center

Page 8: An introduction to Impact Evaluation

Monitoring and causality

Gov’t/program production function

Users meet service delivery

INPUTS

OUTPUTS

OUTCOMES

IMPACTSProgram impacts confounded by local, national, global effects

difficulty of showing causality

Page 9: An introduction to Impact Evaluation

Impact evaluation

• Many names (e.g. Rossi et al call this impact assessment) so need to know the concept.

• Impact is the difference between outcomes with the program and without it

• The goal of impact evaluation is to measure this difference in a way that can attribute the difference to the program, and only the program

Page 10: An introduction to Impact Evaluation

Why it matters

• We want to know if the program had an impact and the average size of that impact– Understand if policies work

• Justification for program (big $$)• Scale up or not – did it work?• Compare different policy options within a program• Meta-analyses – learning from others

– (with cost data) understand the net benefits of the program

– Understand the distribution of gains and losses

Page 11: An introduction to Impact Evaluation

What we need

The difference in outcomes with the program versus without the program – for the same unit of analysis (e.g. individual)

• Problem: individuals only have one existence

• Hence, we have a problem of a missing counter-factual, a problem of missing data

Page 12: An introduction to Impact Evaluation

Thinking about the counterfactual

• Why not compare individuals before and after (the reflexive)?– The rest of the world moves on and you are

not sure what was caused by the program and what by the rest of the world

• We need a control/comparison group that will allow us to attribute any change in the “treatment” group to the program (causality)

Page 13: An introduction to Impact Evaluation

comparison group issues

• Two central problems:– Programs are targeted

Program areas will differ in observable and unobservable ways precisely because the program intended this

– Individual participation is (usually) voluntaryParticipants will differ from non-participants in observable

and unobservable ways

• Hence, a comparison of participants and an arbitrary group of non-participants can lead to heavily biased results

Page 14: An introduction to Impact Evaluation

Example: providing fertilizer to farmers

• The intervention: provide fertilizer to farmers in a poor region of a country (call it region A)– Program targets poor areas– Farmers have to enroll at the local extension office to

receive the fertilizer– Starts in 2002, ends in 2004, we have data on yields

for farmers in the poor region and another region (region B) for both years

• We observe that the farmers we provide fertilizer to have a decrease in yields from 2002 to 2004

Page 15: An introduction to Impact Evaluation

Did the program not work?

• Further study reveals there was a national drought, and everyone’s yields went down (failure of the reflexive comparison)

• We compare the farmers in the program region to those in another region. We find that our “treatment” farmers have a larger decline than those in region B. Did the program have a negative impact?– Not necessarily (program placement)

• Farmers in region B have better quality soil (unobservable)• Farmers in the other region have more irrigation, which is key

in this drought year (observable)

Page 16: An introduction to Impact Evaluation

OK, so let’s compare the farmers in region A

• We compare “treatment” farmers with their neighbors. We think the soil is roughly the same.

• Let’s say we observe that treatment farmers’ yields decline by less than comparison farmers. Did the program work? – Not necessarily. Farmers who went to register with the program

may have more ability, and thus could manage the drought better than their neighbors, but the fertilizer was irrelevant. (individual unobservables)

• Let’s say we observe no difference between the two groups. Did the program not work? – Not necessarily. What little rain there was caused the fertilizer to

run off onto the neighbors’ fields. (spillover/contamination)

Page 17: An introduction to Impact Evaluation

The comparison group

• In the end, with these naïve comparisons, we cannot tell if the program had an impact

We need a comparison group that is as identical in observable and unobservable dimensions as possible, to those receiving the program, and a comparison group that will not receive spillover benefits.

Page 18: An introduction to Impact Evaluation

How to construct a comparison group – building the counterfactual

1. Randomization

2. Matching

3. Difference-in-Difference

4. Instrumental variables

5. Regression discontinuity

Page 19: An introduction to Impact Evaluation

1. Randomization• Individuals/communities/firms are randomly assigned

into participation• Counterfactual: randomized-out groupCounterfactual: randomized-out group • Advantages:

– Often called the “gold standard”: by design: selection bias is zero on average and mean impact is revealed

– Perceived as a fair process of allocation with limited resources

• Disadvantages:– Ethical issues, political constraints– Internal validity (exogeneity): people might not comply with the

assignment (selective non-compliance)– Unable to estimate entry effect– External validity (generalizability): usually run controlled

experiment on a pilot, small scale. Difficult to extrapolate the results to a larger population.

Page 20: An introduction to Impact Evaluation

Randomization in our example…

• Simple answer: randomize farmers within a community to receive fertilizer...

• Potential problems?– Run-off (contamination) so control for this– Take-up (what question are we answering)

Page 21: An introduction to Impact Evaluation

2. Matching• Match participants with non-participants from a

larger survey• Counterfactual: matched comparison groupCounterfactual: matched comparison group

• Each program participant is paired with one or more non-participant that are similar based on observable characteristics

• Assumes that, conditional on the set of observables, there is no selection bias based on unobserved heterogeneity

• When the set of variables to match is large, often match on a summary statistics: the probability of participation as a function of the observables (the propensity score)

Page 22: An introduction to Impact Evaluation

2. Matching

• Advantages:– Does not require randomization, nor baseline (pre-

intervention data)

• Disadvantages:– Strong identification assumptions– Requires very good quality data: need to control for all

factors that influence program placement– Requires significantly large sample size to generate

comparison group

Page 23: An introduction to Impact Evaluation

Matching in our example…

• Using statistical techniques, we match a group of non-participants with participants using variables like gender, household size, education, experience, land size (rainfall to control for drought), irrigation (as many observable charachteristics not affected by fertilizer)

Page 24: An introduction to Impact Evaluation

Matching in our example…2 scenarios

– Scenario 1: We show up afterwards, we can only match (within region) those who got fertilizer with those who did not. Problem?

• Problem: select on expected gains and/or ability (unobservable)

– Scenario 2: The program is allocated based on historical crop choice and land size. We show up afterwards and match those eligible in region A with those in region B. Problem?

• Problems: same issues of individual unobservables, but lessened because we compare eligible to potential eligible

• now unobservables across regions

Page 25: An introduction to Impact Evaluation

An extension of matching:pipeline comparisons

• Idea: compare those just about to get an intervention with those getting it now

• Assumption: the stopping point of the intervention does not separate two fundamentally different populations

• example: extending irrigation networks

Page 26: An introduction to Impact Evaluation

3. Difference-in-difference• Observations over time: compare observed

changes in the outcomes for a sample of participants and non-participants

• Identification assumption: the selection bias is time-invariant (‘parallel trends’ in the absence of the program)

• Counter-factual: changes over time for the non-Counter-factual: changes over time for the non-participantsparticipants

Constraint: Requires at least two cross-sections of data, pre-program and post-program on participants and non-participants– Need to think about the evaluation ex-ante, before the program

• Can be in principle combined with matching to adjust for pre-treatment differences that affect the growth rate

Page 27: An introduction to Impact Evaluation

Implementing differences in differences in our example…

• Some arbitrary comparison group

• Matched diff in diff

• Randomized diff in diff

• These are in order of more problems less problems, think about this as we look at this graphically

Page 28: An introduction to Impact Evaluation

As long as the bias is additive and time-invariant, diff-in-diff will work ….

Y1

Impact Y1

*

Y0

t=0 t=1 time

Page 29: An introduction to Impact Evaluation

What if the observed changes over time are affected?

Y1

Impact? Y1

*

Y0

t=0 t=1 time

Page 30: An introduction to Impact Evaluation

4. Instrumental Variables• Identify variables that affects participation in the

program, but not outcomes conditional on participation (exclusion restriction)

• Counterfactual: The causal effect is identified out of the Counterfactual: The causal effect is identified out of the exogenous variation of the instrumentexogenous variation of the instrument

• Advantages:– Does not require the exogeneity assumption of matching

• Disadvantages:– The estimated effect is local: IV identifies the effect of the

program only for the sub-population of those induced to take-up the program by the instrument

– Therefore different instruments identify different parameters. End up with different magnitudes of the estimated effects

– Validity of the instrument can be questioned, cannot be tested.

Page 31: An introduction to Impact Evaluation

IV in our example

• It turns out that outreach was done randomly…so the time/intake of farmers into the program is essentially random.

• We can use this as an instrument

• Problems?– Is it really random? (roads, etc)

Page 32: An introduction to Impact Evaluation

5.Regression discontinuity design• Exploit the rule generating assignment into a program

given to individuals only above a given threshold – Assume that discontinuity in participation but not in counterfactual outcomes

• Counterfactual: individuals just below the cut-off who did Counterfactual: individuals just below the cut-off who did not participatenot participate

• Advantages:– Identification built in the program design– Delivers marginal gains from the program around the

eligibility cut-off point. Important for program expansion

• Disadvantages:– Threshold has to be applied in practice, and

individuals should not be able manipulate the score used in the program to become eligible.

Page 33: An introduction to Impact Evaluation

Figure 1: Kernel Densities of Discriminant Scores and Threshold points by region

De

nsi

ty

Region 3Discriminant Score

7593.9e-06

.003412

De

nsi

ty

Region 4Discriminant Score

7532.8e-06

.00329

De

nsi

ty

Region 5Discriminant Score

7510

.002918

De

nsi

ty

Region 6Discriminant Score

7525.5e-06

.004142

De

nsi

ty

Region 12Discriminant Score

5718.0e-06

.004625

De

nsi

ty

Region 27Discriminant Score

6914.5e-06

.003639

De

nsi

ty

Region 28Discriminant Score

757.000015

.002937

Example from Buddelmeyer and Skoufias, 2005

Page 34: An introduction to Impact Evaluation

RDD in our example…

• Back to the eligibility criteria: land size and crop history

• We use those right below the cut-off and compare them with those right above…

• Problems:– How well enforced was the rule?– Can the rule be manipulated?– Local effect

Page 35: An introduction to Impact Evaluation

What difference do unobservables make: Microfinance in Thailand

• 2 NGOs in north-east Thailand

• Village banks with loans of 1500-7500 (300 US$) baht

• Borrowers (women) form peer groups, which guarantee individual borrowing

• What would we expect impacts to be?

Page 36: An introduction to Impact Evaluation

Comparison group issues in this case:

• Program placement: villages which are selected for the program are different in observable and unobservable ways

• Individual self-selection: households which choose to participate in the program are different in observable and unobservable ways (e.g. entrepreneurship)

• Design solution: allow membership but no loans at first

Page 37: An introduction to Impact Evaluation

FE model Non-FE model

Naïve model Super naïve

Women’s land value

42.5

(93.3)

87.5

(65.3)

121**

(54.6)

6916***

(1974)

Women’s self emp sales

-10.7

(504)

174

(364)

542*

(296)

545*

(295)

Women’s ag sales

76.5

(101)

162

(73.9)

101*

(59.5)

113*

(59.9)

Unobserved village char

X

Observed village char

X X

Member obs & unobs char

X X

Member land 5 years ago

X X X

Results

from Coleman (JDE 1999)

Page 38: An introduction to Impact Evaluation

Prioritizing for Impact Evaluation

• It is not cheap – relative to monitoring• Possible prioritization criteria:

– Don’t know if policy is effective• e.g. conditional cash transfers

– Politics• e.g. Argentina workfare program

– It’s a lot of money

• Note that 2 & 3 are variants of not “knowing” – in this context, etc.

Page 39: An introduction to Impact Evaluation

Summing up:Methods

• No clear “gold standard” in reality – do what works best in the context

• Watch for unobservables, but don’t forget observables

• Be flexible, be creative – use the context

• IE requires good monitoring and monitoring will help you understand the effect size

Page 40: An introduction to Impact Evaluation

Human knowledge and human power meet in one; for where the

cause is not known the effect cannot be produced.

-Francis Bacon

Page 41: An introduction to Impact Evaluation

Thank you

Page 42: An introduction to Impact Evaluation

Impact Evaluation CN Template

1. What is the main question we want to answer?

2. What are the indicators we will use to capture this?

3. How will we set up the evaluation (evaluation method, strategy)

4. What will be our source of data?

5. Who will be responsible for what?

Page 43: An introduction to Impact Evaluation

Impact Evaluation CN Template

6. What is the work plan/time line?- Consider important policy milestones

7. How will we pay for it?

8. What are the plans for dissemination?

Page 44: An introduction to Impact Evaluation

Figure 1: World Bank Impact Evalautions, by year and status

2837

96

13

28

49

0

20

40

60

80

100

120

140

160

Before 2004 After 2004 Current(ongoing)

Num

ber o

f im

pact

eva

luat

ions

Non-bank projects

Bank projects

Page 45: An introduction to Impact Evaluation

* Includes impact evaluations of World Bank projects and impact evaluations funded by the World Bank (activities with multiple papers are counted once); impact evaluations with at least a methodological design

Portfolio of Impact Evaluations*

AFR, 47

EAP, 17ECA, 2

LAC, 26

MENA, 2

SAR, 27

Page 46: An introduction to Impact Evaluation

* Includes impact evaluations of World Bank projects and impact evaluations funded by the World Bank (activities with multiple papers are counted once); impact evaluations with at least a methodological design

Portfolio of Impact Evaluations*

CCT and other Social Protection,

12

Health, Nutrition & Population, 22

Urban Upgrading, 8

Youth Programs, 2

Other Infrastructure, 14

Agriculutre & Environment, 4

Other, 3

Education, ECD & Training, 31

Private Sector Development & Microfinance, 9

CDD/Social Funds, 16

Page 47: An introduction to Impact Evaluation

Status of Ongoing Impact Evaluations

62 62

55

8

20

0

10

20

30

40

50

60

70

Under Discussion Evaluation Designed Baseline DataCollected

Follow-up DataCollected

Analysis in Progress

Status

Num

ber o

f im

pact

eva

luat

ions