37
An Introduction to Impact Evaluation in Observational (Non-Experimental) Settings Alexis Diamond Development Impact Department

Alexis Diamond - quasi experiments

Embed Size (px)

DESCRIPTION

Presentation at "Impact Evaluation for Financial Inclusion" (January 2013) CGAP and the UK Department for International Development (DFID) convened over 70 funders, practitioners, and researchers for a workshop on impact evaluation for financial inclusion in January 2013. Co-hosted by DFID in London, the workshop was an opportunity for participants to engage with leading researchers on the latest research methods of impact evaluation and to discuss other areas on the impact evaluation agenda.

Citation preview

Page 1: Alexis Diamond  - quasi experiments

An Introduction to Impact Evaluation

in Observational (Non-Experimental) Settings

Alexis Diamond

Development Impact Department

Page 2: Alexis Diamond  - quasi experiments

2

Goals for this Presentation

• To explain key differences between randomized experiments

(RCTs) and observational studies

• To briefly sketch some of the most important methods of

causal inference in observational studies, showing how they

might be applied to answer questions in access to finance

projects, and offering practical guidance on:

Matching (an estimator, or a tool for designing observational studies?)

Differences-in-differences

Encouragement design (Instrumental variable, or “IV” regression)

Regression discontinuity design

Synthetic control methods

Page 3: Alexis Diamond  - quasi experiments

3

Basic concepts

• Observational study: comparison of treated and control

groups in which the objective is to estimate cause and effect

relationships, without the benefit of random assignment.

Observational studies are also known as quasi-experiments or

natural experiments

• In a randomized experiment, random chance forms

comparison groups (treatment and control), making groups

comparable in terms of both measureable characteristics and

characteristics that cannot be measured.

• Generally, If assumptions are met, causal conclusions follow—

but generally only in randomized experiments do we KNOW

assumptions are met; otherwise, assumptions aren’t testable.

Page 5: Alexis Diamond  - quasi experiments

5

Selection bias: ―perfect implementation‖

i Yi(observed) Treatment Status

1 5 Treatment

2 6 Treatment

3 4 Treatment

4 4 Control

5 2 Control

6 6 Control

A microfinance project is reporting the ex-post impact

indicator $/day for participants and non-participants…

Page 6: Alexis Diamond  - quasi experiments

6

Selection bias: ―perfect implementation‖

i Yi(observed) Treatment Status

1 5 Treatment

2 6 Treatment

3 4 Treatment

4 4 Control

5 2 Control

6 6 Control

A microfinance project is reporting the ex-post impact

indicator $/day for participants and non-participants…

Average for the treatment group: $5/day

Page 7: Alexis Diamond  - quasi experiments

7

Selection bias: ―perfect implementation‖

i Yi(observed) Treatment Status

1 5 Treatment

2 6 Treatment

3 4 Treatment

4 4 Control

5 2 Control

6 6 Control

A microfinance project is reporting the ex-post impact

indicator $/day for participants and non-participants…

Average for the treatment group: $5/day

Average for the control group: $4/day

Page 8: Alexis Diamond  - quasi experiments

8

Selection bias: ―perfect implementation‖

i Yi(observed) Treatment Status

1 5 Treatment

2 6 Treatment

3 4 Treatment

4 4 Control

5 2 Control

6 6 Control

A microfinance project is reporting the ex-post impact

indicator $/day for participants and non-participants…

Difference = +$1/day:

Average for the treatment group: $5/day

Average for the control group: $4/day

Page 9: Alexis Diamond  - quasi experiments

9

Selection bias: ―perfect implementation‖

How should one think about that result, +$1/day?

Does it mean the project has positive impact?

Page 10: Alexis Diamond  - quasi experiments

10

Selection bias: ―perfect implementation‖

i Yi Yi(1) Yi(0) Treatment Status Yi(1) – Yi(0)

1 5 5 ? Treatment ?

2 6 6 ? Treatment ?

3 4 4 ? Treatment ?

4 4 ? 4 Control ?

5 2 ? 2 Control ?

6 6 ? 6 Control ?

How should one think about that result, +$1/day?

Does it mean the project has positive impact?

Impact on who?

Page 11: Alexis Diamond  - quasi experiments

11

Selection bias: ―perfect implementation‖

i Yi Yi(1) Yi(0) Treatment Status Yi(1) – Yi(0)

1 5 5 2 Treatment +3

2 6 6 3 Treatment +3

3 4 4 1 Treatment +3

4 4 3 4 Control -1

5 2 1 2 Control -1

6 6 5 6 Control -1

Avg Treatment Effect for Treated (ATT) = 3

Avg Treatment Effect for Control (ATC) = -1

Avg Treatment Effect (ATE) = +3 – (-1) = 4

That simple $1/day difference we identified earlier = ATT + BIAS

Page 12: Alexis Diamond  - quasi experiments

12

Selection bias: Ignore it at your peril

i Yi Yi(1) Yi(0) Treatment Status Yi(1) – Yi(0)

1 5 5 ? Treatment ?

2 6 6 ? Treatment ?

3 4 4 ? Treatment ?

4 4 ? 4 Control ?

5 2 ? 2 Control ?

6 6 ? 6 Control ?

Identifying impacts requires identifying Y(1) and Y(0) for the same units

BIAS can be positive/negative, big/small, observed/hidden…

Page 13: Alexis Diamond  - quasi experiments

13

Observational studies: Are they credible? Yes, but…

A judgment-free method for dealing with problems of

sample selection bias is the Holy Grail of the evaluation

literature, but this search reflects more the aspirations of

researchers than any plausible reality…

—Rajeev Dehejia, ―Practical Propensity Score Matching‖

• Some have tried to set up tests for observational methods:

e.g., “Can method X (matching, regression, IV, etc.) recover

the true (experimental) benchmark?”

• Such efforts have generally failed to conclusively validate

observational studies.

Page 14: Alexis Diamond  - quasi experiments

14

Observational studies: Are they credible? Yes, but…

History abounds with examples where causality has

ultimately found general acceptance without any

experimental evidence…

The evidence of a causal effect of smoking on lung

cancer is now generally accepted, without any direct

experimental evidence to support it…

At the same time, the long road toward general

acceptance of the causal interpretation …shows the

difficulties in gaining acceptance for causal claims

without randomization.

—Guido Imbens, ―Better LATE than Nothing‖

Page 15: Alexis Diamond  - quasi experiments

15

Why bother with observational studies?

• Studies that start as perfect RCTs often end as broken RCTs,

not “gold-standard” RCTs. These broken RCTs may be better

than many observational studies, but there is no bright line

distinguishing broken RCTs from observational studies.

• Standard RCTs cannot address many important policy issues

(i.e., macroeconomic questions, or cases with general

equilibrium effects more broadly)

• Other issues are difficult to address with RCTs, setting up a

trade-off between rigor and relevance. What’s better—the

RCT in a lab setting, or the equivalent observational study?

• RCTs are often more expensive, time-consuming, and fragile

than alternatives—can be high risk and not always strategic.

Page 16: Alexis Diamond  - quasi experiments

16

More advantages of observational studies

• Sometimes you can use pre-existing data, which has time and

cost advantages (though there are clear trade-offs)

o Typical out-of-pocket time/cost of a World Bank RCT: > 1 year & $500K

o Occasionally they can be done cheaply and easily, especially in a place

like India (there are examples where it costs < $50,000)

o With administrative data, observational studies may have no (or trivial)

out-of-pocket costs, and be completed in days or weeks.

• Sometimes you want to apply observational methods to

experimental data

• Good for hypothesis-generation

• Avoids RCT’s ethical considerations

Page 17: Alexis Diamond  - quasi experiments

17

Methodology #1: Matching

You: My clients enjoy big impacts from our bank’s financing

Critic: Compared to whom? Where’s the control group?

You: Ok, I’ll go find one—and then you’ll see!

Page 18: Alexis Diamond  - quasi experiments

18

Methodology #1: Matching

You: My clients enjoy big impacts from our bank’s financing

Critic: Compared to whom? Where’s the control group?

You: Ok, I’ll go find one—and then you’ll see!

Eg: Boonperm/Haughten, “Thailand Village Fund” (2009)

Treated

Control 2

Control 1

Control 3

X1: Education

X2: Age

Page 19: Alexis Diamond  - quasi experiments

19

Methodology #1: Matching

Treated

Control 2

Control 1

Control 3

X1: Education

X2: Age

Treated

Control 3

Control 2

Control 1

X2: Age

Rescale X1: Education multiplied by 2

You: My clients enjoy big impacts from our bank’s financing

Critic: Compared to whom? Where’s the control group?

You: Ok, I’ll go find one—and then you’ll see!

Eg: Boonperm/Haughten, “Thailand Village Fund” (2009)

Page 20: Alexis Diamond  - quasi experiments

20

Matching: Points to consider

• Matching is (unfortunately) as much art as science, and there

are more methodological varieties of matching than there are

flavors of ice cream

• Widespread agreement that matching is, at a minimum, a

useful pre-processing step to reduce model dependence.

Unfortunately, no consensus on balance tests/diagnostics.

• Hugely important benefit of matching is that it is performed

“blind to the answer”—comparing favorably with regression

• Matching helps with selection bias due to observed variables

(confounders)—it does not help with unobserved confounders.

For the latter, one can (and should) do sensitivity analysis.

Page 21: Alexis Diamond  - quasi experiments

21

Methodology #2: Differences-in-Differences (D-i-D)

You: My clients enjoy big impacts from our bank’s financing

Critic: Compared to whom? Where’s the control group?

You: Ok, I’ll go find one—and then you’ll see!

Critic: Too many unobservables. It’s a waste of time.

You: Well, can you assume my control group’s growth rate

(e.g., near zero), is a good proxy for the treatment

group’s counterfactual growth rate (without the loan?)

D-i-D: subtract one before/after difference from the other

Addresses observed confounders (regression assumptions) &

unobserved time-invariant confounders common to treatment

and control groups. See Kondo’s work in the Philippines (ADB).

Page 22: Alexis Diamond  - quasi experiments

22

Diffs-in-Diffs: Points to consider

Incom

e

After Before

Estimated ATET

Pre-treatment difference

NOTE: Circles are observed, square (counterfactual) is unobserved (imputed).

Treated

Treated

Control

Control

Counterfactual

Page 23: Alexis Diamond  - quasi experiments

23

Diffs-in-Diffs: Points to consider

• If matching is implausible, why would D-i-D be plausible?

Does the parallel trend assumption seem easier to believe?

• The parallel trend assumption must hold over the time

period, implying composition of two groups should remain

constant over time.

• D-i-D benefits from “placebo tests” run pre-treatment

Page 24: Alexis Diamond  - quasi experiments

24

Methodology #3: Encouragement Design

You: Well, can you assume my control group’s growth rate

(e.g., near zero), is a good proxy for the treatment

group’s counterfactual growth rate (without the loan?)

Critic: No, also not credible.

You: OK, how about a natural experiment?

Our FI established additional info kiosks in 100 villages

to encourage loan take-up—these villages were not

chosen at random, but it was ―practically‖ random.

The encouragement (“instrument”, assumed “as good as

random”) has an effect (for some) on probability of finance.

This method leverages this “exogenous” variation to overcome

potential bias from both observed and unobserved confounders.

Page 25: Alexis Diamond  - quasi experiments

25

Encouragement Design: Points to consider

• Encouragement design requires strong assumptions:

o Encouragement must really be random or almost random, and must

have no direct effect on impacts (only an indirect effect via treatment)

o The encouragement must NEVER discourage take-up (no defiers)

o Causal estimates restricted to “compliers” only… (Who?)

o Also, for credible results, encouragement had better be effective

• Strange quirk: different answers, from different models, can

all be “correct” because complier populations may differ

• Was popular, now more disparaged in observational work

• Again, sensitivity tests are available and should be run

Page 26: Alexis Diamond  - quasi experiments

26

Methodology #4: Regression Discontinuity Design

You: OK, how about a natural experiment?

Our FI established additional info kiosks in 100 villages

to encourage loan take-up—these villages were not

chosen at random, but it was ―practically‖ random.

Critic: I don’t buy it. Rollout was in fact strategic, not random.

You: Ok, I’ll try again. This bank always provides extra lines

of credit at great terms to customers with credit scores

above a certain threshold. Let’s compare results for

customers just above and below the threshold.

Treatment assumed as good as random at the threshold if the

discontinuity is sharp. RDD addresses observed and unobserved

confounders. What question will the RDD design above answer?

Page 27: Alexis Diamond  - quasi experiments

27

Regression Discontinuity Design: Points to consider

• Generally considered a very strong design: US Dept of

Education classifies it in the same category as RCT

• Only informative for those at the discontinuity threshold

• No “gaming” the threshold allowed (ideally, the threshold is

unknown to the subjects, or outside subjects’ control)

• Relatively low statistical power, requiring much larger

sample sizes than RCTs or other observational methods.

• Watch out for contamination by other treatments at the same

discontinuity

• Sensitivity tests available to probe plausibility of assumptions

Page 28: Alexis Diamond  - quasi experiments

28

Methodology #5: Synthetic control method

Critic: I don’t buy it. It must’ve been strategic, not random.

You: Ok, I’ll try again. This bank always offers extra lines of

credit at great terms to customers with credit scores

above a certain threshold. Let’s compare results for

customers just above and below the threshold.

Critic: I’m not interested in only a narrow set of borrowers.

You: Last try. How about we do an in-depth case-study of a

greenfield microfinance institution, asking about the

social welfare impact on the neighboring community?

Synthetic controls allows inference for a single treated unit.

This approach addresses observed and unobserved confounders.

Page 29: Alexis Diamond  - quasi experiments

29

Methodology #5: Synthetic control method

Estimating Average Impact on Household Consumption in a Single Village

20

00

00

40

00

00

60

00

00

80

00

00

10

00

00

0

Per

Ca

pita

Exp

en

ditu

res

1995 2000 2005 2010year

Treated District (Kabil) Synthetic Control District

Page 30: Alexis Diamond  - quasi experiments

30

Synthetic controls: Points to consider

• Only method allowing for rigorous quantitative causal

inference for a single treated unit

• Enormous growth in popularity in last 5 years

• Particularly well-suited to case-studies exploring program

impacts at village/city/state/country level

• Requires time-series data and many control units

• Placebo tests are available to assess plausibility of critical

assumptions

Page 31: Alexis Diamond  - quasi experiments

31

• Creating/testing elaborate theories is particularly helpful for

indirectly testing for hidden biases (unconfoundedness).

When asked what can be done in observational

studies to clarify the step from association to

causation, Fisher replied: ―Make your theories

elaborate.‖ (Cochrane)

This is sage advice, but often misunderstood.

Fisher didn’t mean you should make your

theories and explanations complicated.

He meant, when constructing causal hypothesis,

envisage as many different consequences of its

truth as possible, and plan observational studies

to discover whether each holds.

Elaborate theories, multiple tests

Page 32: Alexis Diamond  - quasi experiments

32

Final thoughts

Page 33: Alexis Diamond  - quasi experiments

33

Final thoughts

• Ex-ante, be clear as to standard of evidence (going to depend upon the

purpose of your inquiry, and who your audience is)

Page 34: Alexis Diamond  - quasi experiments

34

Final thoughts

• Ex-ante, be clear as to standard of evidence (going to depend upon the

purpose of your inquiry, and who your audience is)

• Also ex-ante, be clear re treatment, covariates, units, and assumptions.

Page 35: Alexis Diamond  - quasi experiments

35

Final thoughts

• Ex-ante, be clear as to standard of evidence (going to depend upon the

purpose of your inquiry, and who your audience is)

• Also ex-ante, be clear re treatment, covariates, units, and assumptions.

• Try to adjust for (eliminate) differences in observed characteristics

while remaining blind to the answer.

Page 36: Alexis Diamond  - quasi experiments

36

Final thoughts

• Ex-ante, be clear as to standard of evidence (going to depend upon the

purpose of your inquiry, and who your audience is)

• Also ex-ante, be clear re treatment, covariates, units, and assumptions.

• Try to adjust for (eliminate) differences in observed characteristics

while remaining blind to the answer.

• Run diagnostics/sensitivity tests for unobserved (hidden) bias

Page 37: Alexis Diamond  - quasi experiments

37

Final thoughts

• Ex-ante, be clear as to standard of evidence (going to depend upon the

purpose of your inquiry, and who your audience is)

• Also ex-ante, be clear re treatment, covariates, units, and assumptions.

• Try to adjust for (eliminate) differences in observed characteristics

while remaining blind to the answer.

• Run diagnostics/sensitivity tests for unobserved (hidden) bias

• Devise/test multiple“elaborate theories”. Invest in learning about the

substantive problem to be solved, and be skeptical of your own results.