30
Regression Discontinuity 10/13/08

Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

  • View
    218

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

Regression Discontinuity

10/13/08

Page 2: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

What is R.D.?• Regression--the econometric/statistical

tool social scientists use to analyze multivariate correlations

Yi = α + X1iβ1 + X2iβ 2 + ei

Where Y is some sort of dependent variable, alpha’s a constant, the X’s are a bunch of independent variables, the beta’s are coefficients, and the e is the error term.

Page 3: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

Some sort of arbitrary jump/change thanks to a quirk in law or nature.

We’re interested in the ones that make very similar people get very dissimilar results.

Discontinuity

Page 4: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

Discontinuity Examples

• PSAT/NMSQT

– Basically the top 16,000 test-takers get a scholarship.

– A small difference in test score can means a discontinuous jump in scholarship amount.

Page 5: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

Discontinuity Examples

• School Class Size– Maimonides’ Rule--No more than 40

kids in a class in Israel.– 40 kids in school means 40 kids per

class. 41 kids means two classes with 20 and 21.

(Angrist & Lavy, QJE 1999)

Page 6: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

Discontinuity Examples

• Union Elections– If employers want to unionize, NLRB

holds election. 50% means the employer doesn’t have to recognize the union, and 50% + 1 means the employer is required to “bargain in good faith” with the union.

(DiNardo & Lee, QJE 2004)

Page 7: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

Discontinuity Examples

• U.S. House Elections– Incumbency advantage. If you’re first

past the pole in the previous election, even by just one vote, you get a huge advantage in the next election.

(David Lee, Journal of Econometrics 2007)

Page 8: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

Discontinuity Examples• Air Pollution and Home Values

– The Clean Air Act’s National Ambient Air Quality Standards say if the geometric mean concentration of 5 pollutant particulates is 75 micrograms per cubic meter or greater, county is classified as “non-attainment” and are subject to much more stringent regulation.

(Ken Chay, Michael Greenstone, JPE 2005)

Page 9: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

Combine the “R” and the “D”Run a regression based on a situation

where you’ve got a discontinuity.

Treat above-the-cutoff and below-the-cutoff like the treatment and control groups from a randomization.

Page 10: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

Why are we doing this?Why do we have to look for quirks like this?

Can’t we just control for whatever we want using OLS or some other line-fitting tool?

Just get a bunch of people’s salaries and PSAT scores. PSAT’s are X, income is Y, run a regression in SPSS/Stata, or heck, even Excel, and we have causal inference, right? Higher test scores cause people to earn more later in life.

Yi = α + X1iβ1 + X2iβ 2 + ei

Page 11: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

No.The statistical methods we use are based on lot of

assumptions. Importantly, the error terms (which is really full of things we can’t measure, the unobservables) are supposed to be uncorrelated with the X’s and normally distributed.

In reality, those conditions probably hasn’t been met in any of the previous situations.

For example, class size is probably correlated with some type of neighborhood quality.

Please turn to your neighbor and discuss what is probably wrong with each of the previous 5 examples (PSAT, class size, union elections, house elections, air pollution)

Page 12: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

No.The statistical methods we use are based on lot of

assumptions. Importantly, the error terms (which is really full of things we can’t measure, the unobservables) are supposed to be uncorrelated with the X’s and normally distributed.

In reality, those conditions probably hasn’t been met in any of the previous situations.

• Higher PSAT kids might have higher ability.• Crowded classrooms might be in poorer schools.• Unionized workers might work for certain types of firms.• Incumbent politicians might be better. They won before, didn’t they?• Pollution might be correlated to economic growth, which could

increase home values.

Page 13: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

Controlling for everything?Focus on the Israeli schools for a second.We can try and control for neighborhood poverty level.Does that solve the problem?No.If neighborhood poverty level is correlated with the X of

interest (class size) why would you think it’s safe to assume that the unobservables aren’t correlated? Have you really magically controlled for every single thing that’s correlated with the X of interest? Probably not.

So let’s find a bandwidth in which these things are uncorrelated.

Page 14: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

A Bandwidth of Randomness

Test scores aren’t random, and neither is class size, nor air pollution.

But is a kid in the 94.9th percentile really that different from the 95th percentile kid?

Is a school with 40 kids that different from a school with 41?

Right around the cutoff, there’s a good chance things are random.

Page 15: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

No Sorting - ObservablesBut don’t take my word for it. Look at the averages of the

observables in your below cutoff group, and the averages of the observables in the above cutoff group. Are they the same? Hopefully, but maybe not.

Do people know about this cutoff? Are they doing some endogenous sorting? When deciding where to live, did good moms look for schools where their kids would be the 41st kid? Did certain types of polluters look for counties where they’d be below the cutoff?

These things can be checked to some degree--look at the average observables above and below the cutoff.

Page 16: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

No Sorting - ClumpingIn addition to checking the observables on either side

of the cutoff, we should check the density of the distribution. Is it unusually low/high right around the cutoff?

If there’s some abnormally large portion of people right around the cutoff, it’s quite possible that you don’t have random assignment.

Page 17: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

No Sorting - Clumping

Dude, you’re totally cheating. Please stop.

Emily Conover & Adriana Camacho “Manipulation of Social Program Eligibility”

QuickTime™ and a decompressor

are needed to see this picture.

Page 18: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

GSP--Multiple Analyses“Incentives to Learn,” Ted Miguel, Michael Kremer, Rebecca Thornton

Girls Scholarship Program, Busia Kenya.

Randomize holding a scholarship competition across schools in Busia and Teso districts.

Treatment: If a girl finishes in the top 15% in her district on the end-of-year exam, she wins a two-year scholarship.

Randomization Analysis: Does attending a school with the competition make you work harder/improve schooling outcomes?

RD Analysis: Does winning the award improve schooling outcomes?

Page 19: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

P-900 in Chile“The Central Role of Noise in Evaluating Interventions That Use Test Scores to Rank Schools” Kenneth Y.

Chay, Patrick J. Mcewan, Miguel Urquiola, AER 2005

Mean Reversion: Sophomore Slump, SI Cover Curse, Heisman Trophy Curse, Madden curse, and in the opposite direction.

QuickTime™ and a decompressor

are needed to see this picture.

Page 20: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

THIS IS THE MOST AMAZING THING EVER!

HOLY CRAP! Look at the educational outcomes of treatment schools in 1990, compared to those same schools in 1988, before the program. AMAZING! FANTABULOUS!

QuickTime™ and a decompressor

are needed to see this picture.

Page 21: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

Oh, wait.

Hmm. That’s kind of disappointing.

QuickTime™ and a decompressor

are needed to see this picture.

Page 22: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

So how do we actually do this?1. Draw two pretty pictures

1. Eligibility criterion (test score, income, or whatever) vs. Program Enrollment

2. Eligibility criterion vs. OutcomeFigure 1: Participation in PANES and eligibility

0

.2

.4

.6

.8

1

- . 0 2 - . 0 1 0 . 0 1 . 0 2

s t a n d a r d iz e d S E S

Figure 2: Political support for the government and program eligibility

.5

.6

.7

.8

.9

-.02 -.01 0 .01 .02

standardized SES

Page 23: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

So how do we actually do this?2. Run a simple regression.

(Yes, this is basically all we ever do, and the stats programs we use can run the calculation in almost any situation, but before we do it, it’s necessary to make sure the situation is appropriate and draw the graphs so that we can have confidence that our estimates are actually causal.)

Outcome as a function of test score (or whatever), with a binary (1 if yes, 0 if no) variable for program enrollment.

QuickTime™ and a decompressor

are needed to see this picture.

Page 24: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

Is it really that simple?Don’t be silly.

You could totally have a situation where the outcome is some sort of quadratic or cubic or nth polynomial function of the test score. Try controlling for that. This is going to depend on the situation and is somewhat arbitrary.

QuickTime™ and a decompressor

are needed to see this picture.

Page 25: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

Wait, “somewhat arbitrary?”

Yeh, lame, I know. Arbitrary’s what we’re trying to avoid. But two things aren’t univerally clear:

1. How wide a bandwidth around the cutoff are we looking at?

We’re really only confident in our estimate for people that are close to the cutoff. This is a LOCAL AVERAGE TREATMENT EFFECT. We can confidently say that a school right around the cutoff would improve average test scores by X if they received the treatment, but we’re not so confident that already awesome schools would get the same benefit.

Page 26: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

Wait, “somewhat arbitrary?”

2. Without the program, what shaped function would there be naturally?

What sort of function do we throw in to control for the fact that even if there was no National Merit Semifinalist scholarship, smarter kids are likely to earn more later in life?

The solution: SHOW YOUR WORK

Page 27: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

You’re Such a Phony.In addition to showing your work, another good robustness check is to test for the effects of non-existent programs.

QuickTime™ and a decompressor

are needed to see this picture.

Page 28: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

You’re Such a Phony.

QuickTime™ and a decompressor

are needed to see this picture.

Page 29: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

Conclusion

• Find a threshold

• Look at people just above and just below

• Make sure there’s no sorting

• It’s only a local effect

Page 30: Regression Discontinuity 10/13/08. What is R.D.? Regression--the econometric/statistical tool social scientists use to analyze multivariate correlations

In Your Groups

• Do we have a threshold?

• Are people sorting?

• It’s a local effect--is that what we want?