52
Michael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in- Difference Estimation, Foundations and Trends in Econometrics, 4 (survey plus).

Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Embed Size (px)

Citation preview

Page 1: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner

Difference-in-DifferenceEstimation

Paper related to this lecture:M. Lechner (2010): Difference-in-Difference Estimation, Foundations andTrends in Econometrics, 4 (survey plus).

Page 2: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 2

Introduction (1)

> Another, more explicit way to remove time constant 'confounders'

> Very popular methods, at least in their regression versions

> Panel data are not needed, repeated cross-sections are ok

> Can also be easily used with grouped data, typically at the regional unit x

time level (see AP, 5.2)

> Frequently used to analyse changes in some law if there exists a group not

affected by these changes, but it can be applied to other settings as well

> Key identifying assumption: Potential outcomes of groups with different

levels of D are subject to same time trends (common trend assumption)

Page 3: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 3

Introduction (2)

> Might be a reasonable strategy if biases of before-after estimator or matching estimators show some stability

> Idea: Compare outcomes before and after (difference 1) treatment for treated and untreated (difference 2)

> Attractive when it is impossible to control for confounders and there is no (other) IV

> Prices to pay

• Some functional form dependence cannot be removed

• Works only if pre-treatment information is available

• Only ATET is identified

Page 4: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 4

Introduction (3)

> Particularly attractive when panel data is not available /useable,

because controlling for pre-treatment outcomes is a very

attractive alternative

• could be done with matching if panel data is available

> Has been used for 60+ years in economics (and other sciences)

> Other names (used outside economics)

• untreated control group design with independent pretest and

posttest samples

• control group design with pretest and posttest

• …

Page 5: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 5

DiD: Basic ideas (1)

> 2 (4 for repeated cross-sections) different groups available (binary D)

• 1) group with D=1 in period t and D=0 in t-1

• 2) group with D=0 in period t and D=0 in t-1

• Observations in periods t and t-1 need not come from the same unit

> Key identifying assumption:

• Groups 1) and 2) share same time trend of the potential outcomes for D=0 (Y0)

> This idea can be exploited to identify the ATET the following way:

• Add the change from t-1 t of Y of group 2) (non-treated) to the mean of Y

of group 1 in period t-1 (no treatment) to infer what Y would have been in

period t, if group 1) would not have been subject to a change of D from 0 to 1

Page 6: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 6

DiD: Basic ideas (2)Figure 5.2.1: Causal effects in the differences-in-differences model

Source: Card & Krueger (AER, 1994)

Page 7: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 7

Textbook-type discussions of DiD

> Meyer (1995): "Natural and Quasi-Experiments in Economics",

Journal of Business & Economic Statistics, 13, 151-161.

> Imbens, Wooldridge (2009): "Recent Developments in the

Econometrics of Program Evaluation", JEL, 47:1, 5-86.

> Angrist, Pischke (2009), Mostly Harmless Econometrics, New York:

Princeton University Press, chapter 5.

> Angrist and Krueger (1999), Handbook of Labor Economics, 3A.

> Heckman, Lalonde, & Smith (1999); Handbook of Labor

Economics, 3A

Page 8: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 8

DiD: Example-2, Card & Krueger (AER, 1994)

> Goal: Understand employment effects of minimum wages

> Basic ideas of the CK study

• Fast food restaurants have a high share of minimum wages workers

• Comparing employment levels before and after a minimum wage

change may be confounded by the business cycle

• Comparing states with low and high minimum wages may be

confounded by other characteristics of the local economy

• Find a state that raised its minimum wage (New Jersey) and compare

development of employment to a state that did not raise the

minimum wage and is subject to same business cycle (Pennsylvania)

Page 9: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 9

DiD: Example, Card & Krueger (AER, 1994) (2)Table 5.2.1: Average employment per store before and after the New Jersey minimum wage increase

Source: Card & Krueger (AER, 1994), standard errors in brackets

PA NJ Difference, NJ-PA

Variable (i) (ii) (iii)1. FTE employment before, 23.33 20.44 -2.89

(1.35) (0.51) (1.44)2. FTE employment after min

wages rise in NJ, 21.17 21.03 -0.14

(0.94) (0.52) (1.07)3. Change in mean FTE -2.16 0.59 2.76

employment (1.25) (0.54) (1.36)

Page 10: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 10

DiD: The role of covariates (X)

> We need those exogenous variables as control variables that

lead to differential time trends (remove confounding of

common trend assumption)

> Including further control variables (that have the property that

the common trend assumption still holds) conditional on this

required set of covariates has positive and negative aspects:

• On the positive side it could help to detect effect heterogeneity

that may be of substantive interest to the researcher

• On the negative side, it might lead to support problems

Page 11: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 11

History of DiD (1): 1st paper (epidemiology)

> Snow, John (1855), On the Mode of Communication of Cholera, 2nd ed., London: John Churchill.Summarized according to Angrist and Pischke, p. 227: "Snow wanted to establish that Cholera is transmitted by contaminated drinking water (as opposed to "bad air", the prevailing theory at the time). To show this, Snow compared changes in death rates from cholera in districts serviced by two water companies, the Southwark and Vauxhall Company, and the Lambeth Company. In 1849 both companies obtained their water supply from the dirty Thames in central London. In 1852, however, the Lambeth Company moved its water works upriver to an area relatively free of sewage. Death rates in districts supplied by Lambeth fell sharply in comparison to the change in death rates in districts supplied by Southwark and Vauxhall."

> Interesting to read!

Page 12: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 12

History of DiD (2): Very early application (psychology)Exploits change in state laws very relevant today!

• Rose, A.M. (1952): "Needed Research on the Mediation of Labour

Disputes", Personal Psychology, 5, 187-200.

Page 13: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 13

History of DiD (3): 1st application in economics

> Lester, Richard A. (1946): "Shortcomings of marginal analysis for the

wage-employment problems", American Economic Review, 36, 63-82.p. 76 "The same pressure of minimum wages had similar results in the wood furniture industry. Between

October, 1937, and February, 1941, the South-North wage differential was reduced about 7 % for 72

identical wood furniture plants, with the establishment of a statutory minimum of 25 cents in October,

1938, and 30 cents in October, 1939, and the setting of minima from 32 to 40 cents in the principal

industries competing with Southern furniture manufacturers for labor. Not only did employment for the

industry as a whole increase the most in firms with the lowest average hourly earnings in 1937, where the

statutory minima obviously had the greatest direct and immediate effect; but employment in the Southern

plants increased 26 per cent, whereas it decreased slightly in competing Northern firms during the period

(October, 1937 to February, 1941); and, within the South, employment expanded more than twice as fast in

the lower-wage firms whose wages were increased 10% as it did in the higher-wage firms where the

increase in wages was less than 2%. Various factors were, of course, responsible for employment results so

contrary to the presuppositions of conventional marginalism in such industries as men's cotton clothing

and wood furniture. …"

Page 14: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 14

History of DiD (4): Very early econ. app's

> Simon, Julian L. (1966): "The Price Elasticity of Liquor in the U.S. and a

Simple Method of Determination", Econometrica, 34-1, 193-205.

> Ashenfelter, Orley (1978): "Estimating the Effect of Training Programs on

Earnings", The Review of Economics and Statistics, 60/1, 47-57.The source of the famous Ashenfelter's dip.

> Cook, Philip J., and George Tauchen (1982): "The Effect of Liquor Taxes on

Heavy Drinking", The Bell Journal of Economics, 13/2, 379-390.

Page 15: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 15

Important recent applications in economics (1)

> Ashenfelter, Orley, and David Card (1985): "Using the Longitudinal Structure of Earnings to Estimate the Effect of Training Programs", The Review of Economics and Statistics, 67/ 4, 648-660.

> Heckman, James J., and Richard Robb (1986): "Alternative Methods for Solving the Problem of Selection Bias in Evaluating the Impact of Treatments on Outcomes", in Howard Wainer (ed.), Drawing Inferences from Self-Selected Samples, 63-113.

> Heckman, J.J., and V.J. Hotz (1989): "Choosing Among Alternative Nonexperimental Methods for Estimating the Impact of Social Programs: The Case of Manpower Training", Journal of the American Statistical Association, 84, 862-880. (famous pre-programme test)

> Card, David (1990): "The Impact of the Mariel Boatlift on the Miami Labor Market", Industrial and Labor Relations Review, 43, 245-257.

> Card, David and Alan B. Krueger (1994): Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania, The American Economic Review, 84, 772-793

> Heckman, James, Hidehiko Ichimura, Jeffrey Smith, & Petra Todd (1998): "Characterizing Selection Bias Using Experimental Data", Econometrica, 66, 1017-1098. (matched DiD)

Page 16: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 16

Important recent applications in economics (2)

> Meyer, Bruce D., W. Kip Viscusi, David L. Durbin (1995): "Workers' Compensation and Injury Duration: Evidence from a Natural Experiment", The American Economic Review, 85/3, 322-340.

> Waldfogel, Jane (1998): "The Family Gap for Young Women in the United States and Britain: Can Maternity Leave Make a Difference", Journal of Labor Economics, 16, 505-545.

> Blundell, Richard, Alan Duncan, and Costas Meghir (1998): "Estimating Labor Supply Responses Using Tax Reforms", Econometrica, 66/4, 827-861.

> Acemoglu, Daron, and Joshua D. Angrist (2001): "Consequences of Employment Protection? The Case of the Americans with Disabilities Act", Journal of Political Economy, 109, 915-957.

> Besley, Timothy, and Robin Burgess (2004): "Can Labor Regulation Hinder Economic Performance? Evidence From India," The Quarterly Journal of Economics, 91-134.

> Blundell, Richard, Costas Meghir, Monica Costa Dias, and John van Reenen (2004): "Evaluating the Employment Impact of a Mandatory Job Search Program", Journal of the European Economic Association, 2, 569-606.

Page 17: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 17

Notation and definitions

> Basic version: 2 periods, 2 treatments

( )

( )

[ ]( )

1 0

1 0

( )

1 0

|

| , 1 | 1

( ) | 1 .

( ) ( ) | , 1 .

1

t

t t t

t t

x

t

t t t t

ATET E Y Y

E E Y Y X x D D

E x D

ATET x x E Y Y

D

X x D

θ

θ

θ

= − =

= − = = = =

= =

= = − = =

=

1 4 4 4 4 2 4 4 4 43

Use either indexing (Y11)

or conditioningE (Y1|T=1)

Page 18: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 18

Identifying assumptions

> SUTVA:

> Exogeneity of control variables:

> Common support:

> NEPT:

> Common trend or bias stability assumption (next slide)

> Note: These assumptions imply exogeneity of pre-treatment out-

comes which implies exogeneity of the timing of the intervention

1 0(1 ) .t t tY dY d Y= + −1 0 , .X X X x χ= = ∀ ∈

0 ( ) 0, .x xθ χ= ∀ ∈

{ }( 1| , ( , ) ( , ), (1,1) ) 1, , {0,1}; .P TD X x T D t d d t x χ= = ∈ < ∀ ∈ ∀ ∈

Page 19: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 19

Identifying assumptions (2)

> Common trend assumption

> Bias stability assumption

> Both assumptions are identical (proof at home!)

0 01 0

0 01 0

0 01 0

| , 1 | , 1

| , 0 | , 0 ( )

| | , .

E Y X x D E Y X x D

E Y X x D E Y X x D CT

E Y X x E Y X x x χ

= = − = = = = = − = = = = − = ∀ ∈

0 01 1

0 00 0

| , 1 | , 0 [ (1, )]

| , 1 | , 0 [ (0, )], .

E Y X x D E Y X x D Bias x

E Y X x D E Y X x D Bias x x χ

= = − = = = = = = − = = = ∀ ∈

Page 20: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 20

A weakness of DiD: Functional form dependence of identifying assumptions (1)

> Can the assumptions about the trend of the outcome variable

be contradictory depending on their measurement?

> 2 Examples

• Target is ATET

• Case 1: Heteroscedastic mean zero log-normal potential outcome

• Case 2: Homoscedastic; mean group and time dependent; log-nor.

• Two functions:

0ln | , ~ (0,2 2 )tY X x D d N d t= = +

0ln | , ~ ( , 2)tY X x D d N d t= = +

0 0( ) lng Y Y= 0 0*( )g Y Y=

Page 21: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 21

The weakness: Functional form dependence of identifying assumptions (2)

> Does common trend ass. hold for g(.) and g*(.)?

> Remember

> Case 1:

> Common trend holds for the log, but not for the level

0ln | , ~ (0,2 2 )tY X x D d N d t= = +

1 0

0 0

1 0

0 0

(ln | , 1) (ln | , 1) 0

(ln | , 0) (ln | , 0) 0.

d d

d d

E Y X x D E Y X x D

E Y X x D E Y X x D

= = − = = =

= = + = = =

1 4 4 4 2 4 4 4 3 1 4 4 4 2 4 4 4 3

1 4 4 44 2 4 4 4 43 1 4 4 44 2 4 4 4 43

2 1

1 0

0 01 0

0 01 0

1

( | , 1) ( | , 1) ( 1)

( | , 0) ( | , 0) 1.e e

e e

E Y X x D E Y X x D e e

E Y X x D E Y X x D e=

= = − = = = −

= = + = = = −

1 4 44 2 4 4 43 1 4 44 2 4 4 43

1 4 4 4 2 4 4 43 1 4 4 4 2 4 4 43

2 ( ) 0.5 ( ) ( ) 2 ( ) ( )ln ~ ( , ) ~ ( , ( 1)( ))E Z Var Z Var Z E Z Var ZZ N Z N e e eµ σ + +⇒ −

Page 22: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 22

The weakness: Functional form dependence of identifying assumptions (3)

> Case 2:

> Again, common trend holds for the log, but not for the level

0ln | , ~ ( , 2)tY X x D d N d t= = +

1 02 1

1 01 0

(ln | , 1) (ln | , 1) 1

(ln | , 0) (ln | , 0) 1.

d d

d d

E Y X x D E Y X x D

E Y X x D E Y X x D

= = − = = =

= = + = = =

3 2

2 1

0 0 21 0

0 01 0

( | , 1) ( | , 1) ( 1)

( | , 0) ( | , 0) ( 1).e e

e e

E Y X x D E Y X x D e e

E Y X x D E Y X x D e e

= = − = = = −

= = + = = = −

2 ( ) 0.5 ( ) ( ) 2 ( ) ( )ln ~ ( , ) ~ ( , ( 1)( ))E Z Var Z Var Z E Z Var ZZ N Z N e e eµ σ + +⇒ −

Page 23: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 23

The big weakness: Functional form dependence of identifying assumptions (4)

> Another view on the functional form dependence

• The appropriate functional form should follow from the

parameter the researcher is interested in

− e.g. is the interest more in effects in terms of differential rates or in

differences of the levels?

• However, even when a sensible functional form can be derived

from the parameter of interest, the problem remains why the CT

or CB assumptions should be plausible for that particular choice

of g(.), while it is most likely violated for other choices of g(.).

Page 24: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 24

The role of covariates (in the identifying ass.)Problematic confounders

> We need to condition on those exogenous variables that lead

to differential trends

• remove confounding of common trend assumption

> Including further control variables (that have the property that

the common trend assumption still holds) conditional on this

extended set of covariates has positive and negative aspects

• On the positive side it could help to detect effect heterogeneity

that may be of substantive interest to the researcher

• On the negative side, it might lead to support problems

Page 25: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 25

The role of covariatesUnproblematic confounders (1)

> The effect of unobservables on the CT assumption (itera. exp.):

> U is unobserved CT is required to hold at aggregate level

> If CT holds cond. on (unobserved) U it holds at the

aggregate level if the distribution of U does not depend on D

> In those cases there is no need to observe (and correct for) U,

because U is not common-trend-confounding

1 0

1 0

1 1| 1 1 | 1 0

1 1| 0 1 | 0 0

: | 1, | 1,

| 0, | 0, .

U D U D

U D U D

CT E E Y D U u E E Y D U u

E E Y D U u E E Y D U u

= =

= =

= = − = = = = = = − = =

Page 26: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 26

The role of covariatesUnproblematic confounders (2)

> If the distribution of the unobservable U does depend on D, additional assumptions are needed

> Assumption: Effect of U on Y is constant over time (and separable) so that it cancels within each difference (fixed effect model). E.g.:

> does not depend on time (and thus not on the time of the treatment (!))

> In this case, we can allow for selection into treatment based on an unobservable U that also influences potential outcomes

1 1| 1, 1 | 1| , | ( )d

U T D t t U DE E Y D d U u E Y D d E f U= = = = = = = +

( )df U

Page 27: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 27

The regression formulation (1)

> Linear model still most popular. It can be justified by:

> Flexible, but no interactions between time and treatment

group allowed would lead to a violation of the CT (this is

the instrument!!)

1 1 1 1 1 1 1

0 0 0 0 0 0 0

| , ;

| , .

t

t

E Y X x D d t d x tx dx

E Y X x D d t d x tx dx

α δ γ β λ π

α δ γ β λ π

= = = + + + + + = = = + + + + +

Page 28: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 28

The regression formulation (2)

> Common trend holds:

> Differential trends for the different potential outcomes are no problem, if the trends are the same for the two groups defined by treatment status

1 1 1 1 1 1 1

0 0 0 0 0 0 0

| , ;

| , .

t

t

E Y X x D d t d x tx dx

E Y X x D d t d x tx dx

α δ γ β λ π

α δ γ β λ π

= = = + + + + + = = = + + + + +

1 11 0

1 1 1 1 1 1 1 1 1 1 1 1

| , 1 | , 1

;

E Y X x D E Y X x D

x x x x x xα δ γ β λ π α γ β π δ λ

= = − = = = + + + + + − − − − = +

1 11 0

1 1 1 1 1 1 1 1

| , 0 | , 0

.

E Y X x D E Y X x D

x x x xα δ β λ α β δ λ

= = − = = = + + + − − = +

Page 29: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 29

The regression formulation (3)

> Starting from the specifications for the potential outcomes,

the effects can be expressed in terms of their regression

coefficients

1 1 1 1 1 1 1

0 0 0 0 0 0 0

| , ;

| , .

t

t

E Y X x D d t d x tx dx

E Y X x D d t d x tx dx

α δ γ β λ π

α δ γ β λ π

= = = + + + + + = = = + + + + +

[ ] [ ][ ] [ ]{ }

1 1

0 0

1 0

1 0

1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0

| , 1 | , 1

| , 0 | , 0

( ) ( )

[( ) ( )]

( )

x

x

E Y X x D E Y X x D

E Y X x D E Y X x D

x x x x

x

x

x

x xδ λ

δ λ

α δ γ β λ π α γ β π

α δ β λ α β

θ

+

+

= = = − = =

− = = − = =

= + + + + + − + + +

− + + + − +

1 0 1 0( ) ( ) .xxδ λ

δ δ δ λλ λ += − + − =

Page 30: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 30

The regression formulation (4)

> Derivation of the regression model for the observed outcome

variable

> From these derivations, we see that a regression with group and

time dummies plus the various interactions identify the effects

> In such a regression, the coefficients of the interaction terms of

time and treatment group capture the effects

Y

1 0

0 0 0 0

[ | , ] [ (1 ) | , ]...

.

t t tE Y D d X x E dY d Y D d X x

t x dt dd dttx x xα δ β λ α πδ λ

= = = + − = ==

= + + + + + + +

Page 31: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 31

The regression formulation (5)

> It is rather common (but not necessarily good) practice to

assume that the coefficient is zero, implying that the

interaction of group and time with the control variables

disappears

λ

0 0 0 0[ | , ]tE Y D d X x t x tx d dt dtx dxα δ β δλ α πλ= = = + + + + + + +

Page 32: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 32

The regression formulation (6): Disadvantages

> (i) The effect heterogeneity that is allowed for

• If the heterogeneity is related to group membership (D) or the control variables (X) (not captured by the interaction), then OLS is inconsistent and asymptotically biased

> (ii) The way how control variables are included

• Control variables are included linearly common trends conditional on the linear index, which is more restrictive than assuming common trends conditional on X

> (iii) The possibility of arriving at estimates that are not plausible

• If support of Y is limited (e.g., Y is a binary), then the predicted expected outcome may not respect this restriction

0 0 0 0[ | , ]tE Y D d X x t x tx d dt dtx dxα δ β δλ α πλ= = = + + + + + + +

Page 33: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 33

Discrete and limited dependent variables and nonlinear regression: Simple versions do not work

> There are many types of outcome variables for which it is

common practice to use nonlinear models instead of linear

ones, because they provide a better approximation of the

statistical properties of such random variables (probit etc.)

> Specifying the index function as interaction like in the linear

regression is usually not compatible with a common trend

assumption

> To see this incompatibility use the following standard

specification:

Page 34: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 34

Discrete and limited dependent variables (2)

> First specify a standard model for the potential outcomes:

> This model leads to the following regression for E(Y|X):

( )( ) { }

1 1 1

0 0 0

| , ( );

| , ( ); 0,1 , .

t

t

E Y X x D d G t d x tx dx

E Y X x D d G t d x tx dx d x

α δ γ β λ π

α δ γ β λ π χ

= = = + + + + +

= = = + + + + + ∀ ∈ ∀ ∈

( )( ) ( )

1 0

1 0

1 1 0 0

0 0

| , (1 ) | ,

| , (1 ) | ,

( ) (1 )( )

...

; {0,1}, .

t t t

t t

E Y X x D d E dY d Y X x D d

dE Y X x D d d E Y X x D d

G d t x tx x d t x tx

G t d x tx dx dt dtx d x

α δ γ β λ π α δ β λ

α δ γ β λ π δ λ χ

= = = + − = =

= = = + − = =

= + + + + + + − + + + =

= + + + + + + + ∀ ∈ ∀ ∈

Page 35: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 35

Discrete and limited dependent variables (3)

> This general model does not fulfill the CT assumption

> CT only respected if no group specific effects

> Not an attractive assumption! (removes main rational for DiD)

( ) ( )

( ) ( ){ }

1 0

1 0

| , 1 | , 1

( ( )) ( ( ));

| , 0 | , 0

( ( )) ( ); 0,1 .

d d

d d

d d

d d

E Y X x D E Y X x D

G x G x

E Y X x D E Y X x D

G x G x d

α δ β λ α β

α δ β λ

γ π

α β

γ π

= = − = = =

= + + + + + − + + +

= = − = = =

= + + + − + ∀ ∈

% %

% %

% %

% % %

( 0)γ π= =

Page 36: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 36

Discrete and limited dependent variables (4)

> (Ad-hoc) Alternatives

• Use a linear model anyway

• Use a semi-parametric approach

• Use parametric models (e.g. a probit for a binary dependent

variable) to approximate in the four groups

− Problem: We cannot recover the exact functional specifications of the

mean potential outcomes, which makes it harder to understand the

restrictions from functional form assumptions

( )| ,tE Y X x D d= =

· { }0 1 01 1 1 0 0

1

ˆ ˆ ˆ( ) ( ) ( )N

i i i i i ii

ATET d t y x x xϕ ϕ ϕ=

= −Φ − Φ −Φ ∑

Page 37: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 37

Discrete and limited dependent variables (5)

> Problem is however more fundamental

• For Y with bounded support, the common trend assumption never holds!

> Example

• Assume that a binary variable for a particular group of nontreated in the post

treatment period has mean 0.9.

• The gap between the treated and nontreated groups prior to treatment was

0.2 in favour of the treated.

• Adjusting for common trends would lead to an expected nontreatment

outcome of 1.1.

• Thus, the common trend assumption must be violated.

Page 38: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 38

Discrete and limited dependent variables (6)

> Alternative (in the spirit of Blundell and Costa Dias, 2009)

• Assume common trends for some latent model (Y*)

• H(.) is assumed to be strictly monotonous and invertible

• H(.) is a classical link function like in the probit, logit, tobit models

( )0 0*1| , ( | , ) ; , {0,1}, .t tE Y X x D d H E Y X x D d d t x χ = = = = = ∀ ∈ ∀ ∈

( )0* 1 0| , ( | , ) ; , {0,1}, .t tE Y X x D d H E Y X x D d d t x χ− = = = = = ∀ ∈ ∀ ∈

Page 39: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 39

Discrete and limited dependent variables (7)

> Assume that common trends hold at the level of the latent dependent

variable

> Whether this assumption is plausible or not depends

• on particular parameterization of the model which involves the H() function

• whether a substantive meaning can be given to the latent outcome variable,

like a utility or an earnings potential, which can then be used as the basis for

judging the credibility of this assumption

( ) ( )( ) ( )( ) ( )

0* 0*1 0

0* 0*1 0

0* 0*1 0

| , 1 | , 1

| , 0 | , 0 ( *)

| | , .

E Y X x D E Y X x D

E Y X x D E Y X x D CT

E Y X x E Y X x x χ

= = − = = =

= = − = = =

= − = ∀ ∈

Page 40: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 42

Discrete and limited dependent variables (10)

> Example: Probit specification

> The missing counterfactual is then given by:

> Based on consistent estimates of the coefficients, we can

estimate the ATET by

( )| , ( ).dt tE Y X x D d xϕ= = = Φ

· 0 0 1 11 1 1 0 01

1 1

1 ˆ ˆ ˆ( ) , .N N

i i i i ii i

ATET d t y x x x N d tN

ϕ ϕ ϕ= =

= −Φ − + = ∑ ∑

( ) { }0 1 0 1 0 1 11 1 0 0

0 0 11 0 0

| , 1 ( ) ( ) ( )

( ).

E Y X x D x x x

x x x

ϕ ϕ ϕ

ϕ ϕ ϕ

− − − = = = Φ Φ Φ −Φ Φ +Φ Φ =

= Φ − +

Page 41: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 43

Discrete and limited dependent variables (11)

> This expression is different from the one given above that was

also based on the probit model

> Although both examples are based on a probit model, the

former assumes common trends for the expected potential

outcomes, whereas the latter assumes common trends for a

nonlinear transformation of the expected outcomes.

> As DiD is functional form dependent, such transformations

matter and lead to different results!

Page 42: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 44

More than 2 periods

> Assume that all assumptions hold in all periods

> More post-treatment periods

• discover dynamics of the effects

> More pre-treatment periods

• make identification more credible with specific pre-post-period

pairs

• increase efficiency by averaging over pre-treatment periods

• placebo experiments possible to increase credibility of

identification

Page 43: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 45

The relation of DiD to matching (1)

> Matching:

> DiD:

> DiD looks even more similar when assumptions are

strengthened to independence:

( ) ( )1 1 0| | ,1,d dE Y x E Y xDX X D= = = ==

( )0 0 01 0 1| |Y Y D X Y D X−

( ) ( )0 0 0 01 0 1 0 01| , | ,E Y Y X x E Y DY X xD− = = − = ==

Page 44: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 46

The relation of DiD to matching (2)

> DiD and matching assumptions are not nested (!!)

• DiD allows some selection on unobservables

• Matching makes no assumptions about the pre-treatment

periods and g(.)

• DiD identifies average effects, matching the counterfactual

distribution

Page 45: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 47

Panel data (1)

> Panel data is not necessary for DiD estimation, all results presented so far are also valid for repeated cross-sections

> One consequence of basing the estimator on individual differences over time (as is possible with panel data) is that all influences of time constant confounding factors that are additively separable from the remaining part of the conditional expectationsof the potential outcome disappear

• Angrist & Pischke (2009) show that adding individual fixed effects instead of group dummies leads to numerically the same results (in the linear model)

• But see our current working paper for the case of attrition …

Page 46: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 48

Panel data (2)

> Panel data allow matching on pre-treatment outcomes instead

of DiD

• DiD methods allow for time constant confounding unobservables

while requiring common trends

• Matching does not require common trends but assumes that

conditional on pre-treatment outcomes confounding

unobservables are irrelevant

Page 47: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 49

Panel data (3)

> Note that a matching type DiD estimator that controls for lagged

outcome and a 'normal' matching estimator with lagged outcomes as

controls identify the same parameter

• same is of course true for parametric models

> Therefore, the question is between a matching estimator with lagged

outcomes and DiD estimator NOT controlling for lagged outcomes

> This common trend assumption and this matching-type assumption

impose different identifying restrictions on the data which are not nested

and must be rationalized based on substantive knowledge about the

selection process, i.e. only one of them can be true (see IW09 and AP09).

Page 48: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 50

Panel data (4)

> The advantage of the DiD method is that it allows for time constant confounding unobservables while requiring common trends

> Matching does not require common trends but assumes that conditional on pre-treatment outcomes confounding unobservables are irrelevant.

> One may argue that conditioning on the past outcome variables already controls for that part of the unobservables that manifested themselves in the lagged outcome variables (and thus favour the standard matching estimator)

Page 49: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 51

Panel data (5)

> Thus, Imbens and Wooldridge's (2009, p. 70) conclusion about the

usefulness of DiD in panel data compared to matching is negative: "As a

practical matter, the DID approach appears less attractive than the

unconfoundedness-based approach in the context of panel data. It is

difficult to see how making treated and control units comparable on

lagged outcomes will make the causal interpretation of their difference

less credible, as suggested by the DID assumptions."

> But: A recent paper by Chabé-Ferret (2010) gives several examples in

which a DiD strategy leads to a consistent estimator while matching

conditional on past outcomes may be biased. He also shows calibrations

based on real data suggesting that this bias may not be small.

Page 50: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 67

Inference problems (1)

> Recently, there has been a renewed discussion about how to conduct

inference in a DiD setting.

> These papers are concerned with potential correlations over time as well

as within the groups (in particular when using panel data).

> Period/group specific randomness (group x time specific individual

random effects) with a finite number of periods / groups,

• No consistent estimator can exist, because within group averaging cannot

eliminate such variability

• The only way to reduce this type of uncertainty is to have more periods and

more (non-treatment) groups

Page 51: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 69

Conclusions

> Heavily used for a long time inside and outside economics.

> Identification depends on plausibility of common trend assumption & g(.)

> Matching and reweighting methods can be fruitfully employed to obtain more robust results with less dependence on arbitrary homogeneity & functional form assumptions

> Depending on the structure of uncertainty, inference may be tricky to impossible in the two-periods-two-groups case.

> Having many periods and many groups (of non-treated) seems to be important for

• (i) more precise estimation

• (ii) testing for the common trend assumption

• (iii) more reliable inference

Page 52: Difference-in-Difference Estimation - UNIGE DiD.pdfMichael Lechner Difference-in-Difference Estimation Paper related to this lecture: M. Lechner (2010): Difference-in-Difference Estimation,

Michael Lechner, 70

Thank you

for your attention!Michael LechnerSEW - St. Gallen