72
PUBL0050: Advanced Quantitative Methods Lecture 5: Panel Data and Difference-in-Differences Jack Blumenau 1 / 52

PUBL0050:AdvancedQuantitativeMethods Lecture5 ...๐‘–๐‘”๐‘ก= + 1 ๐‘”+ 2๐‘‡๐‘–+ 3( ๐‘”โ‹… ๐‘‡๐‘–) + ๐œ€๐‘–๐‘”๐‘ก โ€ข Aggregateddata: ๐‘”๐‘ก = + 1 ๐‘” + 2 ๐‘‡ ๐‘– +

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

PUBL0050: Advanced Quantitative Methods

Lecture 5: Panel Data and Difference-in-Differences

Jack Blumenau

1 / 52

Thank you for the feedback!

1. Question: Can we have drop-in hours rather than office hours?โ€ข Response: Speak to Constanza!

2. Question: Can we have more advice on merging/cleaning data in R?โ€ข Response: I will add some material to the website on how toapproach this.

2 / 52

Thank you for the feedback!

3. Question: Can you provide a crib sheet of R functions?โ€ข Response: Anything comprehensive will be difficult, but I will try toadd an overview of the most important functions that we use to thewebsite.

4. Question: Can we have more applied examples of methods?โ€ข Response: I will make sure that there are more applications perlecture, even if these are just links to papers.

2 / 52

Thank you for the feedback!

5. Question: Can all lecture notes be put online now?โ€ข Response: From reading week, I will upload all slides for the futureweeks, though these are subject to change.

6. Question: Can you provide more help with finding data?โ€ข Response: This is a little difficult given the variety of data that youwill all use. I will add some resources of some commonly useddatabases to the website but please do not take this as acomprehensive list! Data sourcing/collection is part of theevaluation.

2 / 52

Thank you for the feedback!

7. Question: What is going to happen with the strike?โ€ข Response: I wish I knew.โ€ข Real response 1: If the strike goes ahead, I will post the videos of lastyearโ€™s lectures online so that you can watch those.

โ€ข Real response 2: We are still trying to work on contingency plans forthe two classes that would be affected by the strike.

2 / 52

Course outline

1. Potential Outcomes and Causal Inference2. Randomized Experiments3. Selection on Observables (I)4. Selection on Observables (II)5. Panel Data and Difference-in-Differences6. Synthetic Control7. Instrumental Variables (I)8. Instrumental Variables (II)9. Regression Discontinuity Designs10. Overview and Review

3 / 52

Was it โ€œthe Sun wot won itโ€?

1992 1997

4 / 52

Was it โ€œthe Sun wot won itโ€?

1992 1997

5 / 52

Was it โ€œthe Sun wot won itโ€?

Does the media influence vote choice?

โ€ข Randomized experiment?

โ€ข Hard to persuade newspapers to randomly endorse politicalcandidates

โ€ข Hard to randomly allocate citizens to read certain newspapers

โ€ข Selection on observables?

โ€ข The types of individual who read certain newspapers (i.e. The Sun)are likely different in many ways from those who read othernewspapers

โ€ข Difference-in-differences

โ€ข Collect data on vote choice before & after change in endorsementโ€ข Did people who read The Sun change their vote choice more thanpeople who did not read The Sun?

6 / 52

Lecture Outline

Difference-in-differences

Regression DD

Threats to Validity

Multiple periods

Conclusion

7 / 52

Difference-in-differences

Difference-in-differences setup

DefinitionTwo groups:

โ€ข ๐ท๐‘– = 1 Treated units

โ€ข ๐ท๐‘– = 0 Control units

Two periods:

โ€ข ๐‘‡๐‘– = 0 Pre-Treatment period

โ€ข ๐‘‡๐‘– = 1 Post-Treatment period

Potential outcome ๐‘Œ๐‘‘๐‘–(๐‘ก)

โ€ข ๐‘Œ1๐‘–(๐‘ก) outcome of unit ๐‘– in period ๐‘ก when treated (at ๐‘‡๐‘– = 1)โ€ข ๐‘Œ0๐‘–(๐‘ก) outcome of unit ๐‘– in period ๐‘ก when control (at ๐‘‡๐‘– = 1)

8 / 52

Difference-in-differences setup

DefinitionCausal effect for unit ๐‘– at time ๐‘ก is

โ€ข ๐œ๐‘–(๐‘ก) = ๐‘Œ1๐‘–(๐‘ก) โˆ’ ๐‘Œ0๐‘–(๐‘ก)

For a given unit, in a given time period, the observed outcome ๐‘Œ๐‘–(๐‘ก) is:

โ€ข ๐‘Œ๐‘–(๐‘ก) = ๐‘Œ1๐‘–(๐‘ก) โ‹… ๐ท๐‘–(๐‘ก) + ๐‘Œ0๐‘–(๐‘ก) โ‹… (1 โˆ’ ๐ท๐‘–(๐‘ก))โ€ข If treatment occurs only after ๐‘ก = 0 we have:

๐‘Œ๐‘–(1) = ๐‘Œ0๐‘–(1) โ‹… (1 โˆ’ ๐ท๐‘–) + ๐‘Œ1๐‘–(1) โ‹… ๐ท๐‘–

โ†’ Fundamental problem of causal inference.

Estimand (ATT)๐œATT = ๐ธ[๐‘Œ1๐‘–(1) โˆ’ ๐‘Œ0๐‘–(1)|๐ท๐‘– = 1]

ProblemMissing potential outcome: ๐ธ[๐‘Œ0๐‘–(1)|๐ท = 1], ie. what is the average post-periodoutcome for the treated in the absence of the treatment?

9 / 52

Illustration

Treated vs. control in post-treatment period

t = 0 t=1

E[Yi(0)|Di=0]

E[Yi(1)|Di=0]

E[Yi(0)|Di=1]

E[Yi(1)|Di=1]

โ—

โ—

โ—

โ—

Problem: Missing potential outcome: ๐ธ[๐‘Œ0๐‘–(1)|๐ท๐‘– = 1]

10 / 52

Illustration

Strategy: Treated vs. control in post-treatment period

t = 0 t=1

E[Yi(0)|Di=0]

E[Yi(1)|Di=0]

E[Yi(0)|Di=1]

E[Yi(1)|Di=1]

E[Yi(1)|Di=0]

E[Yi(1)|Di=1]

DIGM

โ—

โ—

โ—

โ—

Assumption: No selection bias.

10 / 52

Illustration

Strategy: Before vs. after for treatment units

t = 0 t=1

E[Yi(0)|Di=0]

E[Yi(1)|Di=0]

E[Yi(0)|Di=1]

E[Yi(1)|Di=1]

E[Yi(0)|Di=1]

E[Yi(1)|Di=1]

Change over time in treatment group

โ—

โ—

โ—

โ—

Assumption: No effect of time independent of treatment.

10 / 52

Illustration

Treated vs. control in post-treatment period

t = 0 t=1

E[Yi(0)|Di=0]

E[Yi(1)|Di=0]

E[Yi(0)|Di=1]

E[Yi(1)|Di=1]

E[Yi(0)|Di=0]

E[Yi(1)|Di=0]

Change over time in control group

โ—

โ—

โ—

โ—

Treated vs. control in post-treatment period

10 / 52

Illustration

Strategy: Difference-in-differences (1)

t = 0 t=1

E[Yi(0)|Di=0]

E[Yi(1)|Di=0]

E[Yi(0)|Di=1]

E[Yi(1)|Di=1]

E[Yi(0)|Di=1]

E[Y0i(1)|Di=1]

โ—

โ—

โ—

โ—

โ—

โ—

Assumed change over time in treatment group

Assumption: Trend over time is the same for treatment and control.

10 / 52

Illustration

Strategy: Difference-in-differences (1)

t = 0 t=1

E[Yi(0)|Di=0]

E[Yi(1)|Di=0]

E[Yi(0)|Di=1]

E[Yi(1)|Di=1]

E[Yi(0)|Di=1]

E[Yi(1)|Di=1]

E[Y0i(1)|Di=1]

โ—

โ—

โ—

โ—

โ—

โ—

Assumed change over time in treatment group

Difference in differences

Assumption: Trend over time is the same for treatment and control.

10 / 52

Illustration

Strategy: Difference-in-differences (2)

t = 0 t=1

E[Yi(0)|Di=0]

E[Yi(1)|Di=0]

E[Yi(0)|Di=1]

E[Yi(1)|Di=1]

E[Yi(0)|Di=0]

E[Yi(0)|Di=1]

Selection bias in preโˆ’treatment period

โ—

โ—

โ—

โ—

Treated vs. control in post-treatment period

10 / 52

Illustration

Strategy: Difference-in-differences (2)

t = 0 t=1

E[Yi(0)|Di=0]

E[Yi(1)|Di=0]

E[Yi(0)|Di=1]

E[Yi(1)|Di=1]

E[Yi(1)|Di=0]

E[Y0i(1)|Di=1]

Assumed selection bias in postโˆ’treatment period

โ—

โ—

โ—

โ—

โ—

โ—

Assumption: Selection bias is stable over time.

10 / 52

Illustration

Strategy: Difference-in-differences (2)

t = 0 t=1

E[Yi(0)|Di=0]

E[Yi(1)|Di=0]

E[Yi(0)|Di=1]

E[Yi(1)|Di=1]

E[Yi(1)|Di=0]

E[Yi(1)|Di=1]

E[Y0i(1)|Di=1]

Assumed selection bias in postโˆ’treatment period

โ—

โ—

โ—

โ—

โ—

โ—

Difference in differences

Assumption: Selection bias is stable over time.

10 / 52

Identification with Difference-in-Differences

Two ways of stating the identifying assumption:

โ€ข Parellel trends

โ€ข If treated units did not receive the treatment, they would havefollowed the same trend as the control units

โ€ข No time-varying confounders

โ€ข Omitted variables related both to treatment and outcome must befixed over time

11 / 52

Identification with Difference-in-Differences

Identification Assumption๐ธ[๐‘Œ0๐‘–(1) โˆ’ ๐‘Œ0๐‘–(0)|๐ท๐‘– = 1] = ๐ธ[๐‘Œ0๐‘–(1) โˆ’ ๐‘Œ0๐‘–(0)|๐ท๐‘– = 0] (parallel trends)

Identification ResultGiven parallel trends, ๐œATT is identified as:

๐ธ[๐‘Œ1๐‘–(1) โˆ’ ๐‘Œ0๐‘–(1)|๐ท = 1] = {๐ธ[๐‘Œ๐‘–(1)|๐ท๐‘– = 1] โˆ’ ๐ธ[๐‘Œ๐‘–(1)|๐ท๐‘– = 0]}

โˆ’ {๐ธ[๐‘Œ๐‘–(0)|๐ท๐‘– = 1] โˆ’ ๐ธ[๐‘Œ๐‘–(0)|๐ท = 0]}

In other words:

๐œATT = {Difference in means in post-treatment period}

โˆ’ {Difference in means in pre-treatment period}

12 / 52

Difference-in-Differences: Estimators

Estimand (ATET)๐ธ[๐‘Œ1(1) โˆ’ ๐‘Œ0(1)|๐ท = 1]

Estimator (Sample Means: Panel)

{ 1๐‘1

โˆ‘๐ท๐‘–=1

๐‘Œ๐‘–(1) โˆ’ 1๐‘0

โˆ‘๐ท๐‘–=0

๐‘Œ๐‘–(1)}โˆ’{ 1๐‘1

โˆ‘๐ท๐‘–=1

๐‘Œ๐‘–(0) โˆ’ 1๐‘0

โˆ‘๐ท๐‘–=0

๐‘Œ๐‘–(0)}

= { 1๐‘1

โˆ‘๐ท๐‘–=1

{๐‘Œ๐‘–(1) โˆ’ ๐‘Œ๐‘–(0)} โˆ’ 1๐‘0

โˆ‘๐ท๐‘–=0

{๐‘Œ๐‘–(1) โˆ’ ๐‘Œ๐‘–(0)}} ,

where ๐‘1 is the number of treated individual and ๐‘0 is the number ofnon-treated individuals.

13 / 52

Example

Persuasive Power of the News MediaDid the change in support for the Labour Party by the Sun newspaperincrease the number of people voting Labour? Ladd and Lenz (2009)use the British Election Panel Survey, which includes information onwhether individuals voted for Labour in 1992, whether they votedLabour in 1997, and which newspapers they read.

โ€ข Outcome (๐‘Œ , voted_lab): 1 if individual ๐‘– voted Labour at time ๐‘ก

โ€ข Treatment (๐ท, reads_sun): 1 if individual ๐‘– read the Sun (in 1992)

โ€ข Time (๐‘‡ , year): Election year (1992 or 1997)

โ€ข ๐‘ = 1593, ๐‘1 = 211, ๐‘0 = 1382

โ€ข Note that this is panel data (repeated observations on the sameindividuals over time)

14 / 52

Example

Typically useful to store data in โ€˜longโ€™ format (here, 2 rows per unit):str(sun)

## 'data.frame': 3186 obs. of 4 variables:## $ reads_sun: int 0 0 0 0 0 0 0 0 0 1 ...## $ voted_lab: int 1 1 0 1 1 1 1 1 1 1 ...## $ year : num 1992 1992 1992 1992 1992 ...## $ id : int 1 2 3 4 5 6 7 8 9 10 ...

table(sun$reads_sun, sun$year)

#### 1992 1997## 0 1382 1382## 1 211 211

Question: Which observations are โ€˜treatedโ€™?

Answer: Sun readers in 1997.15 / 52

Example

# Untreated, pre-treatmenty_d0_t0 <- mean(sun$voted_lab[sun$reads_sun == 0 & sun$year == 1992])y_d0_t0

## [1] 0.3227207# Treated, pre-treatmenty_d1_t0 <- mean(sun$voted_lab[sun$reads_sun == 1 & sun$year == 1992])y_d1_t0

## [1] 0.3886256# Untreated, post-treatmenty_d0_t1 <- mean(sun$voted_lab[sun$reads_sun == 0 & sun$year == 1997])y_d0_t1

## [1] 0.4305355# Treated, post-treatmenty_d1_t1 <- mean(sun$voted_lab[sun$reads_sun == 1 & sun$year == 1997])y_d1_t1

## [1] 0.582938416 / 52

Example

0.30

0.35

0.40

0.45

0.50

0.55

0.60

% v

otin

g La

bour

1992 1997

Reads the SunDoesn't read the SunAssumed counterfactual trend

ATT

17 / 52

Example

# Parallel trend calculation(y_d1_t1 - y_d1_t0) - (y_d0_t1 - y_d0_t0)

## [1] 0.08649803

# Stable selection bias calculation(y_d1_t1 - y_d0_t1) - (y_d1_t0 - y_d0_t0)

## [1] 0.08649803

Implication: The change in endorsement caused Labour support toincrease by 8.6 percentage points amongst readers of The Sun.

18 / 52

Regression DD

Difference-in-Differences: Regression Estimator

Estimator (Regression 1)Alternatively, the same estimator can be obtained using regression techniques.

๐‘Œ๐‘– = ๐›ผ + ๐›ฝ1 โ‹… ๐ท๐‘– + ๐›ฝ2 โ‹… ๐‘‡๐‘– + ๐›ฟ โ‹… (๐ท๐‘– โ‹… ๐‘‡๐‘–) + ๐œ€,where ๐ธ[๐œ€|๐ท๐‘–, ๐‘‡๐‘–] = 0. Then, it is easy to show that

๐ธ[๐‘Œ๐‘–|๐ท๐‘–, ๐‘‡๐‘–] ๐‘‡๐‘– = 0 ๐‘‡๐‘– = 1 After - Before๐ท๐‘– = 0 ๐›ผ ๐›ผ + ๐›ฝ2 ๐›ฝ2๐ท๐‘– = 1 ๐›ผ + ๐›ฝ1 ๐›ผ + ๐›ฝ1 + ๐›ฝ2 + ๐›ฟ ๐›ฝ2 + ๐›ฟTreated - Control ๐›ฝ1 ๐›ฝ1 + ๐›ฟ ๐›ฟ

Thus, the difference-in-differences estimate is given by:

๐œATT = (๐›ฝ2 + ๐›ฟ) โˆ’ ๐›ฝ2 = ๐›ฟ

Equivalently:๐œATT = (๐›ฝ1 + ๐›ฟ) โˆ’ ๐›ฝ1 = ๐›ฟ

19 / 52

Example: Regression

dd_mod <- lm(voted_lab ~ reads_sun * as.factor(year),data = sun)

voted_lab

reads_sun 0.066โˆ—

(0.036)as.factor(year)1997 0.108โˆ—โˆ—โˆ—

(0.018)reads_sun:as.factor(year)1997 0.086โˆ—

(0.050)Constant 0.323โˆ—โˆ—โˆ—

(0.013)

Observations 3,186R2 0.022

โ€ข ๐›ผ = Labour support amongstnon-Sun readers, 1992

โ€ข ๐›ฝ1 = difference between Sunand non-Sun readers, 1992

โ€ข ๐›ฝ1 + ๐›ฟ = difference betweenSun and non-Sun readers, 1997

โ€ข ๐›ฟ = ATT

20 / 52

Example: Regression

dd_mod <- lm(voted_lab ~ reads_sun * as.factor(year),data = sun)

voted_lab

reads_sun 0.066โˆ—

(0.036)as.factor(year)1997 0.108โˆ—โˆ—โˆ—

(0.018)reads_sun:as.factor(year)1997 0.086โˆ—

(0.050)Constant 0.323โˆ—โˆ—โˆ—

(0.013)

Observations 3,186R2 0.022

โ€ข ๐›ผ = Labour support amongstnon-Sun readers, 1992

โ€ข ๐›ฝ2 = 1992 to 1997 difference,amongst non-Sun readers

โ€ข ๐›ฝ2 + ๐›ฟ = 1992 to 1997difference, amongst Sunreaders

โ€ข ๐›ฟ = ATT

20 / 52

Cross-sectional regression estimator

A nice feature diff-in-diff is it does not require panel data, i.e. repeatedobservations of the same units. Can also use repeated cross-sections:

โ€ข ๐‘Œ๐‘–๐‘”๐‘ก where unit ๐‘– is only measured at one ๐‘ก

โ€ข Units fall into treatment based on groups ๐‘”

โ€ข Particularly useful as many โ€˜treatmentsโ€™ vary at some aggregatelevel:

โ€ข e.g. Law changes at the state/region level

21 / 52

Cross-sectional regression estimator

Two options:

โ€ข Individual-level data:

๐‘Œ๐‘–๐‘”๐‘ก = ๐›ผ + ๐›ฝ1๐ท๐‘” + ๐›ฝ2๐‘‡๐‘– + ๐›ฝ3(๐ท๐‘” โ‹… ๐‘‡๐‘–) + ๐œ€๐‘–๐‘”๐‘ก

โ€ข Aggregated data:

๐‘Œ๐‘”๐‘ก = ๐›ผ + ๐›ฝ1๐ท๐‘” + ๐›ฝ2๐‘‡๐‘– + ๐›ฝ3(๐ท๐‘” โ‹… ๐‘‡๐‘–) + ๐œ€๐‘”๐‘ก

Both approaches will give the same result, as the treatment only varies atthe group level (so long as the aggregated version is weighted by cellsize).

22 / 52

Regression estimator advantages

1. Easy to calculate standard errors (though be careful aboutclustering)

2. We can control for other variables

โ€ข Individual-level data, group-level treatment: controlling forindividual covariates may increase precision

โ€ข Time-varying covariates at the group-level may strengthen theparallel trends assumption, but beware of post-treatment bias

3. Simple to extend to multiple groups/periods (more on this later)

4. Can use multi-valued (not just binary) treatments

23 / 52

Difference-in-Differences: First Differences Estimator

Estimator (Regression 2)With panel data we can use regression with first differences:

ฮ”๐‘Œ๐‘– = ๐›ผ + ๐›ฟ โ‹… ๐ท๐‘– + ๐‘‹โ€ฒ๐‘–๐›ฝ + ๐‘ข,

where ฮ”๐‘Œ๐‘– = ๐‘Œ๐‘–(1) โˆ’ ๐‘Œ๐‘–(0), and ๐‘ข = ฮ”๐œ€.

โ€ข With two periods gives identical result as other regressions

24 / 52

Example: First-difference regression

1 row per unit:head(sun_diff)

## reads_sun voted_lab_92 voted_lab_97## 1 0 1 1## 2 0 1 0## 3 0 0 0## 4 0 1 1## 5 0 1 1## 6 0 1 1

sun_diff$diff <- sun_diff$voted_lab_97 - sun_diff$voted_lab_92head(sun_diff)

## reads_sun voted_lab_92 voted_lab_97 diff## 1 0 1 1 0## 2 0 1 0 -1## 3 0 0 0 0## 4 0 1 1 0## 5 0 1 1 0## 6 0 1 1 0

25 / 52

Example: First-difference regression

first_diff_mod <- lm(diff ~ reads_sun, data = sun_diff)

Table 1: First difference model

diff

reads_sun 0.086โˆ—โˆ—โˆ—

(0.030)Constant 0.108โˆ—โˆ—โˆ—

(0.011)

Observations 1,593R2 0.005

26 / 52

Break

27 / 52

Threats to Validity

Non-parallel trends

Critical identification assumption: treatment units have similar trends tocontrol units in the absence of treatment.

Question: Why is this assumption untestable?

Answer: because of the FPOCI โ†’ we cannot observe potential outcomeunder the control condition for treated units in the post-treatmentperiod.

28 / 52

Potential violations of parallel trends

โ€ข โ€œAshenfelterโ€™s Dipโ€

โ€ข Participants in worker training programs may experience decreasedearnings before they enter the program (why are they participating?)

โ€ข If wages revert to the mean, comparing wages of participants andnon-participants leads to an upwardly biased estimate

โ€ข Targeting

โ€ข Policymakers may target units who are most improving

29 / 52

Assessing (non-)parallel trends

What can we do?

โ€ข One treatment/control group

โ€ข Plot results and look at trends in periods before the treatmentโ€ข Is the parallel trends assumption plausible?

โ€ข Multiple treatment/control comparisons

โ€ข Estimate treatment effects at different time points (i.e. placebo tests)โ†’ All estimated treatment effects before the treatment should be 0.

โ€ข Include unit-specific time trends โ†’ โ€˜relaxโ€™ parallel trendsassumption

30 / 52

โ€œGoodโ€ parallel trends example%

vot

ing

Labo

ur

1983 1987 1992 1997

Reads the SunDoesn't read the Sun

31 / 52

โ€œBadโ€ parallel trends example%

vot

ing

Labo

ur

1983 1987 1992 1997

Reads the SunDoesn't read the Sun

32 / 52

Parallel trends in Ladd and Lenz

33 / 52

Parallel and non-parallel trends

Parallel trends Treatment effect

Preโˆ’treatment period Postโˆ’treatment period

Nonโˆ’parallel trends No treatment effect

Preโˆ’treatment period Postโˆ’treatment period

Nonโˆ’parallel trends Treatment effect

Preโˆ’treatment period Postโˆ’treatment period

34 / 52

Multiple periods

Difference-in-Differences: Fixed-effect estimator

Estimator (Fixed-effect regression)We can generalise to multiple groups/time periods using unit and periodfixed-effects (โ€˜two-wayโ€™ fixed-effect model):

๐‘Œ๐‘–๐‘ก = ๐›พ๐‘– + ๐›ผ๐‘ก + ๐›ฟ๐ท๐‘–๐‘ก + ๐œ€๐‘–๐‘ก

โ€ข ๐›พ๐‘– is a fixed-effect for groups (dummy for each group)

โ€ข ๐›ผ๐‘ก if a fixed-effect for time periods (dummy for each time period)

โ€ข ๐›ฟ is the diff-in-diff estimate based on ๐ท๐‘–๐‘ก, which is 1 for treatedunit-period observations, and 0 otherwise

Very flexible:

โ€ข can replace ๐ท๐‘–๐‘ก with almost any type of treatment (not only binary)โ€ข can extend easily to multiple periods (i.e. more than 2)โ€ข can have different units treated at different times

35 / 52

Example: Fixed-effect regression

sun$treat <- sun$reads_sun == 1 & sun$year == 1997

fe_model <- lm(voted_lab ~ treat + as.factor(id) + as.factor(year),data = sun)

Table 2: Fixed-effect model

voted_lab

treat 0.086โˆ—โˆ—โˆ—

(0.030)

Observations 3,186R2 0.826

36 / 52

Why does fixed-effect regression estimate the diff-in-diff?

โ€ข Unit/group FEs mean that we are only using within group variationin Y to calculate the effect of D

โ€ข Removes any omitted variable bias that is constant over time

โ€ข Time FEs means that we remove the effect of any changes to theoutcome variable that affect all units at the same time

โ€ข ๐›ฟ โ†’ ๐œATTNote that unit dummies lead to smaller standard errors on our treatmenteffect. Why not always use unit dummies?

โ€ข Fine in panel data, as we have same units at several points in timeโ€ข Not possible with repeated cross-section when we do not have thesame units in each time period

37 / 52

Standard errors for the regression difference in differences

โ€ข Many papers using a DD strategy use data from many periods

โ€ข Treatments typically vary at the group level, while outcomesnormally measured at the individual level

โ€ข E.g. Minimum wage increases (state-level) and employment data(firm-level) in Cark and Krueger

โ€ข Will not bias treatment effect estimates, but will cause problems forvariance estimation when errors are serially correlated

โ€ข Implication: traditional standard errors will tend to be too small.

38 / 52

Standard errors for the regression difference in differences

Solution (Bertrand et al (2004))Use cluster-robust standard errors where clusters are defined at thelevel of the treatment. If the number of groups is:

โ€ข โ€ฆlarge (โช† 30), use vcovCL in sandwichโ€ข โ€ฆsmall (โช… 30), use block-bootstrap

39 / 52

Example: Multiperiod diff-in-diff

Female role-models in politicsWhen women are promoted to high-office, do they motivate otherwomen to participate more in policymaking? Blumenau (2019) examinesthe participation of female MPs in parliamentary debates between 1997and 2018 as a proxy for policymaking involvement. If high-profilewomen act as role-models, when they are promoted they shouldencourage other women to participate more.

โ€ข Outcome (๐‘Œ , prop_words): Proportion of words spoken by women in debate ๐‘‘ atin ministry ๐‘š at time ๐‘ก

โ€ข Treatment (๐ท, female_minister): 1 if cabinet minister of ministry ๐‘š at time ๐‘ก isfemale

โ€ข Unit of analysis: Debates (clustered within about 30 ministries)

โ€ข ๐‘ โ‰ˆ 15,000 debates

โ€ข Time measured at the month level (i.e. February 2012, etc)

40 / 52

Female role-models โ€“ treatment variable

41 / 52

Female role-models โ€“ raw data

42 / 52

Female role-models โ€“ model specification

๐‘ƒ๐‘Ÿ๐‘œ๐‘๐‘Š๐‘œ๐‘Ÿ๐‘‘๐‘ ๐‘Š๐‘œ๐‘š๐‘’๐‘›๐‘‘(๐‘š๐‘ก) = ๐›พ๐‘š +๐›ผ๐‘ก +๐›ฟ โˆ—๐น๐‘’๐‘š๐‘Ž๐‘™๐‘’๐‘€๐‘–๐‘›๐‘–๐‘ ๐‘ก๐‘’๐‘Ÿ๐‘š๐‘ก +๐œ€๐‘‘(๐‘š๐‘ก)

โ€ข ๐›พ๐‘š โ†’ ministry fixed effect

โ€ข Controls for unobserved ministry characteristics that are timeinvariant

โ€ข ๐›ผ๐‘ก โ†’ time (month-year) fixed effect

โ€ข Controls for changes in participation over time that are common toall ministries

โ€ข ๐›ฟ โ†’ average effect of switching from a male to a female minister based onwithin-ministry variation, amongst ministries that see a change in ministergender (i.e. ๐œATT)

43 / 52

R example

fe_mod <- lm(prop.women.words ~ minister.gender +as.factor(ministry) +

as.factor(yearmon),data= speeches)

screenreg(list(fe_mod),omit.coef = c(โ€ministry|yearmonโ€),digits = 3)

44 / 52

R example

#### ===============================## Model 1## -------------------------------## (Intercept) 0.039## (0.044)## minister.genderF 0.041 ***## (0.005)## -------------------------------## R^2 0.106## Adj. R^2 0.091## Num. obs. 14320## RMSE 0.218## ===============================## *** p < 0.001, ** p < 0.01, * p < 0.05

44 / 52

R example (clustered errors)

fe_mod_clustered <- coeftest(fe_mod,vcov = vcovCL, cluster = ~ ministry)

screenreg(list(fe_mod, fe_mod_clustered),omit.coef = c(โ€ministry|yearmonโ€),digits = 3)

45 / 52

R example (clustered errors)

#### ===========================================## Model 1 Model 2## -------------------------------------------## (Intercept) 0.039 0.039## (0.044) (0.048)## minister.genderF 0.041 *** 0.041 ***## (0.005) (0.011)## -------------------------------------------## R^2 0.106## Adj. R^2 0.091## Num. obs. 14320## RMSE 0.218## ===========================================## *** p < 0.001, ** p < 0.01, * p < 0.05

45 / 52

Female role-models โ€“ full results

Interpretation: The appointment of a female minister increasesparticipation of other women byโ‰ˆ 20% relative to male minister baseline

46 / 52

Female role-models โ€“ parallel trends

Hard to provide a visual inspection of the parallel trends assumption astreatment switches on and off over time in different ministries.

Nevertheless, we are still assuming that treated/control ministries wouldhave evolved identically over time in absense of treatment.

One way forward, test for ๐‘š โ€œlagsโ€ and ๐‘ž โ€œleadsโ€ of the treatment:

๐‘ฆ๐‘‘(๐‘š๐‘ก) = ๐›ผ0 +๐‘š

โˆ‘๐‘—=โˆ’๐‘ž

๐›ฟ๐‘— โˆ—๐น๐‘’๐‘š๐‘Ž๐‘™๐‘’๐‘€๐‘–๐‘›๐‘–๐‘ ๐‘ก๐‘’๐‘Ÿ๐‘š(๐‘ก+๐‘—) +๐›พ๐‘š0 +๐›ผ๐‘ก +๐œ€๐‘‘(๐‘š๐‘ก)

โ€ข ๐›ฟโˆ’1, ๐›ฟโˆ’2, ..., ๐›ฟโˆ’๐‘ž are the โ€œleadโ€ effects which should all be 0

โ€ข ๐›ฟ1, ๐›ฟ2, ..., ๐›ฟ๐‘š are the โ€œlaggedโ€ effects which may take differentvalues

Intuition: What is the DD estimate in the Sun example using only datafrom 1992 and 1996? 47 / 52

Female role-models โ€“ parallel trends

48 / 52

Conclusion

Data requirements for Diff-in-diff

Data structure:

โ€ข Panel data or repeated cross-sectionโ€ข Single or multiple treatmentsโ€ข Continuous or binary treatmentsโ€ข Works both at individual/aggregate level

Does this require more data?

โ€ข Adding a time dimension can increase the amount of data you needโ€ข No need to control for extensive covariates (so long as they are fixedwithin units over time) which might mean decreased data collection

49 / 52

Examples of diff-in-diff designs

1. Card & Krueger, 1994โ€ข RQ: Do increases in the minimum wage reduce employment?โ€ข Outcome: Employment growth in fast-food restaurantsโ€ข Treatment: Increased minimum wage in New Jersey; no change inPennsylvania

โ€ข Time: Before/after minimum wage changed

2. Dinas et al., 2019โ€ข RQ: What is the effect of refugee arrivals on support for the far right?โ€ข Outcome: Municipal support for far right partyโ€ข Treatment: Refugee arrivals in Greek islandsโ€ข Time: Elections before/after refugee crisis

50 / 52

Examples of diff-in-diff designs

3. Bechtel & Heinmueller, 2011โ€ข RQ: What is the effect of good policy on government support?โ€ข Outcome: Support for the German SPD in parliamentaryconstituencies

โ€ข Treatment: Flooded German regions close to the River Elbeโ€ข Time: Elections before/after 2002, when the Elbe flooded

4. Hainmueller & Hangartner, 2019โ€ข RQ: What is the effect of direct democracy on immigrantassimilation?

โ€ข Outcome: Naturalization rate of immigrants in Swiss municipalitiesโ€ข Treatment: Whether municipality decides on naturalisation requestsvia expert or citizen councils

โ€ข Time: Decisions before/after legal changes to decision making inmunicipalities

50 / 52

Conclusion

The DD design allows for a comparison over time in the treatment group,controlling for concurrent time trends using a control group.

DD requires data on multiple units in multiple periods, but can beapplied to panel data or repeated cross-sectional data.

DD is very widely used, as it is a powerful conditioning strategy thatdoesnโ€™t require endless lists of covariates to strengthen the identifyingassumption.

The identification assumption โ€“ that treatment and control units wouldfollow parallel trends in the absense of treatment โ€“ should beinvestigated with every application!

51 / 52

After reading week

Synthetic control method

52 / 52