33
1 Quantitative Methods II Dummy Variables & Interaction Eﬀects Edmund Malesky, Ph.D., UCSD

# 1 Quantitative Methods II Dummy Variables & Interaction Effects Edmund Malesky, Ph.D., UCSD

Embed Size (px)

Citation preview

1

Quantitative Methods II

Dummy Variables & Interaction Effects

Edmund Malesky, Ph.D., UCSD

2

The Homogeneity Assumption

OLS assumes all cases in your data are comparable

x’s are a sample drawn from a single population

But we may analyze distinct groups of cases together in one analysis

Mean value of y may differ by group

3

Qualitative Variables

These group effects remain as part of the error term

If groups differ in their distribution of x’s, then we get a correlation between the X variables and the error term

Violates assumption: cov(Xi, ui)=E(u)=0 Omitted Variable Bias!

4

Testing for Differences Across Groups (p. 249-252) The Chow Test i.e. Testing for difference between males and females on

SSR1=Males only; SSR2=Females only SSRur=SSR1+SSR2 SSRP=SSRr=Pooling across both groups

1 2

1 2

[ ( )] [ 2( 1)]*

1pSSR SSR SSR n k

FSSR SSR k

The Chow Test:1. Is only valid under homoskedasticity (the

error variance for the two groups must be equal).

2. The null hypothesis is that there is no difference at all; either in the intercept or

the slope between the two groups.

3. This may be two restrictive in these cases, we should allow dummy variables

and dummy interactions to allow us to predict different slopes and intercepts for

the two groups.

5

Example: Democracy & Tariffs

Here we see that democracies have lower tariffs

Here we see that states in Regional Trading Arrangements (RTA’s) have lower tariffs

0

5

10

15

20

25

30

35

40

Dictator Oligarch Anocracy DemocracyP

erce

nt

Tar

iffs

Pooled Data

05

10152025

3035404550

Dictator Oligarch Anocracy Democracy

Per

cen

t T

ari

ffs

RTANo RTAPooled Data

But if Democracies are more likely to be in RTA’s, then pooling RTA and non-RTA

states biases the coefficient

6

Solution: The Qualitative Variable

Measure this group difference (RTA vs. Non-RTA) and specify it as an x

This eliminates bias But we have no numerical scale to

measure RTA’s Create a categorical variable that captures

this group difference

7

The Qualitative “Dummy”

Create a variable that equals 1 when a case is part of a group, 0 otherwise

This variable creates a new intercept for the cases in the group marked by the dummy

Specifically, how would we interpret:

0 1 2TARIFF DEM RTA u

8

Democracy and Tariff Barriers

05

10152025

3035404550

Dictator Oligarch Anocracy Democracy

Perc

ent T

ariff

s

RTANo RTA

0 1 2

0 1 2

ˆ ˆ ˆ ˆ

ˆ ˆ ˆ50 5 10

TARIFF DEM RTA u

and and

9x1 (could be continuous, categorical, or dichotomous)

y

Graphical Depiction of a Dummy

10

0 2ˆ ˆ

1

1

0

0 1 1 2 2 2ˆ ˆ ˆˆ if 1y x x x

0 1 1ˆ ˆy x

0 1 1 2 2 2ˆ ˆ ˆˆ if 0y x x x

10

Multiple Category Dummies

Dummy variables are a very flexible way to assess categorical differences in the mean of y

We can use dummies even for concepts with multiple categories

Imagine we want to capture the impact of global region on tariffsRegions: Americas, Europe, Asia, Africa

11

Warning!Warning!

Do not fall into the dummy variable trap!

When you have entered both values of a dummy variable in the same regression. These two variables are linearly dependent. One will drop out.

12

Multiple Category Dummies

Create 4 separate dummy variables - 1 for each region

Include all except one of these dummies in the equation

If you include all 4 dummies you get perfect collinearity with the constant. The fourth dummy will drop out.

Americas+Europe+Asia+Africa=1

13

Interpreting Multi-Category Dummies

Each coefficient compares the mean for that group to the mean in the excluded category

Thus if: βhat

2-βhat4 compare the mean tariff in each region to the

mean in the Americas

Mean in Americas is βhat0

An alternative strategy is to drop the constant and run all dummies, as discussed last week.

0 1 2 3 4ˆ ˆ ˆ ˆ ˆ ˆTARIFF DEM EUR ASIA AFR u

14

Dumb Dummies

Dummy variables are easy, flexible ways to measure categorical concepts

They CAN be just labels for ignorance Try to use dummies to capture theoretical

constructs not empirical observations If possible, measure the theoretical

construct more directly

15

Interaction Effects

Dummy variables specify new intercepts Other slope coefficients in the equation do

not change OLS assumes that the slopes of

continuous variables are constant across all cases

What if slopes are different for different groups in our sample?

16

Interaction Effects: An Example

What if the effect of democracy on tariffs depends on whether the state is in an RTA?

0 1 2ˆ ˆ ˆ ˆTARIFF DEM RTA u

1 0 1ˆ ˆ ˆ RTA

17

Interaction Effects: An Illustration (Notice that democracy has been converted to a dummy as

well for illustration purposes)

0

5

10

15

20

25

30

35

Non-Dem Democracy

Per

cen

t T

ari

ffs

RTANo RTA

0 1 2

1

1

ˆ ˆ ˆ

ˆ 5 0

ˆ 6 1

TARIFF DEM RTA u

if RTA

if RTA

18

How Do We Estimate This Set of Relationships?

We begin with:

Substituting for Βhat1,

we get:

0 1 2ˆ ˆ ˆ ˆTARIFF DEM RTA u

1 0 1ˆ ˆ ˆ RTA

0 0 1 2

0 0 1 2

ˆ ˆˆ ˆ ˆ( )

ˆ ˆˆ ˆ ˆ*

TARIFF RTA DEM RTA u

TARIFF DEM RTA DEM RTA u

Βhat1 Βhat

2 Βhat3

In STATA, they will appear as regular

coefficients

19

What Do These Coefficients Mean?

0 2ˆ ˆ is the new intercept for DEM when RTA=1 0

ˆ is the intercept for DEM when RTA=0

0 is the slope of DEM when RTA=0

1 is the impact of RTA on the coefficient for DEM

0 1So if RTA=1, the slope of DEM is +

0 0 1 2ˆ ˆˆ ˆ ˆ*TARIFF DEM RTA DEM RTA u

20

Interpreting the Interaction

Recall that:

RTA is a dummy variable taking on the values 0 or 1

0 0 1 2

0 0 1 2

ˆ ˆˆ ˆ ˆ( )

ˆ ˆˆ ˆ ˆ*

TARIFF RTA DEM RTA u

TARIFF DEM RTA DEM RTA u

0 1 2ˆ ˆ ˆ ˆTARIFF DEM RTA u

1 0 1ˆ ˆ ˆ RTA

1 0ˆ ˆThus if RTA=0, then =

1 0 1ˆ ˆ ˆBut if RTA=1, then = +

21

An Illustration of the Coefficients

Imagine we estimate:

0

5

10

15

20

25

30

35

Non-Dem Democracy

Per

cen

t T

ari

ffs

RTANo RTA

30 5( ) 1( * ) 10( )TARIFF DEM RTA DEM RTA

22

Substantive Effects of Dummy Interactions

No RTA RTA

Non-

Democracy

Βhat0 =

3030

Βhat0 + Βhat

3 =

2020

Democracy Βhat0 + Βhat

1 =

2525

Βhat0 + Βhat

1 +

Βhat2 + Βhat

3 = 1414

23

Interactions with Continuous Variables

The exact same logic about interactions applies if Βhat

1 depends on a continuous variable

0 1 2ˆ is the impact of x when x =0

0 1 1 2 2

1 0 1 2

ˆ ˆ ˆ ˆx x

ˆ ˆ ˆ x

y u

1 1 2ˆˆ is the change in for each one unit increase in x

2 2 1ˆ is the impact of x when x =0

24

Example: Democracy, Tariffs & Unemployment

28 2( ) 1( * ) 5( )TARIFF DEM DEM UNEMP UNEMP

10

20

30

40

50

Tari

ff R

ate

Dictator Oligarch Anocrat DemoDemocracy 1-4

yhat_, Unemployment == 0 yhat_, Unemployment == 2yhat_, Unemployment == 4 yhat_, Unemployment == 6

25x1 (could be continuous, categorical, or dichotomous)

y

Graphical Depiction of a Dummy/Continuous Interaction

1

0

0 1 1 2 2 2ˆ ˆ ˆˆ if 1y x x x

1 0 1

0 3ˆ ˆ

1 0

0 0 1 1 1 2 3 2 2ˆ ˆˆ ( * ) if 1y x x x x x

0 0 1 1 1 2 3 2 2ˆ ˆˆ ( * ) if 0y x x x x x

0

26

What if a Variable Interacts with Itself?

What if Βhat1 depends on the value of x1?

Then we substitute in as before:

Curvilinear (Quadratic) effect is a type of interaction

0 1 1 2 2

1 0 1 1

ˆ ˆ ˆ ˆx x

ˆ ˆ ˆ x

y u

0 0 1 1 1 2 2

20 0 1 1 1 2 2

ˆ ˆˆ ˆ ˆ( x )x x

ˆ ˆˆ ˆ ˆx x x

y u

y u

27

More Complex Interactions

We can use this method to specify the functional form of βhat

1 in any way we choose

Simply substitute the function in for βhat1 ,

multiply out the terms and estimate Only limitations are theories of interaction

and levels of collinearity

28

Examples of interaction effects

from my own research

29

Figure 4: PCI Performance and Economic Welfare

05

10

15

20

04

GD

P p

er

ca

pit

a (

in M

illio

ns o

f C

on

sta

nt

19

94

VN

D)

0 20 40 60 80 100Structural Endowments (Infrastructure, Human Capital, Proximity to Markets)

Low PCI High PCI

“The Governance Premium” Better governed (high PCI)

provinces are able to generate higher living

standards from the same level of development

Governance and Economic Welfare

30

Predicted Number of Loans by Legal Status among Vietnamese Private Firms

Land Use Rights Certificate

Registered at DPI None Partial Full

No 0.83 0.99 1.2

Yes 2.73 3.27 3.98

31

Predicted Probability of Provincial Division in Vietnam

(By State Sector Output with Number of Cabinet Officials).4

.5.6

.7.8

Pre

dic

ted P

rob

ablity

of P

rovin

cia

l D

ivis

ion

0 .2 .4 .6 .8 1State Contribution to Provincial Output

No Cabinet Members 1 Cabinet Member

2+ Cabinet Members

Contribution of covariates at 75th percentile

32

33