Panel Data Models - WordPress.com

Panel Data Models

Justin Raymond S. Eloriaga

2021

Justin Raymond S. Eloriaga Panel Data Models 2021 1 / 41

Chapter Summary

We give an overview of the slew of panel data models and how we canaccount for unobserved heterogeneity. We start by defining whatunobserved heterogeneity is, then we derive the first differences model. Wethen derive the fixed effects model and discuss the concept of an LSDV.We move to discussing the random effects model and the subsequentmodel comparison tests we can perform on panel data.


Let’s Take a Walk

Question

When was the last time you visited a park? Walked in a park? Took yourpet out for a walk? Exercised in one?

Answer

We’re living in a pandemic, so quite a long time ago. But even before thepandemic, hardly, since there aren’t a lot it Metro Manila.


Let’s Take a Walk

Question


Answer



Let’s Take a Walk

Question


Answer



Setting the Scene

What do you observe with the capital/large cities of most developedcountries (i.e. London, New York, Atlanta, San Francisco, etc.)?

Well, numerous sources have pointed out the large greenspace that ispresent in some of these cities. NY has central park, London hasmany parks.

But this isn’t true for most cities around the world (case and point,Metro Manila).


Setting the Scene





Setting the Scene





Setting the Scene





Greenspaces and Income?

Research Question

Do the presence of larger greenspaces increase the average income in citiesaround the world?

There are many cities around the world, and the average income canvary through time. Hence, conceivably, we can deal with a paneldataset in this particular research.

A panel data set has both a cross sectional dimension i and a timeseries dimension t.

We can formulate the model for this as the form below for alli = 1, 2, ...,N and t = 1, 2, ...,T .

AveIncomeit = β0 + β1Greenspaceit + vt + αi + uit



Research Question








Research Question








Research Question








Research Question







Why the additional terms?


The additional terms are essentially some form of an error term. Thereason for their inclusions is just mainly because we are dealing withpanel data and need to account for two dimensional differences.

vt is a time dependent term. If the dataset contains cities, it meansthat this is some ’error’ more dependent on the general trend ratherthan based on cities.

αi is some space dependent term. This term seeks to explain thedifferences across cities which may be used to explain the variation inAveIncome.Note that these things don’t vary through time (i.e.geography, climate, education, race, etc.)




















Let’s suppress these slightly

In most panel data studies, the time dependent element is generallyimplicitly incorporated through dummy variables or is largely ignored.This is because heterogeneity across time is not really evident in mostliterature.

To account for it using dummy variables, we do that in the formbelow (let us assume there are T years under study)

AveIncomeit = β0 + β1Greenspaceit + φ1δ2t + φ2δ3t + ...

φT−1δTt + αi + uit

It is easy to do this when we have only a few years, but it is verydifficult if we have a lot of time observations.























Defining ηit



From the equation above, let us define ηit as the sum of the αi andthe uit . That is

ηit = αi + uit

We refer to ηit as the composite error term. By composite, we meansome composition of two components. A true error and some othererror. In order to understand what this means, we must firstunderstand why OLS is more often than not inadequate when weestimate panel data.


Defining ηit




ηit = αi + uit



Defining ηit




ηit = αi + uit



Defining ηit




ηit = αi + uit



Inadequacy of OLS

In order for OLS to yield a consistent and unbiased estimator, it mustbe that the cov(ηit , xit) = 0∀i .But when we think deeper, we see that this is likely not the case.

Consider our example of income and green spaces.

cov(ηit , xit) = cov(αi + uit ,Greenspaceit)

We can see that assuming cov(uit ,Greenspaceit) = 0 holds likenormal, then there wouldn’t be an issue. But then, we have that αi

term. And that’s a problem.

What do we think of cov(αi ,Greenspaceit))?= 0


Inadequacy of OLS

In order for OLS to yield a consistent and unbiased estimator, it mustbe that the cov(ηit , xit) = 0∀i .

But when we think deeper, we see that this is likely not the case.







Inadequacy of OLS








Inadequacy of OLS








Inadequacy of OLS








Inadequacy of OLS








On α’s and x ’s

More often than not cov(αi ,Greenspaceit)) 6= 0. In our example, this isbecause there are many things that are space dependent such asgeography, climate, and weather that are related to the amount of greenspace in a city.

For example, Dubai and Manila are both heavily developed cities. Butconceivably, it is much easier to have a green space in Manila becauseof the climate, location, geography and general weather in Manilathan in an arid city like Dubai.










Unobserved Heterogeneity

We refer to αi as the unobserved heterogeneity, and this is something thatPanel Data can shed considerable light on.

We don’t observe data points for most of these unobserved factors.These factors just so happen to be generally constant throughouttime.

It is heterogeneous since it varies across space (i.e. varies throughcities in our example).

In the presence of unobserved heterogeneity, OLS will be both biasedand inconsistent. Hence, we must find a way to eliminate this. Andthis is exactly what the more formal panel data models try to do.


























Building the First Differences Model

Our goal is to eliminate that αi term. One way to do that is to just takethe first difference. Recall the model we have so far.

AveIncomei ,t = β0 + β1Greenspacei ,t + φ1δ2t + φ2δ3t + ...

φT−1δTt + αi + ui ,t

The first difference just looks at the first difference of the average income.Basically, that is

∆AveIncomei ,t = AveIncomei ,t − AveIncomei ,t−1

Applying this to our model above , we get

∆AveIncomei ,t = (β0 − β0) + β1∆Greenspacei ,t

φ1∆2t + φ2∆3t + ...φT−1∆Tt

+ (αi − αi ) + ∆ui ,t

Notice that the β0 and αi cancels out!










φ1∆2t + φ2∆3t + ...φT−1∆Tt

+ (αi − αi ) + ∆ui ,t











φ1∆2t + φ2∆3t + ...φT−1∆Tt

+ (αi − αi ) + ∆ui ,t











φ1∆2t + φ2∆3t + ...φT−1∆Tt

+ (αi − αi ) + ∆ui ,t











φ1∆2t + φ2∆3t + ...φT−1∆Tt

+ (αi − αi ) + ∆ui ,t

Notice that the β0 and αi cancels out!Justin Raymond S. Eloriaga Panel Data Models 2021 12 / 41

Implications of the First Difference Model

Under the assumption that cov(∆Greenspacei ,t ,∆uit) = 0, then OLSshould be consistent and unbiased.

In essence, the first differences model is basically using a regular OLSbut on difference.

For it to work properly, we need some variance across time.Otherwise, ∆ terms will approach zero and you end up taking outthings that are constant through time.

















Let’s Try Another Way

Say we have our model as the form below, but we assume that the timeterm vt is implicitly accounted for

AveIncomei ,t = β0 + β1Greenspacei ,t + αi + ui ,t

Remember that for OLS to be unbiased and consistent, it must be thatcov(ηit , x) = 0. But since ηit contains αi and that αi becomes a problem,we need to try to take it out. We try and do this now by deriving the timeaveraged equation. For example...

AveIncome i =1

T

T∑t=1

AveIncomei ,t






AveIncome i =1

T

T∑t=1

AveIncomei ,t






AveIncome i =1

T

T∑t=1

AveIncomei ,t


Time Averaged Equation

If we do that for every term in the equation, we get

AveIncome i = β1Greenspace i + αi + ui

Note that αi = 1T

∑Tt=1 αi = 1

T ·T · αi = αi . This is because αi doesnot vary through time.





Note that αi = 1T

∑Tt=1 αi = 1






Note that αi = 1T

∑Tt=1 αi = 1



Fixed Effects Estimator

To derive what we call the fixed effects estimator, we take the differencebetween the original model and the time averaged equation.

AveIncomei ,t − AveIncome i = β1(Greenspacei ,t − Greenspace i )

+ (αi − αi ) + (uit − ui )

We can rewrite this more simply in terms of tilde terms where each term isessentially a differenced time-demeaned value.

˜AveIncomei ,t = β1 ˜Greenspacei ,t + ui ,t

The caveat for this would be that we remove anything time constant andwe can’t really evaluate the impact of time specific things on thedependent variable.





+ (αi − αi ) + (uit − ui )








+ (αi − αi ) + (uit − ui )








+ (αi − αi ) + (uit − ui )





Least Squares Dummy Variables

One way by which we operationalize the fixed effects framework isthrough the use of dummy variables.

For example, consider our original model with an implied vi


If, for example, we have 3 cities. Then the αi can be represented bysome dummy variables meaning that each city has a differentintercept. Note we only include two so as not to fall to the DVT.

AveIncomei ,t = β0 + β1Greenspacei ,t + ρ1d2 + ρ2d3 + ui ,t

By explicitly showing this, we are explicitly saying that there is adifferent intercept for each city.


































LSDV and the Fixed Effects

Empirically, βdv → β (suggesting OLS is consistent and unbiased) if wesatisfy the three assumptions below

cov(xi ,t |ui ,t) = 0 (Weak Exogeneity)

cov(ui |uj) = 0, ∀ i 6= j (No Serial Correlation)

var(ui ,t) = σ2 (Homoscedastic Errors)

In essence, the wait we operationalize the regular fixed effects model isjust through this dummy variable. Therefore, β∗dv = β∗FE

The estimator is the same for LSDV and FEs, and it may be modifiedin their space and time components through the LSDV Variants.

However, if N or T is large, then we would end up with so muchdummy variables.






































LSDV1

The first variant of the LSDV (LSDV1) is essentially what we have so far.That is, a space varying and time invariant specification.

AveIncomei ,t = β0 + β1Greenspacei ,t + ρ1d2 + ρ2d3 + ...+ ρN−1dN + ui ,t

Therefore, in a more general equation, LSDV1 can be formulated as

LSDV1

Yi ,t = β0 + βi ,t

T∑t=1

N∑i=1

Xi ,t + ρi

N∑i=1

di + ui ,t


LSDV1




LSDV1

Yi ,t = β0 + βi ,t

T∑t=1

N∑i=1

Xi ,t + ρi

N∑i=1

di + ui ,t


LSDV1




LSDV1

Yi ,t = β0 + βi ,t

T∑t=1

N∑i=1

Xi ,t + ρi

N∑i=1

di + ui ,t


LSDV2

The second variant of the LSDV (LSDV2) is essentially the reverse ofLSDV1. That is, we have a space invariant but time varying specification.

AveIncomei ,t = β0 +β1Greenspacei ,t +φ1m2 +φ2m3 + ...+φT−1mT + ui ,t


LSDV2

Yi ,t = β0 + βi ,t

T∑t=1

N∑i=1

Xi ,t + φt

T∑t=1

mt + ui ,t

Note that this variant is less commonly used than other variants.


LSDV2




LSDV2

Yi ,t = β0 + βi ,t

T∑t=1

N∑i=1

Xi ,t + φt

T∑t=1

mt + ui ,t



LSDV2




LSDV2

Yi ,t = β0 + βi ,t

T∑t=1

N∑i=1

Xi ,t + φt

T∑t=1

mt + ui ,t



LSDV2




LSDV2

Yi ,t = β0 + βi ,t

T∑t=1

N∑i=1

Xi ,t + φt

T∑t=1

mt + ui ,t



LSDV3

The third variant of the LSDV (LSDV3) is a combination of LSDV1 andLSDV2. That is, both time and space varying.

AveIncomei ,t = β0 + β1Greenspacei ,t + φ1m2 + φ2m3 + ...+ φT−1mT

+ ρ1d2 + ρ2d3 + ...+ ρN−1dNui ,t


LSDV3

Yi ,t = β0 + βi ,t

T∑t=1

N∑i=1

Xi ,t + ρi

N∑i=1

di + φt

T∑t=1

mt + ui ,t


LSDV3



+ ρ1d2 + ρ2d3 + ...+ ρN−1dNui ,t


LSDV3

Yi ,t = β0 + βi ,t

T∑t=1

N∑i=1

Xi ,t + ρi

N∑i=1

di + φt

T∑t=1

mt + ui ,t


LSDV3



+ ρ1d2 + ρ2d3 + ...+ ρN−1dNui ,t


LSDV3

Yi ,t = β0 + βi ,t

T∑t=1

N∑i=1

Xi ,t + ρi

N∑i=1

di + φt

T∑t=1

mt + ui ,t


Between FE (LSDVs) and the Pooled OLS

A key question to ask would be which LSDV to use and if using an FEmodel is indeed better than a regular OLS. To compare between models,we use a Wald’s Test of Linear Restrictions. To do this, we compute anF-statistic formulated as below.

Wald’s Test of Linear Restrictions

F =

RSSR−RSSURdfR−dfURRSSURdfUR





F =






F =



Differences between LSDVs and FEs

The LSDV model and the FE model do differ operationally in someregards. While they are obviously different in equation form (i.e. timedemeaned vs. adding dummy variables), these differences tell us what thepros and cons are of each model specification.


On the R2 of Fixed Effects and LSDVs

(FE) ˜AveIncomei ,t = β1 ˜Greenspacei ,t + ui ,t

If for example we have an R2 equal to 0.80. How will we interpretthis?

Answer: The R2 is the variation in the average income across timeexplained by the model. Ergo, how well can our model explaindeviations in average income away from its time mean.

(LSDV, say 1) AveIncomei ,t = β0 + β1Greenspacei ,t + ρ1d2 + ρ2d3 + ui ,t

On the average, we usually get a higher R2 value. But it isn’t really agood indicator when we have a lot of variables because it is justmonotonically increasing with the number of variables.





































Why is FE or FD superior to (Pooled) OLS

Consider this simple econometric model below

AveIncomei ,t = βYrsOfExpi ,t + αi + ui ,t

It is obvious that we expect that β > 0.

However, if we just ran a simple Pooled OLS (i.e. lump the datatogether and estimate it as if it were one big cross-section), youmight come up with a nonsensical result.




















Misleading Inference of Pooled OLS

Notice that the derived regression line is negatively sloped (i.e. β < 0),which is nonsensical (for the most part) in our theory.


Misleading Inference of Pooled OLS

Notice that the derived regression line is negatively sloped (i.e. β < 0),which is nonsensical (for the most part) in our theory.


Digging Deeper

One of the reasons why a regular OLS might have failed is because itfailed to account for the space and time dimension both present in a paneldataset. See, for example, the same scatterplot but now labeled. Whatcan you observe?


Digging Deeper



Digging Deeper



Superiority of the FE/FD

FE/FD think about observations across cities as different.

Because we removed the unobserved heterogeneity αi in both thesemodels, we can disregard that there are differences in the averagelevel of income across cities. (i.e. Manila standard of living vs. NewYork standard of living). When you compare many different datapoints at varying levels, you may come to misleading conclusionswhen you look at it as a whole.

Rather, the differences in the average level of income are due to cityspecific differences which don’t really change through time.

















What FD tries to do (graphically)

FD tries to fit a line between these pairs of observations. Notice it outputsthat β > 0 which is, to us, more reasonable.








What FE tries to do (graphically)

FE is similar in that it tries to find a midpoint of all the observations for aparticular i cross section and tries to connect these points as neatly aspossible. Note: when we have just two years (2015 and 2020), theβFE = βFD .








What if cov(αi ,Xi ,t) = 0?

Our discussion so far has been hinged upon the disturbance caused by thepresence of unobserved heterogeneity. We know that, most likely,cov(αi ,Xi ,t) 6= 0

In situations like those, we typically used Fixed Effects or FirstDifferences because of that ”endogeneity” issue between theunobserved heterogeneity αi and the independent variables.

However, we do not necessarily need to use Fixed Effects or FirstDifferences when cov(αi ,Xi ,t) = 0.

















Why wouldn’t we have to deal with αi?

Sometimes, the panel model we may have specified could have thefollowing characteristics.

The specification controlled for all factors in determining thedependent variable.

It could be that αi is just very small (or very insignificant)

So can we just use OLS (i.e. ˆβOLS)

Absolutely. You can even use FE/FD because all of the estimatorswill be consistent.





































Caveats of using OLS/FE/FD when cov(αi ,Xi ,t) = 0

While you may use these methods, it turns out that FE/FD are tooextreme in this case.

In the case of FD, you throw away one period.

In the case of FE, it is too extreme in that you estimate thingsunecessarily.

In the case of (pooled) OLS, even if cov(αi |Xi ,t) = 0, the errors maystill be serially correlated with another.

Proof: We know that ηi ,t = αi + ui ,t

Therefore, cov(ηi ,t , ηi ,t−1) = cov(αi + ui ,t , αi + ui ,t−1)

Even if cov(αi , ui ,t) = 0, you would still be left withcov(αi , αi ) = var(αi ) = σ2α > 0

Hence, there will be serially correlated errors to some extent.


















































































Random Effects Model

To alleviate the serially correlated errors, we need to use some feasiblegeneralized least squares. In the context of panel data, that would be theRandom Effects Model.

Consider the Model below:

AveIncomei ,t = β0 + β1Greenspacei ,t + αi + ui ,t ηi ,t = αi + ui ,t

As we had mentioned, if cov(αi , xi ,t) = 0, then FD/OLS/FE wouldnot be the most efficient model to use. Instead, the random effectsmodel would serve best.

We introduce some parameter λ which is a partial de-meaning factor.

Random Effects

yit − λyi = β0(1− λ) + βN∑i=1

T∑t=1

(xi ,t − λxi ) + ηi ,t − ληi








Random Effects

yit − λyi = β0(1− λ) + βN∑i=1

T∑t=1









Random Effects

yit − λyi = β0(1− λ) + βN∑i=1

T∑t=1









Random Effects

yit − λyi = β0(1− λ) + βN∑i=1

T∑t=1









Random Effects

yit − λyi = β0(1− λ) + β

N∑i=1

T∑t=1



Operationalizing Random Effects

AveIncomei ,t − λAveIncome i = β0(1− λ)+

β1(Greenspacei ,t − λGreenspacei ) + ηi ,t − ληi

Notice the following:

If λ = 0, then essentially, we are left with the original equation.Essentially, this implies that βRE = βPOLS .

If λ = 1, then this reduces to the fixed effects model. Ergo,βRE = βFE .

However, when the cov(αi , xi ,t) = 0, it is most often the case that0 < λ < 1.

So what exactly is λ (formulaically)?

λ = 1−

√(σ2u

σ2u + Tσ2α

)










λ = 1−

√(σ2u

σ2u + Tσ2α

)










λ = 1−

√(σ2u

σ2u + Tσ2α

)










λ = 1−

√(σ2u

σ2u + Tσ2α

)










λ = 1−

√(σ2u

σ2u + Tσ2α

)










λ = 1−

√(σ2u

σ2u + Tσ2α

)Justin Raymond S. Eloriaga Panel Data Models 2021 35 / 41

Hold up! What is σ2u and σ2

α?

Answer: These are the variances of the ui ,t and αi error terms, respectively.

If λ = 0, this is only possible if σ2u = 0. Therefore, RE is equivalent toPooled OLS. In essence, the effect αi is effectively unimportantwithout considering the serial correlation issue.

If λ = 1, this is only possible if Tσ2α →∞. If σ2α →∞, we want toget rid of the confounding αi as much as possible which is donethrough FE.

In essence, the RE is a ”quasi-time demeaned” model since we didn’tfully demean (like in FE), only partially.

CAVEAT: While theoretically this is true, the big problem is that wedon’t necessarily know σ2u and σ2α. We can only estimate it. Hence,estimating these will yield some estimate of λ, namely, λ



α?








α?








α?








α?








α?







So how is Random Effects Done

Random effects generally revolves around the estimation of λ.

Step 1: We use fixed effects/pooled OLS to estimate λ.

Step 2: We use λ to estimate the original random effects equation. Inestimating, we just used pooled OLS on this transformed system.

This two step process is essentially the random effects model.


























Breusch-Pagan Test for Comparing Random Effects andPooled OLS

How do we determine whether RE/Pooled OLS is better? Well, we use theBreusch-Pagan Test (not to be confused with the other BP test used forheteroscedasticity).

In essence, the test procedure revolves around H0 : λ = 0 andHa : λ 6= 0.

As we mentioned before, if λ = 0, then the RE just collapses to thePooled OLS. Hence, we test on a restriction that λ = 0.

















Accounting for Time Constant Variables in Random Effects

Say you had the model below

AveIncomei ,t = β0 + β1Greenspacei ,t + β2Climatei + ηi ,t

To operationalize the RE, we partially time demean this using ourestimated λ.


β1(Greenspacei ,t − λGreenspacei )+

β2(Climatei − λClimatei ) + ηi ,t − ληi

But Climatei = Climate i because it is fixed through time. But we knowthat in general, 0 < λ < 1. Hence, these terms will not disappear like inthe fixed effects. Therefore, RE gives us the ability to account for timeconstant variables (which would have otherwise been dropped in FE).





























Comparing Random Effects and Fixed Effects

Consider the model below

yi ,t = β0 + β1xi ,t + αi + ui ,t

We know that if the cov(αi , xi ,t) = 0, both βRE and βFE will beconsistent, but se(βRE ) < se(βFE ). Therefore, it would be best to userandom effects.

The converse is true if cov(αi , xi ,t) 6= 0 since FE remains consistentbut RE does not.

Essentially, we want to test a hypothesis that cov(αi , xi ,t) = 0 againstcov(αi , xi ,t) 6= 0 and this is exactly what the Hausman Test ends updoing.




yi ,t = β0 + β1xi ,t + αi + ui ,t







yi ,t = β0 + β1xi ,t + αi + ui ,t







yi ,t = β0 + β1xi ,t + αi + ui ,t







yi ,t = β0 + β1xi ,t + αi + ui ,t





Operationalizing the Hausman Test

We propose the test statistic W ′

Hausman Test Statistic

W ′ =( ˆβFE

∗ − ˆβRE∗)

var( ˆβFE )− var( ˆβRE )∼ χ2

1

The test statistic is essentially a chi-squared statistic with one degree offreedom.

If the null hypothesis is true (H0 : cov(αi , xi ,t) = 0), then RE and FEare both internally consistent, but RE is more efficient).

If the null hypothesis is false (Ha : cov(αi , xi ,t) 6= 0), then RE isinternally inconsistent therefore FE would be a better model.





W ′ =( ˆβFE

∗ − ˆβRE∗)


1








W ′ =( ˆβFE

∗ − ˆβRE∗)


1








W ′ =( ˆβFE

∗ − ˆβRE∗)


1








W ′ =( ˆβFE

∗ − ˆβRE∗)


1





Documents

Panel Data Models - WordPress.com