167
Panel Data Models Justin Raymond S. Eloriaga 2021 Justin Raymond S. Eloriaga Panel Data Models 2021 1 / 41

Panel Data Models - WordPress.com

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Panel Data Models - WordPress.com

Panel Data Models

Justin Raymond S. Eloriaga

2021

Justin Raymond S. Eloriaga Panel Data Models 2021 1 / 41

Page 2: Panel Data Models - WordPress.com

Chapter Summary

We give an overview of the slew of panel data models and how we canaccount for unobserved heterogeneity. We start by defining whatunobserved heterogeneity is, then we derive the first differences model. Wethen derive the fixed effects model and discuss the concept of an LSDV.We move to discussing the random effects model and the subsequentmodel comparison tests we can perform on panel data.

Justin Raymond S. Eloriaga Panel Data Models 2021 2 / 41

Page 3: Panel Data Models - WordPress.com

Let’s Take a Walk

Question

When was the last time you visited a park? Walked in a park? Took yourpet out for a walk? Exercised in one?

Answer

We’re living in a pandemic, so quite a long time ago. But even before thepandemic, hardly, since there aren’t a lot it Metro Manila.

Justin Raymond S. Eloriaga Panel Data Models 2021 3 / 41

Page 4: Panel Data Models - WordPress.com

Let’s Take a Walk

Question

When was the last time you visited a park? Walked in a park? Took yourpet out for a walk? Exercised in one?

Answer

We’re living in a pandemic, so quite a long time ago. But even before thepandemic, hardly, since there aren’t a lot it Metro Manila.

Justin Raymond S. Eloriaga Panel Data Models 2021 3 / 41

Page 5: Panel Data Models - WordPress.com

Let’s Take a Walk

Question

When was the last time you visited a park? Walked in a park? Took yourpet out for a walk? Exercised in one?

Answer

We’re living in a pandemic, so quite a long time ago. But even before thepandemic, hardly, since there aren’t a lot it Metro Manila.

Justin Raymond S. Eloriaga Panel Data Models 2021 3 / 41

Page 6: Panel Data Models - WordPress.com

Setting the Scene

What do you observe with the capital/large cities of most developedcountries (i.e. London, New York, Atlanta, San Francisco, etc.)?

Well, numerous sources have pointed out the large greenspace that ispresent in some of these cities. NY has central park, London hasmany parks.

But this isn’t true for most cities around the world (case and point,Metro Manila).

Justin Raymond S. Eloriaga Panel Data Models 2021 4 / 41

Page 7: Panel Data Models - WordPress.com

Setting the Scene

What do you observe with the capital/large cities of most developedcountries (i.e. London, New York, Atlanta, San Francisco, etc.)?

Well, numerous sources have pointed out the large greenspace that ispresent in some of these cities. NY has central park, London hasmany parks.

But this isn’t true for most cities around the world (case and point,Metro Manila).

Justin Raymond S. Eloriaga Panel Data Models 2021 4 / 41

Page 8: Panel Data Models - WordPress.com

Setting the Scene

What do you observe with the capital/large cities of most developedcountries (i.e. London, New York, Atlanta, San Francisco, etc.)?

Well, numerous sources have pointed out the large greenspace that ispresent in some of these cities. NY has central park, London hasmany parks.

But this isn’t true for most cities around the world (case and point,Metro Manila).

Justin Raymond S. Eloriaga Panel Data Models 2021 4 / 41

Page 9: Panel Data Models - WordPress.com

Setting the Scene

What do you observe with the capital/large cities of most developedcountries (i.e. London, New York, Atlanta, San Francisco, etc.)?

Well, numerous sources have pointed out the large greenspace that ispresent in some of these cities. NY has central park, London hasmany parks.

But this isn’t true for most cities around the world (case and point,Metro Manila).

Justin Raymond S. Eloriaga Panel Data Models 2021 4 / 41

Page 10: Panel Data Models - WordPress.com

Greenspaces and Income?

Research Question

Do the presence of larger greenspaces increase the average income in citiesaround the world?

There are many cities around the world, and the average income canvary through time. Hence, conceivably, we can deal with a paneldataset in this particular research.

A panel data set has both a cross sectional dimension i and a timeseries dimension t.

We can formulate the model for this as the form below for alli = 1, 2, ...,N and t = 1, 2, ...,T .

AveIncomeit = β0 + β1Greenspaceit + vt + αi + uit

Justin Raymond S. Eloriaga Panel Data Models 2021 5 / 41

Page 11: Panel Data Models - WordPress.com

Greenspaces and Income?

Research Question

Do the presence of larger greenspaces increase the average income in citiesaround the world?

There are many cities around the world, and the average income canvary through time. Hence, conceivably, we can deal with a paneldataset in this particular research.

A panel data set has both a cross sectional dimension i and a timeseries dimension t.

We can formulate the model for this as the form below for alli = 1, 2, ...,N and t = 1, 2, ...,T .

AveIncomeit = β0 + β1Greenspaceit + vt + αi + uit

Justin Raymond S. Eloriaga Panel Data Models 2021 5 / 41

Page 12: Panel Data Models - WordPress.com

Greenspaces and Income?

Research Question

Do the presence of larger greenspaces increase the average income in citiesaround the world?

There are many cities around the world, and the average income canvary through time. Hence, conceivably, we can deal with a paneldataset in this particular research.

A panel data set has both a cross sectional dimension i and a timeseries dimension t.

We can formulate the model for this as the form below for alli = 1, 2, ...,N and t = 1, 2, ...,T .

AveIncomeit = β0 + β1Greenspaceit + vt + αi + uit

Justin Raymond S. Eloriaga Panel Data Models 2021 5 / 41

Page 13: Panel Data Models - WordPress.com

Greenspaces and Income?

Research Question

Do the presence of larger greenspaces increase the average income in citiesaround the world?

There are many cities around the world, and the average income canvary through time. Hence, conceivably, we can deal with a paneldataset in this particular research.

A panel data set has both a cross sectional dimension i and a timeseries dimension t.

We can formulate the model for this as the form below for alli = 1, 2, ...,N and t = 1, 2, ...,T .

AveIncomeit = β0 + β1Greenspaceit + vt + αi + uit

Justin Raymond S. Eloriaga Panel Data Models 2021 5 / 41

Page 14: Panel Data Models - WordPress.com

Greenspaces and Income?

Research Question

Do the presence of larger greenspaces increase the average income in citiesaround the world?

There are many cities around the world, and the average income canvary through time. Hence, conceivably, we can deal with a paneldataset in this particular research.

A panel data set has both a cross sectional dimension i and a timeseries dimension t.

We can formulate the model for this as the form below for alli = 1, 2, ...,N and t = 1, 2, ...,T .

AveIncomeit = β0 + β1Greenspaceit + vt + αi + uit

Justin Raymond S. Eloriaga Panel Data Models 2021 5 / 41

Page 15: Panel Data Models - WordPress.com

Why the additional terms?

AveIncomeit = β0 + β1Greenspaceit + vt + αi + uit

The additional terms are essentially some form of an error term. Thereason for their inclusions is just mainly because we are dealing withpanel data and need to account for two dimensional differences.

vt is a time dependent term. If the dataset contains cities, it meansthat this is some ’error’ more dependent on the general trend ratherthan based on cities.

αi is some space dependent term. This term seeks to explain thedifferences across cities which may be used to explain the variation inAveIncome.Note that these things don’t vary through time (i.e.geography, climate, education, race, etc.)

Justin Raymond S. Eloriaga Panel Data Models 2021 6 / 41

Page 16: Panel Data Models - WordPress.com

Why the additional terms?

AveIncomeit = β0 + β1Greenspaceit + vt + αi + uit

The additional terms are essentially some form of an error term. Thereason for their inclusions is just mainly because we are dealing withpanel data and need to account for two dimensional differences.

vt is a time dependent term. If the dataset contains cities, it meansthat this is some ’error’ more dependent on the general trend ratherthan based on cities.

αi is some space dependent term. This term seeks to explain thedifferences across cities which may be used to explain the variation inAveIncome.Note that these things don’t vary through time (i.e.geography, climate, education, race, etc.)

Justin Raymond S. Eloriaga Panel Data Models 2021 6 / 41

Page 17: Panel Data Models - WordPress.com

Why the additional terms?

AveIncomeit = β0 + β1Greenspaceit + vt + αi + uit

The additional terms are essentially some form of an error term. Thereason for their inclusions is just mainly because we are dealing withpanel data and need to account for two dimensional differences.

vt is a time dependent term. If the dataset contains cities, it meansthat this is some ’error’ more dependent on the general trend ratherthan based on cities.

αi is some space dependent term. This term seeks to explain thedifferences across cities which may be used to explain the variation inAveIncome.Note that these things don’t vary through time (i.e.geography, climate, education, race, etc.)

Justin Raymond S. Eloriaga Panel Data Models 2021 6 / 41

Page 18: Panel Data Models - WordPress.com

Why the additional terms?

AveIncomeit = β0 + β1Greenspaceit + vt + αi + uit

The additional terms are essentially some form of an error term. Thereason for their inclusions is just mainly because we are dealing withpanel data and need to account for two dimensional differences.

vt is a time dependent term. If the dataset contains cities, it meansthat this is some ’error’ more dependent on the general trend ratherthan based on cities.

αi is some space dependent term. This term seeks to explain thedifferences across cities which may be used to explain the variation inAveIncome.Note that these things don’t vary through time (i.e.geography, climate, education, race, etc.)

Justin Raymond S. Eloriaga Panel Data Models 2021 6 / 41

Page 19: Panel Data Models - WordPress.com

Let’s suppress these slightly

In most panel data studies, the time dependent element is generallyimplicitly incorporated through dummy variables or is largely ignored.This is because heterogeneity across time is not really evident in mostliterature.

To account for it using dummy variables, we do that in the formbelow (let us assume there are T years under study)

AveIncomeit = β0 + β1Greenspaceit + φ1δ2t + φ2δ3t + ...

φT−1δTt + αi + uit

It is easy to do this when we have only a few years, but it is verydifficult if we have a lot of time observations.

Justin Raymond S. Eloriaga Panel Data Models 2021 7 / 41

Page 20: Panel Data Models - WordPress.com

Let’s suppress these slightly

In most panel data studies, the time dependent element is generallyimplicitly incorporated through dummy variables or is largely ignored.This is because heterogeneity across time is not really evident in mostliterature.

To account for it using dummy variables, we do that in the formbelow (let us assume there are T years under study)

AveIncomeit = β0 + β1Greenspaceit + φ1δ2t + φ2δ3t + ...

φT−1δTt + αi + uit

It is easy to do this when we have only a few years, but it is verydifficult if we have a lot of time observations.

Justin Raymond S. Eloriaga Panel Data Models 2021 7 / 41

Page 21: Panel Data Models - WordPress.com

Let’s suppress these slightly

In most panel data studies, the time dependent element is generallyimplicitly incorporated through dummy variables or is largely ignored.This is because heterogeneity across time is not really evident in mostliterature.

To account for it using dummy variables, we do that in the formbelow (let us assume there are T years under study)

AveIncomeit = β0 + β1Greenspaceit + φ1δ2t + φ2δ3t + ...

φT−1δTt + αi + uit

It is easy to do this when we have only a few years, but it is verydifficult if we have a lot of time observations.

Justin Raymond S. Eloriaga Panel Data Models 2021 7 / 41

Page 22: Panel Data Models - WordPress.com

Let’s suppress these slightly

In most panel data studies, the time dependent element is generallyimplicitly incorporated through dummy variables or is largely ignored.This is because heterogeneity across time is not really evident in mostliterature.

To account for it using dummy variables, we do that in the formbelow (let us assume there are T years under study)

AveIncomeit = β0 + β1Greenspaceit + φ1δ2t + φ2δ3t + ...

φT−1δTt + αi + uit

It is easy to do this when we have only a few years, but it is verydifficult if we have a lot of time observations.

Justin Raymond S. Eloriaga Panel Data Models 2021 7 / 41

Page 23: Panel Data Models - WordPress.com

Defining ηit

AveIncomeit = β0 + β1Greenspaceit + φ1δ2t + φ2δ3t + ...

φT−1δTt + αi + uit

From the equation above, let us define ηit as the sum of the αi andthe uit . That is

ηit = αi + uit

We refer to ηit as the composite error term. By composite, we meansome composition of two components. A true error and some othererror. In order to understand what this means, we must firstunderstand why OLS is more often than not inadequate when weestimate panel data.

Justin Raymond S. Eloriaga Panel Data Models 2021 8 / 41

Page 24: Panel Data Models - WordPress.com

Defining ηit

AveIncomeit = β0 + β1Greenspaceit + φ1δ2t + φ2δ3t + ...

φT−1δTt + αi + uit

From the equation above, let us define ηit as the sum of the αi andthe uit . That is

ηit = αi + uit

We refer to ηit as the composite error term. By composite, we meansome composition of two components. A true error and some othererror. In order to understand what this means, we must firstunderstand why OLS is more often than not inadequate when weestimate panel data.

Justin Raymond S. Eloriaga Panel Data Models 2021 8 / 41

Page 25: Panel Data Models - WordPress.com

Defining ηit

AveIncomeit = β0 + β1Greenspaceit + φ1δ2t + φ2δ3t + ...

φT−1δTt + αi + uit

From the equation above, let us define ηit as the sum of the αi andthe uit . That is

ηit = αi + uit

We refer to ηit as the composite error term. By composite, we meansome composition of two components. A true error and some othererror. In order to understand what this means, we must firstunderstand why OLS is more often than not inadequate when weestimate panel data.

Justin Raymond S. Eloriaga Panel Data Models 2021 8 / 41

Page 26: Panel Data Models - WordPress.com

Defining ηit

AveIncomeit = β0 + β1Greenspaceit + φ1δ2t + φ2δ3t + ...

φT−1δTt + αi + uit

From the equation above, let us define ηit as the sum of the αi andthe uit . That is

ηit = αi + uit

We refer to ηit as the composite error term. By composite, we meansome composition of two components. A true error and some othererror. In order to understand what this means, we must firstunderstand why OLS is more often than not inadequate when weestimate panel data.

Justin Raymond S. Eloriaga Panel Data Models 2021 8 / 41

Page 27: Panel Data Models - WordPress.com

Inadequacy of OLS

In order for OLS to yield a consistent and unbiased estimator, it mustbe that the cov(ηit , xit) = 0∀i .But when we think deeper, we see that this is likely not the case.

Consider our example of income and green spaces.

cov(ηit , xit) = cov(αi + uit ,Greenspaceit)

We can see that assuming cov(uit ,Greenspaceit) = 0 holds likenormal, then there wouldn’t be an issue. But then, we have that αi

term. And that’s a problem.

What do we think of cov(αi ,Greenspaceit))?= 0

Justin Raymond S. Eloriaga Panel Data Models 2021 9 / 41

Page 28: Panel Data Models - WordPress.com

Inadequacy of OLS

In order for OLS to yield a consistent and unbiased estimator, it mustbe that the cov(ηit , xit) = 0∀i .

But when we think deeper, we see that this is likely not the case.

Consider our example of income and green spaces.

cov(ηit , xit) = cov(αi + uit ,Greenspaceit)

We can see that assuming cov(uit ,Greenspaceit) = 0 holds likenormal, then there wouldn’t be an issue. But then, we have that αi

term. And that’s a problem.

What do we think of cov(αi ,Greenspaceit))?= 0

Justin Raymond S. Eloriaga Panel Data Models 2021 9 / 41

Page 29: Panel Data Models - WordPress.com

Inadequacy of OLS

In order for OLS to yield a consistent and unbiased estimator, it mustbe that the cov(ηit , xit) = 0∀i .But when we think deeper, we see that this is likely not the case.

Consider our example of income and green spaces.

cov(ηit , xit) = cov(αi + uit ,Greenspaceit)

We can see that assuming cov(uit ,Greenspaceit) = 0 holds likenormal, then there wouldn’t be an issue. But then, we have that αi

term. And that’s a problem.

What do we think of cov(αi ,Greenspaceit))?= 0

Justin Raymond S. Eloriaga Panel Data Models 2021 9 / 41

Page 30: Panel Data Models - WordPress.com

Inadequacy of OLS

In order for OLS to yield a consistent and unbiased estimator, it mustbe that the cov(ηit , xit) = 0∀i .But when we think deeper, we see that this is likely not the case.

Consider our example of income and green spaces.

cov(ηit , xit) = cov(αi + uit ,Greenspaceit)

We can see that assuming cov(uit ,Greenspaceit) = 0 holds likenormal, then there wouldn’t be an issue. But then, we have that αi

term. And that’s a problem.

What do we think of cov(αi ,Greenspaceit))?= 0

Justin Raymond S. Eloriaga Panel Data Models 2021 9 / 41

Page 31: Panel Data Models - WordPress.com

Inadequacy of OLS

In order for OLS to yield a consistent and unbiased estimator, it mustbe that the cov(ηit , xit) = 0∀i .But when we think deeper, we see that this is likely not the case.

Consider our example of income and green spaces.

cov(ηit , xit) = cov(αi + uit ,Greenspaceit)

We can see that assuming cov(uit ,Greenspaceit) = 0 holds likenormal, then there wouldn’t be an issue. But then, we have that αi

term. And that’s a problem.

What do we think of cov(αi ,Greenspaceit))?= 0

Justin Raymond S. Eloriaga Panel Data Models 2021 9 / 41

Page 32: Panel Data Models - WordPress.com

Inadequacy of OLS

In order for OLS to yield a consistent and unbiased estimator, it mustbe that the cov(ηit , xit) = 0∀i .But when we think deeper, we see that this is likely not the case.

Consider our example of income and green spaces.

cov(ηit , xit) = cov(αi + uit ,Greenspaceit)

We can see that assuming cov(uit ,Greenspaceit) = 0 holds likenormal, then there wouldn’t be an issue. But then, we have that αi

term. And that’s a problem.

What do we think of cov(αi ,Greenspaceit))?= 0

Justin Raymond S. Eloriaga Panel Data Models 2021 9 / 41

Page 33: Panel Data Models - WordPress.com

On α’s and x ’s

More often than not cov(αi ,Greenspaceit)) 6= 0. In our example, this isbecause there are many things that are space dependent such asgeography, climate, and weather that are related to the amount of greenspace in a city.

For example, Dubai and Manila are both heavily developed cities. Butconceivably, it is much easier to have a green space in Manila becauseof the climate, location, geography and general weather in Manilathan in an arid city like Dubai.

Justin Raymond S. Eloriaga Panel Data Models 2021 10 / 41

Page 34: Panel Data Models - WordPress.com

On α’s and x ’s

More often than not cov(αi ,Greenspaceit)) 6= 0. In our example, this isbecause there are many things that are space dependent such asgeography, climate, and weather that are related to the amount of greenspace in a city.

For example, Dubai and Manila are both heavily developed cities. Butconceivably, it is much easier to have a green space in Manila becauseof the climate, location, geography and general weather in Manilathan in an arid city like Dubai.

Justin Raymond S. Eloriaga Panel Data Models 2021 10 / 41

Page 35: Panel Data Models - WordPress.com

On α’s and x ’s

More often than not cov(αi ,Greenspaceit)) 6= 0. In our example, this isbecause there are many things that are space dependent such asgeography, climate, and weather that are related to the amount of greenspace in a city.

For example, Dubai and Manila are both heavily developed cities. Butconceivably, it is much easier to have a green space in Manila becauseof the climate, location, geography and general weather in Manilathan in an arid city like Dubai.

Justin Raymond S. Eloriaga Panel Data Models 2021 10 / 41

Page 36: Panel Data Models - WordPress.com

Unobserved Heterogeneity

We refer to αi as the unobserved heterogeneity, and this is something thatPanel Data can shed considerable light on.

We don’t observe data points for most of these unobserved factors.These factors just so happen to be generally constant throughouttime.

It is heterogeneous since it varies across space (i.e. varies throughcities in our example).

In the presence of unobserved heterogeneity, OLS will be both biasedand inconsistent. Hence, we must find a way to eliminate this. Andthis is exactly what the more formal panel data models try to do.

Justin Raymond S. Eloriaga Panel Data Models 2021 11 / 41

Page 37: Panel Data Models - WordPress.com

Unobserved Heterogeneity

We refer to αi as the unobserved heterogeneity, and this is something thatPanel Data can shed considerable light on.

We don’t observe data points for most of these unobserved factors.These factors just so happen to be generally constant throughouttime.

It is heterogeneous since it varies across space (i.e. varies throughcities in our example).

In the presence of unobserved heterogeneity, OLS will be both biasedand inconsistent. Hence, we must find a way to eliminate this. Andthis is exactly what the more formal panel data models try to do.

Justin Raymond S. Eloriaga Panel Data Models 2021 11 / 41

Page 38: Panel Data Models - WordPress.com

Unobserved Heterogeneity

We refer to αi as the unobserved heterogeneity, and this is something thatPanel Data can shed considerable light on.

We don’t observe data points for most of these unobserved factors.These factors just so happen to be generally constant throughouttime.

It is heterogeneous since it varies across space (i.e. varies throughcities in our example).

In the presence of unobserved heterogeneity, OLS will be both biasedand inconsistent. Hence, we must find a way to eliminate this. Andthis is exactly what the more formal panel data models try to do.

Justin Raymond S. Eloriaga Panel Data Models 2021 11 / 41

Page 39: Panel Data Models - WordPress.com

Unobserved Heterogeneity

We refer to αi as the unobserved heterogeneity, and this is something thatPanel Data can shed considerable light on.

We don’t observe data points for most of these unobserved factors.These factors just so happen to be generally constant throughouttime.

It is heterogeneous since it varies across space (i.e. varies throughcities in our example).

In the presence of unobserved heterogeneity, OLS will be both biasedand inconsistent. Hence, we must find a way to eliminate this. Andthis is exactly what the more formal panel data models try to do.

Justin Raymond S. Eloriaga Panel Data Models 2021 11 / 41

Page 40: Panel Data Models - WordPress.com

Unobserved Heterogeneity

We refer to αi as the unobserved heterogeneity, and this is something thatPanel Data can shed considerable light on.

We don’t observe data points for most of these unobserved factors.These factors just so happen to be generally constant throughouttime.

It is heterogeneous since it varies across space (i.e. varies throughcities in our example).

In the presence of unobserved heterogeneity, OLS will be both biasedand inconsistent. Hence, we must find a way to eliminate this. Andthis is exactly what the more formal panel data models try to do.

Justin Raymond S. Eloriaga Panel Data Models 2021 11 / 41

Page 41: Panel Data Models - WordPress.com

Building the First Differences Model

Our goal is to eliminate that αi term. One way to do that is to just takethe first difference. Recall the model we have so far.

AveIncomei ,t = β0 + β1Greenspacei ,t + φ1δ2t + φ2δ3t + ...

φT−1δTt + αi + ui ,t

The first difference just looks at the first difference of the average income.Basically, that is

∆AveIncomei ,t = AveIncomei ,t − AveIncomei ,t−1

Applying this to our model above , we get

∆AveIncomei ,t = (β0 − β0) + β1∆Greenspacei ,t

φ1∆2t + φ2∆3t + ...φT−1∆Tt

+ (αi − αi ) + ∆ui ,t

Notice that the β0 and αi cancels out!

Justin Raymond S. Eloriaga Panel Data Models 2021 12 / 41

Page 42: Panel Data Models - WordPress.com

Building the First Differences Model

Our goal is to eliminate that αi term. One way to do that is to just takethe first difference. Recall the model we have so far.

AveIncomei ,t = β0 + β1Greenspacei ,t + φ1δ2t + φ2δ3t + ...

φT−1δTt + αi + ui ,t

The first difference just looks at the first difference of the average income.Basically, that is

∆AveIncomei ,t = AveIncomei ,t − AveIncomei ,t−1

Applying this to our model above , we get

∆AveIncomei ,t = (β0 − β0) + β1∆Greenspacei ,t

φ1∆2t + φ2∆3t + ...φT−1∆Tt

+ (αi − αi ) + ∆ui ,t

Notice that the β0 and αi cancels out!

Justin Raymond S. Eloriaga Panel Data Models 2021 12 / 41

Page 43: Panel Data Models - WordPress.com

Building the First Differences Model

Our goal is to eliminate that αi term. One way to do that is to just takethe first difference. Recall the model we have so far.

AveIncomei ,t = β0 + β1Greenspacei ,t + φ1δ2t + φ2δ3t + ...

φT−1δTt + αi + ui ,t

The first difference just looks at the first difference of the average income.Basically, that is

∆AveIncomei ,t = AveIncomei ,t − AveIncomei ,t−1

Applying this to our model above , we get

∆AveIncomei ,t = (β0 − β0) + β1∆Greenspacei ,t

φ1∆2t + φ2∆3t + ...φT−1∆Tt

+ (αi − αi ) + ∆ui ,t

Notice that the β0 and αi cancels out!

Justin Raymond S. Eloriaga Panel Data Models 2021 12 / 41

Page 44: Panel Data Models - WordPress.com

Building the First Differences Model

Our goal is to eliminate that αi term. One way to do that is to just takethe first difference. Recall the model we have so far.

AveIncomei ,t = β0 + β1Greenspacei ,t + φ1δ2t + φ2δ3t + ...

φT−1δTt + αi + ui ,t

The first difference just looks at the first difference of the average income.Basically, that is

∆AveIncomei ,t = AveIncomei ,t − AveIncomei ,t−1

Applying this to our model above , we get

∆AveIncomei ,t = (β0 − β0) + β1∆Greenspacei ,t

φ1∆2t + φ2∆3t + ...φT−1∆Tt

+ (αi − αi ) + ∆ui ,t

Notice that the β0 and αi cancels out!

Justin Raymond S. Eloriaga Panel Data Models 2021 12 / 41

Page 45: Panel Data Models - WordPress.com

Building the First Differences Model

Our goal is to eliminate that αi term. One way to do that is to just takethe first difference. Recall the model we have so far.

AveIncomei ,t = β0 + β1Greenspacei ,t + φ1δ2t + φ2δ3t + ...

φT−1δTt + αi + ui ,t

The first difference just looks at the first difference of the average income.Basically, that is

∆AveIncomei ,t = AveIncomei ,t − AveIncomei ,t−1

Applying this to our model above , we get

∆AveIncomei ,t = (β0 − β0) + β1∆Greenspacei ,t

φ1∆2t + φ2∆3t + ...φT−1∆Tt

+ (αi − αi ) + ∆ui ,t

Notice that the β0 and αi cancels out!Justin Raymond S. Eloriaga Panel Data Models 2021 12 / 41

Page 46: Panel Data Models - WordPress.com

Implications of the First Difference Model

Under the assumption that cov(∆Greenspacei ,t ,∆uit) = 0, then OLSshould be consistent and unbiased.

In essence, the first differences model is basically using a regular OLSbut on difference.

For it to work properly, we need some variance across time.Otherwise, ∆ terms will approach zero and you end up taking outthings that are constant through time.

Justin Raymond S. Eloriaga Panel Data Models 2021 13 / 41

Page 47: Panel Data Models - WordPress.com

Implications of the First Difference Model

Under the assumption that cov(∆Greenspacei ,t ,∆uit) = 0, then OLSshould be consistent and unbiased.

In essence, the first differences model is basically using a regular OLSbut on difference.

For it to work properly, we need some variance across time.Otherwise, ∆ terms will approach zero and you end up taking outthings that are constant through time.

Justin Raymond S. Eloriaga Panel Data Models 2021 13 / 41

Page 48: Panel Data Models - WordPress.com

Implications of the First Difference Model

Under the assumption that cov(∆Greenspacei ,t ,∆uit) = 0, then OLSshould be consistent and unbiased.

In essence, the first differences model is basically using a regular OLSbut on difference.

For it to work properly, we need some variance across time.Otherwise, ∆ terms will approach zero and you end up taking outthings that are constant through time.

Justin Raymond S. Eloriaga Panel Data Models 2021 13 / 41

Page 49: Panel Data Models - WordPress.com

Implications of the First Difference Model

Under the assumption that cov(∆Greenspacei ,t ,∆uit) = 0, then OLSshould be consistent and unbiased.

In essence, the first differences model is basically using a regular OLSbut on difference.

For it to work properly, we need some variance across time.Otherwise, ∆ terms will approach zero and you end up taking outthings that are constant through time.

Justin Raymond S. Eloriaga Panel Data Models 2021 13 / 41

Page 50: Panel Data Models - WordPress.com

Let’s Try Another Way

Say we have our model as the form below, but we assume that the timeterm vt is implicitly accounted for

AveIncomei ,t = β0 + β1Greenspacei ,t + αi + ui ,t

Remember that for OLS to be unbiased and consistent, it must be thatcov(ηit , x) = 0. But since ηit contains αi and that αi becomes a problem,we need to try to take it out. We try and do this now by deriving the timeaveraged equation. For example...

AveIncome i =1

T

T∑t=1

AveIncomei ,t

Justin Raymond S. Eloriaga Panel Data Models 2021 14 / 41

Page 51: Panel Data Models - WordPress.com

Let’s Try Another Way

Say we have our model as the form below, but we assume that the timeterm vt is implicitly accounted for

AveIncomei ,t = β0 + β1Greenspacei ,t + αi + ui ,t

Remember that for OLS to be unbiased and consistent, it must be thatcov(ηit , x) = 0. But since ηit contains αi and that αi becomes a problem,we need to try to take it out. We try and do this now by deriving the timeaveraged equation. For example...

AveIncome i =1

T

T∑t=1

AveIncomei ,t

Justin Raymond S. Eloriaga Panel Data Models 2021 14 / 41

Page 52: Panel Data Models - WordPress.com

Let’s Try Another Way

Say we have our model as the form below, but we assume that the timeterm vt is implicitly accounted for

AveIncomei ,t = β0 + β1Greenspacei ,t + αi + ui ,t

Remember that for OLS to be unbiased and consistent, it must be thatcov(ηit , x) = 0. But since ηit contains αi and that αi becomes a problem,we need to try to take it out. We try and do this now by deriving the timeaveraged equation. For example...

AveIncome i =1

T

T∑t=1

AveIncomei ,t

Justin Raymond S. Eloriaga Panel Data Models 2021 14 / 41

Page 53: Panel Data Models - WordPress.com

Time Averaged Equation

If we do that for every term in the equation, we get

AveIncome i = β1Greenspace i + αi + ui

Note that αi = 1T

∑Tt=1 αi = 1

T ·T · αi = αi . This is because αi doesnot vary through time.

Justin Raymond S. Eloriaga Panel Data Models 2021 15 / 41

Page 54: Panel Data Models - WordPress.com

Time Averaged Equation

If we do that for every term in the equation, we get

AveIncome i = β1Greenspace i + αi + ui

Note that αi = 1T

∑Tt=1 αi = 1

T ·T · αi = αi . This is because αi doesnot vary through time.

Justin Raymond S. Eloriaga Panel Data Models 2021 15 / 41

Page 55: Panel Data Models - WordPress.com

Time Averaged Equation

If we do that for every term in the equation, we get

AveIncome i = β1Greenspace i + αi + ui

Note that αi = 1T

∑Tt=1 αi = 1

T ·T · αi = αi . This is because αi doesnot vary through time.

Justin Raymond S. Eloriaga Panel Data Models 2021 15 / 41

Page 56: Panel Data Models - WordPress.com

Fixed Effects Estimator

To derive what we call the fixed effects estimator, we take the differencebetween the original model and the time averaged equation.

AveIncomei ,t − AveIncome i = β1(Greenspacei ,t − Greenspace i )

+ (αi − αi ) + (uit − ui )

We can rewrite this more simply in terms of tilde terms where each term isessentially a differenced time-demeaned value.

˜AveIncomei ,t = β1 ˜Greenspacei ,t + ui ,t

The caveat for this would be that we remove anything time constant andwe can’t really evaluate the impact of time specific things on thedependent variable.

Justin Raymond S. Eloriaga Panel Data Models 2021 16 / 41

Page 57: Panel Data Models - WordPress.com

Fixed Effects Estimator

To derive what we call the fixed effects estimator, we take the differencebetween the original model and the time averaged equation.

AveIncomei ,t − AveIncome i = β1(Greenspacei ,t − Greenspace i )

+ (αi − αi ) + (uit − ui )

We can rewrite this more simply in terms of tilde terms where each term isessentially a differenced time-demeaned value.

˜AveIncomei ,t = β1 ˜Greenspacei ,t + ui ,t

The caveat for this would be that we remove anything time constant andwe can’t really evaluate the impact of time specific things on thedependent variable.

Justin Raymond S. Eloriaga Panel Data Models 2021 16 / 41

Page 58: Panel Data Models - WordPress.com

Fixed Effects Estimator

To derive what we call the fixed effects estimator, we take the differencebetween the original model and the time averaged equation.

AveIncomei ,t − AveIncome i = β1(Greenspacei ,t − Greenspace i )

+ (αi − αi ) + (uit − ui )

We can rewrite this more simply in terms of tilde terms where each term isessentially a differenced time-demeaned value.

˜AveIncomei ,t = β1 ˜Greenspacei ,t + ui ,t

The caveat for this would be that we remove anything time constant andwe can’t really evaluate the impact of time specific things on thedependent variable.

Justin Raymond S. Eloriaga Panel Data Models 2021 16 / 41

Page 59: Panel Data Models - WordPress.com

Fixed Effects Estimator

To derive what we call the fixed effects estimator, we take the differencebetween the original model and the time averaged equation.

AveIncomei ,t − AveIncome i = β1(Greenspacei ,t − Greenspace i )

+ (αi − αi ) + (uit − ui )

We can rewrite this more simply in terms of tilde terms where each term isessentially a differenced time-demeaned value.

˜AveIncomei ,t = β1 ˜Greenspacei ,t + ui ,t

The caveat for this would be that we remove anything time constant andwe can’t really evaluate the impact of time specific things on thedependent variable.

Justin Raymond S. Eloriaga Panel Data Models 2021 16 / 41

Page 60: Panel Data Models - WordPress.com

Least Squares Dummy Variables

One way by which we operationalize the fixed effects framework isthrough the use of dummy variables.

For example, consider our original model with an implied vi

AveIncomei ,t = β0 + β1Greenspacei ,t + αi + ui ,t

If, for example, we have 3 cities. Then the αi can be represented bysome dummy variables meaning that each city has a differentintercept. Note we only include two so as not to fall to the DVT.

AveIncomei ,t = β0 + β1Greenspacei ,t + ρ1d2 + ρ2d3 + ui ,t

By explicitly showing this, we are explicitly saying that there is adifferent intercept for each city.

Justin Raymond S. Eloriaga Panel Data Models 2021 17 / 41

Page 61: Panel Data Models - WordPress.com

Least Squares Dummy Variables

One way by which we operationalize the fixed effects framework isthrough the use of dummy variables.

For example, consider our original model with an implied vi

AveIncomei ,t = β0 + β1Greenspacei ,t + αi + ui ,t

If, for example, we have 3 cities. Then the αi can be represented bysome dummy variables meaning that each city has a differentintercept. Note we only include two so as not to fall to the DVT.

AveIncomei ,t = β0 + β1Greenspacei ,t + ρ1d2 + ρ2d3 + ui ,t

By explicitly showing this, we are explicitly saying that there is adifferent intercept for each city.

Justin Raymond S. Eloriaga Panel Data Models 2021 17 / 41

Page 62: Panel Data Models - WordPress.com

Least Squares Dummy Variables

One way by which we operationalize the fixed effects framework isthrough the use of dummy variables.

For example, consider our original model with an implied vi

AveIncomei ,t = β0 + β1Greenspacei ,t + αi + ui ,t

If, for example, we have 3 cities. Then the αi can be represented bysome dummy variables meaning that each city has a differentintercept. Note we only include two so as not to fall to the DVT.

AveIncomei ,t = β0 + β1Greenspacei ,t + ρ1d2 + ρ2d3 + ui ,t

By explicitly showing this, we are explicitly saying that there is adifferent intercept for each city.

Justin Raymond S. Eloriaga Panel Data Models 2021 17 / 41

Page 63: Panel Data Models - WordPress.com

Least Squares Dummy Variables

One way by which we operationalize the fixed effects framework isthrough the use of dummy variables.

For example, consider our original model with an implied vi

AveIncomei ,t = β0 + β1Greenspacei ,t + αi + ui ,t

If, for example, we have 3 cities. Then the αi can be represented bysome dummy variables meaning that each city has a differentintercept. Note we only include two so as not to fall to the DVT.

AveIncomei ,t = β0 + β1Greenspacei ,t + ρ1d2 + ρ2d3 + ui ,t

By explicitly showing this, we are explicitly saying that there is adifferent intercept for each city.

Justin Raymond S. Eloriaga Panel Data Models 2021 17 / 41

Page 64: Panel Data Models - WordPress.com

Least Squares Dummy Variables

One way by which we operationalize the fixed effects framework isthrough the use of dummy variables.

For example, consider our original model with an implied vi

AveIncomei ,t = β0 + β1Greenspacei ,t + αi + ui ,t

If, for example, we have 3 cities. Then the αi can be represented bysome dummy variables meaning that each city has a differentintercept. Note we only include two so as not to fall to the DVT.

AveIncomei ,t = β0 + β1Greenspacei ,t + ρ1d2 + ρ2d3 + ui ,t

By explicitly showing this, we are explicitly saying that there is adifferent intercept for each city.

Justin Raymond S. Eloriaga Panel Data Models 2021 17 / 41

Page 65: Panel Data Models - WordPress.com

LSDV and the Fixed Effects

Empirically, βdv → β (suggesting OLS is consistent and unbiased) if wesatisfy the three assumptions below

cov(xi ,t |ui ,t) = 0 (Weak Exogeneity)

cov(ui |uj) = 0, ∀ i 6= j (No Serial Correlation)

var(ui ,t) = σ2 (Homoscedastic Errors)

In essence, the wait we operationalize the regular fixed effects model isjust through this dummy variable. Therefore, β∗dv = β∗FE

The estimator is the same for LSDV and FEs, and it may be modifiedin their space and time components through the LSDV Variants.

However, if N or T is large, then we would end up with so muchdummy variables.

Justin Raymond S. Eloriaga Panel Data Models 2021 18 / 41

Page 66: Panel Data Models - WordPress.com

LSDV and the Fixed Effects

Empirically, βdv → β (suggesting OLS is consistent and unbiased) if wesatisfy the three assumptions below

cov(xi ,t |ui ,t) = 0 (Weak Exogeneity)

cov(ui |uj) = 0, ∀ i 6= j (No Serial Correlation)

var(ui ,t) = σ2 (Homoscedastic Errors)

In essence, the wait we operationalize the regular fixed effects model isjust through this dummy variable. Therefore, β∗dv = β∗FE

The estimator is the same for LSDV and FEs, and it may be modifiedin their space and time components through the LSDV Variants.

However, if N or T is large, then we would end up with so muchdummy variables.

Justin Raymond S. Eloriaga Panel Data Models 2021 18 / 41

Page 67: Panel Data Models - WordPress.com

LSDV and the Fixed Effects

Empirically, βdv → β (suggesting OLS is consistent and unbiased) if wesatisfy the three assumptions below

cov(xi ,t |ui ,t) = 0 (Weak Exogeneity)

cov(ui |uj) = 0, ∀ i 6= j (No Serial Correlation)

var(ui ,t) = σ2 (Homoscedastic Errors)

In essence, the wait we operationalize the regular fixed effects model isjust through this dummy variable. Therefore, β∗dv = β∗FE

The estimator is the same for LSDV and FEs, and it may be modifiedin their space and time components through the LSDV Variants.

However, if N or T is large, then we would end up with so muchdummy variables.

Justin Raymond S. Eloriaga Panel Data Models 2021 18 / 41

Page 68: Panel Data Models - WordPress.com

LSDV and the Fixed Effects

Empirically, βdv → β (suggesting OLS is consistent and unbiased) if wesatisfy the three assumptions below

cov(xi ,t |ui ,t) = 0 (Weak Exogeneity)

cov(ui |uj) = 0, ∀ i 6= j (No Serial Correlation)

var(ui ,t) = σ2 (Homoscedastic Errors)

In essence, the wait we operationalize the regular fixed effects model isjust through this dummy variable. Therefore, β∗dv = β∗FE

The estimator is the same for LSDV and FEs, and it may be modifiedin their space and time components through the LSDV Variants.

However, if N or T is large, then we would end up with so muchdummy variables.

Justin Raymond S. Eloriaga Panel Data Models 2021 18 / 41

Page 69: Panel Data Models - WordPress.com

LSDV and the Fixed Effects

Empirically, βdv → β (suggesting OLS is consistent and unbiased) if wesatisfy the three assumptions below

cov(xi ,t |ui ,t) = 0 (Weak Exogeneity)

cov(ui |uj) = 0, ∀ i 6= j (No Serial Correlation)

var(ui ,t) = σ2 (Homoscedastic Errors)

In essence, the wait we operationalize the regular fixed effects model isjust through this dummy variable. Therefore, β∗dv = β∗FE

The estimator is the same for LSDV and FEs, and it may be modifiedin their space and time components through the LSDV Variants.

However, if N or T is large, then we would end up with so muchdummy variables.

Justin Raymond S. Eloriaga Panel Data Models 2021 18 / 41

Page 70: Panel Data Models - WordPress.com

LSDV1

The first variant of the LSDV (LSDV1) is essentially what we have so far.That is, a space varying and time invariant specification.

AveIncomei ,t = β0 + β1Greenspacei ,t + ρ1d2 + ρ2d3 + ...+ ρN−1dN + ui ,t

Therefore, in a more general equation, LSDV1 can be formulated as

LSDV1

Yi ,t = β0 + βi ,t

T∑t=1

N∑i=1

Xi ,t + ρi

N∑i=1

di + ui ,t

Justin Raymond S. Eloriaga Panel Data Models 2021 19 / 41

Page 71: Panel Data Models - WordPress.com

LSDV1

The first variant of the LSDV (LSDV1) is essentially what we have so far.That is, a space varying and time invariant specification.

AveIncomei ,t = β0 + β1Greenspacei ,t + ρ1d2 + ρ2d3 + ...+ ρN−1dN + ui ,t

Therefore, in a more general equation, LSDV1 can be formulated as

LSDV1

Yi ,t = β0 + βi ,t

T∑t=1

N∑i=1

Xi ,t + ρi

N∑i=1

di + ui ,t

Justin Raymond S. Eloriaga Panel Data Models 2021 19 / 41

Page 72: Panel Data Models - WordPress.com

LSDV1

The first variant of the LSDV (LSDV1) is essentially what we have so far.That is, a space varying and time invariant specification.

AveIncomei ,t = β0 + β1Greenspacei ,t + ρ1d2 + ρ2d3 + ...+ ρN−1dN + ui ,t

Therefore, in a more general equation, LSDV1 can be formulated as

LSDV1

Yi ,t = β0 + βi ,t

T∑t=1

N∑i=1

Xi ,t + ρi

N∑i=1

di + ui ,t

Justin Raymond S. Eloriaga Panel Data Models 2021 19 / 41

Page 73: Panel Data Models - WordPress.com

LSDV2

The second variant of the LSDV (LSDV2) is essentially the reverse ofLSDV1. That is, we have a space invariant but time varying specification.

AveIncomei ,t = β0 +β1Greenspacei ,t +φ1m2 +φ2m3 + ...+φT−1mT + ui ,t

Therefore, in a more general equation, LSDV2 can be formulated as

LSDV2

Yi ,t = β0 + βi ,t

T∑t=1

N∑i=1

Xi ,t + φt

T∑t=1

mt + ui ,t

Note that this variant is less commonly used than other variants.

Justin Raymond S. Eloriaga Panel Data Models 2021 20 / 41

Page 74: Panel Data Models - WordPress.com

LSDV2

The second variant of the LSDV (LSDV2) is essentially the reverse ofLSDV1. That is, we have a space invariant but time varying specification.

AveIncomei ,t = β0 +β1Greenspacei ,t +φ1m2 +φ2m3 + ...+φT−1mT + ui ,t

Therefore, in a more general equation, LSDV2 can be formulated as

LSDV2

Yi ,t = β0 + βi ,t

T∑t=1

N∑i=1

Xi ,t + φt

T∑t=1

mt + ui ,t

Note that this variant is less commonly used than other variants.

Justin Raymond S. Eloriaga Panel Data Models 2021 20 / 41

Page 75: Panel Data Models - WordPress.com

LSDV2

The second variant of the LSDV (LSDV2) is essentially the reverse ofLSDV1. That is, we have a space invariant but time varying specification.

AveIncomei ,t = β0 +β1Greenspacei ,t +φ1m2 +φ2m3 + ...+φT−1mT + ui ,t

Therefore, in a more general equation, LSDV2 can be formulated as

LSDV2

Yi ,t = β0 + βi ,t

T∑t=1

N∑i=1

Xi ,t + φt

T∑t=1

mt + ui ,t

Note that this variant is less commonly used than other variants.

Justin Raymond S. Eloriaga Panel Data Models 2021 20 / 41

Page 76: Panel Data Models - WordPress.com

LSDV2

The second variant of the LSDV (LSDV2) is essentially the reverse ofLSDV1. That is, we have a space invariant but time varying specification.

AveIncomei ,t = β0 +β1Greenspacei ,t +φ1m2 +φ2m3 + ...+φT−1mT + ui ,t

Therefore, in a more general equation, LSDV2 can be formulated as

LSDV2

Yi ,t = β0 + βi ,t

T∑t=1

N∑i=1

Xi ,t + φt

T∑t=1

mt + ui ,t

Note that this variant is less commonly used than other variants.

Justin Raymond S. Eloriaga Panel Data Models 2021 20 / 41

Page 77: Panel Data Models - WordPress.com

LSDV3

The third variant of the LSDV (LSDV3) is a combination of LSDV1 andLSDV2. That is, both time and space varying.

AveIncomei ,t = β0 + β1Greenspacei ,t + φ1m2 + φ2m3 + ...+ φT−1mT

+ ρ1d2 + ρ2d3 + ...+ ρN−1dNui ,t

Therefore, in a more general equation, LSDV3 can be formulated as

LSDV3

Yi ,t = β0 + βi ,t

T∑t=1

N∑i=1

Xi ,t + ρi

N∑i=1

di + φt

T∑t=1

mt + ui ,t

Justin Raymond S. Eloriaga Panel Data Models 2021 21 / 41

Page 78: Panel Data Models - WordPress.com

LSDV3

The third variant of the LSDV (LSDV3) is a combination of LSDV1 andLSDV2. That is, both time and space varying.

AveIncomei ,t = β0 + β1Greenspacei ,t + φ1m2 + φ2m3 + ...+ φT−1mT

+ ρ1d2 + ρ2d3 + ...+ ρN−1dNui ,t

Therefore, in a more general equation, LSDV3 can be formulated as

LSDV3

Yi ,t = β0 + βi ,t

T∑t=1

N∑i=1

Xi ,t + ρi

N∑i=1

di + φt

T∑t=1

mt + ui ,t

Justin Raymond S. Eloriaga Panel Data Models 2021 21 / 41

Page 79: Panel Data Models - WordPress.com

LSDV3

The third variant of the LSDV (LSDV3) is a combination of LSDV1 andLSDV2. That is, both time and space varying.

AveIncomei ,t = β0 + β1Greenspacei ,t + φ1m2 + φ2m3 + ...+ φT−1mT

+ ρ1d2 + ρ2d3 + ...+ ρN−1dNui ,t

Therefore, in a more general equation, LSDV3 can be formulated as

LSDV3

Yi ,t = β0 + βi ,t

T∑t=1

N∑i=1

Xi ,t + ρi

N∑i=1

di + φt

T∑t=1

mt + ui ,t

Justin Raymond S. Eloriaga Panel Data Models 2021 21 / 41

Page 80: Panel Data Models - WordPress.com

Between FE (LSDVs) and the Pooled OLS

A key question to ask would be which LSDV to use and if using an FEmodel is indeed better than a regular OLS. To compare between models,we use a Wald’s Test of Linear Restrictions. To do this, we compute anF-statistic formulated as below.

Wald’s Test of Linear Restrictions

F =

RSSR−RSSURdfR−dfURRSSURdfUR

Justin Raymond S. Eloriaga Panel Data Models 2021 22 / 41

Page 81: Panel Data Models - WordPress.com

Between FE (LSDVs) and the Pooled OLS

A key question to ask would be which LSDV to use and if using an FEmodel is indeed better than a regular OLS. To compare between models,we use a Wald’s Test of Linear Restrictions. To do this, we compute anF-statistic formulated as below.

Wald’s Test of Linear Restrictions

F =

RSSR−RSSURdfR−dfURRSSURdfUR

Justin Raymond S. Eloriaga Panel Data Models 2021 22 / 41

Page 82: Panel Data Models - WordPress.com

Between FE (LSDVs) and the Pooled OLS

A key question to ask would be which LSDV to use and if using an FEmodel is indeed better than a regular OLS. To compare between models,we use a Wald’s Test of Linear Restrictions. To do this, we compute anF-statistic formulated as below.

Wald’s Test of Linear Restrictions

F =

RSSR−RSSURdfR−dfURRSSURdfUR

Justin Raymond S. Eloriaga Panel Data Models 2021 22 / 41

Page 83: Panel Data Models - WordPress.com

Differences between LSDVs and FEs

The LSDV model and the FE model do differ operationally in someregards. While they are obviously different in equation form (i.e. timedemeaned vs. adding dummy variables), these differences tell us what thepros and cons are of each model specification.

Justin Raymond S. Eloriaga Panel Data Models 2021 23 / 41

Page 84: Panel Data Models - WordPress.com

On the R2 of Fixed Effects and LSDVs

(FE) ˜AveIncomei ,t = β1 ˜Greenspacei ,t + ui ,t

If for example we have an R2 equal to 0.80. How will we interpretthis?

Answer: The R2 is the variation in the average income across timeexplained by the model. Ergo, how well can our model explaindeviations in average income away from its time mean.

(LSDV, say 1) AveIncomei ,t = β0 + β1Greenspacei ,t + ρ1d2 + ρ2d3 + ui ,t

On the average, we usually get a higher R2 value. But it isn’t really agood indicator when we have a lot of variables because it is justmonotonically increasing with the number of variables.

Justin Raymond S. Eloriaga Panel Data Models 2021 24 / 41

Page 85: Panel Data Models - WordPress.com

On the R2 of Fixed Effects and LSDVs

(FE) ˜AveIncomei ,t = β1 ˜Greenspacei ,t + ui ,t

If for example we have an R2 equal to 0.80. How will we interpretthis?

Answer: The R2 is the variation in the average income across timeexplained by the model. Ergo, how well can our model explaindeviations in average income away from its time mean.

(LSDV, say 1) AveIncomei ,t = β0 + β1Greenspacei ,t + ρ1d2 + ρ2d3 + ui ,t

On the average, we usually get a higher R2 value. But it isn’t really agood indicator when we have a lot of variables because it is justmonotonically increasing with the number of variables.

Justin Raymond S. Eloriaga Panel Data Models 2021 24 / 41

Page 86: Panel Data Models - WordPress.com

On the R2 of Fixed Effects and LSDVs

(FE) ˜AveIncomei ,t = β1 ˜Greenspacei ,t + ui ,t

If for example we have an R2 equal to 0.80. How will we interpretthis?

Answer: The R2 is the variation in the average income across timeexplained by the model. Ergo, how well can our model explaindeviations in average income away from its time mean.

(LSDV, say 1) AveIncomei ,t = β0 + β1Greenspacei ,t + ρ1d2 + ρ2d3 + ui ,t

On the average, we usually get a higher R2 value. But it isn’t really agood indicator when we have a lot of variables because it is justmonotonically increasing with the number of variables.

Justin Raymond S. Eloriaga Panel Data Models 2021 24 / 41

Page 87: Panel Data Models - WordPress.com

On the R2 of Fixed Effects and LSDVs

(FE) ˜AveIncomei ,t = β1 ˜Greenspacei ,t + ui ,t

If for example we have an R2 equal to 0.80. How will we interpretthis?

Answer: The R2 is the variation in the average income across timeexplained by the model. Ergo, how well can our model explaindeviations in average income away from its time mean.

(LSDV, say 1) AveIncomei ,t = β0 + β1Greenspacei ,t + ρ1d2 + ρ2d3 + ui ,t

On the average, we usually get a higher R2 value. But it isn’t really agood indicator when we have a lot of variables because it is justmonotonically increasing with the number of variables.

Justin Raymond S. Eloriaga Panel Data Models 2021 24 / 41

Page 88: Panel Data Models - WordPress.com

On the R2 of Fixed Effects and LSDVs

(FE) ˜AveIncomei ,t = β1 ˜Greenspacei ,t + ui ,t

If for example we have an R2 equal to 0.80. How will we interpretthis?

Answer: The R2 is the variation in the average income across timeexplained by the model. Ergo, how well can our model explaindeviations in average income away from its time mean.

(LSDV, say 1) AveIncomei ,t = β0 + β1Greenspacei ,t + ρ1d2 + ρ2d3 + ui ,t

On the average, we usually get a higher R2 value. But it isn’t really agood indicator when we have a lot of variables because it is justmonotonically increasing with the number of variables.

Justin Raymond S. Eloriaga Panel Data Models 2021 24 / 41

Page 89: Panel Data Models - WordPress.com

On the R2 of Fixed Effects and LSDVs

(FE) ˜AveIncomei ,t = β1 ˜Greenspacei ,t + ui ,t

If for example we have an R2 equal to 0.80. How will we interpretthis?

Answer: The R2 is the variation in the average income across timeexplained by the model. Ergo, how well can our model explaindeviations in average income away from its time mean.

(LSDV, say 1) AveIncomei ,t = β0 + β1Greenspacei ,t + ρ1d2 + ρ2d3 + ui ,t

On the average, we usually get a higher R2 value. But it isn’t really agood indicator when we have a lot of variables because it is justmonotonically increasing with the number of variables.

Justin Raymond S. Eloriaga Panel Data Models 2021 24 / 41

Page 90: Panel Data Models - WordPress.com

Why is FE or FD superior to (Pooled) OLS

Consider this simple econometric model below

AveIncomei ,t = βYrsOfExpi ,t + αi + ui ,t

It is obvious that we expect that β > 0.

However, if we just ran a simple Pooled OLS (i.e. lump the datatogether and estimate it as if it were one big cross-section), youmight come up with a nonsensical result.

Justin Raymond S. Eloriaga Panel Data Models 2021 25 / 41

Page 91: Panel Data Models - WordPress.com

Why is FE or FD superior to (Pooled) OLS

Consider this simple econometric model below

AveIncomei ,t = βYrsOfExpi ,t + αi + ui ,t

It is obvious that we expect that β > 0.

However, if we just ran a simple Pooled OLS (i.e. lump the datatogether and estimate it as if it were one big cross-section), youmight come up with a nonsensical result.

Justin Raymond S. Eloriaga Panel Data Models 2021 25 / 41

Page 92: Panel Data Models - WordPress.com

Why is FE or FD superior to (Pooled) OLS

Consider this simple econometric model below

AveIncomei ,t = βYrsOfExpi ,t + αi + ui ,t

It is obvious that we expect that β > 0.

However, if we just ran a simple Pooled OLS (i.e. lump the datatogether and estimate it as if it were one big cross-section), youmight come up with a nonsensical result.

Justin Raymond S. Eloriaga Panel Data Models 2021 25 / 41

Page 93: Panel Data Models - WordPress.com

Why is FE or FD superior to (Pooled) OLS

Consider this simple econometric model below

AveIncomei ,t = βYrsOfExpi ,t + αi + ui ,t

It is obvious that we expect that β > 0.

However, if we just ran a simple Pooled OLS (i.e. lump the datatogether and estimate it as if it were one big cross-section), youmight come up with a nonsensical result.

Justin Raymond S. Eloriaga Panel Data Models 2021 25 / 41

Page 94: Panel Data Models - WordPress.com

Misleading Inference of Pooled OLS

Notice that the derived regression line is negatively sloped (i.e. β < 0),which is nonsensical (for the most part) in our theory.

Justin Raymond S. Eloriaga Panel Data Models 2021 26 / 41

Page 95: Panel Data Models - WordPress.com

Misleading Inference of Pooled OLS

Notice that the derived regression line is negatively sloped (i.e. β < 0),which is nonsensical (for the most part) in our theory.

Justin Raymond S. Eloriaga Panel Data Models 2021 26 / 41

Page 96: Panel Data Models - WordPress.com

Digging Deeper

One of the reasons why a regular OLS might have failed is because itfailed to account for the space and time dimension both present in a paneldataset. See, for example, the same scatterplot but now labeled. Whatcan you observe?

Justin Raymond S. Eloriaga Panel Data Models 2021 27 / 41

Page 97: Panel Data Models - WordPress.com

Digging Deeper

One of the reasons why a regular OLS might have failed is because itfailed to account for the space and time dimension both present in a paneldataset. See, for example, the same scatterplot but now labeled. Whatcan you observe?

Justin Raymond S. Eloriaga Panel Data Models 2021 27 / 41

Page 98: Panel Data Models - WordPress.com

Digging Deeper

One of the reasons why a regular OLS might have failed is because itfailed to account for the space and time dimension both present in a paneldataset. See, for example, the same scatterplot but now labeled. Whatcan you observe?

Justin Raymond S. Eloriaga Panel Data Models 2021 27 / 41

Page 99: Panel Data Models - WordPress.com

Superiority of the FE/FD

FE/FD think about observations across cities as different.

Because we removed the unobserved heterogeneity αi in both thesemodels, we can disregard that there are differences in the averagelevel of income across cities. (i.e. Manila standard of living vs. NewYork standard of living). When you compare many different datapoints at varying levels, you may come to misleading conclusionswhen you look at it as a whole.

Rather, the differences in the average level of income are due to cityspecific differences which don’t really change through time.

Justin Raymond S. Eloriaga Panel Data Models 2021 28 / 41

Page 100: Panel Data Models - WordPress.com

Superiority of the FE/FD

FE/FD think about observations across cities as different.

Because we removed the unobserved heterogeneity αi in both thesemodels, we can disregard that there are differences in the averagelevel of income across cities. (i.e. Manila standard of living vs. NewYork standard of living). When you compare many different datapoints at varying levels, you may come to misleading conclusionswhen you look at it as a whole.

Rather, the differences in the average level of income are due to cityspecific differences which don’t really change through time.

Justin Raymond S. Eloriaga Panel Data Models 2021 28 / 41

Page 101: Panel Data Models - WordPress.com

Superiority of the FE/FD

FE/FD think about observations across cities as different.

Because we removed the unobserved heterogeneity αi in both thesemodels, we can disregard that there are differences in the averagelevel of income across cities. (i.e. Manila standard of living vs. NewYork standard of living). When you compare many different datapoints at varying levels, you may come to misleading conclusionswhen you look at it as a whole.

Rather, the differences in the average level of income are due to cityspecific differences which don’t really change through time.

Justin Raymond S. Eloriaga Panel Data Models 2021 28 / 41

Page 102: Panel Data Models - WordPress.com

Superiority of the FE/FD

FE/FD think about observations across cities as different.

Because we removed the unobserved heterogeneity αi in both thesemodels, we can disregard that there are differences in the averagelevel of income across cities. (i.e. Manila standard of living vs. NewYork standard of living). When you compare many different datapoints at varying levels, you may come to misleading conclusionswhen you look at it as a whole.

Rather, the differences in the average level of income are due to cityspecific differences which don’t really change through time.

Justin Raymond S. Eloriaga Panel Data Models 2021 28 / 41

Page 103: Panel Data Models - WordPress.com

What FD tries to do (graphically)

FD tries to fit a line between these pairs of observations. Notice it outputsthat β > 0 which is, to us, more reasonable.

Justin Raymond S. Eloriaga Panel Data Models 2021 29 / 41

Page 104: Panel Data Models - WordPress.com

What FD tries to do (graphically)

FD tries to fit a line between these pairs of observations. Notice it outputsthat β > 0 which is, to us, more reasonable.

Justin Raymond S. Eloriaga Panel Data Models 2021 29 / 41

Page 105: Panel Data Models - WordPress.com

What FD tries to do (graphically)

FD tries to fit a line between these pairs of observations. Notice it outputsthat β > 0 which is, to us, more reasonable.

Justin Raymond S. Eloriaga Panel Data Models 2021 29 / 41

Page 106: Panel Data Models - WordPress.com

What FE tries to do (graphically)

FE is similar in that it tries to find a midpoint of all the observations for aparticular i cross section and tries to connect these points as neatly aspossible. Note: when we have just two years (2015 and 2020), theβFE = βFD .

Justin Raymond S. Eloriaga Panel Data Models 2021 30 / 41

Page 107: Panel Data Models - WordPress.com

What FE tries to do (graphically)

FE is similar in that it tries to find a midpoint of all the observations for aparticular i cross section and tries to connect these points as neatly aspossible. Note: when we have just two years (2015 and 2020), theβFE = βFD .

Justin Raymond S. Eloriaga Panel Data Models 2021 30 / 41

Page 108: Panel Data Models - WordPress.com

What FE tries to do (graphically)

FE is similar in that it tries to find a midpoint of all the observations for aparticular i cross section and tries to connect these points as neatly aspossible. Note: when we have just two years (2015 and 2020), theβFE = βFD .

Justin Raymond S. Eloriaga Panel Data Models 2021 30 / 41

Page 109: Panel Data Models - WordPress.com

What if cov(αi ,Xi ,t) = 0?

Our discussion so far has been hinged upon the disturbance caused by thepresence of unobserved heterogeneity. We know that, most likely,cov(αi ,Xi ,t) 6= 0

In situations like those, we typically used Fixed Effects or FirstDifferences because of that ”endogeneity” issue between theunobserved heterogeneity αi and the independent variables.

However, we do not necessarily need to use Fixed Effects or FirstDifferences when cov(αi ,Xi ,t) = 0.

Justin Raymond S. Eloriaga Panel Data Models 2021 31 / 41

Page 110: Panel Data Models - WordPress.com

What if cov(αi ,Xi ,t) = 0?

Our discussion so far has been hinged upon the disturbance caused by thepresence of unobserved heterogeneity. We know that, most likely,cov(αi ,Xi ,t) 6= 0

In situations like those, we typically used Fixed Effects or FirstDifferences because of that ”endogeneity” issue between theunobserved heterogeneity αi and the independent variables.

However, we do not necessarily need to use Fixed Effects or FirstDifferences when cov(αi ,Xi ,t) = 0.

Justin Raymond S. Eloriaga Panel Data Models 2021 31 / 41

Page 111: Panel Data Models - WordPress.com

What if cov(αi ,Xi ,t) = 0?

Our discussion so far has been hinged upon the disturbance caused by thepresence of unobserved heterogeneity. We know that, most likely,cov(αi ,Xi ,t) 6= 0

In situations like those, we typically used Fixed Effects or FirstDifferences because of that ”endogeneity” issue between theunobserved heterogeneity αi and the independent variables.

However, we do not necessarily need to use Fixed Effects or FirstDifferences when cov(αi ,Xi ,t) = 0.

Justin Raymond S. Eloriaga Panel Data Models 2021 31 / 41

Page 112: Panel Data Models - WordPress.com

What if cov(αi ,Xi ,t) = 0?

Our discussion so far has been hinged upon the disturbance caused by thepresence of unobserved heterogeneity. We know that, most likely,cov(αi ,Xi ,t) 6= 0

In situations like those, we typically used Fixed Effects or FirstDifferences because of that ”endogeneity” issue between theunobserved heterogeneity αi and the independent variables.

However, we do not necessarily need to use Fixed Effects or FirstDifferences when cov(αi ,Xi ,t) = 0.

Justin Raymond S. Eloriaga Panel Data Models 2021 31 / 41

Page 113: Panel Data Models - WordPress.com

Why wouldn’t we have to deal with αi?

Sometimes, the panel model we may have specified could have thefollowing characteristics.

The specification controlled for all factors in determining thedependent variable.

It could be that αi is just very small (or very insignificant)

So can we just use OLS (i.e. ˆβOLS)

Absolutely. You can even use FE/FD because all of the estimatorswill be consistent.

Justin Raymond S. Eloriaga Panel Data Models 2021 32 / 41

Page 114: Panel Data Models - WordPress.com

Why wouldn’t we have to deal with αi?

Sometimes, the panel model we may have specified could have thefollowing characteristics.

The specification controlled for all factors in determining thedependent variable.

It could be that αi is just very small (or very insignificant)

So can we just use OLS (i.e. ˆβOLS)

Absolutely. You can even use FE/FD because all of the estimatorswill be consistent.

Justin Raymond S. Eloriaga Panel Data Models 2021 32 / 41

Page 115: Panel Data Models - WordPress.com

Why wouldn’t we have to deal with αi?

Sometimes, the panel model we may have specified could have thefollowing characteristics.

The specification controlled for all factors in determining thedependent variable.

It could be that αi is just very small (or very insignificant)

So can we just use OLS (i.e. ˆβOLS)

Absolutely. You can even use FE/FD because all of the estimatorswill be consistent.

Justin Raymond S. Eloriaga Panel Data Models 2021 32 / 41

Page 116: Panel Data Models - WordPress.com

Why wouldn’t we have to deal with αi?

Sometimes, the panel model we may have specified could have thefollowing characteristics.

The specification controlled for all factors in determining thedependent variable.

It could be that αi is just very small (or very insignificant)

So can we just use OLS (i.e. ˆβOLS)

Absolutely. You can even use FE/FD because all of the estimatorswill be consistent.

Justin Raymond S. Eloriaga Panel Data Models 2021 32 / 41

Page 117: Panel Data Models - WordPress.com

Why wouldn’t we have to deal with αi?

Sometimes, the panel model we may have specified could have thefollowing characteristics.

The specification controlled for all factors in determining thedependent variable.

It could be that αi is just very small (or very insignificant)

So can we just use OLS (i.e. ˆβOLS)

Absolutely. You can even use FE/FD because all of the estimatorswill be consistent.

Justin Raymond S. Eloriaga Panel Data Models 2021 32 / 41

Page 118: Panel Data Models - WordPress.com

Why wouldn’t we have to deal with αi?

Sometimes, the panel model we may have specified could have thefollowing characteristics.

The specification controlled for all factors in determining thedependent variable.

It could be that αi is just very small (or very insignificant)

So can we just use OLS (i.e. ˆβOLS)

Absolutely. You can even use FE/FD because all of the estimatorswill be consistent.

Justin Raymond S. Eloriaga Panel Data Models 2021 32 / 41

Page 119: Panel Data Models - WordPress.com

Caveats of using OLS/FE/FD when cov(αi ,Xi ,t) = 0

While you may use these methods, it turns out that FE/FD are tooextreme in this case.

In the case of FD, you throw away one period.

In the case of FE, it is too extreme in that you estimate thingsunecessarily.

In the case of (pooled) OLS, even if cov(αi |Xi ,t) = 0, the errors maystill be serially correlated with another.

Proof: We know that ηi ,t = αi + ui ,t

Therefore, cov(ηi ,t , ηi ,t−1) = cov(αi + ui ,t , αi + ui ,t−1)

Even if cov(αi , ui ,t) = 0, you would still be left withcov(αi , αi ) = var(αi ) = σ2α > 0

Hence, there will be serially correlated errors to some extent.

Justin Raymond S. Eloriaga Panel Data Models 2021 33 / 41

Page 120: Panel Data Models - WordPress.com

Caveats of using OLS/FE/FD when cov(αi ,Xi ,t) = 0

While you may use these methods, it turns out that FE/FD are tooextreme in this case.

In the case of FD, you throw away one period.

In the case of FE, it is too extreme in that you estimate thingsunecessarily.

In the case of (pooled) OLS, even if cov(αi |Xi ,t) = 0, the errors maystill be serially correlated with another.

Proof: We know that ηi ,t = αi + ui ,t

Therefore, cov(ηi ,t , ηi ,t−1) = cov(αi + ui ,t , αi + ui ,t−1)

Even if cov(αi , ui ,t) = 0, you would still be left withcov(αi , αi ) = var(αi ) = σ2α > 0

Hence, there will be serially correlated errors to some extent.

Justin Raymond S. Eloriaga Panel Data Models 2021 33 / 41

Page 121: Panel Data Models - WordPress.com

Caveats of using OLS/FE/FD when cov(αi ,Xi ,t) = 0

While you may use these methods, it turns out that FE/FD are tooextreme in this case.

In the case of FD, you throw away one period.

In the case of FE, it is too extreme in that you estimate thingsunecessarily.

In the case of (pooled) OLS, even if cov(αi |Xi ,t) = 0, the errors maystill be serially correlated with another.

Proof: We know that ηi ,t = αi + ui ,t

Therefore, cov(ηi ,t , ηi ,t−1) = cov(αi + ui ,t , αi + ui ,t−1)

Even if cov(αi , ui ,t) = 0, you would still be left withcov(αi , αi ) = var(αi ) = σ2α > 0

Hence, there will be serially correlated errors to some extent.

Justin Raymond S. Eloriaga Panel Data Models 2021 33 / 41

Page 122: Panel Data Models - WordPress.com

Caveats of using OLS/FE/FD when cov(αi ,Xi ,t) = 0

While you may use these methods, it turns out that FE/FD are tooextreme in this case.

In the case of FD, you throw away one period.

In the case of FE, it is too extreme in that you estimate thingsunecessarily.

In the case of (pooled) OLS, even if cov(αi |Xi ,t) = 0, the errors maystill be serially correlated with another.

Proof: We know that ηi ,t = αi + ui ,t

Therefore, cov(ηi ,t , ηi ,t−1) = cov(αi + ui ,t , αi + ui ,t−1)

Even if cov(αi , ui ,t) = 0, you would still be left withcov(αi , αi ) = var(αi ) = σ2α > 0

Hence, there will be serially correlated errors to some extent.

Justin Raymond S. Eloriaga Panel Data Models 2021 33 / 41

Page 123: Panel Data Models - WordPress.com

Caveats of using OLS/FE/FD when cov(αi ,Xi ,t) = 0

While you may use these methods, it turns out that FE/FD are tooextreme in this case.

In the case of FD, you throw away one period.

In the case of FE, it is too extreme in that you estimate thingsunecessarily.

In the case of (pooled) OLS, even if cov(αi |Xi ,t) = 0, the errors maystill be serially correlated with another.

Proof: We know that ηi ,t = αi + ui ,t

Therefore, cov(ηi ,t , ηi ,t−1) = cov(αi + ui ,t , αi + ui ,t−1)

Even if cov(αi , ui ,t) = 0, you would still be left withcov(αi , αi ) = var(αi ) = σ2α > 0

Hence, there will be serially correlated errors to some extent.

Justin Raymond S. Eloriaga Panel Data Models 2021 33 / 41

Page 124: Panel Data Models - WordPress.com

Caveats of using OLS/FE/FD when cov(αi ,Xi ,t) = 0

While you may use these methods, it turns out that FE/FD are tooextreme in this case.

In the case of FD, you throw away one period.

In the case of FE, it is too extreme in that you estimate thingsunecessarily.

In the case of (pooled) OLS, even if cov(αi |Xi ,t) = 0, the errors maystill be serially correlated with another.

Proof: We know that ηi ,t = αi + ui ,t

Therefore, cov(ηi ,t , ηi ,t−1) = cov(αi + ui ,t , αi + ui ,t−1)

Even if cov(αi , ui ,t) = 0, you would still be left withcov(αi , αi ) = var(αi ) = σ2α > 0

Hence, there will be serially correlated errors to some extent.

Justin Raymond S. Eloriaga Panel Data Models 2021 33 / 41

Page 125: Panel Data Models - WordPress.com

Caveats of using OLS/FE/FD when cov(αi ,Xi ,t) = 0

While you may use these methods, it turns out that FE/FD are tooextreme in this case.

In the case of FD, you throw away one period.

In the case of FE, it is too extreme in that you estimate thingsunecessarily.

In the case of (pooled) OLS, even if cov(αi |Xi ,t) = 0, the errors maystill be serially correlated with another.

Proof: We know that ηi ,t = αi + ui ,t

Therefore, cov(ηi ,t , ηi ,t−1) = cov(αi + ui ,t , αi + ui ,t−1)

Even if cov(αi , ui ,t) = 0, you would still be left withcov(αi , αi ) = var(αi ) = σ2α > 0

Hence, there will be serially correlated errors to some extent.

Justin Raymond S. Eloriaga Panel Data Models 2021 33 / 41

Page 126: Panel Data Models - WordPress.com

Caveats of using OLS/FE/FD when cov(αi ,Xi ,t) = 0

While you may use these methods, it turns out that FE/FD are tooextreme in this case.

In the case of FD, you throw away one period.

In the case of FE, it is too extreme in that you estimate thingsunecessarily.

In the case of (pooled) OLS, even if cov(αi |Xi ,t) = 0, the errors maystill be serially correlated with another.

Proof: We know that ηi ,t = αi + ui ,t

Therefore, cov(ηi ,t , ηi ,t−1) = cov(αi + ui ,t , αi + ui ,t−1)

Even if cov(αi , ui ,t) = 0, you would still be left withcov(αi , αi ) = var(αi ) = σ2α > 0

Hence, there will be serially correlated errors to some extent.

Justin Raymond S. Eloriaga Panel Data Models 2021 33 / 41

Page 127: Panel Data Models - WordPress.com

Caveats of using OLS/FE/FD when cov(αi ,Xi ,t) = 0

While you may use these methods, it turns out that FE/FD are tooextreme in this case.

In the case of FD, you throw away one period.

In the case of FE, it is too extreme in that you estimate thingsunecessarily.

In the case of (pooled) OLS, even if cov(αi |Xi ,t) = 0, the errors maystill be serially correlated with another.

Proof: We know that ηi ,t = αi + ui ,t

Therefore, cov(ηi ,t , ηi ,t−1) = cov(αi + ui ,t , αi + ui ,t−1)

Even if cov(αi , ui ,t) = 0, you would still be left withcov(αi , αi ) = var(αi ) = σ2α > 0

Hence, there will be serially correlated errors to some extent.

Justin Raymond S. Eloriaga Panel Data Models 2021 33 / 41

Page 128: Panel Data Models - WordPress.com

Random Effects Model

To alleviate the serially correlated errors, we need to use some feasiblegeneralized least squares. In the context of panel data, that would be theRandom Effects Model.

Consider the Model below:

AveIncomei ,t = β0 + β1Greenspacei ,t + αi + ui ,t ηi ,t = αi + ui ,t

As we had mentioned, if cov(αi , xi ,t) = 0, then FD/OLS/FE wouldnot be the most efficient model to use. Instead, the random effectsmodel would serve best.

We introduce some parameter λ which is a partial de-meaning factor.

Random Effects

yit − λyi = β0(1− λ) + βN∑i=1

T∑t=1

(xi ,t − λxi ) + ηi ,t − ληi

Justin Raymond S. Eloriaga Panel Data Models 2021 34 / 41

Page 129: Panel Data Models - WordPress.com

Random Effects Model

To alleviate the serially correlated errors, we need to use some feasiblegeneralized least squares. In the context of panel data, that would be theRandom Effects Model.

Consider the Model below:

AveIncomei ,t = β0 + β1Greenspacei ,t + αi + ui ,t ηi ,t = αi + ui ,t

As we had mentioned, if cov(αi , xi ,t) = 0, then FD/OLS/FE wouldnot be the most efficient model to use. Instead, the random effectsmodel would serve best.

We introduce some parameter λ which is a partial de-meaning factor.

Random Effects

yit − λyi = β0(1− λ) + βN∑i=1

T∑t=1

(xi ,t − λxi ) + ηi ,t − ληi

Justin Raymond S. Eloriaga Panel Data Models 2021 34 / 41

Page 130: Panel Data Models - WordPress.com

Random Effects Model

To alleviate the serially correlated errors, we need to use some feasiblegeneralized least squares. In the context of panel data, that would be theRandom Effects Model.

Consider the Model below:

AveIncomei ,t = β0 + β1Greenspacei ,t + αi + ui ,t ηi ,t = αi + ui ,t

As we had mentioned, if cov(αi , xi ,t) = 0, then FD/OLS/FE wouldnot be the most efficient model to use. Instead, the random effectsmodel would serve best.

We introduce some parameter λ which is a partial de-meaning factor.

Random Effects

yit − λyi = β0(1− λ) + βN∑i=1

T∑t=1

(xi ,t − λxi ) + ηi ,t − ληi

Justin Raymond S. Eloriaga Panel Data Models 2021 34 / 41

Page 131: Panel Data Models - WordPress.com

Random Effects Model

To alleviate the serially correlated errors, we need to use some feasiblegeneralized least squares. In the context of panel data, that would be theRandom Effects Model.

Consider the Model below:

AveIncomei ,t = β0 + β1Greenspacei ,t + αi + ui ,t ηi ,t = αi + ui ,t

As we had mentioned, if cov(αi , xi ,t) = 0, then FD/OLS/FE wouldnot be the most efficient model to use. Instead, the random effectsmodel would serve best.

We introduce some parameter λ which is a partial de-meaning factor.

Random Effects

yit − λyi = β0(1− λ) + βN∑i=1

T∑t=1

(xi ,t − λxi ) + ηi ,t − ληi

Justin Raymond S. Eloriaga Panel Data Models 2021 34 / 41

Page 132: Panel Data Models - WordPress.com

Random Effects Model

To alleviate the serially correlated errors, we need to use some feasiblegeneralized least squares. In the context of panel data, that would be theRandom Effects Model.

Consider the Model below:

AveIncomei ,t = β0 + β1Greenspacei ,t + αi + ui ,t ηi ,t = αi + ui ,t

As we had mentioned, if cov(αi , xi ,t) = 0, then FD/OLS/FE wouldnot be the most efficient model to use. Instead, the random effectsmodel would serve best.

We introduce some parameter λ which is a partial de-meaning factor.

Random Effects

yit − λyi = β0(1− λ) + β

N∑i=1

T∑t=1

(xi ,t − λxi ) + ηi ,t − ληi

Justin Raymond S. Eloriaga Panel Data Models 2021 34 / 41

Page 133: Panel Data Models - WordPress.com

Operationalizing Random Effects

AveIncomei ,t − λAveIncome i = β0(1− λ)+

β1(Greenspacei ,t − λGreenspacei ) + ηi ,t − ληi

Notice the following:

If λ = 0, then essentially, we are left with the original equation.Essentially, this implies that βRE = βPOLS .

If λ = 1, then this reduces to the fixed effects model. Ergo,βRE = βFE .

However, when the cov(αi , xi ,t) = 0, it is most often the case that0 < λ < 1.

So what exactly is λ (formulaically)?

λ = 1−

√(σ2u

σ2u + Tσ2α

)

Justin Raymond S. Eloriaga Panel Data Models 2021 35 / 41

Page 134: Panel Data Models - WordPress.com

Operationalizing Random Effects

AveIncomei ,t − λAveIncome i = β0(1− λ)+

β1(Greenspacei ,t − λGreenspacei ) + ηi ,t − ληi

Notice the following:

If λ = 0, then essentially, we are left with the original equation.Essentially, this implies that βRE = βPOLS .

If λ = 1, then this reduces to the fixed effects model. Ergo,βRE = βFE .

However, when the cov(αi , xi ,t) = 0, it is most often the case that0 < λ < 1.

So what exactly is λ (formulaically)?

λ = 1−

√(σ2u

σ2u + Tσ2α

)

Justin Raymond S. Eloriaga Panel Data Models 2021 35 / 41

Page 135: Panel Data Models - WordPress.com

Operationalizing Random Effects

AveIncomei ,t − λAveIncome i = β0(1− λ)+

β1(Greenspacei ,t − λGreenspacei ) + ηi ,t − ληi

Notice the following:

If λ = 0, then essentially, we are left with the original equation.Essentially, this implies that βRE = βPOLS .

If λ = 1, then this reduces to the fixed effects model. Ergo,βRE = βFE .

However, when the cov(αi , xi ,t) = 0, it is most often the case that0 < λ < 1.

So what exactly is λ (formulaically)?

λ = 1−

√(σ2u

σ2u + Tσ2α

)

Justin Raymond S. Eloriaga Panel Data Models 2021 35 / 41

Page 136: Panel Data Models - WordPress.com

Operationalizing Random Effects

AveIncomei ,t − λAveIncome i = β0(1− λ)+

β1(Greenspacei ,t − λGreenspacei ) + ηi ,t − ληi

Notice the following:

If λ = 0, then essentially, we are left with the original equation.Essentially, this implies that βRE = βPOLS .

If λ = 1, then this reduces to the fixed effects model. Ergo,βRE = βFE .

However, when the cov(αi , xi ,t) = 0, it is most often the case that0 < λ < 1.

So what exactly is λ (formulaically)?

λ = 1−

√(σ2u

σ2u + Tσ2α

)

Justin Raymond S. Eloriaga Panel Data Models 2021 35 / 41

Page 137: Panel Data Models - WordPress.com

Operationalizing Random Effects

AveIncomei ,t − λAveIncome i = β0(1− λ)+

β1(Greenspacei ,t − λGreenspacei ) + ηi ,t − ληi

Notice the following:

If λ = 0, then essentially, we are left with the original equation.Essentially, this implies that βRE = βPOLS .

If λ = 1, then this reduces to the fixed effects model. Ergo,βRE = βFE .

However, when the cov(αi , xi ,t) = 0, it is most often the case that0 < λ < 1.

So what exactly is λ (formulaically)?

λ = 1−

√(σ2u

σ2u + Tσ2α

)

Justin Raymond S. Eloriaga Panel Data Models 2021 35 / 41

Page 138: Panel Data Models - WordPress.com

Operationalizing Random Effects

AveIncomei ,t − λAveIncome i = β0(1− λ)+

β1(Greenspacei ,t − λGreenspacei ) + ηi ,t − ληi

Notice the following:

If λ = 0, then essentially, we are left with the original equation.Essentially, this implies that βRE = βPOLS .

If λ = 1, then this reduces to the fixed effects model. Ergo,βRE = βFE .

However, when the cov(αi , xi ,t) = 0, it is most often the case that0 < λ < 1.

So what exactly is λ (formulaically)?

λ = 1−

√(σ2u

σ2u + Tσ2α

)Justin Raymond S. Eloriaga Panel Data Models 2021 35 / 41

Page 139: Panel Data Models - WordPress.com

Hold up! What is σ2u and σ2

α?

Answer: These are the variances of the ui ,t and αi error terms, respectively.

If λ = 0, this is only possible if σ2u = 0. Therefore, RE is equivalent toPooled OLS. In essence, the effect αi is effectively unimportantwithout considering the serial correlation issue.

If λ = 1, this is only possible if Tσ2α →∞. If σ2α →∞, we want toget rid of the confounding αi as much as possible which is donethrough FE.

In essence, the RE is a ”quasi-time demeaned” model since we didn’tfully demean (like in FE), only partially.

CAVEAT: While theoretically this is true, the big problem is that wedon’t necessarily know σ2u and σ2α. We can only estimate it. Hence,estimating these will yield some estimate of λ, namely, λ

Justin Raymond S. Eloriaga Panel Data Models 2021 36 / 41

Page 140: Panel Data Models - WordPress.com

Hold up! What is σ2u and σ2

α?

Answer: These are the variances of the ui ,t and αi error terms, respectively.

If λ = 0, this is only possible if σ2u = 0. Therefore, RE is equivalent toPooled OLS. In essence, the effect αi is effectively unimportantwithout considering the serial correlation issue.

If λ = 1, this is only possible if Tσ2α →∞. If σ2α →∞, we want toget rid of the confounding αi as much as possible which is donethrough FE.

In essence, the RE is a ”quasi-time demeaned” model since we didn’tfully demean (like in FE), only partially.

CAVEAT: While theoretically this is true, the big problem is that wedon’t necessarily know σ2u and σ2α. We can only estimate it. Hence,estimating these will yield some estimate of λ, namely, λ

Justin Raymond S. Eloriaga Panel Data Models 2021 36 / 41

Page 141: Panel Data Models - WordPress.com

Hold up! What is σ2u and σ2

α?

Answer: These are the variances of the ui ,t and αi error terms, respectively.

If λ = 0, this is only possible if σ2u = 0. Therefore, RE is equivalent toPooled OLS. In essence, the effect αi is effectively unimportantwithout considering the serial correlation issue.

If λ = 1, this is only possible if Tσ2α →∞. If σ2α →∞, we want toget rid of the confounding αi as much as possible which is donethrough FE.

In essence, the RE is a ”quasi-time demeaned” model since we didn’tfully demean (like in FE), only partially.

CAVEAT: While theoretically this is true, the big problem is that wedon’t necessarily know σ2u and σ2α. We can only estimate it. Hence,estimating these will yield some estimate of λ, namely, λ

Justin Raymond S. Eloriaga Panel Data Models 2021 36 / 41

Page 142: Panel Data Models - WordPress.com

Hold up! What is σ2u and σ2

α?

Answer: These are the variances of the ui ,t and αi error terms, respectively.

If λ = 0, this is only possible if σ2u = 0. Therefore, RE is equivalent toPooled OLS. In essence, the effect αi is effectively unimportantwithout considering the serial correlation issue.

If λ = 1, this is only possible if Tσ2α →∞. If σ2α →∞, we want toget rid of the confounding αi as much as possible which is donethrough FE.

In essence, the RE is a ”quasi-time demeaned” model since we didn’tfully demean (like in FE), only partially.

CAVEAT: While theoretically this is true, the big problem is that wedon’t necessarily know σ2u and σ2α. We can only estimate it. Hence,estimating these will yield some estimate of λ, namely, λ

Justin Raymond S. Eloriaga Panel Data Models 2021 36 / 41

Page 143: Panel Data Models - WordPress.com

Hold up! What is σ2u and σ2

α?

Answer: These are the variances of the ui ,t and αi error terms, respectively.

If λ = 0, this is only possible if σ2u = 0. Therefore, RE is equivalent toPooled OLS. In essence, the effect αi is effectively unimportantwithout considering the serial correlation issue.

If λ = 1, this is only possible if Tσ2α →∞. If σ2α →∞, we want toget rid of the confounding αi as much as possible which is donethrough FE.

In essence, the RE is a ”quasi-time demeaned” model since we didn’tfully demean (like in FE), only partially.

CAVEAT: While theoretically this is true, the big problem is that wedon’t necessarily know σ2u and σ2α. We can only estimate it. Hence,estimating these will yield some estimate of λ, namely, λ

Justin Raymond S. Eloriaga Panel Data Models 2021 36 / 41

Page 144: Panel Data Models - WordPress.com

Hold up! What is σ2u and σ2

α?

Answer: These are the variances of the ui ,t and αi error terms, respectively.

If λ = 0, this is only possible if σ2u = 0. Therefore, RE is equivalent toPooled OLS. In essence, the effect αi is effectively unimportantwithout considering the serial correlation issue.

If λ = 1, this is only possible if Tσ2α →∞. If σ2α →∞, we want toget rid of the confounding αi as much as possible which is donethrough FE.

In essence, the RE is a ”quasi-time demeaned” model since we didn’tfully demean (like in FE), only partially.

CAVEAT: While theoretically this is true, the big problem is that wedon’t necessarily know σ2u and σ2α. We can only estimate it. Hence,estimating these will yield some estimate of λ, namely, λ

Justin Raymond S. Eloriaga Panel Data Models 2021 36 / 41

Page 145: Panel Data Models - WordPress.com

So how is Random Effects Done

Random effects generally revolves around the estimation of λ.

Step 1: We use fixed effects/pooled OLS to estimate λ.

Step 2: We use λ to estimate the original random effects equation. Inestimating, we just used pooled OLS on this transformed system.

This two step process is essentially the random effects model.

Justin Raymond S. Eloriaga Panel Data Models 2021 37 / 41

Page 146: Panel Data Models - WordPress.com

So how is Random Effects Done

Random effects generally revolves around the estimation of λ.

Step 1: We use fixed effects/pooled OLS to estimate λ.

Step 2: We use λ to estimate the original random effects equation. Inestimating, we just used pooled OLS on this transformed system.

This two step process is essentially the random effects model.

Justin Raymond S. Eloriaga Panel Data Models 2021 37 / 41

Page 147: Panel Data Models - WordPress.com

So how is Random Effects Done

Random effects generally revolves around the estimation of λ.

Step 1: We use fixed effects/pooled OLS to estimate λ.

Step 2: We use λ to estimate the original random effects equation. Inestimating, we just used pooled OLS on this transformed system.

This two step process is essentially the random effects model.

Justin Raymond S. Eloriaga Panel Data Models 2021 37 / 41

Page 148: Panel Data Models - WordPress.com

So how is Random Effects Done

Random effects generally revolves around the estimation of λ.

Step 1: We use fixed effects/pooled OLS to estimate λ.

Step 2: We use λ to estimate the original random effects equation. Inestimating, we just used pooled OLS on this transformed system.

This two step process is essentially the random effects model.

Justin Raymond S. Eloriaga Panel Data Models 2021 37 / 41

Page 149: Panel Data Models - WordPress.com

So how is Random Effects Done

Random effects generally revolves around the estimation of λ.

Step 1: We use fixed effects/pooled OLS to estimate λ.

Step 2: We use λ to estimate the original random effects equation. Inestimating, we just used pooled OLS on this transformed system.

This two step process is essentially the random effects model.

Justin Raymond S. Eloriaga Panel Data Models 2021 37 / 41

Page 150: Panel Data Models - WordPress.com

Breusch-Pagan Test for Comparing Random Effects andPooled OLS

How do we determine whether RE/Pooled OLS is better? Well, we use theBreusch-Pagan Test (not to be confused with the other BP test used forheteroscedasticity).

In essence, the test procedure revolves around H0 : λ = 0 andHa : λ 6= 0.

As we mentioned before, if λ = 0, then the RE just collapses to thePooled OLS. Hence, we test on a restriction that λ = 0.

Justin Raymond S. Eloriaga Panel Data Models 2021 38 / 41

Page 151: Panel Data Models - WordPress.com

Breusch-Pagan Test for Comparing Random Effects andPooled OLS

How do we determine whether RE/Pooled OLS is better? Well, we use theBreusch-Pagan Test (not to be confused with the other BP test used forheteroscedasticity).

In essence, the test procedure revolves around H0 : λ = 0 andHa : λ 6= 0.

As we mentioned before, if λ = 0, then the RE just collapses to thePooled OLS. Hence, we test on a restriction that λ = 0.

Justin Raymond S. Eloriaga Panel Data Models 2021 38 / 41

Page 152: Panel Data Models - WordPress.com

Breusch-Pagan Test for Comparing Random Effects andPooled OLS

How do we determine whether RE/Pooled OLS is better? Well, we use theBreusch-Pagan Test (not to be confused with the other BP test used forheteroscedasticity).

In essence, the test procedure revolves around H0 : λ = 0 andHa : λ 6= 0.

As we mentioned before, if λ = 0, then the RE just collapses to thePooled OLS. Hence, we test on a restriction that λ = 0.

Justin Raymond S. Eloriaga Panel Data Models 2021 38 / 41

Page 153: Panel Data Models - WordPress.com

Breusch-Pagan Test for Comparing Random Effects andPooled OLS

How do we determine whether RE/Pooled OLS is better? Well, we use theBreusch-Pagan Test (not to be confused with the other BP test used forheteroscedasticity).

In essence, the test procedure revolves around H0 : λ = 0 andHa : λ 6= 0.

As we mentioned before, if λ = 0, then the RE just collapses to thePooled OLS. Hence, we test on a restriction that λ = 0.

Justin Raymond S. Eloriaga Panel Data Models 2021 38 / 41

Page 154: Panel Data Models - WordPress.com

Accounting for Time Constant Variables in Random Effects

Say you had the model below

AveIncomei ,t = β0 + β1Greenspacei ,t + β2Climatei + ηi ,t

To operationalize the RE, we partially time demean this using ourestimated λ.

AveIncomei ,t − λAveIncome i = β0(1− λ)+

β1(Greenspacei ,t − λGreenspacei )+

β2(Climatei − λClimatei ) + ηi ,t − ληi

But Climatei = Climate i because it is fixed through time. But we knowthat in general, 0 < λ < 1. Hence, these terms will not disappear like inthe fixed effects. Therefore, RE gives us the ability to account for timeconstant variables (which would have otherwise been dropped in FE).

Justin Raymond S. Eloriaga Panel Data Models 2021 39 / 41

Page 155: Panel Data Models - WordPress.com

Accounting for Time Constant Variables in Random Effects

Say you had the model below

AveIncomei ,t = β0 + β1Greenspacei ,t + β2Climatei + ηi ,t

To operationalize the RE, we partially time demean this using ourestimated λ.

AveIncomei ,t − λAveIncome i = β0(1− λ)+

β1(Greenspacei ,t − λGreenspacei )+

β2(Climatei − λClimatei ) + ηi ,t − ληi

But Climatei = Climate i because it is fixed through time. But we knowthat in general, 0 < λ < 1. Hence, these terms will not disappear like inthe fixed effects. Therefore, RE gives us the ability to account for timeconstant variables (which would have otherwise been dropped in FE).

Justin Raymond S. Eloriaga Panel Data Models 2021 39 / 41

Page 156: Panel Data Models - WordPress.com

Accounting for Time Constant Variables in Random Effects

Say you had the model below

AveIncomei ,t = β0 + β1Greenspacei ,t + β2Climatei + ηi ,t

To operationalize the RE, we partially time demean this using ourestimated λ.

AveIncomei ,t − λAveIncome i = β0(1− λ)+

β1(Greenspacei ,t − λGreenspacei )+

β2(Climatei − λClimatei ) + ηi ,t − ληi

But Climatei = Climate i because it is fixed through time. But we knowthat in general, 0 < λ < 1. Hence, these terms will not disappear like inthe fixed effects. Therefore, RE gives us the ability to account for timeconstant variables (which would have otherwise been dropped in FE).

Justin Raymond S. Eloriaga Panel Data Models 2021 39 / 41

Page 157: Panel Data Models - WordPress.com

Accounting for Time Constant Variables in Random Effects

Say you had the model below

AveIncomei ,t = β0 + β1Greenspacei ,t + β2Climatei + ηi ,t

To operationalize the RE, we partially time demean this using ourestimated λ.

AveIncomei ,t − λAveIncome i = β0(1− λ)+

β1(Greenspacei ,t − λGreenspacei )+

β2(Climatei − λClimatei ) + ηi ,t − ληi

But Climatei = Climate i because it is fixed through time. But we knowthat in general, 0 < λ < 1. Hence, these terms will not disappear like inthe fixed effects. Therefore, RE gives us the ability to account for timeconstant variables (which would have otherwise been dropped in FE).

Justin Raymond S. Eloriaga Panel Data Models 2021 39 / 41

Page 158: Panel Data Models - WordPress.com

Comparing Random Effects and Fixed Effects

Consider the model below

yi ,t = β0 + β1xi ,t + αi + ui ,t

We know that if the cov(αi , xi ,t) = 0, both βRE and βFE will beconsistent, but se(βRE ) < se(βFE ). Therefore, it would be best to userandom effects.

The converse is true if cov(αi , xi ,t) 6= 0 since FE remains consistentbut RE does not.

Essentially, we want to test a hypothesis that cov(αi , xi ,t) = 0 againstcov(αi , xi ,t) 6= 0 and this is exactly what the Hausman Test ends updoing.

Justin Raymond S. Eloriaga Panel Data Models 2021 40 / 41

Page 159: Panel Data Models - WordPress.com

Comparing Random Effects and Fixed Effects

Consider the model below

yi ,t = β0 + β1xi ,t + αi + ui ,t

We know that if the cov(αi , xi ,t) = 0, both βRE and βFE will beconsistent, but se(βRE ) < se(βFE ). Therefore, it would be best to userandom effects.

The converse is true if cov(αi , xi ,t) 6= 0 since FE remains consistentbut RE does not.

Essentially, we want to test a hypothesis that cov(αi , xi ,t) = 0 againstcov(αi , xi ,t) 6= 0 and this is exactly what the Hausman Test ends updoing.

Justin Raymond S. Eloriaga Panel Data Models 2021 40 / 41

Page 160: Panel Data Models - WordPress.com

Comparing Random Effects and Fixed Effects

Consider the model below

yi ,t = β0 + β1xi ,t + αi + ui ,t

We know that if the cov(αi , xi ,t) = 0, both βRE and βFE will beconsistent, but se(βRE ) < se(βFE ). Therefore, it would be best to userandom effects.

The converse is true if cov(αi , xi ,t) 6= 0 since FE remains consistentbut RE does not.

Essentially, we want to test a hypothesis that cov(αi , xi ,t) = 0 againstcov(αi , xi ,t) 6= 0 and this is exactly what the Hausman Test ends updoing.

Justin Raymond S. Eloriaga Panel Data Models 2021 40 / 41

Page 161: Panel Data Models - WordPress.com

Comparing Random Effects and Fixed Effects

Consider the model below

yi ,t = β0 + β1xi ,t + αi + ui ,t

We know that if the cov(αi , xi ,t) = 0, both βRE and βFE will beconsistent, but se(βRE ) < se(βFE ). Therefore, it would be best to userandom effects.

The converse is true if cov(αi , xi ,t) 6= 0 since FE remains consistentbut RE does not.

Essentially, we want to test a hypothesis that cov(αi , xi ,t) = 0 againstcov(αi , xi ,t) 6= 0 and this is exactly what the Hausman Test ends updoing.

Justin Raymond S. Eloriaga Panel Data Models 2021 40 / 41

Page 162: Panel Data Models - WordPress.com

Comparing Random Effects and Fixed Effects

Consider the model below

yi ,t = β0 + β1xi ,t + αi + ui ,t

We know that if the cov(αi , xi ,t) = 0, both βRE and βFE will beconsistent, but se(βRE ) < se(βFE ). Therefore, it would be best to userandom effects.

The converse is true if cov(αi , xi ,t) 6= 0 since FE remains consistentbut RE does not.

Essentially, we want to test a hypothesis that cov(αi , xi ,t) = 0 againstcov(αi , xi ,t) 6= 0 and this is exactly what the Hausman Test ends updoing.

Justin Raymond S. Eloriaga Panel Data Models 2021 40 / 41

Page 163: Panel Data Models - WordPress.com

Operationalizing the Hausman Test

We propose the test statistic W ′

Hausman Test Statistic

W ′ =( ˆβFE

∗ − ˆβRE∗)

var( ˆβFE )− var( ˆβRE )∼ χ2

1

The test statistic is essentially a chi-squared statistic with one degree offreedom.

If the null hypothesis is true (H0 : cov(αi , xi ,t) = 0), then RE and FEare both internally consistent, but RE is more efficient).

If the null hypothesis is false (Ha : cov(αi , xi ,t) 6= 0), then RE isinternally inconsistent therefore FE would be a better model.

Justin Raymond S. Eloriaga Panel Data Models 2021 41 / 41

Page 164: Panel Data Models - WordPress.com

Operationalizing the Hausman Test

We propose the test statistic W ′

Hausman Test Statistic

W ′ =( ˆβFE

∗ − ˆβRE∗)

var( ˆβFE )− var( ˆβRE )∼ χ2

1

The test statistic is essentially a chi-squared statistic with one degree offreedom.

If the null hypothesis is true (H0 : cov(αi , xi ,t) = 0), then RE and FEare both internally consistent, but RE is more efficient).

If the null hypothesis is false (Ha : cov(αi , xi ,t) 6= 0), then RE isinternally inconsistent therefore FE would be a better model.

Justin Raymond S. Eloriaga Panel Data Models 2021 41 / 41

Page 165: Panel Data Models - WordPress.com

Operationalizing the Hausman Test

We propose the test statistic W ′

Hausman Test Statistic

W ′ =( ˆβFE

∗ − ˆβRE∗)

var( ˆβFE )− var( ˆβRE )∼ χ2

1

The test statistic is essentially a chi-squared statistic with one degree offreedom.

If the null hypothesis is true (H0 : cov(αi , xi ,t) = 0), then RE and FEare both internally consistent, but RE is more efficient).

If the null hypothesis is false (Ha : cov(αi , xi ,t) 6= 0), then RE isinternally inconsistent therefore FE would be a better model.

Justin Raymond S. Eloriaga Panel Data Models 2021 41 / 41

Page 166: Panel Data Models - WordPress.com

Operationalizing the Hausman Test

We propose the test statistic W ′

Hausman Test Statistic

W ′ =( ˆβFE

∗ − ˆβRE∗)

var( ˆβFE )− var( ˆβRE )∼ χ2

1

The test statistic is essentially a chi-squared statistic with one degree offreedom.

If the null hypothesis is true (H0 : cov(αi , xi ,t) = 0), then RE and FEare both internally consistent, but RE is more efficient).

If the null hypothesis is false (Ha : cov(αi , xi ,t) 6= 0), then RE isinternally inconsistent therefore FE would be a better model.

Justin Raymond S. Eloriaga Panel Data Models 2021 41 / 41

Page 167: Panel Data Models - WordPress.com

Operationalizing the Hausman Test

We propose the test statistic W ′

Hausman Test Statistic

W ′ =( ˆβFE

∗ − ˆβRE∗)

var( ˆβFE )− var( ˆβRE )∼ χ2

1

The test statistic is essentially a chi-squared statistic with one degree offreedom.

If the null hypothesis is true (H0 : cov(αi , xi ,t) = 0), then RE and FEare both internally consistent, but RE is more efficient).

If the null hypothesis is false (Ha : cov(αi , xi ,t) 6= 0), then RE isinternally inconsistent therefore FE would be a better model.

Justin Raymond S. Eloriaga Panel Data Models 2021 41 / 41