DYNAMIC PANEL DATA ANALYSIS USING STATA 11 Panel Data Analysis – iLQAM, UiTM Shah Alam, 12-13 Dec 2013. Page 1 DYNAMIC PANEL DATA ANALYSIS USING STATA 11.0 By: Mahyudin Ahmad

Dynamic Panel Data Analysis – iLQAM, UiTM Shah Alam, 12-13 Dec 2013. Page 1

DYNAMIC PANEL DATA ANALYSIS

USING STATA 11.0

By:

Mahyudin Ahmad

UiTM Perlis

Page 2 Dynamic Panel Data Analysis – iLQAM, UiTM Shah Alam, 12-13 Dec 2013.

1. Revisiting Endogeneity issue

2. Static model: IV estimation (recap)

3. Dynamic Panel data model

1. Difference GMM

2. System GMM

4. Diagnostic tests Sargan/Hansen and Autocorrelation tests

5. Hands-on with Stata

Main references:

1. Cameron & Trivedi (2005), Verbeek (2008), Dr Nor Azam notes.

2. https://www.iser.essex.ac.uk/files/teaching/spudney/ec968/downlo

ads/Lecture%20Notes/bergen%20notes.pdf

3. How to do xtabond2 http://ideas.repec.org/p/cgd/wpaper/103.html

Outline

https://www.iser.essex.ac.uk/files/teaching/spudney/ec968/downloads/Lecture Notes/bergen notes.pdf

https://www.iser.essex.ac.uk/files/teaching/spudney/ec968/downloads/Lecture Notes/bergen notes.pdf

http://ideas.repec.org/p/cgd/wpaper/103.html


Recall if Cor (xit, vi) ≠ 0, we have problem of endogeneity.

Instrumental variable technique has been used to overcome the

problem of endogeneity

2SLS estimation: need to find external exogenous instruments that

satisfy necessary requirements:

correlated with endogenous variable,

uncorrelated with error term of the original model.

1. Endogeneity


1. Endogeneity


1. Endogeneity


1. Endogeneity


2. IV estimation


2. IV estimation


2. IV estimation


2. IV estimation


2. IV estimation


The command to do IV estimation

ivregress: linear regression of depvar on varlist1 and varlist2,

using varlist iv (along with varlist1) as instruments for varlist2.

Diagnostics tests: executed after the ivregress command

estat endogenous

estat overid

estat firststage

2. IV estimation


The command to do IV estimation

xtivreg: for panel-data models in which some of the right-hand-

side covariates are endogenous. These estimators are two-

stage least-squares generalizations of simple panel-data

estimators for exogenous variables.

Options: be: 2SLS between estimator.

fe: 2SLS within estimator.

re: 2SLS random-effects estimator.

fd: 2SLS first-differenced estimator.

xthtaylor: for panel-data random-effects models in which some

of the covariates are correlated with the unobserved

individual-level random effect.

We are not going to cover this in our

workshop. You may however read help

from Stata and try for yourself the example

given.

2. IV estimation


What is it called dynamic?

when lagged dependent variable is included in regressors of the

equation to be estimated.

Example:

Economic growth analysis:

Real income/GDP or real GDP growth as depvar

lagged income/GDP is included for convergence analysis

Caselli et al. (1996) and Bond et al. (2001) show that the Generalized Methods of Moments (GMM) dynamic panel estimation is capable to correct for unobserved country heterogeneity, omitted variable bias, measurement error, and endogeneity problems frequently arise in growth estimation.

3. Dynamic Panel data model


Consider a simple case below:

and

and we assume the error term for t=2....T as follows:

initial condition for dynamic model

(1) ,,...2for 1 T....tyy ititit

T....tuv itiit ...,,2for

22 ][

0][

0][

0][

vi

iti

it

i

vE

uvE

uE

vE

TtyuE

stuuE

iit

isit

,......,2for 0][finally and

for 0][

1

3. Panel data dynamic model



The implication when having lagged dependent variable as in

Equation (1) earlier:

All Pooled OLS, Fixed effect and Random effect estimators

becoming inconsistent!

How?

OLS

Linear estimation of yit on yit-1

The eror term (vi +uit) is then correlated with the regressor yit-1

via vi

How? Say, we lag the Equation (1) to become

and it’s obvious yit-1 is correlated with vi in ɛit-1

121 ititit yy



Fixed Effect (within estimator)

Within estimator regress on the error term

is

Inconsistent estimator since is correlated with by

construction.

Recall, and

Consistency requires to become very small relative to uit

and this possible only when which only occurs in long

panels but not on short panels (Nickell, 1981).

Random Effect estimator

Due to the fact that it is linear combination of within and

between estimators.

)( iit yy )( 1,1 iit yy)( iit uu

1,iy iu

T

t iti Tyy2 11, )1/(

T

t iti Tuu2 11, )1/(

iuT


3.1: Difference GMM

Difference GMM transforms Equation (1) will into a differenced

equation as:

time-invariant fixed effect vi has now disappeared

OLS estimator from Equation (2) is definitely inconsistent as yit-1

in the is correlated with the error term

Anderson and Hsio (1981): estimate Eqn (2) via IV estimation

using earlier lagged of y (in level) i.e. yit-2 as instrument for the

.

Valid instrument since yit-2 is not correlated with the

assuming the error uit are serially uncorrelated. E(uit-1uit-2)=0

yit-2 is a good instrument since it is correlated with the .

(2) ,,...2for )()( 1211 T....tuuyyyy itititititit

)( 21 itit yy )( 1 itit uu

)( 21 itit yy

)( 1 itit uu

)( 21 itit yy


3.1: Difference GMM

More efficient estimation is however possible using additional lags

of the dependent variable as instruments.

For example, both yit-2 and yit-3 as instruments for the .

The model is then overidentified (number of instruments greater than

number of instrumented variable) so estimation should be by 2SLS

or panel GMM.

Note that number of instruments available is highest for the

dependent variable observed in time t closest to the final time

period (most recent) T.

In period 3, only yi1 is available as instrument for Δyi3

In period 4, both yi1 and yi2 are available as instrument for Δyi4

In period 5, yi1, yi2, and yi3 are available as instrument for Δyi5

and so on.

)( 21 itit yy


Use many lags, replacing missing with zero

Generate separate instrument for each lag and time period

instrumented

IV-style:

2,

1

.

.

Ti

i

y

y

GMM-style:

.

000

0000

00000

000000

000000

123

12

1

iii

ii

i

yyy

yy

y

3.1: Difference GMM


3.1: Difference GMM

Holtz-Eakin et al (1988) and Arellano and Bond (1991) propose

panel GMM estimators, and the instruments need to observe

the following moment restrictions for α to be estimated

efficiently:

Assuming no serial correlation in the error term, uit , lagged

levels of the variables (that we differenced in equation to be

estimated) make a valid instruments as yis is not correlated

with the

Hence the name Arellano-Bond estimator (or difference GMM)

(3) 2,......,1and ,,......,3for 0][ tsTtyuE isit

)( 1 itit uu


If our model contains other explanatory variables xit as regressors :

The differenced equation will be

Thus, lagged level of xit-2 will make valid instrument for

since it does not correlate with the .

Rule of thumb: for a level variable to be a valid instrument for

differenced variable, it must be lagged at least by 2 periods.

3.1: Difference GMM

)( 1 itit xx

(5) ,,.2for )()()( 11211 T..tuuxxyyyy itititititititit

)( 1 itit uu

(4) ,,...2for 1 T....txyy itititit


Moment conditions now become:

An additional assumption is also required for the explanatory

variables, x’s ie they are assumed to be weakly exogenous

In other words, the explanatory variables must be be orthogonal to

future realizations of the error term.

3.1: Difference GMM


(6) 2,......,1and ,,......,3for 0][ tsTtxuE isit


Arellano-Bond is however shown to have conceptual and

statistical shortcomings

Alonso-Borrego and Arellano (1999), and Blundell and Bond

(1998) point out that when explanatory variables are persistent

over time, lagged levels of these variables make weak

instruments for regression in differences,

Instrument weakness in turn influences the asymptotic and the

small-sample performance of the difference estimator.

Asymptotically, variance of the coefficients will rise, and in small

sample, Monte Carlo experiments show that weak instruments

can produce biased coefficients

3.1: Difference GMM


A better and efficient technique of dynamic panel analysis GMM is

proposed by Arellano and Bover (1995) using the following moment

conditions:

which equal to

The above moment conditions imply that we estimate Equation (1)

in level (not in differenced), and instrument the endogenous yit-1 in

the model with lagged differences of y, ie Δyis , s≤t-1

The estimator proposed by Arellano and Bover (1995) utilizing this

moment condition is therefore called Arrellano-Bover estimator.

3.2: System GMM

(7) 1for 0][ tsyE isit

1for 0])[(or tsyuvE isiti

1for 0])[( tsyuvE isiti


For Δyis to be valid instrument, it must be uncorrelated with the

original composite error term ɛit or (vi+uit) in the level equation to be

estimated.

Notwithstanding the possible correlation between the regressor (in

level) and the time-invariant factor vi.

This is since the regressors are assumed to fulfill the following

stationarity property:

for all p and s.

Recall, earlier that we assume and which

imply normal distribution of vi

3.2: System GMM 1for 0])[(or tsyuvE isiti

][][ isitipit vyEvyE

0][ ivE22 ][ vivE


Blundell and Bond (1998) combines estimation in difference (the

similar technique in difference GMM method) with estimation in

level proposed by Arrellano-Bover estimator

The estimator is called as system GMM estimator.

System GMM estimator must therefore fulfill the moment conditions

(3) and (7)

and the stationary property

3.2: System GMM 1for 0])[(or tsyuvE isiti


(7) 1for 0][ tsyE isit



If our model contains other explanatory variables xit as regressors as

in Equation (4) ie

The instruments in system GMM estimator (level & differenced

regressions) therefore need to fulfill the following moment conditions,

and the following stationary properties

for all p and s.

for all p and s.

3.2: System GMM

(4) ,,...2for 1 T....txyy itititit

(8) 1 allfor 0][( tsxE isit

][][ isitipit vxEvxE


(6) 2,......,1and ,,......,3for 0][ tsTtxuE isit

(7) 1 allfor 0][( tsyE isit



Consistency of the GMM estimator depends on the validity of the

instruments.

As suggested by Arellano and Bond (1991), Arellano and Bover

(1995), and Blundell and Bond (1998), two specification tests are

used: Sargan/Hansen test and serial correlation test (AR(1) &

AR(2)).

Sargan/Hansen test of over-identifying restrictions which tests

for overall validity of the instruments

the null hypothesis is that all instruments as a group are

exogenous. Therefore higher p-value is better (insignificant)

Rule of thumb : no of instruments ≤ no. of panel units.

4. Diagnostic tests


Serial correlation test examines the null hypothesis that error term

of the differenced equation is not serially correlated at the first

order (AR1) and second order (AR2). So again we need higher p-

value here.

By construction, the differenced error term is probably serially

correlated at AR(1) even if the original error is not. Differenced

error term at AR (1) process is and

and both have uit-1

AR(2) test is most important since it will detect autocorrelation in

levels. AR(2) process is and

While most studies that employ GMM dynamic estimation report

the test for first order serial correlation, some do not.

1 ititit uuu 211 ititit uuu

1 ititit uuu 322 ititit uuu

4. Diagnostic tests


xtabond2 command is preferable.

Refer abdata and do file sent

Hands-on with Stata

Documents

DYNAMIC PANEL DATA ANALYSIS USING STATA 11 Panel Data Analysis – iLQAM, UiTM Shah Alam, 12-13 Dec 2013. Page 1 DYNAMIC PANEL DATA ANALYSIS USING STATA 11.0 By: Mahyudin Ahmad