Likelihood Based Inference for Dynamic Panel Data Modelsminiahn/archive/ahn_thomas_2006.pdf · seminar at Rice University, the Eighth Annual Texas Econometrics Camp, the University

Likelihood Based Inference for Dynamic Panel Data Models

Seung C. Ahna

Arizona State University Gareth M. Thomas Quantitative Micro Softwareb

This Version: September, 2006 Abstract This paper considers maximum likelihood (ML) based inferences for dynamic panel data models. We focus on the analysis of the panel data with a large number (N) of cross-sectional units and a small number (T) of repeated time-series observations for each cross-sectional unit. We examine several different ML estimators and their asymptotic and finite-sample properties. Our major finding is that when data follow unit-root processes without or with drifts, the ML estimators have singular information matrices. This is a case of Sargan (1983) in which the first order condition for identification fails, but parameters are identified. The ML estimators are consistent, but they have nonstandard asymptotic distributions and their convergence rates are lower than N1/2. In addition, the sizes of usual Wald statistics based on the estimators are distorted even asymptotically, and they reject the unit-root hypothesis too often. However, following Rotnitzky, Cox, Bottai and Robins (2000), we show that likelihood ratio (LR) tests for unit root follow mixtures of chi-square distributions. Our Monte Carlo experiments show that the LR tests with the p-values from the mixed distributions are much better sized than the Wald tests, although they tend to slightly over-reject the unit root hypothesis in small samples. It is also shown that the LR tests for unit roots have good finite-sample power properties.

JEL Classification Codes: C23, C40 Keywords: dynamic panel data, maximum likelihood, singular information matrix. Acknowledgment The first author gratefully acknowledges the financial support of the College of Business and Dean's Council of 100 at Arizona State University, the Economic Club of Phoenix, and the alumni of the College of Business. We would like to thank participants in the econometrics seminar at Rice University, the Eighth Annual Texas Econometrics Camp, the University of California at San Diego, Pennsylvania State University, the University of California at Los Angeles, and the 2004 Far Eastern Econometric Society Meeting.

a Corresponding Author: Seung C. Ahn, Department of Economics, W.P. Carey School of Business, Arizona State University, Tempe, AZ 85287; email: [email protected]. b The views and research contained in this paper represent those of the individual authors, and not of Quantitative Micro Software.

1

1. Introduction Panel data models assume that individual cross-section units have different intercept terms due to

unobservable heterogeneity. Two different models are used to control the unobservable

heterogeneity. One is the random effects (RE) model in which the individual-specific intercept

terms, or individual effects, are treated as random variables. The other is the fixed effects (FE)

model in which the effects are treated as parameters. The FE model is more general than the

RE model in that it requires weaker distributional assumptions about the effects. One difficulty

in estimating the fixed effects models however is that the number of parameters increases with N.

A traditional treatment for this so-called “incidental parameters problem” (Neyman and Scott,

1948) is the within estimator, i.e., least squares on data transformed into deviations from

individual means, which is also a ML estimator. For models with strictly exogenous regressors,

the within estimator is consistent. Unfortunately, however, when T is small, the within

estimator is inconsistent for the dynamic models that use lagged dependent variables as

regressors (Nickell, 1981). One way to avoid this problem is to use the random effects ML

estimator that treats the effects as time invariant random variables (Hsiao, 1986). Instead,

generalized method of moments (GMM) estimators have been widely used to analyze the FE

dynamic panel models (e.g., Arellano and Bond, 1991; Arellano and Bover, 1995; Ahn and

Schmidt, 1995, 1997; and Blundell and Bond, 1998). An important reason for the popularity of

GMM is that it provides consistent estimators under quite general FE assumptions.

Recently, research interests in the ML estimation of dynamic panel data models have

been revived. For example, Lancaster (2002), and Hsiao, Pesaran and Tahmiscioglu (2002,

HPT) have developed alternative ML estimators for FE dynamic models.1 Kruiniger (2002)

provides the general conditions under which the ML estimators of Lancaster and HPT are

consistent. These studies consider the models in which white noise error terms are

homoskedastic over time. Extending these studies, Alvarez and Arellano (2004) examine the

RE and the two FE ML estimators when white-noise error terms are heteroskedastic over time.

One reason for the recent revival of the ML approach may be that the panel data GMM

estimators often have poor finite-sample properties (e.g., Bond and Windmeijer, 2002).

Dynamic panel data models imply a large number of moment conditions. The GMM estimators

1 Hahn and Kuersteiner (2002) also consider the ML estimator for the FE dynamic model with large N and T. They find the ML estimator, which is the within estimator, is consistent, but it is asymptotically biased. Hahn and Kuersteiner provide a biased-corrected ML estimator.

imposing all of the available moment conditions appear to generate biased statistical inferences,

especially when T is large. Thus, an important research agenda would be to develop the

alternative GMM estimators that use a smaller number of moment conditions, but without

substantial loss of asymptotic efficiency. Ahn and Schmidt (1995) show that when data are

normally distributed, an efficient GMM estimator is asymptotically identical to the ML estimator

constructed on the same first and second order moment conditions. Given that ML estimators

are the GMM estimators based on exact identifying moment conditions, the ML-based approach

may be a viable alternative to the popular GMM approach.

This paper considers the asymptotic and finite-sample properties of the RE ML estimator

and the two FE ML estimators of Lancaster (2002) and HPT when white noise error terms are

homoskedastic over time. For the cases where data are not persistent, Kruiniger (2002) has

studied the relative asymptotic efficiency of these three estimators. This paper readdresses this

relative efficiency issue in a more intuitive way. In addition, we examine the asymptotic and

finite-sample properties of the ML estimators when data are persistent. Alvarez and Arellano

(2004) also provide an elegant comparison of the efficiency of the estimators when the white

noise error terms are heteroskedastic. The methods they use are similar to ours in many

respects. But we have obtained our results independently from theirs.

Our major finding in this paper is that when cross-sectional data follow unit root

processes, the information matrices of the RE and the two FE ML estimators become singular.

However, these estimators still can identify the parameters to be estimated. This is a case

similar to the case of Sargan (1983) in which the first order condition for identification fails, but

the parameters are identified. Rotnitzky, Cox, Bottai and Robins (2000, RCBR) analyze the

general asymptotic properties of ML estimators for such cases. Following their approach, we

derive the asymptotic distribution and convergence rate of the RE and HPT ML estimators.

Their asymptotic distributions are not normal and their convergence rates are N1/4, not N1/2. In

addition, usual Wald-type tests for unit roots generate biased inferences. In contrast, likelihood

ratio (LR) test statistics for unit root follow mixed 2χ distributions that can be easily simulated

by Monte Carlo experiments. We find that the LR tests with the p-values from the mixed

distributions are much better sized than other Wald-type tests.

We also consider a dynamic panel model with heterogeneous trends. For the model

with large N and T, Moon and Phillips (2004) propose a GMM estimator that is obtained treating

2

individual trends as incidental parameters (Neyman and Scott, 1948). The convergence rate of

the estimator is N1/6 when data follow unit roots with heterogeneous drifts. We consider the ML

estimation of the model under the alternative assumption that the individual trends are random.

For fixed T, we find that the convergence rate of the alternative ML estimator is N1/4.

This paper is organized as follows. In Section 2, we briefly compare the RE, HPT and

Lancaster estimators and examine their relative efficiency. Section 3 investigates the

asymptotic properties of these estimators when data follow unit-root processes. Section 4

reports our Monte Carlo experiment results. Some concluding remarks follow in Section 5.

2. ML Estimation The foundation of this paper is the simple dynamic model

3

). , 1 (it i t i ity yδ α ε−= + + (1)

Here i = 1, 2, ..., N denotes cross-section unit (individual) and t = 1, 2, ..., T denotes time. Our

parameter of interest is δ . We initially assume that δ is within a unit circle, i.e., | | 1δ < . We

will relax this assumption later. The composite error ( )i itα ε+ contains a time invariant

individual effect iα and random noise itε . The initial observed value of for individual i isy 0iy .

We assume that the random error vector 1 2( , ,..., )i i i iTε ε ε ε ′= is uncorrelated with 0iy and iα , and

that the iε , 0iy , and iα are cross-sectionally independent. We assume that the error terms itε are

serially uncorrelated. For the maximum likelihood estimation of model (1), we need to make

distributional assumptions about the 0iy and iα . We assume that all of the iε , 0iy , and iα are

normally distributed:

(2)

2, 10

2, 1

1 1 1

00~ 0 , 0

0 0 0

o o

o

y y Ti

i y

i T T T T

yN

I

α

α α

σ σ

α σ σε ν

×

×

× × ×

⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎜⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠

,T ⎟⎟

where var( )itν ε= . Here, we assume that the 0iy and iα have zero means. This assumption is

just for convenience. We can allow nonzero means without altering our main results. Since

we make an explicit normality assumption about the individual effects iα , we shall refer to (2) as

the random-effect (RE) assumption.

In this paper, we do not consider the more general models that contain exogenous

regressors. However, our results can be easily extended to such models. The ML estimation

of model (1) has been considered by Anderson and Hsiao (1981), Bhargava and Sargan (1983),

and Hsiao (1986). We can easily derive the log-likelihood function, viewing (1) as a recursive

simultaneous equations model (treating 0iy , 1iy , ... , iTy as endogenous variables).

The log-likelihood function for model (1) depends on the five parameters, δ , ,oy ασ , 2oyσ ,

2ασ andν . However, we find a convenient reparameterization by which the parameter 2

oyσ

can be orthogonalized to other parameters. Using this method, the parameter of our interest δ

can be estimated independently from 2oyσ . In addition, this reparameterization will facilitate our

comparisons of the RE ML estimator with the two alternative fixed-effects (FE) ML estimators

developed by Hsiao, Pesaran and Tahmiscioglu (2002, HPT), and Lancaster (2002).

We define

4

p y 0( 1)i i iδ α≡ − + 0 0( | )i i iE p y y; ψ= ; ( )0 1i i iu p y iψ ε= − + ,

where 2,( 1) /

o oy yαψ δ σ= − + σ

i

.2 With this notation, model (1) can be written in the following

alternative form:

01

( 1) 1 , 1

0,

0ii

T ii i

yy uyy

ψ δε− × −

Δ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞= + +⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ΔΔ⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠ Δ

(3)

where ,, 1it it i ty y y −Δ = − , 1it it i tε ε ε −Δ = − , and

2( , ..., )i i iTy y y ′Δ = Δ Δ ; , 1 1 , 1( , ..., )i i i Ty y y− − ′Δ = Δ Δ ; 2( , ..., )i i iTε ε ε ′Δ = Δ Δ .

Under the RE assumption, the variance matrix of the error vector ( , )i iu ε ′ ′Δ is given:

1

1 1

1( )i

Ti T T

u cVar

c Bω

ν ω νε

−

− −

T′+ −⎛ ⎞ ⎛≡ Ω ≡⎜ ⎟ ⎜Δ −⎝ ⎠⎝ ⎠

⎞⎟ , (4)

where 2 2 20 ,var( ) / ( / ) /

o oi i y yp y α αω ψ ν σ σ σ ν= − = − 1Tc, − is a (T-1)×1 vector whose first entry

equals one while other entries equal zero, and 1 ,[T TB B− ]jh= is a (T-1)×(T-1) matrix such that

, , and all other entries equal zero. We can show that 1, 2T jjB − = 1, , 1 1, 1, 1T j j T j jB B− + − −= = −

2 If the exogenous regressors, say xi1, ... , xiT, were included in the model, we may construct the likelihood function assuming 0 1( | , ,..., )i i i iTE y x xα is linear in the conditional variables (Wooldridge, 2000).

(1 ( 1)11 1

( 1) 1 1 1

0 0 1[ ( )] 1

0T

TT T TT

T kB k

ωξ

× −−− )T −

− × − −

⎛ ⎞ ⎛ ⎞ ′Ω = +⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠

, (5)

where 1 (( 1) / , ( 2) / ,...,1/ )Tk T T T T T− ′= − − , and det[ ( )] 1T T Tξ ω ω= Ω = + .3

In the reparameterized model (3), the parameter vector to be estimated jointly is given by

θ = ( , , , )δ ν ω ψ ′ . The initial observation 0iy is uncorrelated with the error vector ( , )i iu ε ′ ′Δ .

So, we can construct a likelihood function treating 0iy as predetermined. This is so because

under our normality assumption,

0 0

2 21 2 0 1 0 0( , ,..., , | , ) ( ,..., | , ) ( | )i i i iT i y i i iT i i i yf y y y y f y y y f yθ σ θΔ Δ Δ = Δ Δ σ .

From now on, we use notation fi(•) to denote a density function for the i’th observation; and

( )i θ to denote the log of fi(•). Under the RE assumption, and using (4) and (5), we can easily

derive the log-density function of 1 2( , ,... )i i iTy y y ′Δ Δ Δ conditional on 0iy :4

( ) ( )

( )

, 1 2 0

1, 1 1 , 1

2

1 0 1 , 1

( ) ( , , ..., | , )1 1ln(2 ) ln( ) ln( )

2 2 2 2

( ) ( ) .2

RE i i i i iT i

T i i T i

i i T i iT

y y y yT T y y B y y

T y y k y y

θ θ

π ν ξ δ δν

ψ δνξ

−− − −

− −

≡ Δ Δ Δ

′= − − − − Δ − Δ Δ − Δ

′− Δ − + Δ − Δ

i (6)

Thus, maximizing the likelihood function 1 , ( )Ni RE i θ=Σ , we can obtain the RE ML estimator of θ .

In order to derive the log-density function (6), we have assumed that the initial values

0iy are normally distributed. However, it may be worth noting that the normality of 0iy is not

required for (6). The log-density function is valid as long as the iε and iα are normal

conditional on 0iy and the conditional mean of iα is linear in 0iy .

The function (6) provides a foundation by which the RE ML estimator of δ can be

compared to the HPT and Lancaster estimators. HPT propose to estimate δ based on the

following differenced model:5

3 See HPT for the derivation of det[ ( )]T ωΩ . The parameter ω in their paper is equivalent to ( 1)ω + in this paper. 4 For the cases in which the yi0 and αi do have non-zero expectations, the term 1( i iy y 0 )ψΔ − will be replaced by

1 0( i iy y )aψΔ − Δ − , where a is a constant.

5

5 HPT in fact include an intercept term a for the Δyi1 equation. But the interceptor term equals zero under our zero-mean assumptions.

6

,HPT i

1

(7) 1

, 1

0,i

ii i

y uyy

δε−

Δ ⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟⎜ ⎟ ⎜ ⎟ΔΔ Δ⎝ ⎠ ⎝ ⎠⎝ ⎠

where,

,,

1 1

1( )HPT i HPT T

HPT T HPTi T T

u cVar

c Bω

ν ω νε

−

− −

′+⎛ ⎞ ⎛≡ Ω =⎜ ⎟ ⎜Δ ⎝ ⎠⎝ ⎠

⎞⎟

)

. (8)

If we assume the normality of the error vector ,( ,HPT i iu ε ′ ′Δ , the model (7) leads to the following

log-density function:

( )

1, , , 1

21 1 , 1

,

1 1( ) ln(2 ) ln( ) ln( ) ( ) ( )2 2 2 2

1 ( ) ,2

HPT i HPT HPT T i i T i i

i T i iHPT T

T T y y B y y

y k y y

1 , 1θ π ν ξ δ δν

δνξ

−− − −

− −

′≡ − − − − Δ − Δ Δ − Δ

′− Δ + Δ − Δ (9)

where ( , , )HPT HPTθ δ ν ω ′= and , ,det( ) 1HPT T HPT T HTPTξ ω= Ω = + . Observe that this log-density

function and the corresponding log-likelihood function depend on only three parameters, while

the RE log-density function (6) depends upon four parameters.

While the function (9) is almost identical to (6), a critical difference between the two

functions is that the former does not depend on the initial value 0iy . That is, (9) is the

unconditional log-density of 1iyΔ , ..., iTyΔ . Note that under the RE assumption,

( ) , 0HPT i i iu y u0

2 2var( ) / /HPT i ypψ= + , ω = ν ψ σ ν ω= + . (10)

The HPT ML method treats the 0iy as unobservables. This treatment does not lead to

inconsistent estimators. But it will lead to inefficiency under the RE assumption. The HPT

estimation does not exploit the information about δ that is contained in the level data 0iy . It

only exploits the information contained in differenced data. In contrast, the RE estimator

utilizes the information about δ contained in level data (through the non-zero correlation with

1iyΔ and 0iy ). Of course, both the estimators become asymptotically equivalent if 0ψ = .

An intriguing question might be whether the HPT estimator is a FE treatment in the sense

that its efficiency does not require any distributional assumption about the iα . Observe that the

HPT estimator requires the normality of , 0( 1) 1HPT i i i iu yδ α ε= − + + . This essentially requires

the normality of 0iy , but not necessarily the normality of iα under some circumstances as we see

below.

Suppose that the ity follow a stationary process. Specifically, assume that

0 0 , 1; ; , 1,..., .i i i it i it it i t ity q y q q q t Tη η δ ε−= + = + = + = (11)

This process implies that , 1 (1 )it i t i ity yδ δ η ε−= + − + , where (1 ) iδ η− equals iα in our notation.

Under this assumption, 0( 1) ( 1)i i i 0ip y qδ α δ= − + = − . Thus, as long as 0iy (or ) is normal

conditional on

0iq

iα , ,HPT iu remains normal. When the stationary condition does not hold, the

normality of ,HPT iu requires the normality of iα . This means that the HPT estimator is indeed a

FE treatment when data follow the stationary process (11). However, when data are not

stationary, the HPT estimator may not be viewed as a real FE treatment because it requires a

distributional assumption about the effects iα .

As we mentioned earlier, the RE estimator does not require the normality of 0iy , but it

requires the normality of iα conditional on 0iy to secure the normality of the error term in (3),

( 0( | )i i i iu E y ) 1iα α= − + ε . Observe that even if the data are stationary, the error term

cannot be normal unless the effect

iu

iα is normal. Thus, the RE ML estimator cannot be viewed

as a FE effect treatment, in the sense that its efficiency requires the normality of iα . However,

it is important to note that the consistency of the RE ML estimator does not require the normality

of the effects. Both the RE and HPT ML estimators remain consistent even if iα or iε are non-

normal. We now turn to the Lancaster estimator which has a Bayesian flavor. We derived model

(3) by first-differencing model (1). Instead, if we difference out 0iy from ity , model (1) reduces

to

0 0 , 1 1i i T iy y p iδ ε−Δ = Δ + + ,

where 0( 1)i ip y iδ α= − + , 1T is a T×1 vector of ones, 0 1 0 2 0( , ,..., )i i i i i iT i0y y y y y y y ′Δ = − − − and

0 , 1 1 0 2 0 , 1 0(0, , , ..., )i i i i i i Ty y y y y y y− i− ′Δ = − − − . Lancaster treats the ip as the unobservable

effects instead of the iα . This treatment is similar to that of HPT in that both do not exploit the

information contained in the level of 0iy .

The density of 0 iyΔ conditional on δ ,ν and ip equals

7

, 0 0 , 1 0 0 , 1/ 2 / 2

1, 1 1 , 1

/ 2 / 22

1 1 , 1

1 1( , , ) exp ( 1 ) ( 1 )(2 ) 2

1 ( ) ( )1 2exp ,

(2 ) ( ( ) )2

FE i i i i i T i i i TT T

i i T i i

T T

i T i i i

f p y y p y y p

y y B y y

T y k y y p

δ ν δ δπ ν ν

δ δν

π ν δν

− −

−− − −

− −

⎛ ⎞′= − Δ − Δ − Δ − Δ −⎜ ⎟⎝ ⎠

⎛ ⎞′− Δ − Δ Δ − Δ⎜ ⎟= ⎜ ⎟

⎜ ⎟′− Δ + Δ − Δ −⎜ ⎟⎝ ⎠

(12)

where the second equality is shown in the appendix. Observe that 0 1t

it i s isy y y=− = Σ Δ . Thus,

the second equality of (12) implies that the density , ( , , )FE i if pδ ν can be also viewed as a density

of the first-differenced ity ’s conditional on δ ,ν and ip .

Using the method of Cox and Reid (1987), Lancaster reparameterizes the effects ip

defining exp( ( ))i ip p b δ= − , where

1 11( ) [( ) / ]T

tb T T t t tδ δ− −== Σ − . (13)

This reparameterization is chosen so that the new fixed effects ip are information orthogonal to

( , )δ ν ′ , in the sense that

2

,2 1

ln( )0

( , )FE i

i

fE

pδ ν ×

⎛ ⎞∂=⎜ ⎟′∂ ∂⎝ ⎠

.

With this reparameterization and assuming uninformative uniform priors to the ip , we can

integrate them out from ,FE if . If we do so, we can show that the conditional density of the

differenced yit’s, 1 2( , , ..., )i i iTy y y ′Δ Δ Δ , conditional on ( , )δ ν ′ , is given

1, , 1 1 , 1( 1) / 2

1 1( , ) exp( ( ))exp ( ) ( )2Lan i i i T i iTf b y y B y yδ ν δ δ

ν ν−

− − −−⎛ ⎞′∝ − Δ − Δ Δ⎜ ⎟⎝ ⎠

δ− Δ ,

which leads to the log-density function

1, , 1

( 1) 1( , ) ( ) ln( ) ( ) ( )2 2Lan i i i T i i

Tb y y B 1 , 1y yδ ν δ ν δν

−− − −

− ′= − − Δ − Δ Δ − Δ δ . (14)

Observe that Lancaster’s ML depends on only two parameters. It does not depend on

the variance νω of the composite error term in model (3). Notice that the error term

contains the projection error component of the effect

iu iu

iα (the error in the population regression of

iα on 0iy ). Thus, the fact that (14) does not depend on ω seems to imply that the Lancaster

estimator is indeed a fixed-effect treatment. However, it is not without costs. That is, while

8

the Lancaster estimator does not require any distributional assumption about the effect iα , it

loses the usual ML properties as we see below.6

An interesting property of the Lancaster estimator, ˆ ˆ( , )Lan Lanδ ν ′ , is that the asymptotic

covariance matrix of the Lancaster estimator is not of the inverted Hessian form. Define

2

, ,, ,( , ) ; ( , )

( , ) ( , ) ( , ) ( , )Lan i Lan i Lan i

Lan i Lan iH Bδ ν δ ν ,

δ ν δ ν δ ν δ ν

′∂ ∂⎛ ⎞⎛= = ⎜ ⎟⎜′ ′∂ ∂ ∂ ∂⎝ ⎠⎝

∂ ⎞⎟⎠

.

Then, it can be shown7:

( ) (( )1 12 1 , , ,

ˆ0 , ( ( , ) ( ( , )) ( ( , )

ˆLan o

d Lan i o o Lan i o o Lan i o oLan o

N N E H E B E Hδ δ δ ν δ ν δ νν ν

− −

×

⎛ ⎞−→ − −⎜ ⎟

−⎝ ⎠) ,

where “→d” means “converges in distribution,” and ( , )o oδ ν ′ denotes the true value of

( , )δ ν ′ . The reason why the asymptotic covariance matrix of the Lancaster estimator is not of

the inverted Hessian form is that , 1( ) ( ,... ) 1.Lan i Lan i iTf d y yθ Δ Δ ≠∫ That is, , (Lan i Lanf )θ is not a

proper density. Thus, the Lancaster estimator is not really a ML estimator.8 This problem

arises from the orthogonal reparameterization used by Lancaster.

Kruiniger (2002) shows that the Lancaster estimator of δ is inefficient compared to the

HPT estimator of δ when T > 2, while the asymptotic variance matrices of the two estimators are

the same when T = 2.9 We can obtain the same result in a more intuitive way.

Proposition 1: Suppose that the prior density of ip , ( | , )i i HPTf p ω ν , is given (0, )HPTN νω .

i

6 An intriguing question would be whether the Lancaster and/or HPT estimators are consistent under broader circumstances than the RE estimator is. Consider the two conditions: (i) is finite; and (ii) 1 2limN i ip N p−

→∞ Σ < ∞1

0lim ( , ) ( , )N i io i ip N y yα α−→∞ ′Σ is finite. Obviously, condition (ii) is stronger than (i). Kruiniger (2002) finds

that a necessary condition for consistency of the Lancaster estimator is (i). It can be shown that the same condition is required for the consistency of the HPT estimator. As long as the condition holds and the errors itε are i.i.d. and uncorrelated with , both the Lancaster and HPT estimators are consistent, even if data are not normal. In contrast, it can be shown that the consistency of the RE estimator requires the stronger restriction (ii). Thus, it is true that that the Lancaster and/or the HPT estimator are consistent under more general conditions. However, there are few realistic cases in which condition (i) holds while (ii) is violated. Condition (ii) would be an acceptable assumption for most of the panel studies (at least the studies with short panels). If (ii) holds, all of the three estimators are consistent, and thus the distinction between RE and FE becomes unimportant.

ip

7 See Kruiniger (2002). 8 Lancaster also acknowledges this point. 9 However, even when T = 2, , ( )Lan i Lanf θ is not a proper density function.

9

Then,

. , ,( , , ) ( | , ) ( )FE i i i i HPT i HPT i HPTf p f p dp fδ ν ω ν θ∞

−∞=∫

All of the proofs are in the appendix. While a rigorous proof of Proposition 1 is given in the

appendix, the result of the proposition is quite intuitive given the law of iterative expectations.

The proposition implies that if , ( , , )FE i if pδ ν were integrated with the normal prior

( | , )i i HPTf p ω ν , the Lancaster estimator becomes equivalent to the HPT estimator. This

explains why the HPT estimator should be more efficient than the Lancaster estimator under our

RE assumption or the assumptions justifying the HPT ML method. Under the RE assumption,

the prior ( | , )i i HPTf p ω ν is informative.

3. Maximum Likelihood When Data Are Random Walks In this section, we investigate the asymptotic distribution of the three estimators discussed above

when data contain unit roots with and without drifts. A number of studies have proposed

different test methods for unit roots. Examples are, among many, Levin, Lin and Chu (2004),

Im, Pesaran and Shin (2003), and, Moon and Phillips (2004), and Moon, Perron and Phillips

(2005). These studies consider the cases of large N and T. Alternative unit-root tests for the

data with fixed T have been studied by Breitung and Meyer (1994), and Harris and Tzavalis

(1999). By investigating the asymptotic distributions of the ML estimators, we derive the

distributions of the Likelihood-Ratio (LR) statistics for testing unit roots in data with fixed T.

3.1. Random Walks without Drifts In this subsection, we consider the asymptotic distribution of the random effects ML estimator

when data follow unit root processes. We begin by considering the cases in which the yit are

random walks without drifts: , 1it i t ity y ε−= + , for t = 1, 2, ... , T. We can easily see that this unit

root process is equivalent to the following three parametric restrictions on model (3):

*: ( , , , ) (1, ,0,0)UNDo o o o o oH *θ δ ν ω ψ ν θ′ ′= = ≡ (15)

where the superscripted “UND” refers to “unit-root with no drift”, the subscripted “o” indicates

the true value of the corresponding parameter, and *ν is unrestricted.

10

For notational convenience, we use , ,RE i θ and , ,RE i θθ to denote the score vector and the

Hessian matrix of the log-density , respectively. We will use the same rule to denote the

derivatives of the log-density function with respect to individual parameters: for example,

,RE i

, , , /RE i RE iδ δ= ∂ ∂ and 2 2, , , /RE i RE iδδ δ= ∂ ∂ .

In usual maximum likelihood theory, the information matrices of log-density (in our case,

( , ,( (RE i oE θθ ))θ− ) are assumed to be nonsingular. However, when data are generated with the

restrictions (15), this is no longer the case. We state this finding formally.

11

Proposition 2: Under , both ofUNDoH , , *( (RE iE θθ ))θ− and , , * , , *( ( ) ( )RE i RE iE θ θ )θ θ ′ are singular.

The proposition implies that the information matrix of the RE log-density function (6) is

singular. This proposition also applies to the HPT and Lancaster estimation. It means that the

usual asymptotic theory of ML estimation does not apply to the RE, HPT and Lancaster ML

estimation when data follow random walks without drift. Singular information matrices often

imply model non-identification. However, as Sargan (1983) finds from the analysis of the

models linear in variables and nonlinear in parameters, a singular information matrix (first-order

condition for lack of identification) is not a necessary condition of non-identification. Similarly

to his cases, Proposition 2 does not imply that the RE model (3) is not identifiable when *oθ θ= .

If the RE model were not identifiable, it should be the case that *θ is not a unique maximizer of

,( ( )RE iE )θ when *oθ θ= . But this is not the case. To see why, consider a simple case of T =

2 at *oθ θ= . If we concentrate ,( ( )RE iE )θ by maximizing it with respect to the nuisance

parameter vector ( , , )ν ω ψ ′ given δ , we can obtain:

( )2

2, *

1 1 1 5 1( ) ln ln 2 ( , ),2 2 2 2 2 2

cRE iE g v Tδδ δ

⎛ ⎞ ⎛ ⎞= − + − − + +⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

δ

where the superscript “c” means “concentrated”, and *( , )g Tν is some function of the variance

parameter ν* and T. Then, it can be show that at 1δ = ,

. , , , , , , , ,( ( )) ( ( )) ( ( )) 0; ( ( )) 3c c c cRE i RE i RE i RE iE E E Eδ δδ δδδ δδδδδ δ δ δ= = = = −

Thus, although the expected value of the second derivative of , ( )cRE i δ equals zero, 1δ = is still a

local maximum point. Moreover, it can be shown that

( )3

, , 2 2

2( 1)( ) .( 2) 1 ( 1)

cRE iE δ

δδδ δ

−= −

⎡ ⎤− + +⎣ ⎦

Thus, ( , ( )cRE iE )δ is always increasing when δ < 1 and decreasing when δ > 1. This indicates

that 1δ = is the global maximum point of ( ), ( )cRE iE δ when *oθ θ= . This shows that *θ is the

global maximum point of ,( ( )RE iE )θ when *oθ θ= . We can generalize this result to the cases

with general T. Stated formally:

Proposition 3: Under , UNDoH *θ is a unique global maximizer of ,( ( )RE iE ).θ Thus, the RE

ML estimator, ˆ ˆ ˆ ˆ ˆ( , , , )RE RE RE RE REθ δ ν ω ψ ′= , is a consistent estimator under . UNDoH

This proposition applies to the HPT estimator. However, somewhat surprisingly, it does

not apply to the Lancaster ML estimation. When data are random walks, the point 1δ = is an

inflexion point of the expectation of Lancaster log-likelihood function.

Proposition 4: Under , UNDoH *θ does not maximize ( ), ( , )Lan iE δ ν . In fact, *θ is an inflexion

point.

This proposition is shown by investigating the concentrated expected log-likelihood

function ( ), ( )cLan iE δ constructed similarly to ( ), ( )c

RE iE δ . For example, when T = 2, we

obtain:

2

,2

( ) ( 1) 02( 1)

cLan iE

δ δδ δ

⎛ ⎞∂ −= ≥⎜ ⎟∂ +⎝ ⎠

.

Thus, ( ), ( )cLan iE δ is always increasing except when 1δ = .

Proposition 4 implies that the estimator of δ which directly maximizes the log-likelihood

function 1 , ( , )Ni Lan i δ ν=Σ may not be consistent when data contain a unit root. In fact, in

unreported simulations with data containing unit roots, we often failed to locate the maximizing

12

values of ( , )δ ν ′ , although, when located, they were close to one. In contrast, Given

that 1δ = is the unique root of ( ),. ( ) /cLan iE δ δ∂ ∂ = 0 when *oθ θ= , the GMM estimator using the

score vector of 1 , ( , )Ni Lan i δ ν=Σ as moment functions would be consistent. However, the GMM

estimator would not be normal even asymptotically. It can be shown that the expectation of the

Hessian matrix of 1 , ( , )Ni Lan i δ ν=Σ is singular under . In this paper we will not further

explore the asymptotic distribution of the Lancaster estimator, although it would be an

interesting research agenda. We focus on the asymptotic distributions of the RE and HPT

estimators. Given that the Lancaster estimator does not use a proper density, it could be viewed

as a GMM estimator or a pseudo ML estimator in the sense of Gouriéroux, Monfort and Trognon

(1984). The approach of Rotnitzky, Cox, Bottai and Robins (2000), which we use below to

derive the asymptotic distributions of the RE and HPT ML estimator, is for the ML estimators.

It would be a valuable research agenda to investigate how their approach can be generalized to

GMM or pseudo ML estimators.

UNDoH

Proposition 4 is a somewhat counterintuitive result in that the expected value of the HPT

log-likelihood function has a global maximum at 1δ = when *oθ θ= . Recall that when 1oδ <

and T = 2, the Lancaster and HPT estimators are asymptotically equivalent. Proposition 4

implies that this equality breaks down when data are random walks ( *oθ θ= ).

Sargan (1983) shows, for the models linear in variables but nonlinear in parameters, that

when information matrices are singular, ML estimators are not asymptotically normal, although

they may be consistent. Similar to his case, Proposition 3 warrants that the RE and HPT

estimators are consistent, but their asymptotic distributions are not normal when *oθ θ= as we

show below. While the asymptotic distributions of the two RE and HPT estimators are similar

to those of the ML estimators Sargan considered, his results do not directly apply to the dynamic

panel data model (3). The model Sargan (1983) has analyzed subsumes linear simultaneous

equation models without variance-covariance restrictions on error terms. The dynamic panel

model (3) can be viewed as a linear simultaneous equations model if we treat 1iyΔ , ... , iTyΔ as

endogenous regressors. However, the model (3) is not a special case of Sargan’s model because

it imposes variance-covariance restrictions on the error terms.

Fortunately, Rotnitzky, Cox, Bottai and Robins (2000, RCBR) derive the asymptotic

13

distributions of the ML estimators of more general models when their information matrices are

singular. We can derive the asymptotic distributions of the RE and HPT estimators using their

method. Here, we will focus on the RE estimator only. All of the results we obtain below also

apply to the HPT estimator.

RCBR study the cases in which the derivatives of log-density functions are linearly

dependent (so that Hessian matrices become singular). For such cases, they derive the

asymptotic distributions of ML estimators and likelihood-ratio (LR) test statistics. To use their

method, we need to show that the derivatives of are linearly dependent at,RE i *θ θ= . Stated

formally:

Proposition 5: , , * , , * * , , *( ) ( ) ( ) 0RE i RE i RE iδ ω νθ θ ν θ− + =

))

.

To get tractable asymptotic results, we now need to reparameterize the model.

Following RCBR, we define the following new parameter vector:

( ) (1

, , * , , * , , * , , *

* *

0( 1

( ) ( ) ( ) ( )

0( 1)

( 1) ,1 ( 1)0

r

rr

r RE i RE i RE i RE i

r

E Eϕ ϕ ϕ δ

δ δν ν

θ δω ω θ θ θ θψ ψ

δ δν ν ν ν δ

δω ω δψ ψ

−

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟≡ = + −⎜ ⎟ ⎜ ⎟ ⎡ ⎤⎜ ⎟′⎣ ⎦⎝ ⎠⎜ ⎟ ⎜ ⎟

⎝ ⎠⎝ ⎠⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟− − −⎜ ⎟ ⎜ ⎟ ⎜ ⎟= + − =⎜ ⎟ ⎜ ⎟ ⎜ ⎟+ −⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠

where ( , , )ϕ ν ω ψ ′= , and E(•) is computed assuming . This parameterization is chosen to

secure that when

UNDoH

*θ θ= , *rθ θ= . The reparameterization requires the knowledge of the true

variance of itε , *ν . However, this problem can be resolved by replacing *ν by rν .

The above reparameterization means that we retain δ andψ in , ( )RE i θ , but treat ω and

ν as the functions of δ and the new parameters rω and rν :

( 1); ( 1)r r r rω ω δ ν ν ν δ ν= − − = + − = δ . (16)

14

Let ( , )r rθ δ ϕ′ ′= , ( , ,r r r )ϕ ν ω ψ′ = and , ,( ) ( , ( , ), ( , ), ).RE i r RE i r rθ δ ν δ ν ω δ ω ψ= Under this

reparameterization, ,r o *θ θ= whenever *oθ θ= . In addition, the reparameterization is designed

to have

. , , * , , * , , * * , , *( ) ( ) ( ) ( ) 0RE i RE i RE i RE iδ δ ω νθ θ θ ν θ= − + =

This reparameterization is necessary because the RCBR approach is basically for the cases where

a first derivative of a log-likelihood function equals (not asymptotically, but exactly) zero. The

asymptotic distributions of the ML estimators of ω andν are different from those of the ML

estimators of rω and rν , although the former can be derived from the latter (see RCBR).

However, the distribution of the ML estimator of δ can be directly obtained from that of the ML

estimator from the reparameterized model.

Define:

( ),1 2i i is s s ′= ; ; ,1 , , *( ) / 2i RE is δδ θ= , ,,2 *( )rRE iis ϕ θ= ;

; ( ) [ ], , 1,i i ijE s s i j′ϒ = = ϒ = 2 1 [ ]ij−ϒ = ϒ .

To use the RCBR approach, we need to check (i) whether or not , , *( )RE i δδ θ equals zero or is

linearly related to , , *( )RE i δ θ and , and (ii) whether or not,2is , , *( )RE i δδδ θ is a linear combination

of . We find that onlyis , , *( )RE i δ θ equals zero, not , , *( )RE i δδ θ , and that , , *( )RE i δδ θ is not a linear

combination of , , *( )RE i δ θ and . We also find that,2is , , *( )RE i δδδ θ is not a linear combination

of *( )is θ . Based on this finding, we obtain the following result:

Proposition 6: Let 1 2( , )Z Z Z ′= denote a mean-zero normal vector with 1( )Var Z −= ϒ , where

1Z is a scalar and 2Z is a 3×1 vector. Let ˆ ˆ ˆ( , )r rθ δ ϕ ′ ′= be the ML estimator that maximizes

1 , ( )Ni RE i θ=Σ . Assume that holds; that is,UND

oH ( , , ) (1,1,0)o o oδ ω ψ ′ ′= . Then,

(17) 1/ 21/ 41

1 121 11 11/ 22 12*

ˆ 0( 1)( 1) 1( 0) 1( 0),( )ˆ( )

B

dr

ZN Z ZZ ZZN

δϕ ϕ −

⎛ ⎞ ⎛ ⎞− ⎛ ⎞−→ > +⎜ ⎟ ⎜ ⎟ ⎜ ⎟− ϒ ϒ− ⎝ ⎠⎝ ⎠⎝ ⎠

<

where * *( ,1,0)ϕ ν ′= , “→d” means “converges in distribution,” and B is a Bernoulli random

variable with success probability equal to half and independent of Z. The Likelihood Ratio

15

(LR) statistic for is a mixture of and with the mixing probability equal to one

half. The LR test for the hypothesis that

UNDoH 2 (2)χ 2 (3)χ

1oδ = is a mixture of and zero with the mixing

probability equal to one half.

2 (1)χ10

Several comments follow from Proposition 6. First, the ML estimator r̂θ is not normal

even asymptotically. Since ˆRE

ˆδ δ= , the above theorem indicates that the convergence rate of

ˆREδ is N1/4. The result also implies that the ML-based t-test for the hypothesis of 1oδ = would

not be properly sized.11 Second, the probability that the ML estimator ˆREδ is equal to the true

value of δ converges to half as . This implies that the asymptotic distribution of the ML

estimator

N → ∞

ˆREδ will be spiked at 1δ = when *oθ θ= . Third, the two LR test statistics from the

reparameterized log-likelihood , ( )RE i rθ are the same as the LR test statistics directly obtained

from the original log-likelihood , ( )RE i θ . Thus, using the LR statistics for testing , we do

not need the parameterization (16). Fourth, we can think of the LR statistic for testing the joint

hypothesis that

UNDoH

1oδ = and 0oω = . It can be shown that when *oθ θ= , this test statistic follows

a mixture of and .2 (1)χ 2 (2)χ 12

Finally, we can obtain the similar results for the HPT estimator. The unit-root

hypothesis (15) implies that 1oδ = and , 0HPT oω = . The HPT-based LR test for this joint

hypothesis follows a mixture of and . 2 (2)χ 2 (1)χ

10 If the means of the and0iy iα are non-zero, we need to include an intercept term, say a, in the log-likelihood function replacing 1 0( )i iy yψΔ − by 1 0( i iy y a)ψΔ − − . Then, the unit-root hypothesis (15) implies a = 0 in addition to the restrictions in θ*. For this case, the LR statistic for testing all of the restrictions implied by (15) is a mixture of and . 2 (3)χ 2 (4)χ11 We may consider an alternative Wald-type test statistic, 4 11ˆ( 1) /REN δ − ϒ . Proposition 6 implies that under

, this statistic follows a mixture of and zero. UNDoH 2 (1)χ

16

12 These LR tests are two-tail tests. When T is fixed, the RE likelihood function is well defined in the neighborhood of 1δ = . Thus the RE ML estimator is not subject to the boundary problem raised by Andrews (1999), and the hypothesis of 1oδ = can be tested against the two-tail alternative hypothesis of 0 1δ ≠ . This justifies use of the LR tests. However, some one-tail alternatives of the LR tests would be more powerful since oδ is unlikely to be greater than one, although we do not investigate them here.

3.2. Random Walks with Homogenous Drifts We now consider the ML estimation of the dynamic panel data model with homogeneous trends,

and the LR test for the hypothesis of random walk with homogeneous drifts. Specifically, we

consider the following model:

17

) , 1 (it i t i ity y tδ β α ε−= + + + . (18)

We assume that 0( , , )i i iy α ε ′ ′ satisfies the RE assumption (2), but we now allow the initial

values 0iy and the effect iα to have nonzero means. Assume that 0 1 2( | )i i iE y y 0α γ γ= + .

Then, similarly to what we have done from (1) to (3), we can transform (18) into

01

( 1) 1 ( 1) 1 , 1 1

1 0 0,

0 0 1ii i

T T i Ti i

yy uyy a

ψ δεβ− × − × − −

Δ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ΔΔ Δ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠

+ (19)

where 1a γ β= + , 2( 1)ψ δ= − + γ 1iand 0( ( | ))i i i iu E yα α ε= − + . The individual log-density

function for this model (conditional on 0iy ) is given:

( ) ( )

( )

,

1, 1 1 1 , 1 1

2

1 0 1 , 1 1

1( ) ln(2 ) ln( ) ln( )2 2 21 1 1

2

( ) ( 1 ) ,2

D DRE i T

i i T T i i T

i i T i i TT

T T

y y B y y

T y y a k y y

φ π ν ξ

δ β δ βν

ψ δ βνξ

−− − − − −

− − −

= − − −

′− Δ − Δ − Δ − Δ −

′− Δ − − + Δ − Δ −

(20)

where ( , , , , , )D aφ δ ν ω ψ β ′= , and 1T Tξ ω= + , as before.

Suppose that data are generated by the following trend-stationary process:

it i ity gt qη= + + ; , 1it i t itq qδ ε−= + , 0,...,t T= , (21)

where 1~ (0 , )i T TIN ν× i,ε η and are also normally distributed, and all of0iq iε , iη and are

mutually independent. Assume that

0iq

0 1 2( | )i i iE y y 0η τ τ= + . This data generation process is

the same as the process (18), if we set:

(1 )i iα δ η= − ; 2(1 )ψ δ τ= − ; 1(1 )a gδ τ= − + ; (1 )gβ δ= − .

However, there exists an important difference between (18) and (21). In the former, we do not

assume that the initial values 0iy are distributed around the effect iα . Thus, we do not impose

any restriction on , 0cov( , )oy iyα iσ α= . Without any restriction on ,oy ασ , the parameter vector

φ D is orthogonal to the variance (0

2yσ ) of the initial values 0iy . Thus, the estimation of Dφ

based on the conditional log-density (20) is equivalent to the joint estimation of Dφ and0

2yσ

based on the unconditional log-density of 0 1( , , ..., ) .i i iTy y y ′Δ Δ However, this is not the case for

(21). The process (21) implies that the 0iy are distributed around /(1 )i iη α= − δ , and that

2, ,(1 ) (1 )

o oy yα η ησ δ σ δ σ= − = − . But this restriction implies that0

2 2(1 ) / yηψ δ σ σ= − , and

0

2 2 2 2 2 2(1 ) ( ) /( ) (1 ) /oy yη η oyω δ σ σ σ σ ν δ ψ ψσ ν= − − = − − . (22)

Observe that ω does not contain any free parameter givenψ , δ ,ν and0

2yσ . This means that any

knowledge of0

2yσ would help to obtain a more efficient estimator of Dφ . This implies that the

estimation of Dφ based on the conditional log-density (20) is not equivalent to the joint

estimation of Dφ and0

2yσ based on the unconditional log-density of 0 1( , , ..., ) .i i iTy y y ′Δ Δ The ML

estimators based on the unconditional density will be more efficient. Of course, these

unconditional ML estimators will be inconsistent if condition (22) is violated.

We now turn to the cases where data are random walks with homogeneous drifts; that is,

1δ = in (21). We can show that the unconditional ML estimators computed with the restriction

(22) are consistent and asymptotically normal.13 Thus, the usual ML theory applies. However,

the conditional ML estimators without the restriction have a different story. The hypothesis of

random walk with a homogeneous drift implies the following restrictions on (20):

* * *: (1, ,0,0, ,0)D D Do oH φ φ ν a ′= ≡ , (23)

where “D” refers to “(homogeneous) drifts”, and *ν and are unrestricted. When this

hypothesis holds, we obtain essentially the same results as Proposition 5. Stated formally:

*a

Proposition 7: . , , * * , , * , , * * , , *( ) ( ) ( ) ( ) 0D D D D D D D DRE i RE i RE i RE iaδ β ω νφ φ φ ν φ− − + =

Proposition 7 implies that the Hessian matrix (not just the information matrix) of the RE

ML estimator of Dφ is singular under DoH . In order to derive the asymptotic distribution of the

ML estimator, we can use the following reparameterization: Define ( , , , , , )Dr r r a rφ δ ν ω ψ β ′= ,

where

1813 The restricted ML estimators are also consistent and asymptotically normal for the models without drifts.

( , )r rvν ν ν δ δ= = ; ( , ) ( 1)r rω ω δ ω δ= − − ; ( , , ) ( 1)r ra aβ β β δ β δ= = − − . (24)

Then, ,0 *D Drφ φ= whenever *

D Doφ φ= . Let

, , ,( ) ( , ( , ), ( , ), , , ( , , ))D D DRE i r RE i r r ra aφ δ ν ν δ ω ω δ ψ β β δ=

so that

. , , * , , * * , , * , , * * , , *( ) ( ) ( ) ( ) ( ) 0D D D D D D D D D DRE i RE i RE i RE i RE iaδ δ β ω νφ φ φ φ ν φ= − − + =

)

With this parameterization, we redefine:

( , , , ,r r r raϕ ν ω ψ β ′= ;

; ; ,1 ,2( , )i i is s s′ ′= ,1 , , *( ) / 2D Di RE is δδ φ= ,2 , , *( )

r

D Di RE is ϕ φ= ; , *( | )D D

i r oVar s φ φϒ = = .

Similarly to Proposition 6, let 1 2( , )Z Z Z ′= denote a 6×1 mean-zero random vector with

, where1( ) ( )Var Z −= ϒ 1Z is a scalar. Then, the ML estimator of Drφ , ˆ ˆ ˆ( , )D

r rφ δ ϕ ′ ′= , has the

same asymptotic distribution as (17). Accordingly, the LR statistic for testing DoH is a mixture

of and , , where B is a Bernoulli random variable with

success probability equal to one half. In addition, under , the LR statistic for testing

2 (3)χ 2 (4)χ 2 (4) (1 ) (3)B Bχ + − 2χ

UDoH

1oδ = is . Similar to the cases of unit-roots without drifts, these LR statistics can be

computed using the original log-likelihood . The reparameterization (24) is not

required.

2 (1)Bχ

1 ,N Di RE=Σ i

)

3.3. Random Walks with Heterogeneous Drifts In this section, we examine the ML estimation of a dynamic panel data model with

heterogeneous trends, and the corresponding LR test for the hypothesis of random walk with

heterogeneous drifts. Under this model the data follow the following process:

, 1 (it i t i i ity y tδ β α ε−= + + + , (25)

As before, the error terms itε are assumed to be normal and i.i.d. over both time and individuals.

The error terms are not correlated with any of iα , iβ and 0iy . The identification of the model

requires that . Assume that3T ≥ ( )i iα β+ and iβ are normal conditional on the initial value 0iy

with conditional means linear in 0,iy and the conditional variance matrix, ,Wν where

19

jhW ω⎡= ⎣ ⎤⎦ (j, h = 1, 2) is a 2×2 matrix.

Moon and Phillips (2004) have proposed a GMM estimator for model (25). In their

study, T is large and the unobservables iα and iβ are nuisance parameters. An interesting

property of the GMM estimator is that its convergence rate is N1/6. In this subsection, we

consider an alternative random-trends treatment.

Define ,2, 1it it i ty y y −Δ = Δ − Δ 2 2 2

3( ,... )i iy y yiT ′Δ = Δ Δ , and 2 2 2, 1 2 , 1( ,... )i i iy y y− −T ′Δ = Δ Δ .

We also define 1 0( ( | ))i i i i i iu E y 1iα β α β= + − + + ε 2i, and 2 0( ( | ))i i i iu E yβ β ε= − + Δ

1

2

i

i

i

u

. Using

this notation, we can transform the model (25) into:

1 1 1 0

2 2 2 0 1 22 2

, 1

i i

i i i

i i

y a yy a y y uy y

ψψ δδ ε−

Δ +⎛ ⎞⎛ ⎞ ⎛⎜ ⎟⎜ ⎟ ⎜Δ = + + Δ +⎜ ⎟⎜ ⎟ ⎜

⎜ ⎟ ⎜⎜ ⎟Δ Δ⎝ ⎠ ⎝⎝ ⎠

⎞⎟⎟⎟Δ ⎠

. (26)

Define:

1 0 0 0 ... 01 1 0 0 ... 0

1 2 1 0 ... 00 1 2 1 ... 0: : : : :0 0 0 0 ... 1

HDT

T T

L

×

⎛ ⎞⎜ ⎟−⎜ ⎟⎜ ⎟−′ = ⎜ ⎟−⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

; , 2

( 2) 20TT

IJ

− ×

⎛ ⎞= ⎜ ⎟

⎝ ⎠

such that . Then, the variance matrix of the error vector

in (26),

21 1 2( ,..., ) ( , , )HD

T i iT i i iL y y y y y′ ′Δ Δ = Δ Δ Δ ′ ′

21 2( , , )i i iu u ε ′ ′Δ , is of the form:

( )( )HD HD HDT T T TW L L J WJν ν ′

T′Ω = + .

With this variance matrix, the log-density of 21 2( , ,i i iy y y )′ ′Δ Δ Δ conditional on 0iy is given:

,

1 1 1 1 1 11

2 2 2 0 1 2 2 2 0 12 2 2 2

, 1 , 1

1 1( ) ln(2 ) ln( ) ln{det[ ( )]}2 2 2

1 [ ( )]2

( ) ( )

HD HD HDRE i T

i io i ioHD

i i i T i i

i i i i

T W

y a y y a y,iy a y y W y a y y

y y y y

φ π ν

ψ ψψ δ ψ δ

νδ δ

−

− −

= − − − Ω

′⎛ ⎞ ⎛ ⎞Δ − − Δ − −⎜ ⎟ ⎜ ⎟− Δ − − − Δ Ω Δ − − − Δ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟Δ − Δ Δ − Δ⎝ ⎠ ⎝ ⎠

(27)

where 11 12 22 1 2 1 2( , , , , , , , , )HD a aφ δ ν ω ω ω ψ ψ ′= . An alternative representation of the log likelihood

20

function, which is computationally more convenient, is given in the appendix.

Suppose now that the data follow the unit-root process with heterogeneous drifts:

it i ity β εΔ = + . This process implies the following five restrictions on HDφ :

* 11,* 1,* 1,* *: (1, , , 0, 0, , 0, , 0)HD HD Ho oH Daφ ν ω ψ φ′= ≡ , (28)

where “HD” refers to “heterogeneous drifts,”, and *ν , 11,*ω , 1,*ψ , and are unrestricted (1,*a 1oδ = ,

12, 22, 0o oω ω= = , and 2, 2, 0o oaψ = = ). Similar to the cases of unit-roots without drifts and with

homogeneous drifts, we can show that under HDoH , the information matrix of the log-density (24)

is singular. Stated formally:

Proposition 8: At *HD HDφ φ= ,

2 2 11 12, , 1 , , 1 , , , , 11 , , 11 , ,( 1)HD HD HD HD HD HD

RE i RE i a RE i RE i RE i RE iaδ ψ ν ωψ ν ω ω− − + − + − 0.ω =

Proposition 8 implies that similarly to Proposition 6, we can derive the asymptotic

distribution of the ML estimator of HDφ . The LR statistic for testing HDoH is a mixture of

and , . The ML estimator of

2(5)χ

2 (4)χ 2 (5) (1 ) (4)B Bχ + − 2χ δ is N1/2-consistent. This

convergence rate is contrasted to N1/6 of the GMM estimator of Moon and Phillips (2004). We

are unable to identify the source of their different convergence rates. Indeed, their convergence

rates are obtained under different settings: large N and fixed T for the ML estimator; and both

large N and T for the GMM estimator. The two estimators could be better compared by

investigating the asymptotic distribution of the ML estimator as T tends to infinity. We will

leave this comparison to a future study.

4. Monte Carlo Experiments In this section we consider the finite-sample properties of the RE and HPT ML estimators. We

here consider the RE and HPT ML estimators only.14 Alvarez and Arellano (2004) compare the

efficiency gains of the RE estimator over the HPT and Lancaster estimators through calculations

of population asymptotic variances. Thus, our results partly supplement theirs. We also

2114 Some simulation results on the Lancaster estimator are available upon request from the authors.

investigate the size and power properties of the t-test and the Likelihood Ratio (LR) test based on

the RE and HPT estimators. For our experiments, we consider three cases: the ML estimation

without trend, with homogeneous trends, and with heterogeneous trends.

4.1. ML Estimation without Trend

22

2

In this subsection, we consider the finite-sample properties of the RE and HPT estimators

computed with the assumption of no trend. The foundation of our Monte Carlo experiments

here is the stationary data generating process (11), where ~ (0, )i N ηη σ , and 20 0~ (0, )i qq N σ

~ (0, )i Nε ν 2σ = 2. We set ,2η qoσ = 2, 1ν = , and T = 5. We try four different values, 0.5, 0.8,

0.9 and 1, for oδ . Observe that when 1o = , the itδ y are random walks without drifts. For each

trial, we use 10,000 iterations. We consider two cases, N = 500 and 100, to examine both the

large-sample and small-sample properties of the estimators. The true value of ω must be non-

negative. We did not have to impose this restriction in our simulations because we always

obtained positive estimates for ω .

Table 1 reports the bias and MSE (mean square error) of the RE ML estimator of δ . We

also report the finite-sample size and power properties of the t-tests based on the RE ML

estimation. For all true values of δ ( oδ ), the biases of the ML estimator are small, even when N

= 100. MSE generally increases with oδ when 0.9oδ ≤ , but it slightly decreases as oδ increases

from 0.9 to one. When oδ is small, the t-test is properly sized. However, as oδ increases, the

test tends to over-reject correct hypotheses. Not surprisingly, the degree of the over-rejections

of the t test is inversely related to the sample size.

The power of the t-test to reject correctly the unit-root hypothesis increases with N. But,

when N = 100, the power of the t-test is generally low especially when 0.8 1oδ≤ < . When

1oδ = , the t test rejects the correct unit-root hypothesis too much. This trend is not dependent

on the sample size. This result is consistent with Proposition 6. The HPT estimator has

similar properties as the RE estimator.

Figures 1-2 show the finite-sample distributions of the sampling errors of the RE

estimator (that is, ( RE o )δ δ− ). Figure 1 is for the case with N = 500 and Figure 2 is for the case

with N = 100. In either figure, the distribution of the RE estimator becomes wider as δo

increases. Up to 0.9oδ = , the estimator is roughly normally distributed. Then, when 1oδ = ,

the estimator is no longer normally distributed. Its distribution has a hump near 1δ = as

Proposition 6 suggests.

Figures 3 compares the sampling error distributions of the RE and HPT ML estimators

when N = 100. Somewhat surprisingly, the two estimators are similarly distributed. At our

choice of parameter values, the efficiency gains of the RE estimator over the HPT estimator are

not substantial. However, we find from unreported experiments that the efficiency gain of the

RE estimator increases with the conditional variance of 0iy given the individual effects iη ( 20qσ

in our set-up).

We now examine the finite-sample size and power property of the LR test for the

hypothesis of unit root without drifts. The parametric restrictions we test are given

( , , ) (1,0,0)o o oδ ω ψ ′ = ′ . Table 2 reports the rejection rates of the LR test based on three different

distributional assumptions, and their mixture. For sensitivity analysis, we also

report the results we obtain with many different combinations of the values of

2 (2),χ 2 (3)χ2ησ , 2

qoσ , andν .

For our base choice of , , and2 2ησ = 2 2qoσ = 1ν = , when N = 500 and the normal size is 5%, the

rejection rates of the LR test with the , and their mixed distributions are 3.80%,

8.54%, and 5.40%, respectively. Thus, the LR test with the mixed distribution is reasonably

well sized. Even when N = 100, the LR test with the mixed

2 (3)χ 2 (2)χ

2χ distribution performs well. It

tends to over-reject the unit root hypothesis, but the size of this distortion not substantial, clearly

smaller than that of the t-test reported in Table 1.

Table 2 also reports the finite-sample power property of the LR test to reject the unit-root

hypothesis. The power of the LR test depends on the sample size (N), the true value of δ ( oδ ),

and the relative sizes of the conditional variance of the initial values of (y 20qσ ) and the variance

of the random noises (ν ). The power is positively related to the sample size and the

conditional variance of the initial values of (conditional on the unobservable effectsy iη ), while

it is negatively related to the variance of random noises and the true value of δ . To be more

specific, consider the case of N = 500. When oδ = 0.9, the power of the test is 100% or very

23

close to it for all of the cases reported in Table 2. Even the true value of δ is closer to one

( 0.9o 5δ = ), the power remains reasonably high, when the conditional variance is relatively

larger than the variance of the random noises. If we compare our base case of 2 2( , , )qoησ σ ν =

(2,2,1) with the cases of 2 2( , , )qoησ σ ν = (2,1,1) and 2 2( , , )qoησ σ ν = (2,2,4), we can see that the

power of the LR test is negatively related to the ratio ofν and 2qoσ . The power is somewhat low

when the ratio is large, especially when 1% of significance level is used for the test. But, the

power of the LR test using 10% of significance level is greater than 89% for all of the cases

reported in Table 2.

We now consider the power of the LR test when N = 100. Clearly, the power is lower

when N = 100 than when N = 500, especially when the ratio ofν and 2qoσ is large and the true of

δ is close to one. But, when the ratio ofν and 2qoσ is small, the power of the test is reasonably

high, even if the true value of δ is very close to one (see the case of 2 2( , , )qoησ σ ν = (2,4,1)). For

dynamic models, we can expect that the ratio ofν and 2qoσ should be low because the initial

values of is the weighted sum of the past random noises. The ratio would be even lower

when the true value of

y

δ is close to one. Thus, we can guess that for quite general cases, the LR

test would have high power even in small samples.

Table 3 reports the finite-sample performances of the LR test for unit root based on the

HPT estimator. Comparing Tables 2 and 3, we can see that the LR test based on the HPT

estimator is generally better sized than the test based on the RE estimator, but only marginally.

In terms of power, the test based on the RE estimator is superior. In particular, the power gain

by using the RE estimator instead of the HPT estimator is substantially large when the variance

of the initial values 0iy ( 2qoσ ) is large compared to the variance of random noises itε (ν ).

We also consider the finite-sample properties of the LR test for the single hypothesis of

1oδ = . Tables 4 and 5 reports the size and power of the LR test based on the RE and HPT

estimators, respectively. Comparing Tables 2 and 4 (and/or Tables 3 and 5), we can see that the

LR tests for the multiple restrictions ( , , ) (1,0,0)o o oδ ω ψ = are generally better sized than the LR

tests for the single restriction of 1oδ = . The former tests also have much higher power. Thus,

it is desirable to test the multiple restrictions implied by unit-root process jointly instead of 24

testing a single restriction of unit root.

The general findings from our first Monte Carlo experiments can be summarized as

follows. First, when the true value of δ is near one, the RE and HPT estimators do not follow

usual normal distributions even if the sample size is large. This finding is consistent with our

result in Proposition 6. Second, when the data follow the random walks without drifts, the t-

tests based on the ML estimators reject correct unit-root hypothesis too often. In contrast, both

the LR tests based on the RE and HPT estimators perform well. Even when N is small, they are

sized properly and have good power to reject the unit root hypothesis. Thus, use of the LR tests

would be a useful alternative way to test unit roots. Third, we find that in terms of the power of

the LR test, use of the RE estimator instead of the HPT estimator is desirable especially when the

initial values have a large variance. Finally, the LR test for the single restriction of 1oδ = has

low power. Use of the LR test for the multiple restrictions implied by the unit-root hypothesis

is recommended.

4.2. ML Estimation with Homogenous Trends We now consider the LR test for unit root based on the RE estimation with homogeneous trends.

For simulations, data are generated following (21). The parametric restrictions we test are

given : ( , , , ) (1,0,0,0)o o o o oH δ ω ψ β ′ = ′ . Table 6 reports the rejection rates of the LR test based

on three different distributional assumptions, , and their mixture. Again as with

Table 2, we also report the results we obtain with different combinations of the values of

2 (3)χ 2 (4)χ2ησ ,

2qoσ , andν . For all simulations, the trend parameter g in (21) is fixed at 0.5. The choice of

different values of g does not alter our simulation outcomes reported in Table 6. The results

reported in Table 6 are very comparable to those in Table 2. The introduction of homogeneous

drift does not lead to a large change in the size or power of the LR test. For our base choice of

, , and2 2ησ = 2 2qoσ = 1ν = , when N = 500 and the normal size is 5%, the rejection rates of the

LR test with the , , and their mixed distributions are 3.91%, 7.67%, 5.11%. Thus,

the LR test with the mixed distribution is again reasonably well sized. In addition, the test has a

power to reject unit root even when

2 (4)χ 2 (3)χ

oδ is close to one. As with the no-drift simulations when N

= 100, the LR test with the mixed 2χ distribution performs well. It tends to slightly over-reject

25

the unit root hypothesis but retains a good power property even when N is small, especially when

the variance of the initial values 0iy is large.

4.3. Random Walks with Heterogeneous Drifts In this subsection, we examine the finite-sample properties of the RE ML estimator computed

under the assumption of heterogeneous trends. The data generating process we use is given:

26

it i i ity g t qη= + + , 1it i t itq q; δ ε− + , 0,...,t T= , =

where 1~ (0 , )i T TIN ν× i,ε η and are also normally distributed with zero means, and all of 0iq

iε , iη , and are mutually independent. The mean of the normally distributed trends g0iq ig i is

fixed at 0.5 while their variance is set at three different values, 0, 0.5, and 1. The data are

generated with ,2 2ησ = 2 2,qoσ = 1ν = , our default case from the previous two experiments. As

before we generate data with various values of oδ . Note that when 1oδ = and 2gσ > 0, the data

are random walks with heterogeneous drifts. When 1oδ = and 2gσ = 0, the data are random

walks with homogeneous drifts. As with the pervious experiments we use 10,000 iterations and

use two sample size values: N = 100 and N = 500.

Table 7 reports the bias and MSE of the RE ML estimator of δ . The table also reports

the finite-sample size and power of the t-tests based on the RE ML estimation. Although the

bias and MSE of the estimates of δ are generally small, both increase with oδ and decrease with

N. As with the simulations in section 4.1, when 0.5oδ = the t-test is both properly sized and

powered for all values of 2gσ . Also similarly to the previous simulations, as oδ increases the t-

test starts to over-reject the true hypothesis and under-reject the incorrect hypothesis, although

the degradation is less when N = 500 than when N = 100.

Table 8 reports the finite-sample size and power of the LR test for the multiple

restrictions . Observe that trends become homogeneous

when . Since we use

12, 22, 2, 2,( , , , , ) (1,0,0,0,0)o o o o oaδ ω ω ψ ′ = ′

2 0gσ = 2 2( , , )qoησ σ ν = (2,2,1) to generate the results in Table 8, the data

generating processes are the same for the case of 2 0gσ = in Table 8 and the case of 2 2( , , )qoησ σ ν

= (2,2,1) in Table 6. Comparing these two cases, we can see that the size and power properties

of the LR test based on the ML estimation with heterogeneous trends are quite comparable to

those of the test based on the ML estimation with homogeneous trends. Similarly to the cases

of homogeneous trends, mixed χ2 distribution gives rejection rates close to 1%, 5% and 10% at

the 1%, 5%, 10% of normal sizes when 1oδ = . The test with distribution under-rejects

the correct hypothesis while the test with distribution over-rejects it. The test with mixed

2 (5)χ

2 (4)χ

2χ tends to over-reject the unit root hypothesis, but in much smaller scales than the t-test does as

reported in Table 7. The power of the LR test reported in Table 8 is also comparable to that of

the test based on the ML estimation with homogeneous trends. It appears that the power of the

LR test allowing heterogeneous trends is not sensitive to the degree of the heterogeneity in trends.

Finally, similarly to the cases with random walk without drifts, Table 9, with Table 8,

shows that the LR test for the single restriction of 1oδ = has low power. With or without drifts,

use of the LR test for the multiple restrictions implied by the unit-root hypothesis is

recommended.

5. Concluding Remark This paper has considered the asymptotic and finite-sample properties of a random effects ML

estimator and the two fixed-effects ML estimators of Lancaster (2002) and Hsiao, Pesaran and

Tahmiscioglu (2002, HPT). When data are stationary, the random effects ML estimator is

asymptotically more efficient than the other two fixed effects ML estimators when both the

individual effects and the initial observations are normal. We also consider the asymptotic

distributions of the random effects and HPT ML estimators when data contain unit roots. The

two estimators, as well as the Lancaster estimator, are non-normal. This distortion results

because the information matrices of the estimators are singular when data contain unit root.

Thus, the t-tests for unit root are inappropriate. Consistent with the prediction of Rotnitzky,

Cox, Bottai and Robins (2000), the Likelihood-Ratio tests for unit root with the p-values from

the mixed chi-square distributions perform much better than the t-tests. They also have good

power properties even if the number of observations is small.

Our results depend on the assumption of homoskedastic random errors ( itε ). Alvarez

and Arellano (2004) have shown that when the errors are heteroskedastic or autocorrelated over

time, the ML estimator computed incorporating such heteroskedasticity and autocorrelations is

generally consistent and asymptotically normal even if data may follow random walks.

27

28

However, Thomas (2005) found that the finite-sample distribution of the ML estimator

considerably deviates from the normal distribution, if data follow unit root processes and the

error variances change only slowly over time. In addition, there are some special cases in

which the information matrix of the ML estimator becomes singular. An example is the case in

which only the variance of the random error at the last time period is different from the variances

at other time periods. Another example is the case in which the random errors follow a MA(1)

process with the MA coefficient equal to one (Thomas, 2005). We are currently working on

such special cases.

Our results also convey a message to the studies of Monte Carlo experiments on dynamic

panel data models. The homoskedasticity assumption is not an innocuous simplifying

assumption for the models. The asymptotic and finite-sample properties of the estimators

developed for the models could crucially depend on whether or not the homoskedasticity

assumption holds.

Appendix

29

P T − Proof of Equation (12): Define 11 1T T T′= T T TQ I P, − ; and =

1

( 1)

1 1 0 ... 0 00 1 1 ... 0 0: : : : :0 0 0 ... 1 00 0 0 ... 1 1

T

T T

D −

− ×

−⎛ ⎞⎜ ⎟−⎜ ⎟

′ ⎜ ⎟=⎜ ⎟⎜ ⎟⎜ ⎟−⎝ ⎠

,

where 1T is a (T×1) vector of ones. Observe that 1 1( ,..., )T i iTD y y y− i′ ′ = Δ . Notice that 11T TD −′ =

, Thus, by Rao (1973, p. 77), we have( 1) 10 T − ×1

1 1T T T TQ D B D−1− − −′= . Also, observe that

10 1 0 1( )T

i t it i i Ty T y y y k y−= −′Δ ≡ Σ − = Δ + Δ1 i . Similary, 0 , 1 1 , 1i T iy k y− −′Δ = Δ − . Using these results, we

obtain

0 0 , 1 0 0 , 1

0 0 , 1 0 0 , 1

0 0 , 1 0 0 , 1

2, 1 , 1 0 0 , 1

1, 1 1

( 1 ) ( 1 )( 1 ) ( 1 )

( 1 ) ( 1 )

( ) ( ) ( )

( ) (

i i i T i i i T

i i i T T i i i T

i i i T T i i i T

i i T i i i i i

i i T

y y p y y py y p Q y y p

y y p P y y p

y y Q y y T y y p

y y B y

δ δδ δ

δ δ

δ δ δ

δ

− −

− −

− −

− − −

−− −

′Δ − Δ − Δ − Δ −

′= Δ − Δ − Δ − Δ −

′+ Δ − Δ − Δ − Δ −

′= − − + Δ − Δ −

′= Δ − Δ Δ 2, 1 1 1 , 1) ( ( ) )i i i T i i i .y T y k y y pδ δ− − −′− Δ + Δ + Δ − Δ −

Proof of Proposition 1: A straight algebra shows

( ) ( )

( ) ( )

,

1 2, 1 1 , 1/ 2 / 2

21/ 2 1/ 2 1/ 2

1, 1 1 , 1/ 2 / 2

1/ 2 1/ 2

( , , ) ( | , )

1 1exp ( )(2 ) 2 2

1 1exp(2 ) ( ) 2

1 1exp(2 ) 2

1(2 )

FE i i i HPT

i i T i i i iT T

iHPT HPT

i i T i iT T

f p f p

Ty y B y y p gv

p

y y B y yv

v

δ ν ω ν

δ δπ ν

π ν ω νω

δ δπ

π

−− − −

−− − −

⎡ ⎤′= − Δ − Δ Δ − Δ −⎢ ⎥⎣ ⎦⎛ ⎞

× −⎜ ⎟⎝ ⎠

⎡ ⎤′= − Δ − Δ Δ − Δ⎢ ⎥⎣ ⎦

×

−

2 21/ 2

1exp .( ) 2 2 2i i i i

HPT HPT

T T Tp p p g gω ν νω ν ν

2i

⎡ ⎤− − + −⎢ ⎥

⎣ ⎦

where 1 1 , 1(i i T i ig y k y y )δ−′= Δ + Δ − Δ − . We can also show:

2 2 21/ 2 1/ 2 1/ 2

21/ 2, ,1/ 2

, 1/ 2 1/ 2 1/ 2,

2

,

1 1exp(2 ) ( ) 2 2 2

( )( ) exp

(2 ) ( ) 2

exp .2

i i i i iHPT HPT

HPT T HPT T HPTHPT T i i

HPT HPT HPT T

iHPT T

T T Tp p p g gv

Tp gv

T g

π ω ν νω ν ν

ξ ξ ωξπ ω νω ξ

νξ

−

⎡ ⎤− − + −⎢ ⎥

⎣ ⎦⎡ ⎤⎛ ⎞⎢ ⎥= − ⎜ ⎟⎜ ⎟−⎢ ⎥⎝ ⎠⎣ ⎦

⎡ ⎤× −⎢ ⎥

⎢ ⎥⎣ ⎦

Thus,

,

1 2, 1 1 , 1/ 2 / 2 1/ 2

, ,

21/ 2, , 2

1/ 2 1/ 2 1/ 2,

( , , ) ( | , )

1 1exp ( ) ( )(2 ) ( ) 2 2

( )exp

(2 ) ( ) 2

FE i i i i HPT i

i i T i i t iT THPT T HPT T

HPT T HPT T HPTi i

HPT HPT HPT T

f p f p dp

Ty y B y y g

Tp g

δ ν ω ν

δ δπ ν ξ ν νξ

ξ ξ ωπ ν ω νω ξ

∞

−∞

−− − −

∞

−∞

⎡ ⎤′= − Δ − Δ Δ − Δ −⎢ ⎥

⎣ ⎦⎡ ⎤⎛ ⎞⎢ ⎥× − −⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦

∫

1 2, 1 1 , 1/ 2 / 2 1/ 2

, ,

,

1 1exp ( ) ( )(2 ) ( ) 2 2

( | ).

i

i i T i i t iT THPT T HPT T

HPT i i HPT

dr

Ty y B y y g

f r

δ δπ ν ξ ν νξ

θ

−− − −

⎡ ⎤′= − Δ − Δ Δ − Δ −⎢ ⎥

⎣ ⎦=

∫

The following lemmas are useful to prove the propositions in Section 3. The first two

lemmas provide alternative forms of ( )T ωΩ and 1[ ( )]T ω −Ω .

Lemma A.1: Let ( )s ωΩ be the s×s (s = 2, ..., T) matrix of the form of ( )T ωΩ in Section 2. Let:

1 0 0 ... 0 01 1 0 ... 0 0

0 1 1 ... 0 0: : : : :0 0 0 ... 1 00 0 0 ... 1 1

s

s s

L

×

⎛ ⎞⎜ ⎟−⎜ ⎟⎜ ⎟−

′ = ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

−⎝ ⎠

;

1000:0

s

s s

c

×

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟

= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

.

Then, ( )s s s s sL L c cω ω′ ′Ω = + .

Lemma A.2: 1 1[ ( )] ( )1T T T

TL L k kT T T

ωωω

− − ′′Ω = −+

, where 1(1, )T Tk k −′ ′ ′= .

30

31

1TLemma A.3: 1 11 1 1 1( )T T T TB D D Tm m− −

− − − −′ ′= − − , where 11 (1,2,..., 1)Tm T T−

− = − and 1TD − is the

square matrix of the first (T-1) columns of 1TD −′ . In addition, ( )11 ( 1)( 1) /Ttrace B T T−

− = − + 6

)

;

and . ( )1 1 ( 1)(2 1) /(6T Ttrace k k T T T− −′ = − −

Lemma A.4: The first-order and second-order derivatives of , ( )RE i θ are given:

1, , , 1 1 , 1 , 1 1

1 ( ) (RE i i T i i i T iT

Ty B y y y k dδ , )δ ψ δν νξ

−− − − − −′ ′= Δ Δ − Δ + Δ ;

2

2, , 2 ( ( , ))

2 2RE i iT T

T T dω ψ δξ νξ

= − + ;

( ) ( ) ( )21, , , 1 1 , 12 2

1 ( , ) ;2 2 2RE i i i T i i i

T

T Ty y B y y dν δ δν ν ν ξ

−− − −

′= − + Δ − Δ Δ − Δ + ψ δ

, , 0 ( , )RE i i iT

T y dψ ψ δνξ

= ;

1, , , 1 1 , 1 , 1 1 1 , 1

1RE i i T i i T T i

T

Ty B y y k k yδδ ν νξ−

− − − − − − −′ ′= − Δ Δ − Δ Δ′ ;

1, , , 1 1 , 1 , 1 12 2

1 ( ) (RE i i T i i i T iT

Ty B y y y k dδν , )δ ψ δν ν ξ

−− − − − −′ ′= − Δ Δ − Δ − Δ ;

2

, , , 1 12 ( , )RE i i T iT

T y k dδω ψ δνξ − −′= − Δ ; , , , 1 1 0RE i i T i

T

T y k yδψ νξ − −′= − Δ ;

( )21, , , 1 1 , 12 3 3

1 ( ) ( ) (2RE i i i T i i i

T

T Ty y B y y dνν , )δ δ ψν ν ν ξ

−− − −′= − Δ − Δ Δ − Δ − δ ;

( )2

2, , 2 2 ( , )

2RE i iT

T dνω ψ δν ξ

= − ; , , 02 ( , )RE i i iT

T d yνψ ψ δν ξ

= − ;

( )2 3

2, , 2 3 ( , )

2RE i iT T

T T dωω ψ δξ νξ

= − ; 2

, , 02 ( , )RE i i iT

T d yλψ ψ δνξ

= − ; 2, , 0( )RE i i

T

T yψψ νξ= − ,

where 1 0 1 ,( , ) ( )i i i T id y y k y 1iyψ δ ψ δ− −′= Δ − + Δ − Δ .

Lemma A.5: Under the RE assumption,

( )

( )0

1 2 2 1, 1 1 , 1 1 1 1 1 1

11 1 1, 1

( ) ( )

( ) ( )

i T i y T T o T T o T

T T o T o T o

E y B y c H B H c

trace B H H

ψ σ δ δ

ν δ δ

− −− − − − − − − −

−− − − −

′ ′ ′Δ Δ =

′+ × Ω; (A.1)

( )

( )0

2 2, 1 1 1 , 1 1 1 1 1 1 1

1 1 1, 1 1

( ) ( )

( ) ( )

i T T i y T T o T T T o T

o T T o T o T o T

E y k k y c H k k H c

trace k H H k

ψ σ δ δ

ν δ

− − − − − − − − − −

δ− − − −

′ ′ ′ ′ ′Δ Δ =

′′+ × Ω −

; (A.2)

( )1, 1 1 ( )i T i o oE y B bε ν δ−− −′ ′Δ Δ = − ; (A.3)

( )( ) ,, 1 1 1 0 1 , 1( ) o T o

i T i i T i i o oE y k y y k y y bT

( )ν ξ

ψ δ− − − −′ ′Δ Δ − + Δ − Δ = δ ′

1) o

; (A.4)

1, 1 1 , 1( ) ( ) (i i o T i i oE y y B y y Tδ δ−− − −′⎡ Δ − Δ Δ − Δ = −⎣ ν⎤⎦ ; (A.5)

2 ,1 0 1 , 1[( ( )) ] T o o

i o i T i i oE y y k y yT

ξ νψ δ− −′Δ − + Δ − Δ = , (A.6)

where ,0 1T oTξ ω= + , 1, 1( )T o T oω− −Ω = Ω , ( )b δ is defined in (13), ( ) /b db dδ δ′ = , and

13 4 5

2 3 4( 1) ( 1)

1 0 0 ... 01 0 ... 0

( ) : : : :... 0... 1

TT T T

T T TT T

Hδ

δδ δ δδ δ δ

−− − −

− − −− × −

⎛ ⎞⎜ ⎟⎜ ⎟

′ ⎜ ⎟=⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

.

Proof: Define , 1 2 , 1( , , ..., )i T i i i Th u ε ε− −= Δ Δ , so that , 1( )i T o T oVar h ,ν −= Ω , and

, 1 1 1 0 1 , 1( ) ( )i T o T o i T o iy H c y H h Tδ ψ δ− − − − −′ ′Δ = + .

Then,

( )( )

( )( )0

0

0

1, 1 1 , 1

2 2 1 11 1 1 1 1 , 1 1 1 1 , 1

2 2 1 11 1 1 1 1 1 1 1 , 1 , 1

2 21 1 1

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( )

i T i

y T T o T T o T i T T o T T o i T

y T T o T T o T T o T T o i T i T

y T T o T

E y B y

c H B H c E h H B H h

c H B H c tr E H B H h h

c H B

ψ σ δ δ δ δ

ψ σ δ δ δ δ

ψ σ δ

−− − −

− −− − − − − − − − − −

− −− − − − − − − − − −

− − −

′Δ Δ

′ ′ ′= +

′ ′ ′= +

′= ( )1 11 1 1 1 1, 1( ) ( ) ( ) .T o T o T T o T o T oH c tr B H Hδ ν δ δ− −

− − − − − −′′ + × Ω

′

′

(A.2)-(A.6) can be obtained by the similar method.

Lemma A.6: Under the RE assumption,

32

( )

0

0 0

2

, ,

2,

, , 2

2, , ,

2 2

, ,

( )0 (

0 02 2

( ) ,( ) 0

2 2

( ) 0 0

o yoo

T o o T o

o T oRE i o

o

T o o T o T o

y yo

o T o o T o

TTbB A b

T T

ETb T T

T Tb

θθ

ψ σδ δξ ν ξ

ν νξθ

δξ ν ξ ξ

ψσ σδ

ν ξ ν ξ

⎛ ⎞′′+⎜ ⎟

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

− = ⎜ ⎟′⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟′⎜ ⎟⎝ ⎠

)

(A.7)

where,

( ) ( )11 1 1,0 1 1 1 1, 1 1

,

( ) ( ) ( ) ( )T T o T T o T T o T o T o TT o

TA tr B H H trace k H H kδ δ δ δξ

−− − − − − − − − −

′ ′′= Ω + Ω ;

0

2 2 1 2 21 1 1 1 1 0 1 1 1 1 1

,0 0

1 ( ) ( ) ( ) ( )y T T o T T o T T T o T T T o To T

T1B c H B H c c H k k H c

v vψ σ δ δ ψ σ δ δ

ξ−

− − − − − − − − − − −′′ ′ ′ ′= + ′ .

Proof of Proposition 2: When * *(1, , 0, 0)oθ θ ν ′= = , Lemma A.1 and the definition of 1( )TH δ−

in Lemma 5 imply that . This is so because1 1 1(1) (0) (1)T T TH H− − −′Ω 1TI −= 1T −1 1(0)T TL L− −′Ω =

and . In addition, in (A.7), B = 0 because11(1) ( )T TH L −

− −′= 1 0oψ = . Using Lemma A.3 and the

fact that , 1T oξ = when *oθ θ= , we can show A = T(T-1)/2 = Tb(1)′. Substituting these results,

0oψ = , and , 1T oξ = , into (A.7), we have:

( )

0

2

2, , *

2

(1) 0 (1) 0

0 02 2

( ) ,(1) 02 2

0 0 0

o o

RE i

o

y

o

Tb TbT T

T TE Tb

T

θθ

ν ν

θν

σν

′ ′⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜

− = ⎜ ′⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

⎟⎟ (A.8)

which has a zero determinant. The likelihood theory indicates ( ), * , *( ) ( ) 0RE i RE iE H Bθ θ+ = , at

*oθ θ= . Thus, ( ), *( )RE iE B θ must be also singular.

Proof of Proposition 3: We first derive ,( ( )RE iE )θ under . We can show that UNDoH

33

( )( ) 0

2 2 2 2* * *

( 1)(2 1) ( 1)( 1) ( 1)(2 1)( , ) 2 ;6 3 6i y

T T T T T TE d vT T T

ψ δ ν ν δ σ ψ δ+ + − + − −= − + +

( ) ( )1 2*1( ) ( ) ( 1)( 1) 2( 1)( 2) ( 1)( 1) ,

6i T iE B T T T T T Tνε δ ε δ δ δ−−′Δ Δ = − + − − − + − +

where 1 1( , ) ( )i i io T i id y y k y y , 1ψ δ ψ − −′= Δ − + Δ − Δ δ and , 1( )i i iy yε δ − δΔ = Δ − Δ . Thus, we have

( )

( )

0

,

2*

*2 2 2

1( , , , ) ln( ) ln( ) ln2 2 21 ( 1)( 1) 2( 1)( 2) ( 1)( 1)

12( 1)(2 1) 4( 1)( 1)1 ,

( 1)(2 1)12

RE i T

yT

T TE

T T T T T T

T T T TT T

δ ν ω ψ π ν ξ

ν δ δν

δνσ ψ δνξ

= − − −

− − + − − − + − +

+ + − − +⎛ ⎞− ⎜ ⎟⎜ ⎟+ + − −⎝ ⎠

where 1T Tξ ω= + . We now concentrate out ψ , ω , and ν from ,( ( , , , )RE iE )δ ν ω ψ . It is

obvious that 0ψ = maximizes ,( ( , , , )RE iE )δ ν ω ψ . Thus we have the concentrated value of

,( ( , , , )RE iE )δ ν ω ψ :

( )

( )

( )

,

2*

2*

1( , , ) ln( ) ln( ) ln2 2 21 ( 1)( 1) 2( 1)( 2) ( 1)( 1)

121 ( 1)(2 1) 4( 1)( 1) ( 1)(2 1) .

12

cRE i T

T

T TE

T T T T T T

E T T T T T T

δ ν ω π ν ξ

ν δ δνν δ δ

νξ

= − − −

− − + − − − + − +

− + + − − + + − −

Now, solve the first-order condition with respect to Tξ (instead of ω ):

( ), *2 2

( 1)(2 1) 4( 1)( 1)( , , ) 1 1 02 12 ( 1)(2 1)

cRE i

T T

T T T TET T

δδ ν ω νξ ξ νξ δ

+ + − − +∂ ⎛ ⎞= − + =⎜ ⎟∂ + − −⎝ ⎠

.

Then we obtain:

( )2*1 ( 1)(2 1) 4( 1)( 1) ( 1)(2 1)6T T T T T T T .νξ δ δ

ν= + + − − + + − −

Thus,

34

( )

( )

*,

2

2*

1 1 1( , ) ln( ) ln( ) ln ln2 2 2 2 6

1 ( 1)(2 1) ( 1)( 1) ( 1)(2 1)ln 22 6 3 61 ( 1)( 1) 2( 1)( 2) ( 1)( 1) .

12

cRE i

T TE

T T T T T T

T T T T T T

νδ ν π νν

δ δ

ν δ δν

⎛ ⎞ ⎛ ⎞= − − − − ⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠+ + − + − −⎛ ⎞− − +⎜ ⎟

⎝ ⎠

− − + − − − + − +

Now, solving

( ) ( ), 2*

2

( , ) 1 1 ( 1)( 1) 2( 1)( 2) ( 1)( 1) 02 12

cRE iE T T T T T T T

δ ν ν δ δν ν ν

∂ −= − + − + − − − + − + =

∂,

we have:

2*

1 ( 1)( 1) ( 1)( 2) ( 1)( 1)21 6 6 6

T T T T T TT

ν ν δ− + − − − +⎛ ⎞= − +⎜ ⎟− ⎝ ⎠δ

)

.

Substituting this solution to ,( ( , )cRE iE δ ν yields

( ), *

2

2

1 1 1 1( ) ln( ) ln( ) ln ln2 2 2 1 2 6

1 ( 1)( 1) ( 1)( 2) ( 1)( 1)ln 22 6 6 6

1 ( 1)(2 1) ( 1)( 1) ( 1)(2 1)ln 2 .2 6 3 6

cRE i

T T TET

T T T T T T T

T T T T T T

δ π ν

δ δ

δ δ

− ⎛ ⎞ ⎛ ⎞= − − − −⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎝ ⎠− − + − − − +⎛ ⎞− − +⎜ ⎟

⎝ ⎠+ + − + − −⎛ ⎞− − +⎜ ⎟

⎝ ⎠

Then, a little algebra shows:

( ) ( ) ( ) ( )3

,2 2

( ) 1 1 2 1

2( 1) 3 2 3( 1)(2 1) ( 1)2 1 2 1 1 2

cRE iE T T T

T TT TT T T T

δ δδ

δ δ

∂ − + −= −

∂ ⎡ ⎤ ⎡+ −⎧ ⎫ ⎧ ⎫− − + + − +⎢ ⎥ ⎢⎨ ⎬ ⎨ ⎬− − + +⎩ ⎭ ⎩ ⎭⎢ ⎥ ⎢⎣ ⎦ ⎣

T ⎤−⎥⎥⎦

.

Since the denominator of ,( ( )) /cRE iE δ δ∂ ∂ is positive for any T ≥ 2, it is positive for 1δ < and

negative for 1δ = . The derivative equals zero only at 1δ = . This indicates that 1δ = is the

global maximum point.

Proof of Proposition 4: It can be shown that when *oθ θ= ,

( ),

2*

1( , ) ( ) ln( )2

( 1)( 1) ( 1)( 2) ( 1)( 1) .2 6 3 6

Lan iTE b

T T T T T T

δ ν δ ν

ν δ δν

−= −

− + − − − +⎛ ⎞− − +⎜ ⎟⎝ ⎠

Without loss of generality we set * 1ν = . Then, the first-order maximization condition with

respect toν yields

21 ( 1)( 1) ( 1)( 2) ( 1)( 1)1 6 3 6

T T T T T TT

ν δ δ− + − − − +⎛ ⎞= − +⎜ ⎟− ⎝ ⎠.

Substituting this into the expected value of ,( ( ,Lan iE ))δ ν , we can get 35

( ),

2

1 1 1( ) ( ) ln2 1 2

1 ( 1)( 1) ( 1)( 2) ( 1)( 1)ln .2 6 3 6

cLan i

T TE bT

T T T T T T T

δ δ

δ δ

− −⎛ ⎞= − −⎜ ⎟−⎝ ⎠− − + − − − +⎛ ⎞− − +⎜ ⎟

⎝ ⎠

If we differentiate the concentrated likelihood function, we yield

( ) ( ) ( ) ( )( )

( ) ( ) ( ),

2

( ) 1 1 2( )

1 2 2

cLan iE T T T T

bT T T T

δ δδ

δ δ∂ − + − −

′= −∂ + − − 1δ + +

;

( ) ( ) ( )( )( )( ) ( )( )

( ) ( )( )( ) ( )( )

22,

22

( ) 2 1 1 2( )

2 4 1

1 1.

2 4 1

cLan iE T T

bT T

T TT T

δ δ δδ

δ δ δ δ

δ δ δ

∂ − − + +′′= −

∂ − + + + +

− ++

− + + + +

( ) ( ) ( )( )( )( ) ( )( )

( ) ( ) ( )( )( ) ( ) ( )( )

33,

33

22

( ) 8 1 1 2( )

2 4 1

6 1 1 1 2.

1 2 2 1

cLan iE T T

bT T

T T T

T T T

δ δ δδ

δ δ δ δ

δ δ

δ δ

∂ − − + +′′′= +

∂ − + + + +

− + − + +−

+ − − + +

It can be shown that when 1δ = ,

1 ( 1)( 2) ( 1)( 2)( 3(1) ; (1) ; (1)2 6 12

T T T T Tb b b− − − − −′ ′′ ′′′= = =)T − .

Using these results, we can easily show that at 1δ = ,

( ) ( )2

, ,2

(1) (1)0,

c cLan i Lan iE Eδ δ

∂ ∂= =

∂ ∂

but,

( ) ( ) ( ) ( ) ( ) ( )3

,3

(1) 1 2 3 1 10

12 2

cLan iE T T T T Tδ

∂ − − − − += −

∂≠ .

Thus, 1δ = is an inflexion point of ( ), ( )cLan iE δ .

Proof of Proposition 5: At *θ θ= , using the alternative representation of 1[ ( )]T ω −Ω

given in Lemma A.2, we can have

36

1 11, , * 2

, 1 , 1

12 2

1( ) ( )2 2

1 1( ) ,2 2 2 2

i iRE i T T

i i i i

i T T T T i i i i

y yT L Ly y y y

T TrL L L L r rr

ν θν ν

ν ν ν ν

−

− −

−

′Δ Δ⎛ ⎞ ⎛ ⎞′= − + ⎜ ⎟ ⎜ ⎟Δ − Δ Δ − Δ⎝ ⎠ ⎝ ⎠

′ ′ ′ ′= − + = − + Σ

where and1( , ..., )i i iTr y y ′= Δ Δ *ν ν= Also, we have:

( ) ( )

2 2 22, , * 1 1 , 1

2

( ) ( )2 2 2 2

1 1 1 .2 2 2 2

RE i i T i i T T i

i T T T T i i T T i

T T T Ty k y y k L r

T T TrL k k L r r r

ω θν ν

ν ν

− −′′ ′= − + Δ + Δ − Δ = − +

′′ ′ ′ ′= − + = − +

Thus,

1, , * , , * 2 1

1 1( ) ( ) ( 1 1 ) .2

T sRE i RE i i T T T i s t it isv r I r

vω νθ θν

−= =′ ′− = − − = Σ Σ Δy yΔ

Now,

1

1 1 21 1, , * 2 1

, 1 , 11

1

001 1( ) ( ) .

: :

i

i i i T sRE i T T s t it is

i i iTt it iT

yy y y

L L y yy y y

y y

δ θν ν

− −= =

− −−

=

′ Δ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟′ ′Δ Δ Δ⎛ ⎞ ⎛ ⎞ ⎜ ⎟ ⎜ ⎟′= = =⎜ ⎟ ⎜ ⎟Δ Δ − Δ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎜ ⎟ ⎜ ⎟Σ Δ Δ⎝ ⎠ ⎝ ⎠

1ν

Σ Σ Δ Δ

Thus, , , * , , * * , , *( ) ( ) ( ) 0.RE i RE i RE iδ ω νθ θ ν θ− + =

Proof of Proposition 6: We first check whether (i) whether , , *( )RE i δδ θ equals zero or is

linearly related to , , *( )RE i δ θ and . Note that ,2is

, , * , , *( ) ( )rRE i RE iν νθ θ= ; , , * , , *( ) ( )

rRE i RE iω ωθ θ= ; , , * , , *( ) ( )RE i RE iψ ψθ θ= .

Since , ,( , , , ) ( , ( , ), ( , ), )RE i r r RE i r rδ ν ω ψ δ ν ν δ ω ω δ ψ= , we have:

, , , , , , , ,RE i RE i RE i r RE iδ δ ω νν= − + ;

( )( ) (

, , , , , , , , , , , , , ,

, , , , , ,

2, , , , , , , , , , , ,2 2

RE i RE i RE i r RE i RE i RE i r RE i

r RE i RE i r RE i

RE i RE i RE i r RE i RE i r RE i

δδ δδ δω δν ωδ ωω ων

νδ νω νν

) .δδ δω ωω δν ων

ν ν

ν ν

ν ν

= − + − + −

+ − +

= − + + − + νν

Then, with a little algebra and Lemma A.3, we can show that at *rθ θ= ,

37

( )( )

, , * , , * , , * , , *

2* , , * , * * , , *

1, 1 1 , 1 , 1 1 1 , 1

* *2 3

, 1 1 1 1 , 1 1* *

( ) ( ) 2 ( ) ( )

2 ( ) ( ) ( )

1

2 ( ( ))

RE i RE i RE i RE i

RE i RE RE i

i T i i T T i

i T i T i i i T

Ty B y y k k y

T Ty k y k y y y k

δδ δδ δω ωω

δν ων νν

θ θ θ θ

ν θ θ ν θ

ν ν

ν ν

−− − − − − − −

− − − −

= − +

+ − +

′ ′ ′− Δ Δ − Δ Δ

=′ ′ ′+ Δ Δ + Δ − Δ − Δ +( )

( )

21 , 1

12

, 1 , 1 , 1 1* * *

( )

( 1)2

1 2 ( ) ( )2

i i

i T T i i T iT iT

y y

T T

T T Ty D D y y m y yν ν ν

− −

−

− − − −

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟

Δ − Δ⎜ ⎟⎝ ⎠

++

⎛ ⎞ +′′ ′= − Δ Δ − Δ Δ − Δ +⎜ ⎟⎝ ⎠

( 1) ,T

which is neither zero, nor a linear combination of , , *( )RE i δ θ and . ,2is

We now check (ii) whether or not , , *( )RE i δδδ θ is a linear function of . We can show is

( )

( ) ( ), , , , ,, , , , ,

2 3,. , , , , , , , , , , ,

3 3

3 2 3

RE i RE i RE RE i RE i

r RE i RE i RE i r RE i RE i r RE i

δδδ δδδ δδω ωωω ωωω

δδν δωων ωων δνν ωνν νννν ν

= − + −

+ − + + − + ( ).ν

But, at *θ θ= ,

; , , *( ) 0RE i δδδ θ =2

, , , 1 1 1 , 1*

RE i i T T iT y k k yδδω ν − − − −′ ′= Δ Δ ;

( )3

, , * , 1 1 1 , 1*

2( ) ( )RE i i T i T i iT y k y k y yωωδ θν − − −′ ′= Δ Δ + Δ − Δ ;

4

3 2, , * 1 1 , 1

*

3( ) ( ( ))RE i i T i iTT y k y yωωω θν − −′= − + Δ + Δ − Δ ;

, , * , , **

1( ) ( )RE i RE iδδν δδθ θν

= − ; , , * , , **

1( ) ( )RE i RE iδων δωθ θν

= − ;

2

, , * , , ** *

1( ) ( )2RE i RE iT

ωων ωωθ θν ν

= − + ; , , * , , **

2( ) ( )RE i RE iδνν δνθ θν

= − ;

, , * , , **

2( ) ( )RE i RE iννω νωθ θν

= − ; , , * , , * 3* *

3( ) ( ) .2RE i RE iT

ννν ννθ θν ν

= − +

Using these results, we can show:

38

3 2 2

, , * , 1 1 1 , 1*

2 22

1 , 1 , , ** *

2 3 3( )2

6 3( ) ( ) 3 (

RE i i T T i

T i iT iT RE i

T T T T y m m y

T Tm y y y

δδδ

δδ

θν

),θν ν

− − − −

− −

+ + ′ ′= − Δ Δ

′− Δ Δ − Δ −

which is neither zero, nor a linear combination of . For example, when T = 2: is

( )2 2, , * 1 1 2 2 , , *

* * *

3 12 12( ) 15 ( ) 3 ( )RE i i i i i RE iy y y yδδδ δδθ θν ν ν

= − Δ − Δ Δ − Δ − ;

( )2 2, , * 1 1 2 2

* * *

1 2 2( ) ( ) 3RE i i i i iy y y yδδ θν ν ν

⎛ ⎞= − Δ − Δ Δ − Δ +⎜ ⎟

⎝ ⎠;

( )2 2, , * , , * 1 22

* *

1 1( ) ( ) ( ) ( )2rRE i RE i i iy yν νθ θ

ν ν= = − + Δ + Δ

2, , * , , * 1 2

*

1( ) ( ) 1 ( )2rRE i RE i i iy yω ωθ θν

= = − + Δ + Δ ;

( ), , * , , * 0 1 2*

1( ) ( )rRE i RE i i i iy y yψ ψθ θ

ν= = Δ + Δ .

All of these derivatives are linearly independent.

We now consider the asymptotic distribution of the ML estimator of rθ . Let

1 ,( ) ( ),NN r i RE i rL θ θ== Σ where ( , ) .r rθ δ ϕ′ ′= For convenience, redefine rϕ as ( , , )r r rϕ ω ψ ν ′=

instead of ( , , )r rν ω ψ ′ . Correspondingly, we also reorder and,2is 2Z . Partition 2Z into

, where2 21 22( , )Z Z Z ′= 21Z is a 2×1 vector and 22Z is a scalar. We also partition andϒ 1−ϒ ,

accordingly:

11 21,1 22,1

11 2121,1 21,21 22,21

21 2222,1 22,21 22,22

⎛ ⎞′ ′ϒ ϒ ϒ⎜ ⎟⎛ ⎞′ϒ ϒ ⎜ ⎟′ϒ = = ϒ ϒ ϒ⎜ ⎟⎜ ⎟ ⎜ ⎟ϒ ϒ⎝ ⎠ ϒ ϒ ϒ⎜ ⎟⎝ ⎠

;

11 21,1 22,1

11 211 21,1 21,21 22,21

21 2222,22

22,1 22,21

( ) ( )−

⎛ ⎞′ ′ϒ ϒ ϒ⎜ ⎟⎛ ⎞′ϒ ϒ ⎜ ⎟′ϒ = = ϒ ϒ ϒ⎜ ⎟⎜ ⎟ ⎜ ⎟ϒ ϒ⎝ ⎠ ϒ ϒ ϒ⎜ ⎟⎝ ⎠

.

Given the conditions (i)-(ii) are satisfied, Theorem 3 of RCBR implies that in the bounded

neighborhood of *θ , the log-likelihood function can be approximated by:

39

2 2 21

* * * *2

2 211 1 1 1

* * *2

( 1) ( 1) ( 1)1( ) ( ) (1)2

( 1) ( 1) ( 1)1( ) ( ) ( ) (1)2

N r N pr r r

N pr r r

ZL L N N o

Z

ZL N N o

Z

δ δ δθ θ

ϕ ϕ ϕ ϕ ϕ ϕ

δ δ δθ

ϕ ϕ ϕ ϕ ϕ ϕ− − − −

′ ′⎛ ⎞ ⎛ ⎞ ⎛ ⎞− − −⎛ ⎞= + ϒ − ϒ +⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟− − −⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠

′ ′⎛ ⎞ ⎛ ⎞ ⎛ ⎞− −⎛ ⎞= + ϒ − ϒ +⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟− −⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠

2

*

−−

(A.9)

Suppose that . Then, it is straightforward to show that the ML estimator of1 0Z > rθ equals

2

1

*2

ˆ( 1)(1)

ˆ( )p

r

ZNo

ZN

δ

ϕ ϕ

⎛ ⎞− ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟− ⎝ ⎠⎝ ⎠

.

That is,

1/ 2

1

*2

ˆ( 1) ( 1) (1).ˆ( )

B

p

r

N Z oZN

δ

ϕ ϕ

⎛ ⎞ ⎛ ⎞− −⎜ ⎟ = +⎜ ⎟⎜ ⎟− ⎝ ⎠⎝ ⎠

(A.10)

This is so because the solution for 1/ 2 ˆ( 1N δ )− has two solutions and both solutions have equal

chances. Now, suppose . Then, we do not have an interior solution for 1 0Z <

2ˆ( 1)N δ − > 0 . The corner solution for 2ˆ( 1)N δ − equals zero. For this case, we have:

1/ 2

21 11 1*2 1

ˆ 0( 1)(1).

( )ˆ( )p

r

No

Z ZN

δ

ϕ ϕ−

⎛ ⎞− ⎛ ⎞⎜ ⎟ = ⎜ ⎟⎜ ⎟ − ϒ ϒ⎝ ⎠−⎝ ⎠

+

2Z

]

(A.11)

Thus, we obtain (17).

We now derive the asymptotic distributions of LR test statistics. Substituting (A.10) and

(A.11) into (A.9), we have:

, (A.12) 1 1 * ** 1 2 2 22

ˆ2[ ( ) ( )] 1( 0) ( ) 1( 0)N r NL L Z Z Z Z Zθ θ − − ′′− = > × ϒ + < × ϒ

where 1(•) is the index function, 22 21 11 1 12 122 [ ( )− −ϒ = ϒ − ϒ ϒ ϒ and * 21 11

2 2 ( ) 11Z Z Z−= − ϒ ϒ .

A tedious algebra shows:

1 1 11 1 * *

1 1 2 22 2

1 ** 1 ** * 11 11 1 21 21,21 21 22 22,22 22

( ) Z ( )

Z ( ) ( ) '( ) ,

Z Z Z Z Z**Z Z Z Z

− − −

− −

′ ′ϒ = ϒ + ϒ

′ ′= Φ + Φ + Φ Z− (A.13)

where, ,1111Φ = ϒ 21,21 21,1 11 1 21,1

21,21 ( )− ′Φ = ϒ − ϒ ϒ ϒ , ** 21,1 11 121 21 1( )Z Z Z−= − ϒ ϒ ,

( )1 22,111 21,1

22,22 22,1 22,2122,22 21,1 21,21 22,21

− ⎛ ⎞′⎛ ⎞′ ϒϒ ϒ ⎜ ⎟Φ = ϒ − ϒ ϒ ⎜ ⎟⎜ ⎟ ⎜ ⎟′ϒ ϒ ϒ⎝ ⎠ ⎝ ⎠;

40

( )1

11 21,11** 22,1 22,21

22 22 21,1 21,2121

ZZ Z

Z

−⎛ ⎞′ ⎛ϒ ϒ= − ϒ ϒ ⎜ ⎟ ⎜⎜ ⎟ϒ ϒ ⎝ ⎠⎝ ⎠

⎞⎟ .

Notice that 1Z , **21Z and **

22Z are uncorrelated, and

; ; . (A.14) 1 11~ (0, )Z N Φ **21 21,21~ (0, )Z N ϒ

−

**22 22,22~ (0, )Z N ϒ

Using (A.13), we can rewrite (A.12) as:

(A.15) 1 ** 1 ** ** 1 **

* 1 1 11 1 21 21,21 21 22 22,22 22

** 1 ** ** 1 **1 21 21,21 21 22 22,22 22

ˆ2[ ( ) ( )] 1( 0) ( ( ) ( ) ( ) )

1( 0) ( ( ) ( ) ).

N r NL L Z Z Z Z Z Z Z

Z Z Z Z Z

θ θ − −

− −

′ ′ ′− = > × Φ + Φ + Φ

′ ′+ < × Φ + Φ

Let (1,1,0, )rθ ν ′= be the restricted ML estimator of rθ which maximizes (A.9) with the

restrictions, 1δ = and 0ω ψ= = . It can be shown that:

*** 22( ) (pN Z oν ν− = + 1) .

Substituting this solution into (A.9), we can have

1

* 2 1 21 2 1 2 1

* 22

** 1 **22 22,22 22

0 0 02[ ( ) ( )] 2 0 0 0 (1)

* *

( ) (1)

N r N p

p

ZL L N Z N o

Z

Z Z o

θ θν ν ν ν ν ν

× × ×

−

′ ′⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟− = ϒ − ϒ +⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟− − −⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠

′= Φ +

(A.16)

Using (A.14)-(A.16), we can show that when ,r o *θ θ= , the LR test statistic for testing the joint

hypotheses of 1oδ = , , 0r oω = , and 0oψ = follows a mixed Chi-square distribution:

1 ** 11 1 11 1 21 21,21 21

** 1 **1 21 21,21 21

2 2

ˆ2[ ( ) ( )] 1( 0) ( ( ) ( ) )

1( 0) ( ( ) ) (1)

(3) (1 ) (2).

N r N r

p

d

L L Z Z Z Z Z

Z Z Z o

B B

θ θ

χ χ

− −

−

′ ′− = > × Φ + Φ

′+ < × Φ +

→ × + − ×

**

Finally, consider the LR statistic for testing the hypothesis that 1oδ = . Let (1, )r rθ ϕ ′ ′=

be restricted ML estimator that maximize (A.9) with the restriction 1oδ = . We can show that

** 2( )r pN Zϕ ϕ− = + (1)o .

Substituting this solution into (A.9) yields:

41

1

*2* * *

* * ** 1 ** ** 1 **2 22 2 21 21,21 21 22 22,22 22

0 0 02[ ( ) ( )] 2 (1)

(1) ( ) ( ) (1).

rN N pr r r

p p

ZL L N N o

Z

Z Z o Z Z Z Z o

θ θϕ ϕ ϕ ϕ ϕ ϕ

− −

′ ′⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞− = ϒ − ϒ +⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟− − −⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠

′ ′ ′= ϒ + = Φ + Φ +

(A.17)

Then, using (A.14), (A.15) and (A.17), we have:

1 21 1 11 1

ˆ2[ ( ) ( )] 1( 0) ( ) (1) (1).rN r N p dL L Z Z Z o Bθ θ χ−′− = > × Φ + →

This completes our proof.

Proof of Proposition 7: It is easy to see that ,, , * * , , *( ( ) ( ))D D D DRE i RE iaδ βφ φ− , , *( )D D

RE i ν φ , and

, , *( )D DRE i ω φ are the same as , , *( )RE i δ θ , , , *( )RE i ν θ , and , , *( )RE i ω θ with the ityΔ replaced by

ity aΔ − . Thus, we can obtain the result by the same method used for Propositions 5.

Proof of Proposition 8: Define

1 0 0 0 ... 01 1 0 0 ... 0

1 2 1 0 ... 00 1 2 1 ... 0: : : : :0 0 0 0 ... 1

HDT

T T

L

×

⎛ ⎞⎜ ⎟−⎜ ⎟⎜ ⎟−′ = ⎜ ⎟−⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

; ; , 2

( 2) 20TT

IJ

− ×

⎛ ⎞= ⎜ ⎟

⎝ ⎠

1 01 11 2: :1 1

TG

T

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟=⎜ ⎟⎜ ⎟⎜ ⎟−⎝ ⎠

We can easily show that and1( )HDT TG L J−′= ( )HD HD HD

T T T TW L L J WJ′T′Ω = + . Let

11 12

12 22

( 1)2

( 1) ( 1)(2 1)2 6

T T

T TTg gG G

g g T T T T T

−⎛ ⎞⎜ ⎟⎛ ⎞ ′≡ = ⎜ ⎟⎜ ⎟ − − −⎝ ⎠ ⎜ ⎟⎜ ⎟⎝ ⎠

.

Then, we have:

( )

11

11 1 1

2

1 1 1 1 12

( )

( ) [ ( ) ] ( )

( ) ( ) ( ) [ ] ( ) .

HD HD HDT T T T T

HD HD HD HD HD HD HD HDT T T T T T T T T T T T

HD HD HD HDT T T T T T T T

L L J WJ

L L L L J W J L L J W I J L L

L L L G W G G W I G L

−−

−1− − −

− − − − −

⎡ ⎤′ ′Ω = +⎣ ⎦

′ ′ ′ ′′ ′= − +

′ ′′ ′= − +

−

Thus, the log-likelihood function can be written:

42

,

1 1 1 0 1 1 1 01

2 2 2 0 1 2 2 2 0 12 2 2 2

, 1 , 1

1 1ln(2 ) ln( ) ln[det( )]2 2 2

1 ( )2

HD HD HDRE i T T T T

i i i iHD HD

i i i T T T T i i

i i i i

T L L J WJ

y a y y a yy a y y L L J WJ y a y y

y y y y

π ν

ψ ψ

iψ δ ψν

δ δ

−

− −

′ ′= − − − +

′Δ − − Δ − −⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟′ ′− Δ − − − Δ + Δ − − − Δ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟Δ − Δ Δ − Δ⎝ ⎠ ⎝ ⎠

δ

Define 1,* 1,* 0it it id y a yψ= Δ − − . By a tedious but straightforward algebra, we can show at

*HD HDφ φ= :

1 111, , 1 1 , 1 1 1

11 11

1 ( ( ) )( 1)

HD T t T TRE i t s is i t t it t ity d T t y d

gδω

ν ν ω− −

= = + = == Σ Σ Δ − Σ − Δ Σ+

;

2

11, , 2 1

11 11

1 (( 1)( 1) 2

HD T T1)RE i a t it t it

w T Tt d dgν ν ω= =

−= Σ − − Σ

+;

2

11, , 2 0 1 0

11 11

1 (( 1)( 1) 2

HD T T1)RE i t it i t it i

w T Tt d y d ygψ ν ν ω= =

−= Σ − − Σ

+,

where *ν ν= and 11 11,*ω ω= . Thus, we have:

2 2

1 111, , 1 , , 1 , , 1 1 , , 1 1 1

11 11

1 ( ( ) )(( 1)

HD HD HD T t T TRE i RE i a RE i t s i s i t t it t ita d d T

gδ ψωψ

ν ν ω− −

= = + = =− − = Σ Σ − Σ − Σ+

)t d d .

We also can show:

2 211, , 1 12 2

11 11

1 ( )2 2 2 ( 1)

HD T TRE i t it t it

T d dgνω

ν ν ν ω= == − + Σ − Σ+

;

11

2 211 11 111 12, ,

11 11 11 11 11 11

1 1( ) ( )2 1 2 ( 1) 2 ( 1)

HD T Tt it t itRE i

g g d dg g gω

ωω ν ω ν ω= == − − Σ + Σ

+ + +;

12

2 112 12 111 12, ,

11 11 11 11 11 11

1( ) ( )(( 1) ( 1) ( 1)

HD T T Tt it t it t i tRE i

g g d dg g gω

ωω ν ω ν ω

−1 , 1 )td= = == − − Σ + Σ Σ

+ + + + .

But,

43

( ) ( ) ( )

( ) ( )

11, , , , 11 11 , ,

212 111 1 1 11 11 11 1 1

211 11 11 111

111 1 1

11 1

( 1)

( ) ( 1)212 ( 1) 2 ( 1)( )

( )(

HD HD HDRE i RE i RE i

T TT T tt it t it tt it t s is it

Tt it

T Tt it t it

g

T t d d g dd d dg gd

T t d dg

δ ν ων ω

ω ων ν ω

ω

ν ω

−−= == = =

=

−= =

+ − +

Σ − Σ − Σ⎡ ⎤Σ + Σ Σ= − +⎢ ⎥ + +− Σ⎢ ⎥⎣ ⎦

Σ − Σ= −

Tit

ν ω=

( )

( ) ( ) ( )

( ) ( ) ( )

2

11 11 1

1 11 112

11 1 1 1 11 11 1

11 11 11 112

11 1 11 1 1 11 11 1

11 11 11 11

( 1)1) 2 ( 1)

( 1) ( 1) ( 1)( 1) 2 ( 1)

( 1) ( 1) ( 1)( 1) ( 1)

Tt it

T T T Tt it t it t it t it

T T Tt it t it t it t

g dg

T d t d d g dg g

T d t d d gg g

ω

ν ω

ω ω

ν ω ν ω

ω ω ω

ν ω ν ω

=

= = = =

= = = =

− Σ+

+ +

− Σ − Σ − Σ − Σ= − +

+ +

− Σ Σ − Σ − Σ= − + +

+ +( )

( ) ( ) ( )

2

11 11

211 111 1 1

11 11

2 ( 1)

( 1) ( 1) ,2 ( 1) ( 1)

Tit

T T Tt it t it t it

dg

T d t d dT T

ν ω

ω ων ω ν ω= = =

+

−= − Σ + Σ − Σ

+ +

where the last equality results from g11 = T. Now observe that:

( ) ( ) (11 12

2

1 1, , , ,11 11

( 1) 1( 1) ( 1)2 ( 1) ( 1)

HD HD T T Tt it t it t itRE i RE i

TT dT Tω ω ν ω ν ω= = = )1d t d−

− − = Σ − Σ Σ −+ +

.

Thus, we have:

11 11 12

11 12

, , , , 11 11, , , , , ,

, , , , 11 11, , , ,

( 1) [( 1)

( 1) 0.

HD HD HD HD HDRE i RE i RE i RE i RE i

HD HD HD HDRE i RE i RE i RE i

T Tδ ν ω ω

δ ν ω ω

ν ω ω

ν ω ω

+ − + + − −

= + − + − =

]ω

An alternative representation of the log likelihood function (27) is given:

, 2

2 2 1 2 2, 1 2 , 1

1 1 1 0 2 22 , 1

2 2 2 0 1

1 1 1 01

2 2 2

1 1 1ln(2 ) ln[det( )] ln( ) ln[det( )]2 2 2 2

1 ( ) ( ) ( )2

1 ( )2

HD HDRE i T

HDi i T i i

i i HDT i i

i i i

i i

i

TB

y y B y y

y a yK y y

y a y y

y a yy a

π ν

δ δν

ψδ

ψ δν

ψψ

−

−− − −

− −

−

= − − − − Ξ

′− Δ − Δ Δ − Δ

′Δ − −⎛ ⎞⎛ ⎞ ′+ + Δ − Δ⎜ ⎟⎜ ⎟Δ − − − Δ⎝ ⎠⎝ ⎠Δ − −

×ΞΔ − −

2 22 ,

0 1

( )HDT i i

i i

K y yy y

δδ − −

⎛ ⎞⎛ ⎞ ′+ Δ − Δ⎜ ⎟⎜ ⎟− Δ⎝ ⎠⎝ ⎠1 ,

where,

44

;

( )

2

2 2

1 30 10 0: :0 0

HDT

T x

C −

−

−⎛ ⎞⎜ ⎟−⎜ ⎟⎜ ⎟=⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

( ) ( )

2

2 2

6 4 1 0 ... 04 6 4 1 ... 0

1 4 6 4 ... 0: : : : :0 0 0 0 ... 40 0 0 ... 6

HDT

T x T

B −

− −

−⎛ ⎞⎜ ⎟− −⎜ ⎟⎜ ⎟− −

= ⎜ ⎟⎜ ⎟⎜ ⎟−⎜ ⎟⎜ ⎟⎝ ⎠

; 12 2( ) ( 2 )HD HD HD

T T TK B C−− −= −

2

;

11, 12, 11 12 12 2

12, 22, 12 22

1 1( ) ( ) ( )

1 2T T HD HD HD

T T TT T

C B Cξ ξ ω ωξ ξ ω ω

−− − −

+ −⎛ ⎞ ⎛ ⎞ ′Ξ ≡ = −⎜ ⎟ ⎜ ⎟− +⎝ ⎠⎝ ⎠.

Our unreported simulations show that it is much easer to minimize the above log likelihood

function with respect to ,11 ,12 ,22 1 2 1 2( , , , , , , , , )T T T a aδ ν ξ ξ ξ ψ ψ ′ than to minimize the equivalent log

likelihood function (27) with respect to . 11 12 13 1 2 1 2( , , , , , , , , )HD a aφ δ ν ω ω ω ψ ψ ′=

45

46

REFERENCES

Ahn, S.C., and P. Schmidt, 1995, Efficient estimation of models for dynamic panel data, Journal

of Econometrics, 68, 5-27. Ahn, S.C. and P. Schmidt, 1997, Efficient estimation of dynamic panel data models: alternative

assumptions and simplified assumptions, Journal of Econometrics 76, 309-321. Alvarez, A. and M. Arellano, 2004, Robust likelihood estimation of dynamic panel data models,

mimeo, CEMFI, Spain. Anderson, T.W. and C. Hsiao, 1981, Estimation of dynamic models with error components,

Journal of the American Statistical Association 76, 598-606. Andrews, D., 1999, Estimation when a parameter is on a boundary, Econometrica 67, 1341-1384. Arellano, M. and S. Bond, 1991, Tests of specification for panel data: Monte Carlo evidence and

an application to employment equations, Review of Economic Studies 58, 277-297. Arellano, M. and O. Bover, 1995, Another look at the instrumental variables estimation of error-

component models, Journal of Econometrics 68, 29-51. Bhargava, A. and J.D. Sargan, 1983, Estimating dynamic random effects models from panel data

covering short time periods, Econometrica 51, 1635-1660. Blundell, R. and S. Bond, 1998, Initial conditions and moment restrictions in dynamic panel data

models, Journal of Econometrics 87, 115-143. Bond, S. and F. Windmeijer, 2002, Finite sample inference for GMM estimators in linear panel

data models, IFS, mimeo. Breitung, J. and W. Meyer, 1994, Testing for unit roots using panel data; are wages on different

bargaining levels cointegrated, Applied Economics 26, 353-361. Cox, D.R. and N. Reid, 1987, Parameter orthogonally and approximate conditional inference

(with discussion), Journal of the Royal Statistical Society, B, 49 (1), 1-39. Gouriéroux, C., A. Monfort and A. Trognon, 1984, Pseudo maximum likelihood methods: Theory,

Econometrica 52, 681-700. Hahn, J. and G. Kuersteiner, 2002, Asymptotically unbiased inference for a dynamic panel data

model with fixed effects when both n and T are large, Econometrica 70, 1639-1657. Harris, R. and E. Tzavalis, 1999, Inference for unit roots in dynamic panels where the time

dimension is fixed, Journal of Econometrics 91, 2001-226.

47

Hsiao, C., 1986, Analysis of Panel Data, Cambridge, University Press, Cambridge, UK. Hsiao, C., M.H. Pesaran and A.K. Tahmiscioglu, 2002, Maximum likelihood estimation of fixed

effects dynamic panel data models covering short time period, Journal of Econometrics 109, 107-150.

Im, K., H. Pesaran and Y. Shin, 2003, Testing for unit roots in heterogeneous panels, Journal of

Econometrics 115, 53-74. Kruiniger, H., 2002, On the estimation of panel regression models with fixed effects, Queen

Mary, University of London, mimeo. Lancaster, T., 2002, Orthogonal parameters and panel data, Review of Economic Studies 69,

647-666. Levin, A., F. Lin and C. Chu, 2002, Unit root tests in panel data: asymptotic and finite-sample

properties, Journal of Econometrics 108, 1-24. Moon, H. and P.C.B. Phillips, 2004, GMM estimation of autoregressive roots near unity with

panel data, Econometrica 72, 467-522. Moon, H., P. Phillips and B. Perron, 2005, Incidental trends and the power of panel unit root tests,

mimeo, University of Southern California. Neyman, J. and E. Scott, 1948, Consistent estimates based on partially consistent observations,

Econometrica 16, 1-32. Nickell, S., 1981, Biases in dynamic models with fixed effects, Econometrica 49, 1399-1416. Rao, C.R., 1973, Linear statistical inference and its applications, the 2nd eds., John Wiley & Sons

Inc, New York, USA. Rotnitzky, A., D.R. Cox, M. Bottai and J. Robins, 2000, Likelihood-based inference with

singular information matrix, Bernoulli 6 (2), 243-284. Sagan, J.D., 1983, Identification and lack of identification, Econometrica 51, 1605-1634. Thomas, G., 2005, Maximum likelihood based estimation of dynamic panel data Models, Ph.D.

dissertation, Arizona State University. Wooldridge, J.M., 2000, A framework for estimating dynamic, unobserved effects panel data

models with possible feedback to future explanatory variables, Economics Letters, 68, 245-250.

Table 1

The Biases in the RE and HPT ML Estimators and the Size and Power Properties of the T-Tests:

ML Estimation without Time Trends

Rejections of Ho: δ = δο Rejections of Ho: δ = 1 Mean Bias MSE 1% 5% 10% 1% 5% 10%

RE N = 500

δο = 0.5 0.5000 0.0000 0.0006 1.06 5.14 9.83 100 100 100 δο = 0.8 0.8056 0.0056 0.0032 3.93 7.71 12.14 82.90 90.95 93.66 δο = 0.9 0.9053 0.0053 0.0057 9.22 16.67 22.13 21.80 32.83 39.38 δο = 1 0.9983 -0.0017 0.0048 9.49 16.38 21.22 9.49 16.38 21.22 N = 100 δο = 0.5 0.5006 0.0006 0.0033 0.92 3.95 8.45 99.98 99.98 99.98 δο = 0.8 0.8157 0.0157 0.0135 8.66 14.24 18.69 32.00 46.65 55.04 δο = 0.9 0.8963 -0.0037 0.0144 9.58 16.77 22.22 13.71 22.72 28.34 δο = 1 0.9960 -0.0040 0.0112 9.20 15.96 21.41 9.20 15.96 21.41

HPT N = 500

δο = 0.5 0.5001 0.0001 0.0007 1.11 4.96 9.60 100 100 100 δο = 0.8 0.8220 0.0220 0.0076 9.75 13.89 17.48 70.20 77.86 80.34 δο = 0.9 0.9016 0.0016 0.0057 5.79 12.41 17.84 21.83 31.08 36.41 δο = 1 0.9983 -0.0017 0.0044 8.80 15.05 19.76 8.80 15.05 19.76 N = 100 δο = 0.5 0.5026 0.0026 0.0046 1.12 4.02 8.46 99.90 99.92 99.99 δο = 0.8 0.8234 0.0234 0.0160 6.10 11.87 16.41 30.47 42.78 49.66 δο = 0.9 0.8848 -0.0052 0.0125 5.53 11.28 16.02 13.29 21.76 26.74 δο = 1 0.9968 -0.0032 0.0099 8.02 14.33 19.08 8.02 14.33 19.08 Notes: Table 1 reports the means, biases and mean square errors of the random effects and Hsaio, Pesaran and Tahmiscioglu (HPT) estimators from 10,000 simulations with the number of observations equal to N and the true value of δ equal to δo along with the rejection rates of the t-test with significance levels of 1%, 5% and 10%. The variables, ηi, qi0, and εit, are generated from the normal distributions with zero means and the variances, ση2 = 2, σqo

2 = 2, and ν = 1, respectively. T = 5 is used. The data are generated without trends and with zero means. The ML

estimators are estimated without trends and intercept terms.

Figure 1

The Sample Error Distributions of the RE ML Estimator without Trend When N = 500: The true values of δo vary from 0.5 to 1.

0

100

200

300

400

500

600

-0.2 -0.16 -0.12 -0.08 -0.04 0 0.04 0.08 0.12 0.16 0.2

0.5 0.8 0.9 1 Notes: Figure 1 shows the finite sample distribution of the sampling errors of the RE Estimator for values of

oδ varying from 0.5 to 1. The variables, ηi, qi0, and εit, are generated from the normal

distributions with zero means and the variances, ση2 = 2, σqo2 = 2, and ν = 1, respectively. N=500 and T = 5 is used. The data are generated without trends and with zero means. The ML estimators are

estimated without trends and intercept terms

Figure 2 The Sampling Error Distribution of the RE estimator without Trend When N = 100:

The true values of δo vary from 0.5 to 1.

0

100

200

300

400

500

600

-0.36 -0.3 -0.24 -0.18 -0.12 -0.06 0 0.06 0.12 0.18 0.24 0.3 0.36

0.5 0.8 0.9 1 Notes: Figure 2 shows the finite sample distribution of the sampling errors of the RE Estimator for values of δo varying from 0.5 to 1. The variables, ηi, qi0, and εit, are generated from the normal distributions with zero means and the variances, ση2 = 2, σqo

2 = 2, and ν = 1, respectively. N=100 and T = 5 is used. The data are generated without trends and with zero means. The ML estimators are

estimated without trends and intercept terms

Figure 3

The Sampling Error Distributions of the RE and HPT estimators when N = 100

0

100

200

300

400

500

600

-0.16 -0.12 -0.08 -0.04 0 0.04 0.08 0.12 0. 16

0

50

100

150

200

250

300

350

-0.4 -0.3 -0.2 -0. 1 0 0. 1 0.2 0.3

Panel A: δo=0.5 Panel B: δo=0.8

0

50

100

150

200

250

300

350

400

450

-0.4 -0.3 -0. 2 -0.1 0 0. 1 0.2 0.3

0

100

200

300

400

500

600

700

800

-0. 4 -0.32 -0. 24 -0. 16 -0.08 0 0.08 0.16 0.24 0.32

Panel C: δo=0.9 Panel D: δo=1

Notes: Figure 3 shows the finite sample distribution of the sampling errors of the RE Estimator and the HPT Estimator for various values of δ0. The variables, ηi, qi0, and εit, are generated from the normal distributions with zero means and the variances, ση2 = 2, σqo

2 = 2, and ν = 1, respectively. N=100 and T = 5 is used. The data are generated without trends and with zero means. The ML

estimators are estimated without trends and intercept terms

Table 2

The Size and Power of the Likelihood Ratio Test Based on the RE ML Estimation without Trend:

Testing the Joint Hypothesis of Unit Root without Drift, Ho: 1oδ = and 0o oω ψ= = .

δο = 1 δο

= 0.95 δο = 0.9

Significance 1% 5% 10% 1% 5% 10% 1% 5% 10%

ση2=2, σq02=2, ν=1

N=500 χ2 (3) 0.73 3.80 7.67 91.23 97.68 99.05 100 100 100 χ2 (2) 2.01 8.54 16.30 95.72 99.20 99.67 100 100 100 Mix χ2 (3), χ2 (2) 1.06 5.40 10.48 93.17 98.44 99.41 100 100 100 N=100 χ2 (3) 0.93 4.06 8.41 14.45 33.81 46.17 67.21 86.17 92.41 χ2 (2) 2.19 9.31 17.05 24.72 48.23 61.38 79.18 93.21 96.66 Mix χ2 (3), χ2 (2) 1.24 6.09 11.44 17.79 39.61 52.77 72.22 89.52 94.48

ση2=2, σq02=4, ν=1

N=500 χ2 (3) 0.77 3.53 7.45 100 100 100 100 100 100 χ2 (2) 1.92 8.62 16.60 100 100 100 100 100 100 Mix χ2 (3), χ2 (2) 1.17 5.24 10.89 100 100 100 100 100 100 N=100 χ2 (3) 0.91 4.18 8.23 77.51 91.48 95.51 99.99 99.99 100 χ2 (2) 2.16 9.09 17.09 86.60 96.11 96.02 99.99 100 100 Mix χ2 (3), χ2 (2) 1.45 6.06 12.57 81.34 93.61 96.99 99.99 100 100

ση2=2, σq02=1, ν=1


ση2=2, σq02=2, ν=2


ση2=2, σq02=2, ν=4

N=500 χ2 (3) 0.73 3.80 7.67 65.14 83.55 90.16 99.98 100 100 χ2 (2) 2.01 8.54 16.30 76.54 91.05 95.36 100 100 100 Mix χ2 (3), χ2 (2) 1.06 5.40 10.48 69.63 87.13 92.90 99.99 100 100 N=100 χ2 (3) 0.93 4.06 8.41 6.96 20.64 31.19 43.89 68.47 78.51 χ2 (2) 2.19 9.31 17.05 13.71 33.12 45.87 58.38 80.14 88.57 Mix χ2 (3), χ2 (2) 1.24 6.09 11.44 9.04 25.56 37.20 49.53 73.34 83.21

Notes: Table 2 reports the size and power of the Likelihood Ration ratio test based on the random effects ML estimation with the p-values from χ2(3), χ2 (2), and a mixture of χ2 (3) and χ2 (2) over 10,000 simulations with the number of observations equal to N and the true value of δ equal to δo for varying values of ση2, σq0

2, and ν. The variables, ηi, qi0, and εit, are generated from the normal distributions with zero means and, the variances, ση2, σqo

2, and ν, respectively. The data are generated without trends and with zero means. The ML estimators are estimated without trends and intercept terms.

Table 3 The Size and Power of the Likelihood Ratio Test

Based on the HPT ML Estimation without Trend: Testing the Joint Hypothesis of Unit Root without Drifts, Ho: 1oδ = and ,o 0HPTω = .

δο

= 1 δο = 0.95 δο

= 0.9 Significance 1% 5% 10% 1% 5% 10% 1% 5% 10%

ση2=2, σq02=2, ν=1

N=500 χ2 (2) 0.63 3.21 6.65 42.38 66.32 76.93 88.02 96.83 98.41 χ2 (1) 2.23 9.89 18.10 60.70 82.16 89.34 95.71 99.04 99.67 Mix χ2 (1), χ2 (2) 0.95 5.03 10.07 48.86 73.04 82.25 91.50 97.86 99.08 N=100 χ2 (2) 0.64 3.57 6.98 4.84 15.37 24.08 14.10 32.67 44.40 χ2 (1) 2.36 10.32 19.00 12.13 30.26 42.60 27.71 52.38 65.11 Mix χ2 (1), χ2 (2) 1.02 5.21 10.53 6.96 20.23 30.58 18.15 39.73 52.78

ση2=2, σq02=4, ν=1

N=500 χ2 (2) 0.63 3.21 6.65 2.69 9.14 16.25 15.74 35.10 46.86 χ2 (1) 2.23 9.89 18.10 7.15 21.75 33.25 30.34 54.60 67.88 Mix χ2 (1), χ2 (2) 0.95 5.03 10.07 3.96 13.09 22.07 20.66 41.92 54.99 N=100 χ2 (2) 0.64 3.57 6.98 1.14 4.99 9.26 2.87 9.78 16.70 χ2 (1) 2.36 10.32 19.00 3.78 12.79 22.59 7.46 22.42 33.90 Mix χ2 (1), χ2 (2) 1.02 5.21 10.53 1.75 7.33 13.14 4.06 13.65 22.73

ση2=2, σq02=1, ν=1

N=500 χ2 (2) 0.63 3.21 6.65 61.86 82.07 89.07 99.91 99.99 100 χ2 (1) 2.23 9.89 18.10 78.39 92.29 96.13 99.97 100 100 Mix χ2 (1), χ2 (2) 0.95 5.03 10.07 68.24 86.54 92.45 99.94 100 100 N=100 χ2 (2) 0.64 3.57 6.98 7.22 20.56 30.40 37.69 62.31 73.95 χ2 (1) 2.36 10.32 19.00 17.05 37.26 50.55 57.00 80.04 88.41 Mix χ2 (1), χ2 (2) 1.02 5.21 10.53 9.88 26.36 37.72 44.76 69.23 80.26

ση2=2, σq02=2, ν=2

N=500 χ2 (2) 0.63 3.21 6.65 61.85 82.07 89.07 99.91 99.99 100 χ2 (1) 2.23 9.89 18.10 78.39 92.29 96.13 99.97 100 100 Mix χ2 (1), χ2 (2) 0.95 5.03 10.07 68.24 86.54 92.45 99.94 100 100 N=100 χ2 (2) 0.64 3.57 6.98 7.22 20.56 30.40 37.69 62.31 73.95 χ2 (1) 2.36 10.32 19.00 17.05 37.26 50.55 57.00 80.04 88.41 Mix χ2 (1), χ2 (2) 1.02 5.21 10.53 9.88 26.36 37.72 44.76 69.23 80.26

ση2=2, σq02=2, ν=4

N=500 χ2 (2) 0.63 3.21 6.65 66.59 85.19 91.31 99.97 100 100 χ2 (1) 2.23 9.89 18.10 81.96 94.09 97.09 100 100 100 Mix χ2 (1), χ2 (2) 0.95 5.03 10.07 72.86 89.05 94.14 99.99 100 100 N=100 χ2 (2) 0.64 3.57 6.98 8.09 22.35 31.90 46.23 70.12 80.62 χ2 (1) 2.36 10.31 18.99 18.20 39.40 52.73 65.17 85.80 92.09 Mix χ2 (1), χ2 (2) 1.02 5.21 10.52 10.97 28.03 39.70 53.45 76.58 86.00

Notes: Table 3 reports the size and power of the likelihood ratio test based on the HPT ML estimation with the p-values from χ2 (2), χ2 (1), and a mixture of χ2 (2) and χ2 (1) over 10,000 simulations with the number of observations equal to N and the true value of δ equal to δo. The variables, ηi, qi0, and εit, are generated from the normal distributions with zero means and the variances, ση2, σq0

2, and ν, respectively. The data are generated without trends and with zero means. The ML estimators are estimated without trends and intercept terms.

Table 4 The Size and Power of the Likelihood Ratio Test Based on the RE ML Estimation without Trend:

Testing the single Hypothesis of Unit Root, Ho: 1oδ = .

δο

= 1 δο = 0.95 δο

= 0.9 Significance 1% 5% 10% 1% 5% 10% 1% 5% 10%

ση2=2, σq02=2, ν=1

N=500 χ2 (1) 0.47 3.01 5.75 0.71 3.92 8.81 4.25 16.11 28.07 Mix χ2 (1), χ2 (0) 1.04 5.75 11.56 1.28 8.81 18.78 7.36 28.07 45.05 N=100 χ2 (1) 0.75 3.05 6.15 0.78 3.60 7.27 1.15 5.95 11.98 Mix χ2 (1), χ2 (0) 1.46 6.15 12.76 1.44 7.27 15.42 2.26 11.98 24.15

ση2=2, σq02=4, ν=1

N=500 χ2 (1) 0.51 2.92 5.78 2.32 9.77 17.39 33.13 57.32 68.83 Mix χ2 (1), χ2 (0) 1.05 5.78 11.63 4.22 17.39 30.43 42.89 68.83 80.21 N=100 χ2 (1) 0.73 3.02 6.06 1.18 5.67 11.19 4.77 16.13 25.87 Mix χ2 (1), χ2 (0) 1.45 6.06 12.57 2.42 11.19 22.29 8.25 25.87 41.81

ση2=2, σq02=1, ν=1

N=500 χ2 (1) 0.51 2.96 5.64 0.45 2.66 6.04 0.89 7.16 15.99 Mix χ2 (1), χ2 (0) 1.03 5.64 10.90 1.05 6.04 13.88 2.12 15.99 33.95 N=100 χ2 (1) 0.77 3.03 6.09 0.67 3.03 6.29 0.64 3.88 8.18 Mix χ2 (1), χ2 (0) 1.41 6.09 12.80 1.35 6.29 13.77 1.28 8.18 18.60

ση2=2, σq02=2, ν=2

N=500 χ2 (1) 0.47 3.01 5.74 0.48 2.78 6.72 1.46 10.08 20.47 Mix χ2 (1), χ2 (0) 1.04 5.74 11.55 1.02 6.72 15.05 3.35 20.47 39.11 N=100 χ2 (1) 0.75 3.05 6.15 0.73 3.12 6.41 0.69 4.36 9.15 Mix χ2 (1), χ2 (0) 1.46 6.15 12.76 1.34 6.41 14.17 1.48 9.15 20.41

ση2=2, σq02=2, ν=4

N=500 χ2 (1) 0.47 3.01 5.74 0.46 2.61 5.98 0.93 8.02 18.39 Mix χ2 (1), χ2 (0) 1.04 5.74 11.54 0.99 5.98 13.91 2.51 18.39 37.15 N=100 χ2 (1) 0.75 3.05 6.15 0.67 3.06 6.21 0.59 3.83 8.65 Mix χ2 (1), χ2 (0) 1.46 6.15 12.76 1.37 6.21 13.71 1.31 8.65 19.55

Notes: Table 4 reports the size and power of the likelihood ratio test based on the random effects ML estimation with the p-values from χ2 (1), and a mixture of χ2 (1) and χ2 (0) over 10,000 simulations with the number of observations equal to N and the true value of δ equal to δo. For each simulation, the variables, ηi, qi0, and εit, are generated from the normal distributions with zero means and the variances, ση2, σq0

2, and ν, respectively. The data are generated without trends and with zero means. The ML estimators are computed without trends and intercept terms.


Based on the HPT ML Estimation without Trend: Testing the single Hypothesis of Unit Root, Ho: 1oδ = .

δο

= 1 δο = 0.95 δο

= 0.9 Significance 1% 5% 10% 1% 5% 1% 1% 5% 10%

ση2=2, =2, ν=1 2

qoσ

N=500 χ2 (1) 0.43 2.67 5.04 0.43 2.51 5.43 0.76 3.62 7.79 Mix χ2 (1), χ2 (0) 0.95 5.04 10.26 0.87 5.43 11.38 1.47 7.79 17.31 N=100 χ2 (1) 0.62 2.60 5.21 0.57 2.69 5.42 0.60 3.12 6.41 Mix χ2 (1), χ2 (0) 1.13 5.21 10.60 1.12 5.42 11.83 1.12 6.41 13.89

ση2=2, =4, ν=1 2

qoσ

N=500 χ2 (1) 0.43 2.67 5.04 0.79 3.54 6.36 6.54 18.31 27.92 Mix χ2 (1), χ2 (0) 0.95 5.04 10.26 1.39 6.36 12.69 10.52 27.92 42.00 N=100 χ2 (1) 0.62 2.60 5.21 0.70 2.82 5.90 1.71 6.81 11.91 Mix χ2 (1), χ2 (0) 1.13 5.21 10.60 1.17 5.90 11.86 3.22 11.91 21.23

ση2=2, =1, ν=1 2

qoσ

N=500 χ2 (1) 0.43 2.67 5.04 0.39 2.49 5.40 0.61 4.93 12.39 Mix χ2 (1), χ2 (0) 0.95 5.04 10.26 0.90 5.40 12.10 1.40 12.39 29.07 N=100 χ2 (1) 0.62 2.60 5.21 0.57 2.65 5.40 0.51 3.29 7.26 Mix χ2 (1), χ2 (0) 1.13 5.21 10.60 1.14 5.40 12.09 1.12 7.26 17.14

ση2=2, =2, ν=2 2

qoσ

N=500 χ2 (1) 0.43 2.67 5.04 0.39 2.49 5.40 0.61 4.93 12.39 Mix χ2 (1), χ2 (0) 0.95 5.04 10.26 0.90 5.40 12.08 1.40 12.39 29.07 N=100 χ2 (1) 0.62 2.60 5.21 0.57 2.65 5.40 0.53 3.22 6.67 Mix χ2 (1), χ2 (0) 1.13 5.21 10.60 1.14 5.40 12.09 1.09 6.67 16.10

ση2=2, =2, ν=4 2

qoσ

N=500 χ2 (1) 0.43 2.67 5.04 0.41 2.41 5.39 0.61 4.93 12.39 Mix χ2 (1), χ2 (0) 0.95 5.04 10.26 0.88 5.39 12.35 1.40 12.39 29.07 N=100 χ2 (1) 0.62 2.59 5.20 0.58 2.64 5.53 0.51 3.29 7.26 Mix χ2 (1), χ2 (0) 1.13 5.20 10.59 1.16 5.53 12.13 1.12 7.26 17.14

Notes: Table 5 reports the size and power of the likelihood ratio test based on the HPT ML estimation with the p-values from χ2 (1), and a mixture of χ2 (1) and χ2 (0) over 10,000 simulations with the number of observations equal to N and the true value of δ equal to δo. For each simulation, the variables, ηi, qi0, and εit, are generated from the normal distributions with zero means and the variances, ση2, σq0

2, and ν, respectively. The data are generated without trends and with zero means. The ML estimators are computed without trends and intercept terms.

Table 6 The Size and Power of the Likelihood Ratio Test Based on

the RE ML Estimation with Homogenous Trends: Testing the Joint Hypothesis of Unit Root with Homogenous Drifts,

Ho: 1oδ = , and 0o o oω ψ β= = = .

δο = 1 δο

= 0.95 δο = 0.9

Significance 1% 5% 10% 1% 5% 10% 1% 5% 10%

ση2=2, σq02=2, ν=1


ση2=2, σq02=4, ν=1

N=500 χ2 (4) 0.64 3.80 7.78 100 100 100 100 100 100 χ2 (3) 1.80 7.63 14.33 100 100 100 100 100 100 Mix χ2 (3), χ2 (4) 1.00 5.06 10.30 100 100 100 100 100 100 N=100 χ2 (4) 0.93 4.74 9.33 72.55 88.84 93.84 99.99 100 100 χ2 (3) 2.32 9.21 16.34 81.82 93.77 96.98 99.99 100 100 Mix χ2 (3), χ2 (4) 1.31 6.45 11.98 76.45 91.07 95.39 99.99 100 100

ση2=2, σq02=1, ν=1


ση2=2, σq02=2, ν=2


ση2=2, σq02=2, ν=4

N=500 χ2 (4) 0.64 3.84 7.70 59.78 80.33 87.95 100 100 100 χ2 (3) 1.66 7.58 14.41 71.11 87.81 93.11 100 100 100 Mix χ2 (3), χ2 (4) 0.99 5.00 10.33 64.59 83.91 90.82 100 100 100 N=100 χ2 (4) 0.88 4.52 8.83 6.58 19.06 29.68 40.53 65.03 76.28 χ2 (3) 2.05 8.67 15.79 11.73 29.35 42.07 52.72 75.95 85.10 Mix χ2 (3), χ2 (4) 1.23 6.18 11.61 8.44 22.86 35.02 45.26 69.73 80.22 Notes: Table 6 reports the size and power of the likelihood ratio test based on the random effects ML estimation with the p-values from χ2 (4), χ2(3) and a mixture of χ2 (4) and χ2 (3) over 10,000 simulations with the number of observations equal to N and the true value of δ equal to δo. The variables, ηi, qi0, and εit, are generated from the normal distributions with zero means and the variances, ση2, σq0

2, and ν, respectively. The trend parameter g is fixed at 0.5. The ML estimators are computed allowing a common time trend and a nonzero intercept.

Table 7 The Biases in the RE ML Estimators

and the Size and Power Properties of the T-Tests: ML Estimation with Heterogeneous Trends

Rejections of Ho: δ = δο Rejections of Ho: δ = 1 Mean Bias MSE 1% 5% 10% 1% 5% 10%

2gσ = 0 N = 500

δο = 0.5 0.5000 0.0000 0.0002 0.96 5.13 10.18 100 100 100 δο = 0.8 0.8003 0.0003 0.0013 0.99 5.67 10.28 99.78 99.91 100 δο = 0.9 0.8895 -0.0105 0.0199 7.82 14.23 19.14 14.40 22.06 28.05 δο = 1 0.9842 -0.0158 0.0122 16.23 21.85 28.43 16.23 21.85 28.43 N = 100 δο = 0.5 0.5001 0.0001 0.0004 0.96 5.24 10.24 100 100 100 δο = 0.8 0.8037 0.0037 0.0073 1.07 5.55 10.03 51.67 69.33 78.93 δο = 0.9 0.8612 0.0388 0.0423 11.34 19.35 25.03 18.65 26.14 29.85 δο = 1 0.9701 0.0299 0.0503 16.14 24.42 29.68 16.14 24.42 29.68

2gσ = 0.5 N = 500

δο = 0.5 0.4999 -0.0001 0.0004 1.06 5.11 10.22 100 100 100 δο = 0.8 0.8003 0.0003 0.0032 1.09 5.83 12.01 98.32 99.25 99.45 δο = 0.9 0.8962 -0.0028 0.0265 10.02 16.81 23.05 17.47 25.21 31.27 δο = 1 0.9920 -0.0080 0.0247 14.12 20.95 26.78 14.12 20.95 26.78 N = 100 δο = 0.5 0.4993 -0.0007 0.0005 1.11 5.34 10.56 100 100 100 δο = 0.8 0.8003 0.0003 0.0032 1.25 6.01 11.43 54.22 70.12 80.45 δο = 0.9 0.8862 -0.0128 0.0265 12.45 18.23 26.52 19.52 28.53 35.52 δο = 1 0.9920 -0.0080 0.0247 16.53 25.32 30.29 16.53 25.32 30.29

2gσ = 1 N = 500

δο = 0.5 0.4998 -0.0002 0.0001 1.04 5.09 10.46 100 100 100 δο = 0.8 0.8002 0.0002 0.0014 2.42 7.01 12.69 97.35 97.99 98.10 δο = 0.9 0.9142 0.0142 0.0188 8.78 16.04 22.01 19.05 29.68 34.44 δο = 1 0.9978 -0.0022 0.0034 18.09 23.76 29.05 18.09 23.76 29.05 N = 100 δο = 0.5 0.4997 -0.0003 0.0003 1.12 5.34 10.45 100 100 100 δο = 0.8 0.8005 0.0005 0.0042 1.14 5.78 11.49 100 100 100 δο = 0.9 0.8829 -0.0171 0.0292 9.12 17.42 24.56 18.31 30.45 37.66 δο = 1 0.9904 -0.0096 0.0312 15.12 21.45 28.52 15.12 21.45 28.52

Notes: Table 7 reports the mean, bias and mean square error of 10,000 simulations on the random effects ML estimator with the number of observations equal to N and the true value of δ equal to δo along with the rejection rate of the t-test with significance levels of 1%, 5% and 10%. The estimators are computed allowing heterogeneous time trends. For each simulation, the variables, ηi, qi0, and εit, are generated from the normal distributions with zero means and the variances, ση2, σq0

2, ν, respectively. The trend parameters gi are drawn from the normal distributions N(0.5,σg

2) with three different values of σg2: 0, 0.5, and 1.

Table 8

The Size and Power of the Likelihood Ratio Test Based on the RE ML Estimation with Heterogeneous Trends:

Testing the Joint Hypothesis of Unit Root with Heterogeneous Drifts, Ho: 1oδ = and 12, 22, 2, 2, 0o o o oaω ω ψ= = = = .

δο

= 1 δο = 0.95 δο

= 0.9 Significance 1% 5% 10% 1% 5% 10% 1% 5% 10%

2gσ = 0


2gσ = 0.5

N=500 χ2 (5) 0.62 4.17 9.42 91.24 96.22 100 100 100 100 χ2 (4) 1.62 8.79 16.82 95.23 98.23 100 100 100 100 Mix χ2 (4), χ2 (5) 0.90 5.51 12.66 92.35 97.52 100 100 100 100 N=100 χ2 (5) 0.73 4.26 9.89 14.54 36.45 52.32 68.63 87.94 93.44 χ2 (4) 1.82 9.02 16.80 23.27 48.26 60.32 79.22 94.55 98.22 Mix χ2 (4), χ2 (5) 1.04 6.21 13.21 18.46 40.25 69.45 72.33 91.54 96.33

2gσ = 1

N=500 χ2 (5) 1.02 4.13 7.89 90.21 97.33 100 100 100 100 χ2 (4) 1.67 7.43 14.22 96.52 98.55 100 100 100 100 Mix χ2 (4), χ2 (5) 1.24 5.63 10.28 93.55 98.04 100 100 100 100 N=100 χ2 (5) 0.86 4.56 8.21 13.18 37.44 51.11 70.22 88.23 95.21 χ2 (4) 1.89 7.23 15.32 24.26 44.57 61.22 80.32 94.22 100 Mix χ2 (4), χ2 (5) 1.10 5.12 13.31 18.48 41.71 56.22 75.42 91.55 97.43 Notes: Table 8 reports the size and power of the Likelihood Ratio test based on the random effects ML estimation with the p-values from χ2 (4), χ2(5), and a mixture of χ2 (4) and χ2 (5) over 10,000 simulations with the number of observations equal to N and the true value of δ equal to δo. The estimators are computed allowing heterogeneous time trends. For each simulation, the variables, ηi, qi0, and εit are generated from the normal distributions with zero means and the variances, ση2 = 2, σq0

2 =2, and ν =1, respectively. The trend parameters gi are drawn from the normal distributions N(0.5,σg



Based on the RE ML Estimation with Heterogeneous Trends: Testing the Single Hypothesis of Unit Root, Ho: 1oδ = .

δο

= 1 δο = 0.95 δο

= 0.9 Significance 1% 5% 10% 1% 5% 10% 1% 5% 10%

2gσ = 0

N=500 χ2 (1) 0.62 3.01 5.12 1.31 3.52 6.83 1.46 4.46 8.81 Mix χ2 (1), χ2 (0) 1.03 5.12 10.43 1.94 6.83 12.94 2.41 8.81 15.57 N=100 χ2 (1) 0.82 3.44 5.45 1.43 4.67 7.23 1.64 5.03 9.12 Mix χ2 (1), χ2 (0) 1.21 5.45 10.72 1.98 7.23 13.45 2.01 9.12 18.23

2gσ = 0.5

N=500 χ2 (1) 0.72 3.62 6.43 0.82 3.86 7.12 1.00 4.41 7.92 Mix χ2 (1), χ2 (0) 1.68 6.43 12.93 1.51 7.14 13.45 1.72 7.92 15.32 N=100 χ2 (1) 0.74 4.01 7.23 0.92 4.13 8.03 0.92 4.23 8.02 Mix χ2 (1), χ2 (0) 1.42 7.23 13.04 1.62 8.02 14.52 1.91 8.02 16.32

2gσ = 1

N=500 χ2 (1) 0.74 3.14 7.43 0.89 4.13 8.56 0.92 4.38 9.38 Mix χ2 (1), χ2 (0) 1.56 7.43 15.83 1.61 8.58 15.99 1.82 9.38 18.67 N=100 χ2 (1) 0.77 4.18 8.15 0.94 4.19 8.58 0.85 4.11 8.81 Mix χ2 (1), χ2 (0) 1.58 8.64 15.39 1.58 8.58 17.95 1.64 8.81 18.02 Notes: Table 9 reports the size and power of the Likelihood Ratio test based on the random effects ML estimation with the p-values from χ2 (1), and a mixture of χ2 (1) and χ2 (0) over 10,000 simulations with the number of observations equal to N and the true value of δ equal to δo. The estimators are computed allowing heterogeneous time trends. For each simulation, the variables, ηi, qi0, and εit are generated from the normal distributions with zero means and the variances, ση2 = 2, σq0

2 =2, and ν =1, respectively. The trend parameters gi are drawn from the normal distributions N(0.5,σg


Documents

Likelihood Based Inference for Dynamic Panel Data Modelsminiahn/archive/ahn_thomas_2006.pdf · seminar at Rice University, the Eighth Annual Texas Econometrics Camp, the University