Upload
lamdien
View
224
Download
0
Embed Size (px)
Citation preview
Panel Data
Panel data refers to datasets with both cross-sectional and time-series vari-
ation
We can think of this either as:
- repeated observations for the same cross-section of observation units in
different time periods (months, years, etc.);
or as:
- time-series data for multiple observation units (households, firms, etc.)
Panel data is also known as longitudinal data
1
Some examples of panel datasets
Household surveys
- US Panel Study of Income Dynamics (PSID) or Health and Retirement
Study (HRS)
- British Household Panel Survey (BHPS) or English Longitudinal Study
of Ageing (ELSA)
Company accounts
- Compustat or Worldscope databases for publicly traded firms
- Bureau van Dijk datasets (FAME, AMADEUS, ORBIS) for all firms,
including subsidiaries2
‘Census’of production microdata for business establishments (‘plants’)
- US Longitudinal Research Database (LRD) or UK Annual Respondents
Database (ARD)
Sectoral/regional panel datasets
- sectoral national accounts or regional accounts for individual countries
- European System of national and regional Accounts (ESA, regional data
for EU countries); OECD STAN or EU KLEMS databases (sectoral data for
OECD/EU countries)
Country panel datasets
- World BankWorld Development Indicators (WDI) or PennWorld Tables
3
Panel datasets come in all shapes and sizes
We draw a broad distinction between panels in which the number of time
periods is large enough to consider estimating separate time series models
for each observation unit, and to rely on asymptotic properties derived by
considering the number of time period observations (T ) going to infinity
(long T panels)
And panels in which the number of time periods is too small for this ap-
proach to be viable (short T panels)
4
For short T panels with many cross-sectional observations, we rely instead
on asymptotic properties derived by considering the number of cross-section
observation units (N) going to infinity
Datasets with a small number of observation units observed for a small
number of time periods are more problematic, and it may not be appropri-
ate to rely on any asymptotic properties to characterize the behaviour of
estimators and test statistics in this case
5
For long T panels, a natural starting point is to estimate a separate time
series model for each observation unit
We may then want to summarize the results for the panel as a whole by
consideringmeans ormedians of the estimated parameters, or other summary
statistics
- taking simple averages of the estimated parameters inN linear regression
models estimated by OLS results in the Mean Groups estimator, due to
Pesaran and Smith (1995)
6
Alternatively we may want to test and impose restrictions that some or all
of the estimated parameters are common to all the observation units
- such restrictions are known as ‘pooling’restrictions, and estimators which
impose them are referred to as ‘pooled’estimators
We will look in a little more detail at some of the methods proposed for
long T panels later in the course, but our main focus will be on methods for
short T panels
7
Short T Panels
Having data for a small number of time periods limits our ability to allow
for heterogeneity across observation units in all parameters of the model
But we can still allow for unrestricted heterogeneity in the intercepts of
linear models, which in some contexts can be an important advantage over
cross-section datasets in controlling for omitted variables
8
Panel data is particularly useful when both:
- outcome variables and explanatory variables of interest vary (non-trivially)
over time
- important omitted variables are plausibly time-invariant (or very nearly)
In this setting, allowing for individual-specific intercepts controls for time-
invariant omitted variables, while parameters of interest can still be esti-
mated by exploiting the variation over time in y and X
One way to think about panel data in this setting is that it provides us
with observations on the outcome y ‘before and after’some change in the
explanatory variable X9
Examples
For micro production functions, measures of inputs and output generally
vary over time, while hard-to-measure factors like technology and manage-
ment quality may have important time-invariant components
For empirical growth models, measures of GDP per capita and investment
rates generally vary over time, while historical and geographical determi-
nants of income levels are time-invariant, and hard-to-measure factors like
institutional quality may have an important time-invariant component
10
Warning
Having repeated observations over time is much less useful if the outcomes
or explanatory variables of interest have little or no variation over time
This should be intuitive - having repeated observations on things that don’t
change adds no information over having an observation at one point in time
For this reason, panel data is less useful if we are interested in the effect of
education on earnings
- earnings vary over time, but for most working individuals observed in
labour force surveys, measures of educational attainment vary very little
once they have completed full-time education and entered the labour market
11
Types of Variables and Error Components
With panel data, we distinguish between 3 types of observed variables
i) those that vary both across individual observation units and over time
(for example, sales of firm i in year t are different from sales of the same firm
i in year t− 1, and different from sales of firm j 6= i in year t)
ii) those that vary across individual observation units but are constant over
time for each observation unit (for example, characteristics of individuals like
race or gender)
12
iii) those that vary over time but are the same for all observation units
in a given time period (for example, the exchange rate between countries A
and B in a model of exports to country B for a panel of exporting firms in
country A)
For observed variables, we reflect this in our notation by using 2 subscripts
for variables which vary over both dimensions of the panel, and only 1 sub-
script for variables which vary only over one dimension
For example: SALESit denotes sales of firm i in year t;MALEi denotes an
indicator for whether individual i is male or female; EXCHRATEt denotes
the exchange rate between countries A and B
13
Similarly we can decompose the error term in linear models, or unobserved
components of models more generally, into a component which varies across
individuals but not over time, a component which varies over time but not
across individuals, and a remaining component which varies both across
individuals and over time
- and use the same notational convention to distinguish between these three
error components
With this notation, the most general linear model we could consider can
be written in the form:
14
yit = xitβ + wiγ + stδ + (ηi + ft + vit)
for i = 1, 2, ..., N and t = 1, 2, ..., T , where
yit is a scalar outcome variable for individual i at time t
xit is a row vector containing observations on a set of explanatory variables
that vary over individuals and time
wi is a row vector containing observations on a set of time-invariant ex-
planatory variables
st is a row vector containing observations on a set of common, time-varying
explanatory variables
15
yit = xitβ + wiγ + stδ + (ηi + ft + vit)
β, γ and δ are the corresponding column vectors of parameters
and the error term is decomposed into an individual-specific time-invariant
component (ηi), a common time-varying component (ft) and a residual com-
ponent (vit) which varies both across individuals and over time
This version of the linear model for panel data is called the three-way er-
ror components model (whether or not we include all 3 types of observed
explanatory variables)
16
Time Dummies
Since our focus is on short T panels, we can allow for the common time-
varying component of the error term (ft) simply by specifying period-specific
intercepts; this component of the error term simply indicates that the inter-
cept in our linear model may take different values in different time periods
First combine the common time-varying component of the error term with
any common time-varying explanatory variables included in the original
model to form φt = stδ + ft, and re-write the original model as
yit = xitβ + wiγ + (ηi + φt + vit)
17
Now define a set of T dummy variables with:
D1t = 1 for observations in period 1, and D
1t = 0 otherwise
D2t = 1 for observations in period 2, and D
2t = 0 otherwise ....
DTt = 1 for observations in period T , and D
Tt = 0 otherwise
And re-write the model in the form
yit = xitβ + wiγ + φ1D1t + φ2D
2t + ...φTD
Tt + (ηi + vit)
= xitβ + wiγ +T∑s=1
φsDst + (ηi + vit)
This is now a two-way error components model with period-specific
intercepts
18
yit = xitβ + wiγ +T∑s=1
φsDst + (ηi + vit)
We estimate the T additional parameters (φ1, φ2, ..., φT ) together with the
original β and γ parameter vectors
Since the number of time periods T is small, this does not present any
problems
The dummy variables (D1t , D
2t , ..., D
Tt ) are known as time dummies (or,
with annual observations, year dummies)
19
There is no loss of generality with this approach, although we lose identi-
fication of the δ parameters on any observed common time-varying explana-
tory variables
With short T panels, this is usually not a concern, as short T panels are
in any case not ideally suited to estimating the effects of such common time-
varying covariates
Having estimated the φt parameters, we have the option of plotting these
as a time series and investigating whether they vary with the business cycle,
or have some other temporal pattern
20
The model with a full set of T time dummies is said to be ‘saturated’in
the time dimension
We have observations on T time periods, so only T degrees of freedom in
the time dimension; here we estimate T parameters on included variables
which vary only over time, and this is the maximum number of parameters
we could estimate on explanatory variables of this type
If we tried to add one or more additional common time-varying explanatory
variables, this would show up as a perfect multicollinearity problem
This also indicates that if we omit time dummies, we can include at most
T common time-varying explanatory variables in the vector st
21
Having shown that it is easy to allow for period-specific intercepts, we will
now suppress these and focus on the two-way error components model
yit = xitβ + wiγ + (ηi + vit)
for i = 1, ..., N and t = 1, ..., T , where
xit = (x1it, ..., xKit), β =
β1
...
βK
, wi = (w1i, ..., wGi), γ =
γ1
...
γG
1×K K × 1 1×G G× 1
and yit, ηi, and vit are scalars
22
Before studying estimation, we introduce some further notation
First stack the T observations for each individual
yi = Xiβ +Wiγ + (ηijT + vi)
for i = 1, ..., N
yi =
yi1
...
yiT
, Xi =
x1i1 . . . xKi1
... . . . ...
x1iT . . . xKiT
, Wi =
w1i . . . wGi
... . . . ...
w1i . . . wGi
, ηijT =
ηi
...
ηi
, vi =
vi1
...
viT
T × 1 T ×K T ×G T × 1 T × 1
jT is a T × 1 column vector with each element equal to one23
Then stack over the N individuals
y = Xβ +Wγ + (η + v)
y =
y1
...
yN
, X =
X1
...
XN
, W =
W1
...
WN
, η =
η1jT
...
ηNjT
, v =
v1
...
vN
NT × 1 NT ×K NT ×G NT × 1 NT × 1
As discussed, panel data is most useful for estimating parameters in β on
time-varying explanatory variables. We now simplify further by omitting
consideration of time-invariant explanatory variables
24
If our parameters of interest are in β, this is again without loss of generality,
since we can combine wiγ + ηi = η∗i , which just re-defines the individual-
specific component of the error term. Setting G = 0
y = Xβ + (η + v)
y =
y1
...
yN
, X =
X1
...
XN
, η =
η1jT
...
ηNjT
, v =
v1
...
vN
NT × 1 NT ×K NT × 1 NT × 1
Two important assumptions that we maintain for short T panels:
25
yi = Xiβ + (ηijT + vi)
Cross-sectional independence: Observations on (yi, Xi) are indepen-
dent over i = 1, ..., N
Slope parameter homogeneity: The parameters in β are common to
all i = 1, ..., N
The formof unobserved heterogeneity that we address relates to the individual-
specific intercept terms (ηi) in our linear model relating yit to xit (known as
‘fixed effects’or ‘random effects’, depending on whether they are assumed
to be correlated or uncorrelated with the explanatory variables in xit)
26
yit = xitβ + (ηi + vit)
= xitβ + uit for i = 1, ..., N and t = 1, ..., T
y = Xβ + (η + v)
= Xβ + u
uit = ηi + vit; u = η + v
We can now define the (pooled) ordinary least squares estimator of
the parameter vector β
β̂OLS = (X′X)−1X ′y
27
Properties of the (pooled) OLS estimator
We assume that both error components have expected values of zero, so
that we have E(uit) = E(ηi) + E(vit) = 0. Note that E(ηi) is defined over
the individual observation units, while E(vit) and E(uit) are defined over
individual observation units and time periods
We also assume that there is no correlation between (any of the explanatory
variables in) xit and the time-varying component of the error term vit
- otherwise we would have a source of simultaneity, and the OLS estimator
would be inconsistent
28
In the panel data context, we say that the explanatory variables in xit are
predetermined with respect to vit
- we do not rule out the possibility that current xit may be correlated with
lagged values of vi,t−k for some k > 0
Assumption (xit predetermined)
E(xitvit) = 0 for t = 1, 2, ..., T
Whether or not the OLS estimator of β is consistent then depends on
whether or not (any of the explanatory variables in) xit are correlated with
the time-invariant component of the error term, or the individual-specific
effects, ηi29
Under the further assumption that all of the explanatory variables in xit
are uncorrelated with the individual-specific effects, we satisfy all of the
conditions required to establish that β̂OLS is a consistent estimator of β
(provided that X is full rank so that (X ′X)−1 exists and the OLS estimator
can be computed)
Assumption (uncorrelated individual effects, or ‘random effects’)
E(xitηi) = 0 for t = 1, 2, ..., T
Under these assumptions, we have E(xituit) = E(xitηi) +E(xitvit) = 0, so
that we have the key orthogonality condition needed for OLS to be consis-
tent: β̂OLSP→ β as NT →∞
30
This consistency result holds as the total sample size (NT ) goes to infinity
Importantly, it also holds in the semi-asymptotic sense, letting N → ∞
with T held fixed (large N , fixed T asymptotics), which is more useful for
approximating the behaviour of estimators in short T panels
Notice that the error term uit = ηi + vit is serially correlated; even if
the time-varying component vit is serially uncorrelated, we have uit = ηi+vit
and ui,t−1 = ηi+vi,t−1, and these two errors are positively correlated through
the common ηi component
31
Consequently valid inference for the (pooled) OLS estimator in panel data
models with individual-specific effects requires the use of cluster-robust
standard errors and test statistics
- clustering on the identifier for individual observation units (the i subscript
variable) allows for this correlation between the error terms for the same
individual in different time periods
Another consequence of this serial correlation is the OLS estimator of β is
not effi cient in panel data models with individual-specific effects
- we will discuss the effi cient estimator of β in a particular version of the
linear model with uncorrelated individual effects later in the course
32
The assumption that all of the explanatory variables in xit are uncorrelated
with the individual-specific effects requires that all of the included explana-
tory variables are uncorrelated with any time-invariant omitted variables
whose influence on yit is reflected in the value of ηi
This assumption is highly restrictive in many economic applications
Notice that the OLS estimator of β in a single cross-section data sample
would also be consistent under the same assumptions made in this section
- the consistency of the (pooled) OLS estimator thus follows from these as-
sumptions, and not specifically from the availability of repeated observations
over time, i.e. panel data
33
If one or more of the explanatory variables in xit is correlated with the
individual-specific effects, the OLS estimator of β is inconsistent
- we can view ηi as a time-invariant omitted variable, which in this case is
both relevant and correlated with at least some of the included explanatory
variables, so that we have a form of ‘omitted variable bias’
Assumption (correlated individual effects, or ‘fixed effects’)
E(xitηi) 6= 0 for t = 1, 2, ..., T
In this case we have E(xituit) = E(xitηi) + 0 6= 0, so that the key orthog-
onality condition needed for OLS to be consistent is violated
34
Consequently the (pooled) OLS estimator is inconsistent in panel data
models with correlated individual effects
- this holds regardless of whether we consider N →∞ or T →∞ or both
OLS using the panel data (pooled OLS) is subject to the same kind of
omitted variable bias as OLS in a single cross-section
Simply using the repeated observations over time for each individual does
not change this
35
However the availability of repeated observations over time allows us to
transform the original model in order to construct consistent estimators of
parameters on time-varying explanatory variables in models with correlated
individual effects
This is one of the major advantages of empirical work using panel data
compared to empirical work using single cross-section datasets
36
Panel data is useful when we suspect that cross-section regression results
would be biased, due to the presence of (relevant and correlated) omitted
variables
- particularly if it is plausible that important omitted variables are time-
invariant (or vary little over the sample period)
- and the dependent variable and the explanatory variables of interest vary
over time
37
Examples
Micro production functions: firms with better managers tend to be larger,
with higher levels of capital and labour inputs
Empirical growth models: countries with better institutions or more fa-
vorable geography tend to better environments for investment, and so have
higher investment rates
Our next task will be to consider estimators for panel data which can
estimate parameters consistently in this setting
38