Upload
lamdang
View
218
Download
0
Embed Size (px)
Citation preview
National Library 1+1 , canada Bibliothèque nationale du Canada
Acquisitions and Acquisitions et Bibliographie Services senrices bibliographiques
395 Wellington Street 395, nie Wellington OttawaON K1A O N 4 Ottawa ON K I A ON4 Canada Canada
The author has granted a non- L'auteur a accordé une licence non exclusive licence allowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, loan, distribute or sell reproduire, prêter, distribuer ou copies of this thesis in microfom, vendre des copies de cette thèse sous paper or electronic formats. la forme de rnicrof iche/~, de
reproduction sur papier ou sur format électronique.
The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts fiom it Ni la thèse ni des extraits substantiels may be printed or othexwise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation.
Abstract
This thesis examines the statistical properties of the Poisson AR(1) model of Al-
Osh and Alzaid (1987) and McKenzie (1988). The analysis includes forecasting,
estimation, testing for independence and specification and the addition of regresson to
the model.
The Poisson A N I ) mode1 is an Uinnite server queue, and as such is well suited
for modeling short-term disability claimants who are waiting to recover from an injury or
illness. One of the goals of the thesis is to develop statistical methods for analyzing senes
of monthly counts of claimants collecting short-term disability benefits fiom the
Workers' Compensation Board (WCB) of British Columbia
We consider four types of forecasts, which are the k-step ahead conditional mean,
median, mode and distribution. For low count senes the k-step ahead conditional
distribution is practical and much more informative than the other forecasts.
We consider three estimation methods: conditional least squares (CLS),
generalized least squares (GLS) and maximum likelihood (ML). In the case of CLS
estimation we find an analytic expression for the information and in the GLS case we fmd
an approximation for the information. We find neat expressions for the score function and
the observed Fisher information matrix. The score expressions leads to new definitions of
residuaIs.
Special care is taken to test for independence since the test is on the boundary of
the parameter space. The score test is asyrnptotically equivalent to testing whether the
CLS estimate of the correlation coefficient is zero. Further we defke a Wald and
likelihood ratio test.
Then we use the generai specification test of McCabe and Leyboume (1996) to
test whether the model is sufficient to explain the variation found in the data
Next we add regressors to the model and update our earlier forecasting, estimation
and testing results. We also show the model is identifiable.
We conclude with a detailed application to monthly WCB claims counts. The
preliminary analysis includes plots of the senes, autocorrelation function and partial
autocorrelation function. Mode1 selection is based on the preliminary analysis, t-tests for
the parameters, the generai specification test and residuals. We also include forecasts for
the f i t six months of 1995.
Table of Contents
.. Abstract .............................................................................................................................. u
.. List of Tables ............................................................................................................. v 1 I
.......................................................................................... List of Figures ................... ...... x
Acknowlegements .......................................................................................................... ..v
1 Inhoducfion ............................................................................................................ 1
1.1 General .......................................................................................................... 1
1.2 Overview of topics ...................................................................................... 4
Poisson AR(1) mode1 ..................,,,.....................................s........o.............. .. 10
2.1 Mode1 definition ......................................................................................... 10
2.2 Interpretation .............................................................................................. 12
2.3 Basic properties ......................................................................................... -13
2.4 Poisson AR@) model ............................................................................... 15
2.5 An illustrative example .............................................................................. 22
Forecasting ............................................................................................................ 26 . . 3.1 Mmmurn Mean squared error .................................................................... 26 . . 3.2 Muiimum mean absolute error ................................................................... 32
. *
3 $3 Forecasts distributions ............................................................................... -33 . .
3.4 Predictron intervals ..................................................................................... 36
3.5 Duration ..................... .,.,, ......................................................................... 39
Estimation . . . . a . . . . . . a ~ . . . . . . . . . ~ m . . . . . . . ~ ~ ~ ~ ~ ~ ~ . ~ ~ ~ ~ ~ a ~ ~ ~ ~ ~ ~ ~ a ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ * ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ * ~ ~ ~ ~ ~ * . . ~ ~ ......41
4.1 Likelihood theory and estimating bc t ions ............................................... 41
............................. 4.2 Conditional Ieast squares for the Poisson AR(1) mode1 64
4.3 Generalized least squares for the Poisson AR(1) mode1 ............................. 69
4.4 The score and Fisher information for the Poisson AR(1) mode1 ............... 8 1
4.5 The score and Fisher Somation for a general AR(1) mode1 ................... 94
4.6 Asymptotics of the conditional maximum likelihood estimaton for the
Poisson AR(1) mode1 ................................................................................ -98
........................................................................... 4.7 Cornparison of rnethods 103
................................................................................... Testing for independence 108
5.1 GaussianAR(1) ........................................................................................ 108
......................................................................... 5.2 Conditional les t squares 1 0 9
5.3 Score test ................................................................................................. 111
5.4 The score fûnction on the boundary of the parameter space .................... 114
General misspecification test ............... ......~............o........o..o......o................ 121
.......................................................... .................................... 6.1 Overview ... -121
............................................................................................. 6.2 Outline test 1 2 3
.................................................... 6.3 Details for the Poisson AR(1) mode1 1 2 7
...................................................................................... Models with covariates 134
7.1 Mode1 definition and introduction ............................................................ 134
.............................................................................................. 7.2 Forecasting 1 3 5
................................................................................................. 7.3 Estimation 142
..................................................................................................... 7.4 Testing 1 4 8
Application to counts of workers collecting disabiüty benefits ................... AS4
8.1 Workers' Compensation Data .................................................................. 154
8.2 Mode1 selection and testing ...................................................................... 162
8.3 Amval process ........................................................................................ 1 7 5
8.4 Forecasting ............................................................................................... 180
8.5 Gaussian AR(1) models ........................................................................... 184
Bibliography ................................................................................................................... 190
Appendix ................... ~..0.......0...........0~...0.......0.....0.....~.....~...0...195
List of Tables
3.3.1 k-step ahead conditional rneans. medians. modes and point mass forecasts ........ 35
3.4.1 95% prediction intervals for the k-step ahead conditional distribution ........... -39
4.7.1 The diagonal elements of the Godambe information rnatrYr for GLS. CLS and
............................................................ CML when a = 0 3 and h = 1 105
4.7.2 The diagonal elements of the Godambe information matrix for GLS, CLS and
CML when a = 0.7 and h = 1 ........................................................... 105
5.4.1 Tests for independence in the illustrative data set ...........
5.4.2 The percentage of t h e the nul1 hypothesis of independence
....................... 119
was rejected out of
1000 simulated series of length 200 with a = O and h = 1 .......................... 119
5 A.3 The percentage of t h e the nul1 hypothesis of independence was rejecied out of
1000 simulated series of length 200 with a = 0.1 and 1 = 1 ........................ -120
8.1.1 A summary of simple statistics for data sets 1 through 5 ........................... 155
8 .2.1 Tests for independence in data set 1 * .................................................. -169
8.2.2 This table displays the seasonal arrival rate for data set 3 .......................... -169
8 . 2.3 This table summaries the parameter estimation for data sets 1 to 5; included are
the parameter estimates and the upper and lower 95% confidence limits ........ 170
8.2.4 This table summarizes the joint information matrut test of models 1,2 and 3 on
data set 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . . . .170
8.2.5 This table contains the mean duration and 95% confidence interval for the
mean duration for data sets 1 *,2,. ..,5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . .l7O
8.3.1 This table summaries the parameter estimation for the arriva1 processes in data
sets 1A to 5A; included are the parameter estimates and 95% confidence
interval. The Iast two colurnns contain estimated arriva1 rates and 95%
confidence intervais fiom the Poisson AR( l ) model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .l78
8.3.2 This table displays the seasonal arriva1 rate for data set 3A. . . . . . . . . ... . . . . . . . . . . ... 179
8.3.3 This table surnrnarizes the information matrix test for data sets 1 A-5A. . . . . . . . . . .179
8.4.1 Individual 95% confidence intervals for the k-step ahead conditional
distribution for data set 1 *. . . . . . . . . . . . . . . . . . . . . . . . . . . .: . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1 8 1
8.4.2 Individual 95% confidence intervals for the k-step ahead conditional
distribution for data set 2. . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 18 1
8.4.3 The k-step ahead conditional means, conditional medians and point mass
forecast for data set 3. . , .... . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 182
8.4.4 Individual 95% confidence intervals for the k-step ahead conditional
distribution for data set 4. . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 182
8.4.5 Individual 95% confidence intervals for the k-step ahead conditional
distribution for data set 5 ................................................................. 182
8.4.6 The marginal means. medians and distributions for data set 3 ..................... 183
8 .5 . 1 The Gaussian AR( 1) mode1 parameter estimates for data sets 1 to 5 ............... 185
8.5.2 This table displays the seasonal arrivd rate for data set 3 given by the
Gauss ian AR(1) mode1 ............................................................................... 186 . .
List of Figures
2.5.1 Times series plot of rnonthiy claims counts of workers collecting STWLB fkom
January 1987 to December 1994. Al1 claimants are male, between the ages of
25 and 34, are employed in the heavy manufacturing industry and are
collecting STWLB due to a burn related injury.. ................................................... -24
2.5.2 Correlogram for the t h e series of monthly claims counts of workers collecting
STWLB fiorn January 1987 to Decernber 1994. Al1 claimants are male,
between the ages of 25 and 34, are employed in the heavy manufachukg
.................... industry and are collecting STWLB due to a burn related injury. -25
2.5.3 Sample partial autocorrelation function for the tirne senes of monthly claims
counts of workers collecting STWLB fiom January 1987 to December 1994.
Al1 clairnants are male, between the ages of 25 and 34, are employed in the
heavy manufacturing industry and are collecting STWLB due to a bum related
injury. ......................................................................................... -25
3.3.1 Time series plot of monthly claims count collecting STWLB fiom January
1987 to December 1994. Al1 claimants are male, between the ages of 25 and
34, ernployed in the heavy manufacturing industry and collecting STWLB due
to a bum related injury. .................................................................... .3 5
4.4.1 Residual plot of the continuation process for the time series of monthly daims
counts of workers collecting S T W B from January 1987 to December 1994.
Al1 claimants are male, between the ages of 25 and 34, are employed in the
heavy manufacturing industry and are collecting STWLB due to a burn reiated
injury. ................... .. .................................................................. -92
4.4.2 Residual plot of the arriva1 process for the time series of monthly clairns
counts of workers collecting STWLB from January 1987 to December 1994.
Al1 claimants are male, between the ages of 25 and 34, are ernployed in the
heavy manufacturing industry and are collecting STWLB due to a burn related
injury. .................................... .,.. ........................................................................ .93
4.4.3 Autocorrelations in the continuation residuals for the time series of monthly
clairns counts of workers collecting STWLB fhm January 1987 to December
1994. Al1 claimants are male, between the ages of 25 and 34, are employed in
the heavy manufacturing industry and are collecting STWLB due to a burn
related injury. ................................................................................ -93
4.4.4 Autocorrelations in the amival residuals for the time series of monthly clairns
counts of workers collecting STWLB fiom January 1987 to December 1994.
Ali claimants are male, between the ages of 25 and 34, are employed in the
heavy manufacturing industry and are collecting STWLB due to a bum related
injury. ......................................................................................... -93
4.7.1 The asymptotic efficiency of conditional least squares as a function of a when
h = 1. ......................................................................................... 104
4.7.2 The asymptotic efficiency of conditional least squares as a function of À. when
4.7.3 Box plots cornparhg the sampling distributions of â when the arrival process
............................................................. { E , } is uniform over {0,1,2}. -1 06
4.7.4 Box plots comparing the sampling distributions of i when the arxival process
............................................................. { E , } is unifonn over {0,1,2}. .107
5.2.1 A comparison of the power for the Gaussian and Poisson based tests as a
h c t i o n of a. h = 1 and n = 100 ......................................................... 1 IO
5.2.2 A comparison of the power for the Gaussian and Poisson based tests as a
fimctionof A . a = 0.01 and n = 100 . . . . . . . . . . . . . . . .c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i l 1
5.4.1 Plots of the likelihood as a hc t ion of a for five sirnulated samples of size
200 with a = O and h = 1. ................................................................. 115
5.4.2 Plots of the likelihood as a function of a for five simulated samples of size 200
with a = 0.1 and 3, = 1. .................... ,. ............................................. 115
8.1.1 A time senes plot of data set 1. .......................................................... -158
8.1.2 A t h e series plot of data set 2. .......................................................... .158
8.1.3 A time senes plot of data set 3. .......................................................... .158
8.1.4 A tirne series plot of data set 4. .......................................................... .159
........................................................... 8.1.5 A time series plot of data set 5 -159
.................................................. 8.1.6 ACF's and PACF's for data sets 1 to 3 160
.................................................. 8.1 -7 ACF's and PACF's for data sets 4 to 5 161
....................................................... 8.1.8 ACF's and PACF's for data sets I * 161
8.2.1 Pearson. continuation and arriva! residuals plotted agauist tirne for data set : * .... 170
..... 8.2.2 Pearson, continuation and arrivai residuals plotted against tirne for data set 2 171
8.2.3 Pearson. continuation and amival residuals plotted against tirne for data set 3 ...... 172
8.2.4 Pearson. continuation and arriva1 residuals plotted against model regresson in
................................... data set 3 ................................................. : 173
8.2.5 Pearson. continuation and anival residuals plotted against t h e for data set 4 ..... -174
8.2.6 Pearson. continuation and arriva1 residuals plotîed against tirne for data set 4 ...... 174
8.5.1 The bar chart represents the k-step ahead conditional cumulative distribution
for the Poisson AR(I) model. while the line graph represents the forecasts fiom
................................................................. the Gaussian AR(1) mode1 186
I thank rny supervisors, Dr. Brendan McCabe and Dr. Marty Puteman, for their
encouragement and guidance throughout my graduate career at UBC.
1 would like to acknowledge the assistance and valuable comments of another
member of my supervising cornmittee, Dr. Bent Jergensen.
1 would like to thank the Workers' Compensation Board of British Columbia for
providing the data, which motivated this thesis. I M e r thank the Workers'
Compensation Board of British Columbia for providing me with some financial support.
1 thank my wife Mary Kelly for her support and patience. Our first son Terry
Freeland was born during our stay at UBC and our second son Sandy Freeland was bom
only a few weeks ago. Both boys have brought us vast arnounts joy and happiness.
Finally 1 would like to thank my supervison, Dr. Brendan McCabe and Dr. Marty
Puteman, for their financial support to me in the past six years.
xiv
Chapter 1
1. Introduction
1.1 General
Count data is generated when the number of occurrences of an event type are recorded.
Examples include the monthly number of motor vehicle accidents, the annual number of
murders, the number of goals scored in a hockey game, the number of stolen bases by a
baseball player in a season and the monthly number of new cancer patients. Ofken the
prime objective in studying count data is to understand how counts are changing over
time, for instance seasonal patterns such as more suicides in February, and trends such as
increases in the nurnber of cancer patients. For this reasons count data is often collected
over time to form a time series. Often the term discrete tirne series is used and includes
more general situations in which the discrete values were not necessarily counts.
Methods for examining and modeling tirne series have been around a long tirne.
The most popular rnethod for modeling time series is the Box-Jenkins modeling
procedure (ARCMA models), developed by Box and Jenkins (1970). The limitation of
these models is that the random variation in the series is assumed to be normally
distributed. When counts are "large" it is often assumed that t h e series of counts c m be
adequately approximated by continuous time series which are normally distributed. The
reasoning for this is that many common distributions for count data (e.g. binomial,
1
Chapter 1. Introduction
Poisson and negative binomial) have an approxirnate normal distribution when the
distribution mean is "large". OAen series have very smail counts or even many
consecutive zeroes. Series such as the monthly number of Polio cases in the US, or the
monthly number of new cases of women with AIDS in Vancouver will typically contain
many consecutive mon& with no occurrences.
Because of the inappropnateness of classic tirne senes methods for the snidy of
time series of count data, much effort has and continues to be put into the development of
methodology necessary to study such time series. It is important to h d out when and to
what extent classical tirne senes methods fail. Throughout the thesis we will compare our
results for the Poisson AR(1) model to the results one would get by incorrectly using a
Gaussian AR(1) model. For example we get inefficient parameter estimates when
estimation is based on the pseudo Gaussian AR(1) likelihood. However when it cornes to
testing for independence in a Poisson AR(1) time series one can incorrectly assume a
Gaussian AR(1) model with Little consequence.
Several authors have studied the time series of polio counts in the US, see Zeger
(1988), Chan and Ledolter (1995), Jsrgensen et al (1995) and Song (1996). These works
are based on the theory of state space models. For an overview of state space models see
Fahrmeir and Tutz (1994). Other authors have taken a different approach to modeling
count time series, by trying to develop models with similar properties of the popular
ARIMA models. See Al-Osh and Alzaid (1987), McKenzie (1988), Joe (1996 & 1997),
Jmgensen and Song (1995) and Song (1996).
Chapter 1. Introduction 3
One of the goals of this thesis is to develop models to use in an analysis of data on
short-term disability (Sm) claim counts at the Workers' Compensation Board of British
Columbia (WCB). The WCB provides disability insurance for more than 130,000
employers in British Columbia Every year the WCB receives about 200,000 new claims.
There are two broad types of disability claims: health care only clairns and wage Ioss
claims. In this thesis claims counts for those workers collecting Short Term Wage Loss
Benefits (STWLB) wiII be examined.
A large portion of the economy in British Columbia has historically been based on
the resource sector. In industries such as forestry, fishing and mining, injuries such as
broken bones, cuts, dislocations and sprains are predorninant. Lately the depletion of
natural resources and other environmental concerns have reduced the activities of many
resource industries. This however has not brought a decrease in the nurnber of clairns,
since injuries fiom other industries are becorning more prevalent. For exarnple, the
nurnber of repetitive strain injuries suffered by store clerks and computer programmers
has increased dramatically. About half of al1 the claims at the WCB are strains and of
these about half are back straïns. The data which have been provided by the WCB
contains the monthly count of the nurnber of workers collecting STWLB. These data are
grouped by 47 injuries, 16 industries, 10 claims centen, 2 genders and 4 age groups,
resulting in 47xl6x10x2x4=60,160 separate series. Of course many of the series have no
occurrences at dl.
Chapter 1. Introduction
Each month the number of workers collecting STWLB is composed of 2
components, new claims and continuing claims. Since the number of continuing c l M s
this month depends on the total number collecting claims last month, these data have an
auto-regressive structure of order 1. The count AR(1) model discussed in Al-Osh and
Alzaid (1987) is appropriate for studying such data. Much of the work done to date on
such AR(1) models for count data has concemed probabilistic issues, such as correlation
structure, distribution and time reversibility. Little or no attention has been devoted to
issues of inference and forecasting. There is a need for statisticians and economeûicians
to study these models keeping statistical and practicality issues in mind. This thesis
focuses on forecasting, estimation, inference, covariates, residual analysis, and the fining
of the mode1 to actual data-
1.2 O v e ~ e w of topics
The outline of this thesis is as follows. In Chapter 2 we begin by presenting the Poisson
AR(1) model, Al-Osh and Alzaid (1987). The appropnateness of fitting such a model to
the WCB claim data is discussed. We can think of disabled workers as remaining in a
queue waiting to get healthy. Under certain assumptions this queuing process is
equivalent to the Poisson AR(1) model. This is important shce it gives justification for
the use of binomial thinning to mode1 the correlation in these data. We continue the
chapter by reviewing known properties of the Poisson AR(1) model. Then we discuss
three extensions of this model to a Poisson AR@) model that are f o n d in the literature.
Our contribution is to show that the partial autocomelation fùnction c m be used to select
Chapter 1. Introduction 5
the number of iags in the Poisson AR@) model of Jin-Guan and Yuan (199 1). That is, the
partial autocorrelations beyond lag p are zero, as they are for the Gaussian AR@) model.
The chapter concludes with an illustration applied to one of the WCB data series.
We consider Chapter 3 to be a major contribution. In this chapter we consider
how to generate forecasts £iom the Poisson AR(1) model. Two criteria are considered for
fmding optimal forecasts, the minimum squared error and minimum absolute error of the
forecasts. The squared error approach results in the conditional mean as the optimal
forecast, while the absolute error approach yields the conditional median as the optimal
forecast. It is interesting to note that the conditionai median is an integer, which may be
desirable from the point of view of data cohesion. We derive formulas for the k-step
ahead conditional mean and variance as well as the k-step ahead conditional moment
generating fünction. Our analysis is concemed with count series, more specifically low
count series. That is, the number of possible outcomes for the k-step ahead count is small.
Ln such cases it is possible and more usehl to calculate the probability mass for each of
these outcomes, which we cal1 point mass forecasting. We show how to fmd confidence
intervals around each point mass forecast when parameter estimates are used. The chapter
concludes by finding confidence intervals for the mean duration of claims.
Chapter 4 begins with a brief review of likelihood theory and es tka thg functions
and follows the exposition in Barndorff-Nielsen and Sprensen (1994) and Godambe and
Heyde (1 987). This theory is used to uni@ the following methods of estimation and
in ference for this model : conditionai least squares (CLS), generalized least squares
Chapter 1. introduction 6
(GLS), and conditional maximum likelihood (CML). Al-Osh and Alzaid (1987)
considered estimation by CLS and CML, but ignored inference except to refer to the
generai results for CLS in Klimko and Nelson(1978). Wooldndge (1 99 1) considers GLS
estimation and inference for parametric models of conditional means and variances. Our
contributions include the following: we prove a strong law of large numbers which
requires sirnply moment restrictions that typically hold for autoregressive processes. Then
we find an analytic expression for the expected information matrix in the CLS case and
derive an approximation for the expected information matrk in the GLS case. Further we
find neat forms for expressing the score function and observed Fisher information matrix
in tems of conditional expectations. This result generalizes to other cases where each
observation is a drawing fiorn a convolution of two distributions. It is relatively easy to
numericd1 y calculate the expected Fisher information. This makes Fisher scoring much
faster than using the observed information in a Gauss-Newton method. We also show that
the Poisson AR(1) model is a -mixing. We directly compare the asyrnptotic efficiency of
CLS to CML, and fmd that the efficiency is one when a = O and decreases as a
increases. Finally, we use a simulation to show that the methods of CLS and GLS are
more robust to model misspecification than CML.
To date in the literature no one has considered any statistical tests on the Poisson
AR(1) model. In Chapter 5 we develop tests to check if a series is independent. This is
important since it would be inappropriate to fit an autoregressive model to a series of
Chapter 1. Introduction
independent mndom variables. We note that special care must be taken since we are
testing to see if the parameter value lies on the boundary of its domain.
A test based on the score is developed and found to be asymptotically equivalent
to a one sided test of the least squares estirnator for a . In fact the difference between the
score and CLS statistics is in the denominators, which are respectively the sample mean
and variance. Since under the nul1 hypothesis, the senes consists of independent and
identically distributed Poisson random variables, the sample rnean and variance should be
asymptotically equivalent. However if the data are over-dispersed then the sample mean
will be smaller than the sample variance, or the score based statistic will be larger than
the CLS statistic. It is M e r found that the score is a well defined martingale for values
of a in a neighborhood of zero, and hence a Taylor series expansion of the score
function around the point a = O is also well defhed. This leads to a one sided score,
Wald and likelihood ratio test.
Chapter 6 considers a generd
ourselves that a series is dependent
specification test for the model. Having satisfied
is it adequately modeled by the Poisson AR(I)
process? This general specification test is often referred to as the information matrix test.
Basically it tests to see if the difference between the two representations of the
information matrix (sornetirnes referred to as the b e r and outer products) is distributed
about zero, which should be the case if the model is correctly specified. Chesher (1983)
showed that the information matrix test can be interpreted as a score test, and amounts to
testing whether the model parameters are stochastic. This test will therefore indicate
Chapter 1. Introduction 8
whether or not over-dispersion is present in the data McCabe and Leyboume (1996)
extended Chesher's result to a much more general setting, including tirne series of
dependent non-homogeneous random vectors. In this chapter we outluie the result of
McCabe and Leybourne and present the details for the Poisson AR(1) model. To assess
the test we simulate 100 series for which the Poisson model is the correct specification
and LOO series for which the Poisson model is a misspecification. We find that the level
of the test is approximately correct for finite series of length 200 and that the test has
strong power.
Ln Chapter 7 we extend the model to include covariates. It is the fmt time anyone
has considered adding covariates to both the continuation process and the arrivai process.
Joe (1996) briefly considers the addition of covariates to the mean of the process. Note
loe's parameterization is slightly different than ours. We h d the k-step ahead moment
generating function. From this we get the k-step ahead conditionai mean, variance and
distribution. Further we find the marginal distribution. Then we show how to construct
individual confidence intervals around the point mass forecasts (k-step ahead conditional
distribution). Next we give necessary and sufficient conditions for the model to be
identifiable. We further give conditions for the model to be a-mixing. If the covariates
are such b a t the model is identifiable, a-mixing and stationary then the maximum
likelihood estimates are asyrnptotically nomai. We conclude the chapter by outlining
testing for independence and for specification.
Chapter 1. Introduction
Joe (1997) applied the Poisson AR(1) model to daily counts of children reporting
symptoms associated with air pollution. The advantage of the WCB data over the data
used by Joe (1 997) is that the WCB data can be interpreted as a queue and hence justifies
our use of the Poisson AR(1) model. In Chapter 8 we apply the methods in the previous
chapters to rnodeling and forecasting five monthly time series of counts of disabled
workers collecting STWLB. Our andysis includes the-series plots as well as plots of the
sample autocorrelation fùnction and partial autocorrelation function. Al1 five senes
appear to be autoregressive of order 1 and one of the series appears to be seasonal. Mode1
selection is based on t-ratios, the information matrix specification test and residuai
analysis. We give a detailed discussion of the interpretation of the residuals.
For the five WCB series we analyze in this chapter we were able to obtain
additional data about the nurnber of anivals each month. The previously analyzed data
contain the total number collecting each month, that is, the nurnber continuing plus the
nurnber of arivals. We use these data to assess whether the arrivals are Poisson. In
addition we directly estimate the arrivai parameters and compare them to the estimates
obtained using the Poisson AR(1) model.
Next we calculate point mass forecasts for the five series. We note that the k-step
ahead distribution quickly approaches the marginal distribution. We conclude the chapter
by comparing the results of the Poisson AR(1) model to the results that would corne fiom
assuming a Gaussian AR(1) model.
2. Poisson AR(1) mode1
In this chapter, we begin by presenting the Poisson AR(1) model, Al-Osh and Alzaid
(1987). The appropriateness of fitting such a model to the WCB c l a h data is discussed.
We can think of workers collecting STWLB as waiting in a queue. When they begin
receiving benefits they enter the queue and when they stop receiving benefits they exit the
queue. Under certain assurnptions this queuing process is equivalent to the Poisson AR(1)
model. Next we review some basic properties of the model, which are shown to foliow
&om the moment generating function. We then discuss extensions of the model to a
Poisson AR@) model which has properties similar to the Gaussian AR@) model. We
conclude the chapter with an illustrative exarnple which demonstrates the theory covered
in this chapter.
2.1 Mode1 formulation
Let X, , X, , . , X, be a series of dependent Poisson counts generated according to the
model
where X, has a Poisson distribution with parameter h/(l -a), writien as X, - PO(&),
and { is a series of independent identically distributed (iid) Poisson random
10
Chapter 2. Poisson AR( 1 ) model
variables with parameter h , that is 6 , - Po(h). The thinning operator " 0 " is defmed as
follows; given XI-, , a 0 Xr-, = £3, (a), where BI, (a),B,, (a), . . . ,B,r r- , , (a) are iid
Bernoulli random variables with P(B, (a) = 1) = 1 - P(B, (a) = O) = a . Since a 0 X,-,
given ,Y,-, is a sum of iid Bernoulli random variables if follows that it has a binomial
distribution with parameters a and X,-, , written as a 0 XI-, / X,-, - Bi(a, XI-,). The term
thinning operator is used since its operation is to randornly thin out or reduce the
number in a group. It is M e r assumed that B, (a ) and E , are independent.
For cornparison purposes we defme the Gaussian AR(1) model as,
where {rl,}~, is a sequence of independent identically distributed normal random
variables with mean zero and variance a'.
An important distinction between the Poisson AR(I) model and the Gaussian
AR(1) model is that in the Poisson model X, is composed of two random components,
the survivorship component a 0 XI- , 1 X,-, , and the new entrant (innovation) component
E , . in the Gaussian model given X,-, the first component aXt-, is not random. The
Poisson mode1 is also complicated by the fact that these two random components are not
observed. That is the distribution of X, given Xt-, is given by the convolution of the two
random components. This makes the Poisson model much harder to work with than the
Gaussian model.
Chapter 2. Poisson AR(1) mode1
This model has been studied by Al-Osh and Alzaid (1987) and McKenzie (1 988).
Joe (1996) extended this model by developing a general method to define a random
operator for cases were the marginal distribution is in the convolution-closed infuiitely
divisible class. His method is consistent with the AR(1) Poisson model in that it defines
the binomial thinning operator when the marginal distribution is Poisson.
2.2 Mode1 interpretation
This model can be interpreted as a birth and death process, see Ross (1983, Section 5.3)
for an introduction to birth and death processes. Each individual at time. t - 1 . has
probability a of continuing to be dive at time, I , and at each time, t , the number of
births follows a Poisson distribution with mean h .
Altematively, the model can be interpreted as an infinite server queue, see Ross
(1983, Example 2.3 b). The service time is geomeaic with parameter 1 - a and the
arrival process is Poisson with mean 1. One fundamental resuit fiom queuing theory is
that the expected length of the queue is equal to the arriva1 rate tirnes the expected
waiting tirne, or L = W , where , L , 1 and W are respectively the expected queue
length, the anival rate and the expected waiting tirne, see Little (1 96 1). In this example
the mean waiting tirne, W , is equal to l/(l -a) and the expected queue length
is L = L / ( l - a ) .
With regards to short-terni disability claims, X, is the number of worken
collecting short-tem wage Ioss benefits (STWLB) at time t . It equals the sum of the
Chapter 2. Poisson AR(1) model
number of workers continuing to collect £iom tirne t - 1, a 0 XI-, , and the number of
new claims at tirne t , E , . The waiting time l/(l -a) is the mean number of months that
a newly disabled worker is expected to collect STWLB. This i s referred to as duration at
the WCB and is an important input into managerial decision making. Note that it is not
directly extractable from available WCB data
2.3 Basic properties
In this section we look at the stationary distribution of the Poisson AR(1) process and
basic quantities, such a s mean, variance, moment generating fùnction and correlation.
For the Poisson AR(1) model the conditional mean and variance of X , given
X,-, are respectively E [ X , 1 Xr-, ] = a&-, + k and Var[ X, 1 Xr-, ] = a ( 1 - a ) Xr- , + I . The
Gaussian AR(L) model has the same conditionai mean but the conditional variance, o',
is different, McKenzie ( 1 988).
In the following proposition the stationary distribution for the Poisson AR(1)
model is given and eorn this we c m fmd the unconditional moments of XI. McKenzie
(1988) sketches the proof of this result, we give a simple proof using moment generating
functions.
Proposition 23.1 m e n X, is Poisson with mean h/( l - a ) the marginal distribution of
X , is Poisson with mean k/(l -a).
Chapter 2. Poisson M(1) mode1
Prooj We use induction to prove the result. By assurnption the result is true for X, . We
now assume Xr-, is Poisson with mean A/(l -a), that is the moment generating
h function of Xt-, is M , (r) = exp -(es - 1)) . The moment generating fünction of X, (1 -a
is,
= E E exp sa 0 x,-, + s E , ) I x , - , ] ] [ [ (
= E [( ae' + (1 - a)) exp(h(es - l))]
where es' = aes + 1 -a . Making this substitution gives us,
Thus the marginal distribution of X, is Poisson with mean X/(l -a ) and the
result follows.
Remark
Since the marginal distribution of X, is Poisson it follows that the unconditional mean
Chapter 2. Poisson AR( 1) mode1 15
and variance of X , are both equal to À/(l- a) . in addition the first three marginal
3 moments of X , are respectively & , & + (&)' and & + 3(&)2 + (&) .
An important quantity for this mode1 is the autocomelation fhction, which is the
k same as in the standard Gaussian AR( 1 ) model, and is given by p, = a , k = 1,2,. . . , see
Al-Osh and Alzaid ( 1987).
2.4 Poisson AR@) model
A natural extension of the Poisson AR(1) mode1 is the following Poisson AR@) model,
where { E ,)= is again a sequence of iid Poisson random variables, " 0 " is the binomial ,=1
thinning operatar, given ,Y,, , Xi,. . . , XI-, a , O X,-I, a O . , a 0 XI - p and E , are
mutudly independent and a, E [0,1], j = 12,. . . , p. Note that the marginal distribution is
Poisson only for the case p = 1. This extension is studied in detail in Jin-Guan and Yuan
(1991).
Alzaid and Al-Osh (1990) examine a different Poisson AR@) model where the
thinning operator is defined such that a, 0 X, ,a 0 X, ,. . . ,a 0 X, given X, has a
multinomial distribution with parameters a, ,a,, . . . ,a and X, . A non-linear Poisson P
AR@) model is found in Joe (1996). Joe writes his model as X, = A, (x,-, , X,-, , . . . , XI- , )
+ E~ where the probability function of A,(x,-, , X,, ,. . ., XI- , ) is a p l dimensional sum.
Chapter 7. Poisson AR(1) model
The model by Jin-Guan and Yuan (1991) has the same autocorrelation function as
the Gaussian AR@) model. The parameters are easily estirnated by conditional least
squares, see Theorem 2.4.4. A drawback to this model is the lack of a physical
interpretation as in the Poisson AR(1) model. Jin-Guan and Yuan give no interpretation
of this model, nor c m we think of any physical system it could represent. However it is
often the case that ARMA models do not have physical interpretation, so this may not be
much of a concem.
It is possible to give a physical interpretation of the model in Alzaid and Al-Osh
(1990) but the interpretation is a bit odd and is as follows. Let ,Yr be the number of
newborn femdes in period t . Each female has reproductive life span of p periods and is
permitted to have at most one offspring. From the cohort X, of newbom females the
distribution of their female offspring in the next p periods is assumed to be multinomial.
That is they let a, 0 Xr ,a, 0 X, , . . . , a , 0 X, denote the female offspring respectively in
periods t + l , t + 2 ,..., t + p and assume that a, 0 X,,a, 0 X ,,..., a, 0 X, given X, has a
multinomial distribution with parameters a,, a ?, . . .,a P and X, . The innovation E ,
represents the nurnber of newbom females that immigrate into the system at tirne t- The
autocorrelation function for this process is the same as the Gaussian ARMA@,p-1)
process.
The higher order AR@) models of Joe (1996) are extremely difficult to use in
practice. The conditional moments for his model are nonlinear and involve the calculation
of rnulti-dimensional s m s . In his AR(2) model the calculation of E [ X , 1 x,-, , x,-,]
Chapter 2. Poisson AR( 1 ) rnodel
requires the calculation of four 2-dimensional s u s . This is a major draw back since it
makes conditional least squares estimation much more difficult. Maximum likelihood
estimation is possible but dso dificult, since even when p = 2 the conditionai
probabilities involve Cdimensiond sums. Joor (1997) contains an exatr.p!e where an
AR(2) model is estimated. However the model does have a physical interpretation and as
a special case can be interpreted as a queue. For the AR(2) modcl there exists random
variables 2,. 2, - ,Z,, - Z12, Z,, ,Z, and Z,, such that the distribution of X+, , ,Y,+, , X,,, is
the sarne as the distribution of Z,+Z12+Z13+ZlLt , z2 + Z12 + ZL) + Z13
Z, + Z,, + 5; + Z,, . This is not a queue since Z1, represents the number present at both
time t+ 1 and t+3 but are not present at tirne r+2. It is possible to make Z,, degenerate and
equal to zero, in which case the rnodel becomes a queue.
In the remainder of this section we examine the properties of the Poisson AR@)
model of Jin-Guan and Yuan (1991) and henceforth refer to it shp ly as the Poisson
AR@) model. The objective of the rest of this section is to show that we can use the
sample autocorrelation function and partial autocorrelation function to select the order in
a Poisson AR@) rnodel in exactly the same way as they are used to select the order in a
Gaussian AR@) model.
Theorem 2.4.1 (Jin-Guan and Yuan, 1991) Let {E ,}" be c3unt valued random variables i=l
with mean p ,finite variance a * and let a , ~[0,1], j = 12,. . . , p . Ifthe mots of
Chapter 2. Poisson AR(1) model
are imide the unit cirde, rhen there exists a unique stationan. count valued time series
{X, ) which satisfes (2.4.1) and COV[X,, E , ] = O , s < t .
The proof of this result is long and is found in Jin-Guan and Yuan ( 1 99 1).
We d e h e the sample covariance and sample correlation respectively as Y,
=~-'C:::(X~-X)(X,~, -K) and fi, =y,/yo.
Theorern 2.4.2 The P o h n AR@) process {x,} defined &y (2.4.1) is ergodic.
Theorem 2.43 f , and fi, are stronglv consistent.
Jin-Guan and Yuan (1 99 1 ) show that the Poisson AR(1) process {x, ) ergodic and use
this to prove the strong consistency of the sample covariance's and correlations.
The following result, also from Jin-Guan and Yuan (1991), implies that the
autocorrelation function of the Poisson AR@) model is the same as for the Gaussian
AR@) model. We provide a simple alternative proof to that found in Jin-Guan and Yuan
(1 99 1).
Proposition 2.4.1 The Yule- WaIker equation, p, = a , p ,-, +a, p ,-, + * -+a, p,-, . holh
for the Poisson AR@) model.
Chapter 2. Poisson AR(1) model
ProoJ Multiplying (2.4.1 ) by XI-, and taking expectations we get,
Next we take the expectation of (2.4.1) and multiply by E[x,-,] to get,
We note that E[E,x,-,]= E[E,]E[x,-,] and that because of stationarity E[x,-,x,-,]
- E[x,-,]E[x,-, ] = c o v [ ~ , , x,,-, ] = y s-k . Taking the difference between (2.4.2) and
(2.4.3) we get y , = a ,y ,-, +a ,y ,-? t -+a ,y ,-, . Finally dividing this by y , completes
the proof.
There is a common misconception that the sample autocorrelations for a senes of
iid data will be zero. In fact for iid data we can expected one autocorrelation out of 20 to
be larger in absolute value than 2rCX and as a result values of lbk 1 larger than 2 t i X are
statistically significant at the 5% level.
Parameter estimates for the Poisson AR@) model can be found using the method
of conditional least squares, Klimko and Nelson (1978). In this method the parameter
estimates are choosen to minimize the sum of squared distances between X, and
E [ X , 1 x,-, , X+, ,. . .] . Details for the Poisson AR(1) case are found in Section 4.2.
Chapter 2. Poisson AR(1) model
A
Theorem 2.4.4 The conditional l e m squares estimares. 6 ,, ,&, ,. . . ,& and k,, , cf P*
a, ,a ?,. . .a and h are strongly consistent andfirther
where C is ajnite covariance mawix.
Jin-Guan and Yuan (199 1) prove this by noting that the Poisson AR@) satisfîes the
conditions needed in Klirnko and Nelson (1978) for the conditional least squares
estimates to be consistent and asymptotically normal. The covariance matrix Z is equal
to the inverse of the Godambe information, see Section 4.1.1 and an expression for it is
given in Jin-Guan and Yuan ( 199 1 ).
Defuiifion 2.4.1 The pth partial autocorrelation, rr , , is the last coefficient, a, , when
fitting a Poisson AR@) model, and measures the excess correlation at lag p which is not
accounted for in a Poisson AR@- 1 ) model.
The following new result is useful in model selection, since it shows that a
Poisson AR@) process has the same partial autocorrelations as a Gaussian AR@) process.
Corollary 2.4.1 lf the series {x,} follows a Poisson AR@) process and satisfies the
conditions in Theorem 2.4.1 then the partial autoco~elutions beyond lag p are zero. that
Chapter 2. Poisson AR( 1) model
is a,,, =O. for k 2 1.
Proof: By Theorem 2.4.1 X, is uniquely defined by (2.4.1) with p lags. Therefore the
only way to represent X, is by the following equation,
- a ,,, = O and hence xPr , = O . is to set a,+, = =---
cl
As a result of Theorem 2.4.4 we can use conditional least squares to €id strongly
consistent estimates of the partial autocorrelation coefficients.
Another estirnate cornes fiom the Yule-Walker equations. Setting k = p in the
Yule-Waiker equation and solving for a , gives us a p = p , a , p , - , + x 2 p p - ?
-. . .* ,-, p, . This suggests the foilowing estimate for rc .,
where 6 , ,&, ,. . . ,â are any & consistent estimates of a , ,a2 ,. . . ,a when fitting a P - 1 P- 1
Poisson AR@- 1) model.
If {x,} is a sequence of iid random variables satisfjring certain mild moment
restrictions, for example finite variance and fourth moment, then nX6, has a standard
normal asymptotic distribution, where the sample partial autocorrelation coefficient 6, is
defined as in (2.4.4). As a result, values of le,[ larger than 2 f K are statistically
Chapter 2. Poisson AR(I) model
significant at the 5% level.
2.5 An iliustrative example
To illustrate the points in this chapter and the following chapters we have selected
one of the series form the WCB clairns data set, which is analyzed in more detail in
Chapter 8. As mentioned in Section 1.1 the WCB data are rnonthly claims counts of
workers collecting STWLB and are grouped into more than 60,000 separate series.
One of the categones in which the data is grouped is type of industry, most of
which are strongly affected by seasonality. Examples of seasonally affected industries are
logging, hotels, restaurants, fishing, and retail. To model the senes fiom these industries
it will be necessary to add seasonal regressors to the Poisson AR(I) model defmed in
Section 2.1. The addition of covariates or regressors to the model will be covered in
Chapter 7.
The heavy manufacturing industry is one which we feel is less sensitive to
seasonality. It is fiom this industry that we have selected a series to examine in the
foilowing exarnple.
Example 2.5.1 The data series used in this example and the examples in Chapters 3
through 6 consist of rnonthly claims counts of workers collecting STWLB f?om the
Richmond claims center between January 1987 and December 1994. All the claimants are
males, are berneen the ages of 25 and 34, are employed in the heavy manufacturing
industry and are collecting STWLB due to a bum related injury.
Chapter 2. Poisson AR(1) mode1 23
A time series plot of this data is given in Figure 2.5.1. From this plot the data
appear to be stationary and non-seasonal. This is confïxmed by the correlogram, Figure
2.5.2. In a non-stationary time senes the autocorrelations do not corne down to zero
except for large lags. If the series were seasonal then we would expect the absolute value
of the sample autocorrelation at lags 6 and 12 to be large. The correlogram for an AR(I)
process with O < a < 1 should move to zero exponentially as the lag increases. While the
autocorrelations in our correlogram approach zero quickly it does not appear exponential.
A good discussion on interpreting correlograrns is found in Chatfield (1989) section 2.7.2
and 4.1.1.
Fi,we 2.5.3 is a plot of the sample partial autocorrelations, which were calculated
using (2.4.4). This data set consists of 12 x 8 = 96 observations. As noted at the end of
Section 2.4 any partial autocorrelation larger than 214% = 0.204 is statistically
significant at the 5% level. Consequently the second partial autocorrelation is on the
border of being significant.
Since the correlogram does not give any strong evidence against an AR(1) plus
the fact that we have a physical interpretation which suggest an AR(1) rnodel, it is
appropriate to proceed assurning an AR(1) model.
Suppose we know the '?me" parameter values are a = 0.40 and A = 5 2 (these are
actually the maximum likelihood estimates caiculated in Section 4.6). Then the
unconditional expected number of claimants collecting STWLB each month is
5 2 / (1 - 0.40) = 8.67 . The expected waiting tirne, or the expected number of months that
Chapter 2. Poisson AR(1) mode1 24
a new injured claimant can expect to be off work, is 1 / (1 - 0.40) = 1.67. A property of the
geometric service tirne is that it is memoryless. That is, at the start of the current month
al1 claimants collecting this month, both old and new claimants, can expect to collect, in
addition to the current month, for an average 0.67 months.
Cla ims Counts (heavy manufacturing indu*, males, ages 25-34, bums)
20 I
Figure 2.5.1 Times series plot of monrhly clairns counu of workers collecring STWLB /rom January 1987 ro December 1994. All claimanu are male, benveen rhe ages of25 and 34, are employed in the heavy manufachrring indusrry and are collecting STWLB due to a burn relared injuy.
Correlogram (heavy manufacturing indu-, males, ages 25-34, burns)
- . * . . - * * - - . . - - - - - - . - - - - * * * - . * . - - - - - - - * . - - - - - - - - - - . -
A A - - A A A - v T ?
Figure 2.5.2 Correlogramfor the tirne series of monthly claim counts of workers collecting STWLBfim Januaty 1987 ro December 1994. Ali claimants are male, beiween rhe a g a of 25 and 34, are employed in rhe heavy rnanufacturing industry and are collecring STWLB due ro a burn &red injuty.
Chapter 2. Poisson AR(1) mode1 25
Figure 2.5.3 Sample partial auroconelatton JUIlcrzon for the lime series of monthly c la im counts of workers collecting STWL.B/rom Janua-, 1987 io December 1994. Ail claimants are male. between the agez of25 and 34. are employed in the heu- manufacruring industly und are c o l k i n g STWLB due to a bum related injury.
Chapter 3
3. Forecasting
The first three sections of this chapter conceni forecasting when the model parameters are
known. We begin by considering two cnteria for optimal forecasting: minimum mean
squared error forecasting and minimum mean absolute error forecasting. The first
criterion almost always results in a non-integer forecast and is the same point forecast that
the usual Gaussian AR(1) model would produce. The second critenon leads to an integer
forecast which is attractive fiom the point of view of data cohesion. When forecasting a
future count fkom a low count tirne series there is only a srnall set of possible values the
future outcome is Iikely to take. In these situations it is practical and desirable to give the
individual probabilities for each outcome in the set. We cal1 this point mass forecasting.
A fourth type of forecast is the conditional mode, which is found by selecting the
outcome (point mass forecast) with the largest probability. We conclude the chapter by
consûucting individual confidence intervals for the point mass forecasts when using
parameter estimates.
3.1 Minimum mean squared error
Consider a sarnple {x,},", fiom the Poisson AR(1) mode1 in Chapter 2. The objective is
to find a forecast, X,,, of X,, that minimizes the expected squared error given the
26
Chapter 3. Forecasting
sample. That is to find &+, which minimizes
The first order conditions are,
This irnplies that the conditional mean, k,v+k = E[x,,, 1 X , ] , is the forecast of X,vTk
that minimizes the mean squared forecast error. Note that this is a general result when
forecasting tirne series and does not depend on the model.
The next result is for the k-step ahead conditional mean, and has never explicitly
appeared in the literature except for the case were k=l . It is the same as the k-step ahead
conditional mean for the Gaussian AK(I) model.
Proposition 3.1.1 In the Poisson AR(I) model the k-step ahead conditional mean is
ProoJ We prove the result by induction. Since E , is independent of a 0 X,-, the one step
ahead conditional mean is
Chapter 3. Forecasting
Now suppose that the k-l step ahead conditionai mean is
Then the k-step ahead conditional mean is
and the result follows by induction.
O
It is also interesting to know the variation of X,_, given X, around the forecast k*,,, ,
which is given in the following proposition.
Proposition 3.1.2 In the Poisson M(I) model the k-step ahead conditional variance is
1-ak given by VO~[X,+, 1 x,] = a' (1 -a ')x, + - h, k=1,2,3 ,....
1 -a
ProoJ We prove the result by induction. The one step ahead conditionai variance is,
~ a r [ ~ , + , ~ ~ , ] = a ( l - a ) ~ , +k .
Now suppose that the k-1 step ahead conditionai variance is
Chapter 3. Forecasting
Then the k-step ahead conditional variance is
For the Gaussian AR(1) mode1 the k-step ahead conditional variance is
(l - a ' k b ' .
As k goes to infinity the conditional mean and variance respectively go to the
stationary (unconditional) mean and variance of the process. That is
lim,,, qx,w I X,] = %-a and lim,,, var[&, I X , v ] = %-a .
A third result which actually includes the two previous results is the conditional
moment generating function of X,,, given X, .
Chapter 3. Forecasting
Theorem 3.1.1 For the Poisson AR(I) mode! the distribution of X,,, given X, is a
convolution of a binomial distribution with parameters ak and X , and a Poisson
distribution with parameter h S. Thhot is, the k-step ahead conditional moment
genemtingjhction is given by
Proofi We prove the result by induction. The one step ahead conditional moment
generating function is given by
Now suppose that the k- 1 step ahead conditional moment generating function is
Then the k-step ahead conditional moment generating function is
Chapter 3. Forecasting
where e' = a *-'es + (1 -a "' ) . Substituting this for er gives
The above result shows that the distribution of X,,, 1 X , is a convolution of a
binomial distribution with parameters a and X, and a Poisson distribution with
pararneter 1 . Hence it has mean a 'x, + h and variance
a ' (1 - a ' )x, + h E, which of course agrees with Propositions 3.1.1 and 3.1.2. This
extends the results in McKenzie (1988) to cases were k> 1.
Chapter 3. Forecasting
In the usual Gaussian AR(1) mode1 X,-, 1 X, has a normal distribution with
mean a' X, + A +$ and variance s a ' . So while the conditional mean of XLV4 j X,
in the Poisson and Gaussian models are the same, theû conditionai distributions are quite
different.
Coroliary 3.1.1 Let p, denote the distribution of X,,, 1 X , and let p be the
dishibufion of a Poisson randorn variable with mean &. Then. pk p . Thar is p,
converges weakly to p or XdVTk 1 X , has n Poisson lirniting distribution with mean &.
ProoJ From (3.1.1) we have
which is the moment generating function of a Poisson distribution with mean & . The
result follows since convergence of the moment generating function implies convergence
of the probability measure whenever the moment generating function exists in a
neighborhood of zero, see for example Billingsley (1 986).
3.2 Minimum mean absolute error
The objective in this section is to find a forecast, kN+, of XN+, that mùiimizes the
Chapter 3. Forecasting 33
expected absolute error given the sampie. That is to find T,_, which minimizes
E [ I X . ~ . * - ~ , , ; l x , ] -
Let p, (XI X,) be the conditional probability function of X,,, given X,v . We
will define the conditional rnedim of X,+, given X , as the smallest non-negative
integer m, for which Cm' p, ( X I X, ) 2 05 . An alternative definition is to let m, be the x=o
largest non-negative integer such that pk (XI X , ) 5 05. However if p, (O( X, ) > 05 x=o
then the median would not be defmed.
Proposition 3.2.1 In the Poisson AR(0 mode1 the k-step ahead conditional median is the
forecast which minimizes the erpected absolute forecast e m r . That is. E[~x,,-X,,llx,]
hm a global minimum ut X,v+k = mk .
ProoJ Suppose that &+, is between rn - 1 and m , where m is a non-negative integer.
Then E[(x,~, - ~,.,llx,] = xz(i.v-, -r), ( x ) - 1'-(X,, - x ) ~ , ( x ) . The slope of this expectation as
a function of k,v+k is ~ m - ' p k ( x ) - z* p, (x) . If m I m, the dope is negative and if x=O x=m
m 2 m, + 1 the dope is positive. The minimum therefore occurs at k,,, = mk .
cl
It is interesting to note that we did not have to restrict our search to values in the non-
Chapter 3. Forecasting
negative integers in order to get an integer solution.
3.3 Point mass forecasts
In this section we look at point mass forecasts, or the k-step ahead conditional
distribution. in cases were the counts are low, the median forecast may not be very
informative. For example, consider the following 2 cases of a discrete random variable
X: case 1, P (X=O)=l -P (X=l )=50 andcase2 P(X=0)=1-P(X=5)=.90. In
both cases the median of X is O and the mean is 0.5, but in case 2, there is almost twice
the probability of observing a zero. Since there are only 2 outcomes in this example it
would be more informative to give the probability distribution for the outcomes.
By Theorem 3.1.1 the distribution of X,,,,, given X, is a convolution of a
binomial distribution with parameters a k and X, and a Poisson distribution with
parameter h . The probability mass function of X,,, given X , is,
X-S
1 k .Y,-s (ak)'(l-a ) 1-ak
(x - s)! . x i . { - + } ( E ) 1-a 1-a ( 3 . 3 4
in the following example we illustrate the conditional mean, median, mode and point
mass forecasts. The conditional mode is easy to find since it is the point, x, at which the
probability mas, p, (4 X. 1, is largest, where x is a non-negative integer.
Example 3.3.1 This is a continuation of Example 2.5.1. Our series consists of 8 x 12
= 96 observations and the last observation is X,, = 1 1. Again we assume that the "true~'
Chapter 3. Forecasting 35
parameter values are a = 0.40 and )c = 52. Figure 3.3.1 is a time senes plot of the
observations and predicted values (one step ahead conditional mean forecasts). It
indicates that one step ahead conditionai mean forecasts are reasonably good.
Conditional forecasts for the fint 6 months of 1995 given a count of I l in
December 1994 are displayed in Table 3.3.1. The table includes the k-step ahead
conditional mean and median forecasts as well as the k-step ahead conditional distribution
(point mass forecasts). The last column contains the limiting distribution or unconditional
distribution,
Monthly Claims Counts (heavy rnanufacturing indu*, males, ages 25-34, burns)
20 1
Figure 3.3.1 Time series plot ofrnonthly d a i m counr collecting STWLB /tom Januaty 1987 to Decernber 1994. Al1 claimonts are male. between the ages of 25 and 34, ernployeà in the heavy rnanujâctunng i n d w s , and collecring STWLB due to a bum relared injuty.
Chapter 3. Forecasting
mode
z:=* pk (4 1 1 ) A 1)
Pk(6(11)
~k (711 1)
P, (811 1)
~k ('1 l ') pk(lql l)
P, (1 li 1 1)
~k ( l 21 ') ~ t ( 1 W )
~k (14!' l)
I
Table 3.3.1 k-srep ahead con modes and point maïs forecasrs.
Remarks
1 . The conditional mean forecasts quickly approach the unconditionai mean of 8.67.
2. The conditionai median forecast is 9 and remauis unchanged except in the lirnit where
3. The conditional mode is easy to fhd fiom the table of point mass forecasts
(conditional distribution). For example, the largest 1-step ahead point mass, p,(x(l l),
Chapter 3. Forecasting
is 0.142 and occurs when ~ 9 .
4. The 6 step ahead conditional distribution is very close to the limiting distribution.
3.4 Prediction intervals
The point mass forecast in Section 3.3 were calculated assuming the "tnie" parameter
values were known. Since in practice we have to estimate the parameters we would like
to include this uncertainty by constnicting point mass prediction intervals. The following
result, which is found in Serfling (1 980), will be used to develop the prediction intervals.
Theorem 3 4.1 Suppose thaî the sequence. { Y , }LI, of k-dimensional random vectors is
asyrnptotically normal with mean p and covariance rnahir n-' B = n-' [CJ ,] . Leî g(y) be
a real-valued jùnction having a non zero derivaiive al y = p . Then g( Y, ) is
and covariance rnaîrix a.symptoticaZ[v normal with mean
Now suppose we have a sarnple of size n and denote the maximum likelihood
h
estimates for this sarnple by â, and A,. Under mild regularity conditions, see Section
r 4 . 1 ( â n ) is asymptotically normal with mean (a,,~,)' and variance di-', where
i is the Fisher information matrix and a, and ho are the ' h e " parameter values.
h
As a consequence of the above result for fixed x, p, (XI X,;â, , L n ) h a an
Chapter 3. Forecasting
asyrnptotically normal distribution with mean p, (XI Xn ;a,, ho ) and variance
where ii' and i;' are the diagonal element of i-' and iz is the off diagonal element. The
partial derivatives of the point mass probability p, (XI Xn ;a, h) can either be f o n d
directly or with the help of the expressions in Section 4.5 and are given by
and
An approximate 95% confidence interval for p, (xl Xn ;a,, I L , ) is
We now continue Example 3.3.1.
Example 3.4.1 The maximum likelihood estimates for our data set are 6 = 0.40 and
= 5 2 . The inverse of the expected Fisher information rnatrix at these parameter
estimates is given by
Chapter 3. Forecasting
The details of these calculations are found in Example 4.4.1.
In section 4.1 we show that the Poisson AR(I) mode1 satisfies the regularity
conditions for the maximum likelihood estimates to be asyrnptotically normal.
Consequently we can use (3.4.4) to construct individual approximate 95% prediction
intervals for the k-step ahead point mass forecasts. Table 3.4.1 contains the 95%
prediction intervals for the fmt 6 months of 1995.
Remarks
1) The width of the prediction intervals increases with the number of steps ahead.
However after about 6 steps ahead the width changes very little as it is very close to its
maximum.
2) The prediction intervals for the 6 step ahead conditional distribution is almost the
same as the prediction intervals for the unconditional (limiting) distribution.
3) These are individual confidence intervals for the probabilities in the k-step ahead
conditional distribution NOT the forecasts.
Chapter 3. Forecasting
3.5 Duration
In this section we examine duration, which is the number of months that the WCB
continues to pay a clairn. Under the Poisson AR(1) mode1 assumption the duration time is
geometric with parameter 1 -a , see Section 2.2, and the mean duration is (1 -a)-' .
Suppose we have a sample of size n and denote the maximum likelihood
Chapter 3. Forecasting
A
estimaton by â ,, and h,, . We assume that 6, is asymptotically normal with mean a,
and variance o2(a0, ho), where c'(a,, ho) is that portion of the inverse of the Fisher
information matrix which pertains to a , and a, and k, are the "true" parameter values.
As a result of Theorern 3.4.1 the estimated duration (1 -6,)-' is asymptoticalty
normal with mean (1 -a ,)-' and variance (1 -a ,)%'(a,, h,)n-' . Hence an approximate
95% confidence interval for the mean duration is (1 -6 ,,)-' f 196(1- 6 n)-'O(ân, i n )n - ' ! .
Example 3.4.1. For our illutrative example the estimated duration is (1 - 0.40)-' = 1.667
months and an approximate 95% confidence interval for the mean duration is
or (1.229,2.104).
Chapter 4
4. Estimation
Section 4.1 outlines the asymptotic theory of estimating functions and is illustrated by
reviewing estimation and inference for the Gaussian AR(1) model. In Sections 4.2 and
4.3 we show that the methods of conditional least squares (CLS) and generalized least
squares (GLS) sat ise the regularity conditions for asymptotic normality of Section 4.1.
Further we denve an analytic expression for the Godambe information matrYt in the CLS
case and an approximation to the Godambe information matrix in the GLS case. Then in
Section 4.4 we show that the score fiuiction and obsewed Fisher information matrix can
be neatly represented as conditional expectations. Futher in Section 4.5 we show that we
can generalize our expressions in Section 4.4 to generai AR(1) models. In Section 4.6 we
show that the method of maximum likelihood (ML) estimation satisfies the regularity
conditions for asymptotic normaiity. Finally in Section 4.7 we compare the asymptotic
efficiency and robustness of the CLS and ML.
4.1 Likelihood theory and estimamg hinetions
The goal of this section is to present a unified approach to estimation through the use of
estimating functions. We provide the estirnating function approach since it includes many
standard estimating techniques, such as maximum likelihood, generalized method of
moments, conditional least squares and generalized least squares.
42
Chapter 4. Estimation
A detailed review of likelihood theory and an introduction to estimating functions
is found in Barndorff-Nielsen and Sprensen (1994). Godambe and Heyde (1987) give
more details on the asyrnptotics and optimality of estimating functions. Both of these
papers consider estimation and inference for multivariate continuous time stochastic
processes. However in this review we will resûict ourselves to univariate discrete time
stochastic processes. Many applications of estimating functions can be found in Godambe
(199 1).
Let {x,)' be a discrete t h e stochastic process for which we consider r = l
parametric statistical models of the form ( o n , Y, fe;û E @) , where Rn is the sample
space, 3 is a sigma field, 4" is a probability measure defined on the measurable space
(CF, Y ) and O is an open subset of W. We assume that al1 of the probability measures
in the family Pn = { ~ ; 0 EO} are dominated by a common cr -finite measure p and we
let
denote a version of the Radon-Nikodym derivative.
Definition 4.1.1 We call a p-dimenriona[ finction Y,, (0 ) = Y,, (x, , X, , . . . , X,, ; 8 ) a
regular esthatirtg funcrion if it safisfies the following conditions for al2
8 = (8 , ,8 ,. . . ,O ,) €0.
Chapter 4. Estimation
1) Y,@) is a zero menn square integrable martingale with respect to &n and the sigma
field 3. = o ( X , , X2 ,..., X,),
2) the covariance matru: E, [Y: ( 0 ) ~ ~ (0 )] , is positive defznite,
3) Y,, is almost suretv dz,&erenfiable with respect to the components of Cl .
4) Y, is nomingular, where Y, denote the rnatrix of partial derivatives of Y, with
respect ro the components of 0 .
The parameter 8 is estimated by solving the system of equations Y,, (8) = O and
we let 6 denote the estimate. Estirnating functions or estimating equations arise fiom
many standard estimation techniques. The first order conditions for minimizing
conditional least squares is an example of an estimating equation. The score function is
another comrnon estimating function. In the case were Yn (8) is the score f i c t i on we
will write U , instead of Y,. Finally note that Y,, is also cdled an inference fùnction,
since both estimation and inference are based on it.
Defulltion 4.1.2 A sequence {3;,}= of sigma fields is said to be aààpted to a sequence i = t
{ x } of random variables i f X, is 3, measurctb!e. Further, { x } is called an
adirpted sequence.
Definition 4.1.3 Let (3, ): t-1 be an increasing seguence of sigma fields. An adapted
sequence {x, ,3, ) z, on a probabilig space (a, 3, P) is called a marîingafe if for al2 t
Chapter 4. Estimation
E[Ix,I] erists and Lsjnite and E [ x , / ~ , - , ] = X ,-,.
Definition 4.1.4 A stochastic process {x,}:, is cal
sequence if Xr is memurable wirh respect to Ci, and E [ X ,
'led a martingale d#èrence
is,-,] = o.
Remarks
1) Unless otherwise stated we will assume our martingales to be vector valued and
denote the transpose of X, by ~f .
2) If {X }* is a martingale difference sequence then the sequence is uncorrelated. That 1 t= l
is for al1 s and t such that s c t , E [ X , X, ] = E [ X , E [ X , 13,]] = E [ X , O] = O
3) If the sequence { x , } ~ is a martingale then a martingale difference sequence { T } ~ r=I r = l
can be formed by defming = Xr - Xr-, . Further the variance of the martingale is
In the following well known example we illustrate definitions 4.1.1 to 4.1.4.
Example 4.1.1 Let p be Lebesque measure, $2"" = 3"" be the Borel sigma
field of !Rn" and O = (0,l) x %+, where %+ is the set of positive real numbers. The
family of probability measures P"" is defined by the following version of the Radon-
Nikodym derivative,
Chapter 4. Estimation
1 &x, 4+, -A)= p ( ~ ; a , h ) =fi-- e - 1 - a
,=* JI;; 9
T where (aJ) EO and x=(x,,x, ,..., X,) EW"".
This is the Gaussian AR(1) mode1 defined in (2.1.2), with a ~(0 ,1 ) , 0' - 1 and
Â. Xo = - + where E is normal with mean zero and variance l/(l -a ' ) . Note that
1-a
defining X, in this way makes the unconditional distribution of X, normal with mean
h/( 1 - a) and variance I/(l -a ' ) . Note that #;a, A) is the likelihood, and the log-
likelihood is proportional to
The relative contribution of the last two tems of this expression to the log-likelihood is
small for large samples. Since our primary objective is to obtain asyrnptotic properties we
will cowider instead the following approximation to the log-likelihood,
t=i
The score h c t i o n associated with this log-likelihood is
Chapter 4. Estimation
It is useful to express Ln as
O-,, (a, À) =
Next we show that Un is a regular estimating fimction. It is well known that score
functions are martingales. It is easy to ver@ that Un is a martingale with respect to
T The martingale differences of the score are ut = Ur - Ut-, = (X,-,E , , E , ) , which have the
following variance,
Chapter 4. Estimation
This matrix is positive-definite since for any 2-dimensional vector I = (Z, ,&)'.
is positive except if 1, = 6 = O. The variance of Un is simply n time the variance of y
and is positive-defmite.
The partial derivatives of Un exist and are given by the following matrix:
negative except for the case were X, = X , =* X,, . Therefore Un is a regular estirnating
Chapter 4. Estimation
The solution of &(&, i) = O gives the following parameter estimates,
Definition 4.1.5 Let {m, }" be a sequence of martingale dzflerences. which generates r = l
the martingale Mn = Cr=, m, . ( M) = E [rn, rn: 1 3,_, ] is called the quadratic
characteristic of Mn and [ M ] ~ = En nt, m: is called rhe quadratic variation of Mn. 1=I
Remark
If Mn is a martingale with respect to 3, then [ M I , - ( M ) , is ako a martingale with
respect to the same sigma field 3,, .
Definition 4.1.6 A sequence {x,)~ is said to be stationary if/or everyjinite k the joint r= I
dishibution of the vector ( X , , X, -, , . . . , X, +, ) is independent of t.
Definition 4.1.7 A sequence { x , } ~ is said to be a -mUcing ifthere erist a sequence of r=l
positive numbers {a( t ) lm convergent to zero, such that r=I '
Chapter 4. Estimation
I P ( A ~ B)- P ( A ) P ( B ) ~ ~ a ( n )
for anyset A E ~ ( x ~ , X? ,..., X k ) , anyset B E ~ ( x ~ _ , , X k n - l ,...) and k > 1. n t 1 .
Definition 4.1.8 Let {x, }= be a stationary stochartic process defne on the probabil* r = l
n
space (Cl, 3, P ) . I f for every two sets A and B E 3, Ibn- l P(X, E A n X, E B) n+co
= P(4 E A ) P(X, E B) then the sequence {.Yt is called ergodic. r=t
Theorem 4.1.1 Let g be a measurable Jùnction onto %'and defne =
g ( X * , X , ,,,...J,, ) - Then
2) f {x,):, is a -miring a ( n ) = ~ ( n - ' ) for some r>O then { is a-miring r = l
a(n ) = ~(n-') ,
3) if {x, );il is ergodic then {r ):, is ergodic..
For a proof of parts 1 and 3 see Stout (1974, pp. 170, 182) and for a proof of part 2 see
White and Domowitz (1 984, Lemma 2.1).
1) Parts 1 and 3 hoId even whens is infinite.
Chapter 4. Estimation
2) If {x,}:, is stationary and a -rnixing then it is ergodic. The converse is not always
m e .
For a proof of the Ergodic theorem see Stout (1974, p. 18 1).
The following regularity conditions are needed to establish a central limit
theorem: Let {m,}" [ = l be a çequence of martingale differences, which generate the
martingale Mn = Cr=, rn, .
Theorem 4.13 y Mn is a one dimensional zero mean square integrable martingale
satisfiing conditions CI and C2 then
and ifcondition C2 is replaced by condition C2' then
Chapter 4. Estimation
J
[ MI:,% Mn + N(0,l) (mixingl.
A proof of this result can be found in Hall and Heyde (1980, ch. 3).
Remarks
1) Condition C 1 can be replaced by weaker conditions, see Hall and Heyde ( 1980, ch. 3).
2) Loosely speaking the regularity conditions insure that Mn is not dominated by a few
of the rn, 's and that the variance is standardized to unity.
3) I f the martingale difference sequence {rn lx is stationary and var[m,] < then t t=I
condition C 1 is satisfied.
4) If the martingale difference sequence {rn,}h, is stationary ergodic and ~ [ r n ' ] < ;r
then conditions C2 and C2' are satisfied with q L = 1 almost surely, and the result is
called asymptotic normality.
5) If q' is not degenerate the result is called mixed asymptotic nomality and the mode1
is called non-ergodic.
6) For the multivariate case, one simply checks that for al1 non-zero p-dimensionai
vectors Z the conditions for the univariate martingale I T Mm hold. In this case the
Cramér-Wold (1 936) device says that (4.1.4) and (4.1.5) hold for the vector Mn.
Example 4.1.2 In this example we show the estimating function in Example 4.1.1
Chapter 4. Estimation 53
satisfies condition C 1. For any 2 dimensional vector Z = (1, ,Z,)' we have Var Z u, = [ ' 1
I'E[u,u:]z, which we showed in Example 4.1.1 to be equal to 1,'
The variance of Z r & is sirnply n tirnes this. Substituting into condition Cl we get,
We defer veriQing conditions C2 and C2' to Example 4.1 S.
Theorem 4.1.4 (Continuous Mapping Theorem, C m Let Zn and Z be random vectors
fiom some sample space to k-dimensional Euclidean space 93 * . Furrher let g(-) be a
measurable finction fiom Rk to '31". We will allow g(-) ro be discontinuous at a set of
d zero measure points Dg. If Zn + Z and if P(Z E Dg) = O then the Continuous Mapping
d
Theorern says that g(Z, ) + g(Z) .
See McCabe and Tremayne (1993, Section 4.4) for a sketch of the proof and for many
illustrations of its use.
Remark
d d The following three cases of the CMT are widely used. If X, + X and Y, -+ c then,
Chapter 4. Estimation
and
We require the following additional regularity conditions for the next theorem.
P
C3) (0 .)yn (O) + O , for any 0 in a suitable neighborhood of e .
Theorem 4.1.5 Let Yn be a regular in ference fùnction and let 6, be a solution of
Y,, (0 ) = O . I f the regulariîy conditions CI. CZ. C3 and C4 are satisfied rhen.
The proof of this result is outlined in Godarnbe and Heyde (1987).
Remarks
1) In condition C3 a suitable neighborhood of 0 means one such that 6, is in the
neighborhood for al1 n greater than some nurnber N.
Chapter 4. Estimation 55
2) The result of Theorem 4.1.5 also hold when condition C2 is replaced by C2' and
(Y (0 ))" is replaced by [Y (0 )] .
The following regularity conditions, for the sequence of martingale differences
{y~ , } . lead to a simplification of Theorem 4.1 -5.
C5 For al1 s and r, ~ [ ~ , y ( ~ ] = E [ ~ , V : ] < ~ and E [ ~ . ] = E [ + , ] , where Y , is the
matrix of partial derivative of y, with respect to the components of 0 .
Remarks
1) Condition CS implies condition C 1.
2 ) Conditions C5 and C6 imply either conditions C2 or C2'.
3) If {Y ,) is stationary then conditon C5 holds.
4) If {y~ ,} is stationary ergodic, ~ [ b ,y ri] c m and E [ K , I] < a then conditons C6 and
C7 hold.
Definition 4.1.4 When condition CS holds for the regular estimating function Yn we
detine the variability rnatrix as ~ ( 0 ) = var[v ,(Q)] = E[V ,(eh :(O)]
Chapter 4. Estimation
Remark
A desired property of an estimating function is to give similar estirnates in repeated
samples. That is, we would like the function Y,, to not Vary much from sample to sample
or the variance of Y,, to be as small as possible. We therefore desire V to be as smail as
possible.
Definition 4.1.4 When condition CS holds for the regular estirnating function Yn we
d e h e the sensitivity matrix as ~ ( 9 ) = E[$ ,(O)].
Remark
Another desired property of an estimating function is that it should be easy to distinguish
between small changes in the parameters. The steeper the estirnating function is around
the true pararneter value the easier it is to distinguish and identiQ the pararneter. It is
therefore desirable for S(8) to be as "large" a s possible.
Definition 4.1.4 When condition C5 holds for the regufar estimating function Y,, we
define the Godambe information matrix as j = S ' @ ) Y - L ( 0 ) ~ ( ~ ) .
Definition 4.1.9 M e n estimation is based on the score and it satisfies the conditions for
a regular estimating function as well as condition CS, then we define the Fisher
information matrix as i = V = - S .
Chapter 4. Estimation
Remarks
1) The Godambe information cm be intzrpreted as a measure of the arnount of
idormation that can be obtained about the parameter Eom the estimating function.
2) Al1 the estimating functions considered in this chapter satise conditions C5, C6 and
3) Al1 of the estimating functions considered in this thesis are score or quasi-score
functions, that is Y, = L, , where L,,(x, ,x, ,. . . ,x,:0) is either a likelihood or a pseudo
likelihood. Since in this case Y~ will be a matrix of second partial derivatives of L, with
respect to 0 , S(0) will be a symmetric matrix. By pseudo iikeühood we mean when the
true likelihood is replaced by a simpler likelihood, which retains some of the properties of
the true likelihood. For example the Poisson AR(1) likelihood could be replaced by the
pseudo Gaussian AR(1) likelihood, which is much simpler. Recall that the Poisson AR(1)
mode1 and the Gaussian AR(1) mode1 have the same conditional mean, marginal mean,
autocorrelation function and partial autocorrelation fimction.
Coroiisiry 4.1.1 Let Y,, be a reguZar infirrence /unction and let 6 , 6e a solution of
~ ~ ( 0 ) = O . Ifthe regulan'ty conditions C3. C4. C5, C6 and C7 are satisfied then.
Proof From Theorem 4.1.5 we have
Chapter 4. Estimation
where Z a p-dimensional standard normal random variable. From the CMT we have
The left hand side is equivalent to n X ( 6 , -8), while random variable on the right hand
side has a multivariate normal distribution with variance covariance matrix
s T ( û ) v ( 0 ) ~ ' ( 0 ) = j".
Example 4.1.4 We show that condition CS holds for the Gaussian AR(1) mode1 in
Exarnple 4.1.1. In Example 4.1.1 we showed that
which is fmite and due to stationarity is independent of t . The expected value of lit is
given by,
Chapter 4. Estimation
which is also finite and independefit of t. Hence condition CS is satisfied. We will defer
showing that conditions C6 and C7 hold to Exarnple 4.1 S.
Ideally we would like the variation in 6 , to be as small as possible. That is we
want j-' to be small or the Godambe information to be large. The following regularity
condition dlows cornparison between the Godambe information and the Fisher
information. We denote the score function as Un (s;B) = pn (x;8)-' 5 pn (x;8) , where
CS The score function is a regular inference fünction and the order of integration and
different iation may be interchanged as follows,
T heorem 4.1.6 For n regular estimating function iuhich satisf es conditions CS through
C8 the Godambe information is always Iess than or equal to the Fisher information.
More specIf;caliy det(i - j ) 2 O.
Next we state a strong law of large numben for martingales and prove some
special cases which will be used later in the thesis.
Chapter 4. Estimation
Theorem 4.1.7 (Martingale sirong l m ) Let {Y,,3,) be a martingale dzflerence
sequence, where 3, = o ( Y ; , 5,. . . , ) . the o -field generated by Y ; , 5,. . . , y. I f there exist
(U
an r 2 I such that Ce ~ [ l l f ' ] / t l " < CO , rhen zkl i n +O alrnost surely as n + a. r = l
For a proof of this strong law see Chow (1960) and Stout (1974).
Remark
If there exists an M such that ~ [ q ' ] < M for al1 t then l n G O , çince r = l
Next in Theorems 4.1.8 and 4.1.9 we prove a strong law of large numbers which
only requires restrictions on moments. In such cases, Theorems 4.1.8 and 4.1.9 can be
used instead of the Ergodic theorem which requires showing the process is ergodic.
Suppose a stochastic process { x ; } ~ has the following moment retrictions: r = l
for al1 positive integers s and t and k = 1,2,--,p. Further suppose the conditional
expectation of X: given 3,-, = o (x, , x?, . . . , x,-, ) is a polynomial in X,-, of degree k.
That is, the conditional expectation c m be written as
Chapter 4. Estimation
where a,, t O .
Theorem 4.1.8 Let {x,}' be a stochastic process which satisjes the moment r =O
restrictions in (4.1.8) and (4.1.9). Then the s ~ o n g Zaw of Zarge numbers holdî for x:.
as
thatis fx x:+E[x:] a l m o s t s u ~ v a s n + m and k = 1 , 2 ; - - , p . r=1
1 Pro08 First note conditions (4.1.8) and (4.1.9) together imply E[x:] = -
1 - 4 0
{akl E[x:"] +ak2 E [ x : ' ' ] + . . + ~ , , ~ _ , E [ ~ Y , ] + a , ) .
We will proceed to prove the result by induction of k. Consider the case k = 1 .
Let = Xr - a,,X,_, -a , , and note that (y, 3, ) forms a martingale difference sequence.
Further satisfies condition (4.1.8) and hence satisfies the conditions for Theorem 4.1.7.
Therefore as n + a,
ar
which implies that Zn Xr -t $ = EIX,] . 1=1
m We now assume that for Z=1,2 ,..., k, where k < p , ~C:=,X:+E[X:]. Let
Chapter 4. Estimation
= x:*' -a ,+,. ,xf-il - ak+ , ,~ : - , - - a k + ~ . k X r - ~ - a k + i . t + i - Then Y , , } is again a
martingale difference sequence and satisfies condition (41.8). By Theorem 3.1.7 the
following surns converge alrnost surely as n + a,
The proof then follows by induction on k .
Theorem 4.1 -9 Wnder the assumptions of Theorem 4.1.8
almosr sureiy as n + m.
Proo j Note that repeated application of the conditional expectation assumption implies
that E [ X , 13,-,] = b j X t - j + c, , for some bj and ci, and this implies that
E [ X , - , X r ] = b j ~ [ ~ < 2 ] + c , E [ X t ] . Let Y, = X,- j (X, - b j X , - j - c i ) . { + is again
a martingale difference sequence and satisfies condition (4.1.8). By Theorem 4.1.7 we
have the following,
Chapter 4. Estimation
Remarks
<ff
1) Under the sarne conditions the more general result 1 In x:, X: + E[x:-, x:] can t= j
be proven using basically the same proof, only it is more cumbersome to write down.
2) We believe that these are new results.
3) The conditions for Theorem 4.1.7 are satisfied by the Poisson AR(1) model.
Example 4.1.5 We continue with Exarnple 4.1.4. Since the unconditional distribution of
X, is normal with rnean h/(l -a) and variance 1/(1 -a2) we have that for al1 s and t
E[X: ] = E[x,'] c a,, k = 42,. . Also the conditional expectation of X, given X,-, is
linear in X,-, . Hence we can apply Theorems 4.1.6 and 4.17 to get,
and
Chapter 4. Estimation
Tneorems 4.1.6 and 4.17 can also be applied to 6'0, (0 .) which gives us,
where the equality to S results because tir is independent of the parameten. Applying the
CMT we establish condition C3 as follows
Also since Ùn (0) = on (O, ) for al1 8 and 8, in the parameter space condition C4 is
trivially satisfied.
Since conditions C3 through C7 are satisfied Corollary 4.1.1 gives us the
following asymptotic result,
where i = V and appears in (4.1.7).
Chapter 4. Estimation
4.2 Conditional least squares for the Poisson AR(1) model
We begin by defining the family of statistical models. Let p be the counting measure,
fi' = G, where No is the set of non-negative integen, 3 be the set of al1 possible
subsets of an, and O = (0,l) x %+ . We d e h e the family of probability measures Pn by
the following version of the Radon-Nikodym derivative,
where
and a, h E O . This defines the Poisson AR(1) model given in (2.1.1).
In this section we consider estimation of the Poisson AR(1) model parameters by
the method known as conditional least squares (CLS). Klimko and Nelson (1978) and
Hall and Heyde (1980, ch. 6) consider CLS estimation and inference for stochastic
processes. The parameter estimates are selected to minunize the s u m of the squared
distances of each observation X, from its conditional expected value given the previous
observations X,, X , , . . . , X,-, , E[X, 13,-, ] = aX,-, + A. , where 9, is the standard filtration,
Chapter 4. Estimation 66
that is 3, =rr(Xo, Xi..., X , ) . The problem is equivalent to maximiring the following
function over the parameter space,
The fmt order conditions to this maximization problem lead to the following
Note this estimating function is the same as the score function for the Gaussian AR(I)
mode1 given in (4.1.1) and is therefore called a quasi score. The following are a direct
consequence of this equivalence:
1) fiom (4.1.2) the partial denvatives of the estimating function are,
2) this matrix is non-singular (except if Xo = X, =. = X, = O),
3) from (4.1.3) and (4.1.4) the solutions to Yn 0 = O are,
Chapter 4. Estimation
and
(Note that in cases were Û is negative we set it equal to zero)
4) Yn is a martingale with respect to Zn = o X,, X1 ,. . ., X,, , since the conditionai
expectation of X, given XI-, is the same in both the Poisson and Gaussian AR(1)
models.
Next we calculate the variance of y , , which is given by
Chapter 4. Estimation
Note that the above expectations are with respect to the Poisson AR(1) model. This
r matrix is positive definite since for my 2 dimensional vector 1 = l,, 4 . the quantity
is positive except if 1, = 4 = 0.
The expected value of Y, is given by,
S ince E [v ,y :] and E [q , ] are both finite and independent of t condition C5 holds and
Chapter 4. Estimation 69
we define the variability ma& and the sensitivity ma& respectively as V = E [ ~ I , y ~ :]
and s = E[+ ,].
The Godambe mformation and its inverse are given by,
h' 1 +a l ta + 2 h' l-a 1-a 1 -a 1 +a
h' a l - a + I + a h 1 -a
and
l-a
Note these explicit expressions for the Godambe information and its inverse are new.
Condition (4.1.8) is satisfied since the marginal distribution of X, is Poisson with
mean A/ 1-a and al1 moments for the Poisson distribution exist and are finite.
Condition (4.1.9) is also satisfied since E [ X , 1 x,-, ] = aY,-, + î~ . We can therefore use
Theorems 4.1.8 and 4.1 -9 to show the fo llowing:
as
1) n-l [Y]* -, V , which is condition C6,
QI
2) n-'Y,, + S , which is condition C7,
Chapter 4. Estimation
S ince 'kn a, h is independent of a and h condition C4 holds. Further condition
C3 is Y;' a,h Yn a,h =(n-'Y,, a,h )-ln-'!Pn a,h , which by the C m converges in
probability to zero.
Finally we can apply Corollary 4.1.1, since conditions C3 through C7 are
satisfied, to get
4 J Generalized Ieast squares for the Poisson AR(1) model
In this section we use generalized (weighted) least squares (GLS) to estimate the model
parameters, see Wooldridge (199la). In this method, parameter estimates are selected to
minimize the sum of the weighted squared distance of each observation X, fiom its
conditional expected value given the previous observation X,-, , E [x, 1 X,-, ] = aX,-, + h .
The weights are the inverse of the estimated conditionai variance
A A CI
V'~[X, 1 X,-,] = â, 1 - â, X,-, + A, , where â , and In are strongly consistent estirnates,
such as the conditional least squares estirnates. Observations with large conditional
variances are given smaller weightings than those with smaller conditional variances.
This problem is equivalent to rninimizing the following function over the parameter
space,
Chapter 4. Estimation
The fmt order conditions to this minimization problem lead to the following estimating
The sequence {Y , )LI , where v I = Y, - Y,-, , is martingale difference sequence
Therefore YI is a martingale with respect to 3, = o (x, , X, , . . . , X, , â , , h , ) .
The partial derivative of v, with respect to a and h are
Chapter 4. Estimation
This matrix Y,, = C:=,yi, is non-singular except if al1 of the Zr's are zero.
Notice that the matrix
is positive definite and hence the variance of Y,, is also positive definite. Y,, is therefore
a regular inference function. I f we assume that for dl n 1, > 6 for sorne 6 > O then
y ,y is dominated by the function
which has a finite expectation. By the Dominated Convergence Theorem we have
Chapter 4. Estimation
Fuaher this implies
Note that strong consistency of â, and h, is required to use the Dominated Convergence
Theorern. Similarly y , is dominated by the h c t i o n
which has a finite expectation. By the Dominated Convergence Theorem we have
We will therefore define the variability matrix V and sensitivity matrix S by
Theorem 43.1 Let Ym t e the inferencefunction defined in (4.3.1). Then the following
converge under the Poisson AR(I) model:
Chapter 4. Estimation
P n-9Jn -b s
Before we prove this result we need to show the following new result.
Proposition 43.1 I f the stochastic process {x, } follows the Poisson AR(1) mode[ with
a E 0,l , then it is a -mliing with a n = O ( a n ) .
Pro08 We let p, (x,,, 1 X , ) , p(X, / x,-, ) and p X, denote respectively the conditional
probability of XI+, given X,, the conditional probability of X, given X,-, and the
marginal probability of X, . These are respectively defined in (3.3.1), (4.2.1) and (4.2.2).
We denote the joint distribution of X,, X2, . . ., Xk by,
the joint distribution of X,,, , X,,,,, ,. . . by
and the joint distribution of X I , X2 ,. . ., X,, X,+, , X,,,+ ,,... by
For any A E<T X,, X2 ,. . ., Xk and any B = a X, , , X,,,,,. . . we have the following,
Chapter 4. Estimation
x-1-s 1-a* 1-a"
-* -- mtnts-I.Y, -il
= e 1- + a u 5 [ n , ) ( . n , ~ ( 1 - ~ n , . ~ k - ~ - s ~
x ! (x - 1 - s ) ! x-1-s
1-an 1-an 1 -& - mtn(x - l ,X , -1) X, - 1 . 1-a'
- ( 1-a ) +anxk (s+ il( ) ( a n ) f ( l - a n ) * ' * - l - s e l-
x! (x - 1 - s)!
min(x-l.X& -1) X , -1 x, the summation on the right is bounded by s + 1 , which is less than
s u 2
Hence we have
Chapter 4. Estimation
where 0 5 a. 5 a . Therefore
and
Chapter 4. Estimation
Therefore {x, } is a -mking with a n = ~ ( a " ) .
ProoJ (Theorem 4.3.1) We begin by introducing some notation. Let
and
n
s; =Cr', 1 = 1
P Showing (4.3 -6) is equivalent to showing + Y .
Since {x,}~ r=i is stationary and a -mixing with a n = ~ ( a ") by Theorem 4.1.1 {r}:, is also stationary and a -mixing with a n = ~ ( a " ) . Further {x,}: and { y )Li are
r-1
Chapter 4. Estimation
both ergodic.
Since Y, and q are both positive and Y, I we have that the variance of
n-'Sn is less than or equal to the variance of Ân'n-'~;. By assumption n, 5 A and by the
Ergodic theorem n "< 5 E[T] . Hence Âi2n-'< converges almost surely and therefore
its asyrnptotic variance is zero. Chebyshev's inequality cm then be used to show that
P n-'s,, - E[~-'s,] + O . Since lim,,, E[Y,] = V we therefore have the desired resuit,
P nq'Sn -t V . The proof of (4.3.7) is sirnilar and is omitted.
P Condition C2' is satisfied since we have shown that n-'[~]~ +V and
lim,,, n-'~ar[~,,] = V . Condition Cl is also satisfied since lim,, r n a ~ , + ~ ---a) ,]
= V . Condition C4 is satisfied since [yn a, h ]-' qn (6 .,in) = I , . Finally condition C3 is
satisfied since
which by the CMT Theorem 4.1.4 converges in probability to zero. We can therefore use
Theorem 4.1.5 which implies
Chapter 4. Estimation
By the CMT Theorem 4.1.4 the following holds
The following theorem gives an approximation for V .
Theorem 4.3.2 Thefirnction V in 4.3.5 is equivalent to
1
a ' l - a a l - a
where
and R c h6 1 -a 1-a
Chapter 4. Estimation
Proofi Making use of the following two identities,
we cm rewrite the information rnatrix as follows,
To approximate the expectation above we consider the following function,
Chapter 4. Estimation
th The k denvative of g x with respect to x is,
and a Taylor expansion about the point h/ 1 - a is,
where
R 5 wili be largest in absolute value when 5 = 0. Now we can approximate the
expected value of g X,-, .
where the error in the approximation is negative and bounded in absolute value by
Chapter 4. Estimation
4.4 The score and Fisher information for the Poisson AR(1) model
h this section we derive expressions for the score function and observed Fisher
information for the Poisson AR(1) model. n i e conditional likelihood given X,., is
L(a,h) = n:,lp(~,I~,-,), where ~ ( X , I X , _ , ) is defined in (4.2.1.).
Let Ua be the score with respect to a , that is
where l(.) is the log-likelihood. The partial denvative of the conditional probability is
found by making use of the following denvative
and the relation
Chapter 4. Estimation
Thus the score with respect to a is,
Let U, denote the score with respect to h , that is
The partial derivative of the conditional probability is found by making use of the
following denvative
Chapter 4. Estimation
Thus the score with respect to h is,
The second derivative of the conditional probability can now easily be found.
x x-1 + - { - ~ p ( ~ 1-a 1-a - 2 x -2) - p ( y - l lx - l)}
and
Chapter 4. Estimation
The second derivatives of the log-likelihood are then,
and
Chapter 4. Estimation
Next we show how to write the h t and second derivatives of the log-likelihood
as conditionai expectations. To make these conditional expectations more clear, we
consider the following example with just 2 random variables. Let X and Y be
independent randorn variables and denote their densities respectively as f , ( x )
and f,(y) , where the densities are with respect to the measure v (Lebesgue rneasure or
counting measure). Let Z = X + Y be the convolution of X and Y . The joint distribution
of Z and X is f, ( x ) f, (z - x) (note the Jacobian for this transformation is 1). The density
for Z is found by integrating out X.
fAz , =
The conditional density for X given Z is
The conditional moments for X and Y are then,
Chapter 4. Estimation
and
Proposition 4.4.1 For the Poisson m(I) mocie2, with conditional probabilities
where Er[- ] denotes the conditional expectation with respect to the sigma field
3, =4x09x,, ,Xt-IXr)*
Chapter 4. Estimation
Proofi We k t show (4.4.4). Rearranging (4.4.1) we see that
Next the lefi hand side of (4.4.5) is
which is the right hand side of (4.4.5).
Next the left hand of (4.4.6) is
Chapter 4. Estimation
m" XI. -LX,- [ -3 e - À k X t - ' - I
PW,IXI- , ) ==O X (X , - 2 - x ) ! min( Xt.-2.Xl-L-2)
e-À hXt -'-" C X r - 1 (Xr-1- 1) ~ ( X t l X t - 1 ) x=o (x,-, -2 -x)!x! (xr - 2 - x)!
min(X,.-2.X,-, -2)
P(XI I X I - , ) .=-2 (X t -2-x)!
which is the right hand side of (4.4.6).
Next the left hand side of (4.4.7) is
Chapter 4. Estimation
=-À 1.q - 1 -1 - -
~ ( X r l X r - i ) ..O (x[-, - 1 - x)! x ! ( X I - 2 - x)!
min' X,.-2.X,-1 - 1 ) - - (*Y + l ) ( x r - i ). .+' ( 1 - a )*- -'-x e-' hXr -'-"
P ( X , I x r - , ) .r=-1 x+l (xr - 2 - x)! =-k LX, -$
- - X (x, - 1 - x ) !
min( Xt.-l..Y,-i 1 =-L x X r -I
- - P ( X , I x t - 1 ) 5 -r(x~')I(i-a)~r-'-r(~I--r) (x, -x)!
which is the nght hand side of (4.4.7).
Finally the lefi hand side of (4.4.8) is
e - h h x , -2-1 - -
X (x, - 2 -x)!
- - ~ ( 1 -a)x~-l-x{(xr - x)2 - (xt - e-'hXr "
p(XrI XI-, (XI - x)!
Chapter 4. Estimation
which is the right hand side of (4.4.8).
Proposition 4.41 leads to the following new expression for the h t and second
denvatives of the log-likelihood:
- 1 crlc - 2 {a [(a O Xt-i)&, ] - Et [a O X I - 1 l ~ r [& r l }
ha (1 -a) ,.,
and
Note that Er [a x,-, ] * a Xr-, , Et [e t ] * E , , Et-l [a O X,-, ] = aX,- , and
Chapter 4. Estimation 92
Et-l [E , ] = ;I . Further note that given X, the correlation between a 0 XI-, and E , is - 1.
This follows because
In the cases were the tirne series is comprised of low counts the expected
information is easy to calculate numencally. For example if the probability of a count
larger than 6 is ahnost zero then there are 36 outcomes to sum over in the calculation of
E[E, [a 0 XI- , 1'1 or E[E, [E , 1' ] . Hence the expected information is easily to calculate.
The above representation of the score function leads to a new defuiition of
residuals. The usual way to define a residual is to take the difference between a random
variable and its conditional expectation. That is, for the Poisson AR(1) mode1 residuals
can be defined as r, = X, - E,-,[x,] = X , -aXt-, - h . However since Xr is compnsed of
two random components it would be nice to define two sets of residuals, one for each
random component.
The natural way to define such residuals is as follows: for the continuation
component let r,, = a 0 XI-, -&,-, and for the arrival component let 5, = E , - A.
Unfominately these definitions won? work, because a 0 XI-, and E , are not observable.
Chapter 4. Estimation 93
However we can replace a 0 Xr-, and 6 , respectively with ~ , [ a 0 x,-,] and Et [&, ] (their
conditional expectations given the observed values of 4-, and X,). That is, redehe the
residuals as r,: = E,[r , , ] = E,[a 0 X r - , ] - a X , - , and ri = ~ [ r ? , ] = E l [ € , ] - A.
Remarks
1. These conditional expectations are easily caiculated with the help of Proposition
4-4.1.
2. We are using the aggregate data to estirnate the individual unobserved processes
(continuation process and arriva1 process).
3. The residuals are equivalent to the martingale differences of the score fùnction.
Further note our new expressions for the score fùnction lead to these new definitions
for residuals, which otherwise would not have been obvious.
4. Adding the components of the two new sets of residuals gives the old set of residuals.
That is,
5. All the residuals above should be standardized before making residual plots.
6. Since the two sets of residuals are calculated from aggregate data they might be
Chapter 4. Estimation
highly correlated. This appears to be the case in Example 4.4.1. For low count senes
(series with many zeros) there appears to be less correlation, see the residual plots for
data set 1 *, Figure 8.2.1, and the discussion of the residuals in Section 8.2.
Example 4.4.1 Ln this example we continue our analysis of our illustrative data set by
examining the residuai plots. Figures 4.4.1 and 4.4.2 are respectively the residual plots
for the continuation and arrivai processes, while Figures 4.4.3 and 4.4.4 are respectively
the autocorrelations respectively for the continuation and arrivai residuals.
Both residual plots have patterns similar to the Pearson residual plot in Figure
2.5.4. None of the autocorrelations in the continuation residuals are statisticaily
significant at the 5% level. The second autocorrelation for the arrival residuals is on the
border line significant at the 5% level.
Residuals: Continuation Process (heavy manufaduring industry, males, ages 25-34, burns)
k ' ~ - - - = - , l Figure 4.4.1 Raidual plot of the continuation pmcess for the tirne series of rnonthly d a i m couna of workers collecring STWLB/iom Jantiury 1987 fo Decernber 1994. A11 claimants are mule. berween the ages of 25 and 34. are employed in the heavy manufacruring indusrty and are collecting STWLB due to a burn related injury.
Chapter 4. Estimation
-
Residuals Arrivai Proces (heavy manufacturing indu*, males, ages 2544, bums)
F
e Figure 4.4.2 Residual plot of the arrivai procas /or the rime series of rnonthly clairns corn& of workers STWLB /mm January 1987 to December 1994. Ail claimanfi are male. between the ages of 25 and 34. are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury.
Autocorrelations of the Residuals: Continuation Process (heavy manufaduring industry, males, ages 25-34, burns)
i: Figure 4.4.3 Aufocorrelations in the continuation residuals for the rime series of monthly claim counts of wor collecting S W L B fmm Januaty 1987 to December 1994. Ali claimanu am male. benoeen the aga of 25 and 34, are employed in the heavy manufacnrring industry atad are collecting STWLB due to a bum related injury.
Autocorrebtions of the Residuals: Arrival Process (heavy manufacturing industry, mies, ages 25-34, burns)
Figure 4.4.4 Autocorrelations in the am'val residuals for the t h e series of monthly d a i m counfi of w o r k . collecting STWLB /rom January 1987 to December 1994. Al1 claimanrr are male. berween the ages of25 and 34, are employed in the heavy manutcmring industry and are collecting STWLB due IO a burn related injury.
Chapter 4. Estimation
4.5 The score and Fisher information for a generd AR(1) mode1
We now show how to extend this to a general AR(1) model of the following form. Let
= O Xt-l where a 0 XI-, 1 X,-, has density f (SIX,-,;a), E , has density &;A)
and X , 1 XI-, has density
The conditional density of a 0 X,-, 1 X,-, , X , is
Let 3, = o (X, , XI,. . . , X, ) . The conditional moments of a 0 X,-, 13, and E , I 3, can be
expressed as follows.
Given a sample x,, x,, . . . , x,, fiom the above AR(1) model the likelihood is the product of
Chapter 4. Estimation 97
we take h o ) = 1 The log-likelihood and score are respectively then
!@,A) =En i=I log(h(x,lx,-,;a,A)) and
a a If -/(SI x,-, ;a ) and - g(x, - s; 1) are polynomials of the following fom,
da 33c.
a -f(s[x,-, ;a) = (a, + a,s t a2s2 +a-+a,sp)f (SIX,_, ; a ) , da
and
d -g(x, -s;A) = (ao +a,(x; -3)+a2(x, -s)'+--.+a,(x, - s ) ~ ) ~ ( x [ -s;h) d h
then the score can be rewritten is tenns of conditional moments as,
Note that we can write
Chapter 4. Estimation
and
by simply defining r ( s ; a ) = - a /If and y ( x t - ; Thus the score c m be aa. written in terms of conditional expectations as,
The second derivative of the log-likelihood are,
and
a2 --;-g(xr - s ; h ) = y &, -s ;h)g(x, -~;A)+(Y (x, - s; A))' g(xr - s; A). ah-
Using these two equations we can rewrite the second denvatives of the log-likelihood as,
Applying these results to the AR(1) models of Joe (1996) is a bit more
complicated, since Joe allows the distribution of a 0 Xr-, to depend on A. If we define
a 6 (-) such that - f (SI x,-, ;a, A) = S (s; A) f (SI x,-, ;a, A), then the score function with
ah
respect to )c becomes
Chapter 4. Estimation
Similar modifications can be made to the expressions for the second derivatives of the
log-likelihood.
In the following three cases a 0 XI-, is independent of h .
1) The Poisson AR(1) model, see Section 4.4,
2) The Gaussian AR(1) model, with a 0 4-, - aX,-, and E , - ~(h,o'). This is slightly
different than loe (1996), who lets a 0 X,-, IX,-, - N ( ~ x , - , ,aa ') and E , -
N((h, (1 -a )oz). Note however that both models are equivalent.
3) The generalized Poisson AR(1) model, where X, has a generalized Poisson
distribution with parameters h/(l -a ) and k/(l -a). Note Joe ( 1996) allows the two
parameters to be different.
In these 3 cases the functions r(s) and y (s) are polynomials in s, and hence the score
fiinction and observed Fisher information cm be written in terms of the conditional
moments of a 0 4-, and E , .
4.6 Asymptotics of the conditional maximum iikelihood estimators for the
Poisson AR(1) model
In Section 4.3 we showed that the Poisson AR(I) model is a -mkhg. Since it is also
Chapter 4. Estimation 101
stationary the mode1 is ergodic. If the Fisher information is fhite and positive dennite
then the regularity conditions C b C 3 and C5C7 hold. The next theorem shows that the
Fisher information is f i t e .
T Theorem 4.6.1 Let u, = Lr, -U ,,*- u, = CI, -U,,-, and ut = (u,,%) , where U,
and CI, are the scorefwictions for the Poisson M(I) respectively with respect to a and
h. Further, let Ut denote the mat+ ofpartial derivatives of ut with respect to a and h
and let ut = x:&. For any rwo dimensional vector l and any positive inreger k,
Pro03 For the Poisson AR(1) mode1 we note the following:
for dl positive integers k and 1. Also note that, for any two dimensional vector 1,
is a polynomial in Et [a 0 x,-,] and E , [ E , ] of degree k, which we write as
Chapter 4. Estimation
k for some constants a, ,a, ,- ,ak. The expected value of { I ' u , ) is
The proof for the second part is automatic rince E [([TU, )' ] = E [ZTu, url] = E[Z 'zi, 11.
The next theorem shows that the Fisher information is positive definite.
T Proposition 4.6.1 E = O ifand only if I, = 4 = O . where 1 = ( Z , ,& ) .
ProoJ It is sufficient to show that ~ar[1, E, [a 0 x,-,] + Z, E, [E, ]] # 0, or that
~ a r [ l , a ~ , - , ~ ( ~ , - 11 x,-, - 1 ) + Z&(X, - 11 x,~,)] * O when I is non-zero. We prove the
result by contradiction.
Suppose there exists a non-zero 1 such that ~ i z r [ Z , c l ~ , - , ~ ( ~ , - 11 x,-, - 1)
Chapter 4. Estimation
+z&(x, - 11 x,-, )] = O . This implies that
almost everywhere for some constant c. in particular (4.6.4) holds when X, = 1 and
X,-, = O which gives us
or 4 = ch-lei . Taking X, = 1 and X,-, = 1 in (4.6.4) we can solve for Z, as follows
or 1, = cei . Finally taking X , = 1 and X,-, = 2 in (4.6.4) we have
This implies either a = 1 , a = 2 or c = O. Since a = 1 aad a = 2 are outside the
parameter space we conclude that c = O , however this implies that Z, = l2 = 0.
Chapter 4. Estimation
It can be shown that the regularity condition C4 follows from the uniform strong
law of large numbers, see Ferguson (1996, Ch. 16). Therefore the regularity conditions
for Corollary 4.1.1 are satisfied, and as a result the maximum likelihood estimators &,
and i,, have the following asymptotic distribution:
where i is the Fisher information ma&.
The parameter estimates c m be found using a Newton-Raphson type iterative
procedure as follows: let U(0) be the score function and let Ù(0) denote the ma& of
partial denvative of U with respect to 0 ; where 0 =(a,k)'. The iterative procedure is
defined by
In some cases this procedure can be rnodified by replacing l?(êtr') with q~(6" ' ) I and is
sometirnes referred to as Fisher Scoring.
We use Fisher Scoring to estimate the parameter in the Poisson AR(1) model,
since I@(ê('))] is easy to calculate. The CLS estimates generally work well as starting
values. Occasionally (less than 1% of the t h e ) we found that the CLS estimates caused
the algorithm to diverge.
Example 4.6.1 The maximum likelihood estimates for our illustrative example are
Chaprer 4. Estimation
6 = 0.40 and h = 52 . The inverse of the expected Fisher information ma& at these
parameter estimates is given by
4.7 Cornparison of methods
Al-Osh and Alzaid (1987) used a simulation to compare the following three estimation
methods for the Poisson AR(1) model: Yule-Walker, CLS and C a . They note that the
algebraic formulas for the Yule-Walker and CLS estimates are almost identical and is
also seen in their numencal comparisons. Their results also show that there is only a
small eficiency gain in CML when a alpha is small, Say less than 0.3. However for
larger values of a alpha, Say between 0.5 and 1, the efficiency gain in CML seems well
worth the effort.
Our analysis differs in that we look at the asymptotic efficiency (AE). That is we
consider what happens as the sampie size goes to infinity and our results are not based on
a simulation. We let 9, be an estimate of 0 and denote it Godambe information by j and
its inverse by j-'. Let ji' be the k, k element of j-' and let i;' be the k, k element of the
inverse of the Fisher information m a h . Cox and Hinkiey (1974) define the AE of the
Chapter 4. Estimation
Figure 4.7.1 The qmpcoric eflcienc-v ofcondirional leur squares as ajiuicrion of a when h = 1 .
Figure 4.7.2 The arymproric eficienq of condirional leasr squares as afûncrion of A when a = 03.
Figure 4.7.1 shows that as a goes to zero the AE goes to 1, that is for srnall
values of a there is litîle ciifference between CLS and CML. However as a goes to one
the AE goes to O, that is there is a substantial advantage in using CML when a is large.
Figure 4.7.2 shows that as h increases the AE in estimating a rises while the AE in
Chapter 4. Estimation 107
estirnating drops. Ln both cases the AE appears to be approaching a lirnit of about 0.83.
For GLS estimation we can use Theorem 4.3.2 to h d bounds on the Godambe
mforrnation. Unfominately we found no Uicrease in information in moving nom CLS
estimation to GLS estimation. Tables 4.7.1 and 4.7.2 show that the CLS information is
contained within our bounds for the GLS information. Further the upper bound on the
GLS information is not significantiy closer to the CML information.
Gf S. CLS and when and
Next we tested our estimation methods on some rnisspecifed data. We simdated
200 series of length 100 using binomial thinning with parameter a = O 5 and
rnisspecifiing the arriva1 process b y letting the distribution of E , be uni f o m over {0,1,2}.
The resulting sampling distributions of â and h for the estimation methods CLS, GLS
and CML are summarized by box plots in Figures 4.7.4 and 4.7.5.
Figure 4.7.3 shows that CML estimates for a are strongly biased, in fact almost
Chapter 4. Estimation 108
al1 of the CML estimates for a are greater than 0.5. In contrast the CLS and GLS
estimates for a are oniy slightly b i a s d Note that the interpretation a has not changed,
that is a is the probability that a claimant in the current period continues to be a claimant
in the next period.
In Figure 4.7.4 we see that CML estimates for h are biased estimates of the mean
of E, which is 1, in fact almost ail of the CML estimates for j\. are smailer than 1. In
contrast the sample mean of the CLS and GLS estimates for A are very close to 1. We
can thuik of k as the mean parameter of E , , however it is not used to speciQ the
distribution of E , . Estimates of A. are therefore estimates of the mean of E , .
Box Plots (estimates for alpha)
Figum 4.7.3 Box plorr comparing the sampiing dLFtriburions of 6 when the am'val procas {E , } ir uni/onn over
(0.12) -
V) y k Z
E O : .- CI
W b 2{ c.
E 2 2 Y:
O _
Y - O
O a I 1 1 1
CLS GLS CML Estimation Method
Chapter 4. Estimation
Box Plots (estimates for lambda)
I I I 1 CLS GLS CML
Estimation Method
Figure 4.7.4 Box plors comparing ihe sarnpling disrribuiions of A, when the anival pmcess ( ~ , } i s unlform over
{O,[ 2).
Chapter 5
Testing for independence
In this chapter we consider testing if the thinning parameter a in the Poisson AR(I)
mode1 is zero, that is testing for independent observations. We have a strong belief that an
AR(1) mode1 is appropriate for WCB data and in such cases it may not be necessary to
test for independence. However the test for independence is interesting fiom a
mathematical point of view and would be necessary before fitting the mode1 to other data
sets, which may not be interpretable as a queue.
The first section examines what would happen if one blindly went ahead and used
the standard Gaussian AR(1) asymptotic results. The next section bases inference on the
conditional least squares estirnator for a . As noted in section 4.3 this estirnator is the
same as the Gaussian estimator for a , however this time the asymptotic properties of the
estirnator are worked out under the Poisson rnodel. We then derive a score based test and
show that it is asymptoticafly equivalent to the CLS based test. Finally we examine the
score function in detail about the point a = O , leading to non-standard Wald and
likelihood ratio tests.
5.1 Gaussian AR(1)
Suppose we decided to ignore the fact that our tirne series is integer valued, and used the
Chapter 5. Testing for independence
usual Gaussian AEt(1) model. In Example 4.1.5 we showed that &(â, -a) is
asymptotically normal with mean zero and variance L -a ' . Under the null hypothesis,
Ho :a = O , &â , is asyrnptoticaily standard normal. We reject the null hypothesis when
&ân is larger than the critical value 6 , , where 6 , is selected so that the probability of
committing type 1 error is 6 (the significance level of the test). That is, under Ho :a = 0 ,
~ ( & â > 6 , ) = ~ ( 2 , > Cs ) = 6 , where Z denotes the standard nomal random variable.
Note this is a one sided test and that a negative estirnate for a would lead us to accept
the nul1 hypothesis.
The power of the test is the probability of not comrnitting a type II error, that is
the probability of rejecting &:a = O when H,:O < a c 1 is mie. The power of the test
under the Gaussian assumption is,
In the next section we continue by using the correct asymptotic distribution for && .
5.2 Conditional least squares
A more sophisticated test for the hypothesis is based on the conditional least squares
method. In section 4.2 we showed that the CLS estirnate for a was the same as the
Gaussian baçed estimate, but with a larger asymptotic variance, 1 - a + a(l - a)2 /h .
Chapter 5. Testing for independence 112
However under the nul1 hypothesis both variance expressions reduce to 1 and hence we
would use the same cntical value for both tests- Another way to put this is that the
significance level of the test in section 5.1 is correct. However the power function in
section 5. L is incorrect under the Poisson AR(1) assumption since the variance used is too
small. The difference between the Gaussian and Poisson based variance is a(l - a ) ' / ~ , ,
which is small when h is large or whena is near zero or one.
To help assess this effect on the power fünction we consider two graphs of the
power function: fmt the power as a function of a , and second as a function of h .
Power of Gauçsian vs Poisson Test (function of alpha)
Figure 5.2.1 A cornparison of the power for the Gaussian and Poisson bas& tesu as a fitnction of OL. = 1 and n = 100.
Figure 5.2.1 show that for a between 0.01 and 0.16 the Gaussian based power is
understating the "true" power and for a larger than 0.16 the Gaussian based power is
overstating the '‘truc" power. However overall the error in the Gaussian based power
Chapter 5. Testing for independence
seems small. For example if the tnie model is the Poisson AR(1) with a = 030, h = 1
and we perfomed the test on a sample of size 100, then the probability of rejecting
&:a = O is 90.0% and not 93.2% as given by the Gaussian based power.
Figure 5.2.2 shows the following: Gaussian based power is independent of A., for
extremely small values of A. the "tnie" power based on the Poisson AR(I) model is large,
but as )c increases the %ueYy power quickly converges to the Gaussian based power. For
example if the tme model is the Poisson AR(1) with a = 0.01, h = 0.1 and we perfomed
the test on a sarnple of size 100, then the probability of rejecting &:a = O is 8.0% and
not 6.1 % as given by the Gaussian based power.
Power of Gaussian vs Poison Test (hinction of tam bda)
Lambda
Figrue 5.2.2 A cornparison of the p o w for the G w i a n and Poisson based teru as a fiction of ic . a = 0.0 1 and n = 100.
Chapter 5. Test ing for independence
53 Score test
In this section we derive a test for a = O by considering the distribu
fùnction when a = 0.
ition of the score
We begin by noting the following simplification for the conditional probabilities,
equation 4.1, when a = O , p(X,I X,-,) = e-'hxf /x, !. That is, the series is iid Poisson.
Making use of this simplification the score function (4.4.1) reduces to,
Under Ho :a = O, Ua (O, l ) is a zero mean square integrab le martingale. Its
quaciratic characteristic and variance are both equal to n(1 + h) . Since conditions for the
as
strong law hold, fiom Theorem 4.1.8 and 4.19, we have that n -' [O, (O, h)In +(1+ À) .
The martingale differences each have identical moments, and al1 of these moments exist
and are finite. This insures that the Lindeberg condition trivially holds. Thus the
conditions for the central limit theorem are satisfied and the distribution of the score is
To find the distribution of the score under the alternative hypothesis H,:O < a c 1
we rewrite the score,
Chapter 5. Testing for independence
The first tem in the score is a zero mean square integrable martingale. Applying the
strong law of large numbea we get,
Therefore under the alternative hypothesis n'% U= (0, A) will diverge since it has no
mean correction. This impiies that the test is consistent or that asymptoticdly the nul1
hypothesis is rejected with probability one when the alternative hypothesis is me . Also
note again that the test is one sided, that is large positive values of the score lead us to
reject the nul1 hypothesis.
In practice we replace the nuisance parameter h with an estimate. Under
H,:a = O the maximum likelihood estimate for the Poisson mean parameter is the
sarnple mean h = X . Plugging this estirnate into the score fûnction we get,
The second terni is O, (1) and the first term ignoring the 1/X is a zero mean square
integrable martingale with variance h2 . Again the centrai limit theorem holds and
Chapter 5. Testing for independence
d
together with the continuous mapping theorern we get n'Kx(O,i)+N(O,l) .
Altematively we could have found the distribution of the score by noting that,
where â, is the conditional least squares estimator for a. Since both and
1 -z:=l(~,-, -K)' converge in probability to h under the nul1 hypothesis and to n
h/(l -a) under the alternative hypothesis, the continuous mapping theorem implies that
1 and &ân both have the same asymptotic distribution. The difference Jn
between the score test and the conditional least squares t-test is in the estimate for h in
the denominator of the statistic.
To summarize, the test based on the Gaussian mode1 has the correct significance
level, but the wrong power. The error in the power, however, is not large. The test based
on the score is asyrnptotically equivalent under alternatives to the conditional least
squares t-test. So f?om the point of view of testing for independence it is interesting to
note that naively using the standard Gaussian test gives the correct result except for a
small error in the test's power.
5.4 The Score function on the boundary of the parameter space
In this section we take a detailed look at the score function about the point a = 0, and
although th is is a boundary point of the parameter space the score is well defined and
Chapter 5. Testing for independence
behaved.
Recall the interpretation of a is the probability that an individual in the systern at
t h e r - 1 remains in the system at time t . In such a case values of a less than zero
would not make any sense. However the probability function p(x;a ) = a' (1 - a)'-' , x=O,
1, is well defined for any real number a and its denvatives exist and are continuous.
Sirnilarly, even though the Poisson AR(1) likelihood is not defined for negative
values of a , we can still examine the function's mathematical properties at the point
a = O. To get an idea of what the Iikelihood looks like at a = O we simulated ten data
sets and plotted the likelihood as a function of a . Figure 5.4.1 shows five plots of the
likelihood for simulated independent and identically distributed Poisson sarnples with
mean 1, which correspond to the case where a = O. For cornparisons we have also
included five plots for the case were a is near zero in Figure 5.4.2.
alpha
Figure 5.4.1 Plots o,fthe likelihood as afwlction ofa forfive sirnulated samples ofsire 200 wirh a = O and A = 1.
Chapter 5. Testing for independence 118
alpha
Figure 5.4.2 Plors of the likelihood as a/iurction o f a for jve simulated samples of size 200 wirh a = 0.1 and A = 1.
in both figures the likelihood plots are smooth with a unique maximum. These plots do
not m w e r the question of whether or not the score function is well behaved about a = 0,
but suggest that the derivative of the likelihood about a = O may exist and be continuous.
Also note, the maximum for one of the simulation senes in Figure 5.4.1 occurs at a point
where a is negative.
We know that the score is defined at a = O, we even gave an expression for this
in section 5.3. However is it deked for a < O and if so how does it behave? Recall that
the conditional probabilities p(X, 1 X,-, ) , equation 4.1, are polynornials in a . One might
think that a negative value for a could cause the conditional probabilities to become
negative. To consider this possibiiity we rewrite the conditional probabilities as follows
The first term is positive and the summation is less than one in absolute value. Therefore
Chapter 5. Testing for independence 119
given a value 6 > O it is possible to find a neighborhood X, of a = O such that the
. I X,-, ) being non-positive is less than 6 .
1 X f - , ) is a polynomial in a , the denvatives of p(Xf 1 X,-, ) with
respect to a exist and are continuous. Further derivatives of the log(p(X, 1 X,-, )) with
respect to a will be the ratio of two polynomials in a , with the polynomial in the
denominator being positive for a in some neighborhood of zero. Hence the derivative of
log(p(Xl J XI- , ) ) with respect to a exists and is continuous for a in some neighborhood
of zero.
Before calculating the Fisher information matrix we note the following
simplification when a = O .
Substituting this into Equation 4.4.3 we find the observed Fisher information.
Under the nul1 hypothesis the expected information is
Chapter 5. Testing for independence
In a sùnilar manner we h d id = 1 and i, = l/A. The inverse of the Fisher information
Let an be the value of a that maximizes the likelihood when the search for a is
not restricted to (0,l). We cal1 an the likelihood maximizer to distinguish it f?om the
maximum likelihood estimate â n , which must lie in the parameter space (0,l). Under the
nul1 hypothesis an converges in probability to a standard normal randorn variable and
&â converges in probability to a random variable Z', where 2' is defmed as,
where Z has a standard normal distribution.
We cm define the Wald statistic in the usual way, bY, = n(ân - o ) ~ . If the
maximum likelihood estimator â is replace by the likelihood maximizer an then the
Wald statistic has the usual chi-square disiribution with one degree of fieedom. However
Chapter 5. Testing for independence 121
for the maximum likelihood estimate the Wald statistic converges to a modified chi-
square random variable defined by,
where x , has a chi-square distribution with one degree of fieedom.
We also define the likelihood ratio statistic in the usual way, that is 2 Log@),
where A = L(&, ,in; x)/L(o, F,,; X ) and 6, and in are the maximum likelihood
estimates and Zn is the sample mean. Our analysis at the beginning of this section
showed that the derivatives of the log-likelihood with respect to a exist and are
continuous at a = O. It is therefore possible to make the usual Taylor senes expansion of
the log-likelihood about a = O and show its asymptotic equivalence to the Wald statistic.
Example 5.4.1 Ln this exarnple we test o u illustrative data set for independence. Table
54.1 shows the test statistic values as well as the 5% and 1% critical values. In d l the
cases the nul1 hypothesis of independence is rejected.
Test Statistic Value 5% Critical Vaiue 1% Critical Value 1
CLS 4.66 1.645 2.33
1 1
Table 5.4. J Tests for independence in the iliustrative data set.
Chapter 5. Testing for independence 122
To assess how weil the tests for independence work for smail samples we apply
them to two sets of simulated data. For both sets of data we generate 1000 series of length
200. In the h t set we let a = O and h = 1 and in the second set we let a = 0.1 and h = 1.
Table 5.4.2 show the percentage of time the nul1 hypothesis of independence was rejected
by each test for the first set of data, while Table 5.4.3 show the percentage of time the
nul1 hypothesis of independence was rejected by each test for the second set of data.
Test
I 1 1
Table 5.4.7 The percenrage of rime the nuil hypothesis of independence was rejecred out of 1000 simulared series of lengrh 200 wirh a = O and ii = 1 .
CLS
Our Score
Percentage of rejections at the 5% level
Percentage of rejections at the 1 % levei
4.2%
4.3%
Test
I I 1
Table 5.4.3 The percentage of rime the nul1 hyporhesis of independence was rejected out of IO00 simulared series of lengrh 200 wirh a = 0.1 and h. = 1.
0.7%
0.6%
CLS
Our Score
Wald
LR
From Table 5.4.2 we see that the probability of committing type 1 error is about
the same for d l the tests. In Table 5.4.2 we see that the power of the CLS test may be
Percentage of rejections at the 5% level
Percentage of rejections at the 1% level
37.4%
36.3%
3 5 -9%
3 6 .O%
16.0%
15.0%
17.0%
15.5%
Chapter 5. Testing for independence 123
slightly higher than the other tests at the 5% level and that the power of the Wald test
may be slightly higher than the other tests at the 1% level. The table also shows that there
is a large probability of cornmithg a type II error when a is small.
Chapter 6
6. General misspecification test
In this chapter we apply a general specification test, sornetirnes called the hCormation
matrix test, to our model. The test is equivalent to a score test of whether the parameters
in the model are stochastic or not. Section 6.1 motivates the information test. Then in
Section 6.2 we derive the test for the simplest possible case and state the more general
result found in McCabe and Leybourne (1996). Finally in Section 6.3 we give the details
for the Poisson AR(1) model and evaiuate the test on some simulated data
An advantage of maximum likelihood estimation is that the asymptotic variance of its
estimators ofien attain the Cramer-Rao lower bound. However, a drawback to maximum
likelihood estimation is that the estimates are not robust to model misspecification. We
saw this in Section 4.7 where we found that for the misspecified model our parameter
estimates are severely significantly biased. It is therefore important to check a model's
specification if maximum likelihood estimation is to be used.
Let L and I denote the likelihood and log-likelihood respectively, and let 0 be a
vector of parameters. Also let the first and second partial derivatives of L and 1 with
respect to 0 be denoted by 4 , &, 2 , and & respectively. The expected Fisher
Chapter 6. General misspecification test
information c m be expressed in two ways: the Hessian form -E[&,] and the outer
product f o m ~[i , i [] . When the mode1 is correctly specified & + i,l: has a distribution
with mean zero. An equivalent expression for Ï, + & i: is L, / L , which is a zero mean
martingaie. The equivdence of these two expressions is well known and can be shown by
considerhg the second partial derivative of the log-likelihood as follows:
Bamdroff-Nielsen and Sorensen (1994) state that L, / L is a martingale. This is easy to
prove as follows: Let f, (y,) denote the conditional density of Y, given the past
observations, and write the likelihood as Ln = n:=lf, ( y, ) . Let 3, be the sigma field
generated b y y,, "v, , . . , y, . For rn < n we have,
Chapter 6. General misspecification test
Therefore $ / L is a martingale. Under certain regularity conditions, see Section 4.1, the
martingale central limit theorem can be used to find the asymptotic disiribution of
L +i,i;.
A good description of the information matrix test is found in White (1982).
Chesher (1983) shows that for sequences of independent observations the infoxmation
matrix test c m be interpreted as a score based test of whether the mode1 parameter are
stochastic. McCabe and Leybourne (1996) extend Chesher's result to a much more
general setting. Namely to sequences of random vecton, which are possibly dependent
and non-stationary.
6.2 Outline of the test
In this section we denve the score based interpretation of the information matrix test in
the simplest case and then state the more general result found in McCabe and Leybourne
(1 996).
Let y, ,y, ,-*,y, be a sequence of observations. We will assume that the
distribution of y, depends on an unobserved random parameter O , , and denote the joint
Chapter 6. General misspecification test
We will also assume the sequence of parameten {El,}:=, is uncorrelated and that
E[B,] = po and ,] = R 2 0 . If a = O then the parameters are not stochastic and the
observations are independent and identically disiributed. Let L(yl0) be the conditional
likelihood where y = (y, ,-,y,, ) and 8 = (8 , ; -,8 )' . Consider the following Taylor
series expansion of L(yl0) about the vector p = ( p o , - - - , p0 ) :
To proceed we need the following well-known property of the trace operator,
E ( X ' A X ) = n ( A L ) , where X is a n x p randorn rnatrix, A is a n x n ma& of
constants and E [ X ~ X] = Z . In the following we will use E, to denote expectation with
respect to 0 . The marginal likelihood is then,
where R = va@) = diag(ir) . Therefore if n is close to zero the likelihood cm be
approximated by Lu) = L(yl p) + ) n(4 (y1 p)n) . Using the relationship
L, = L(Z, + LZ;) f?om Section 6.1, we c m rewrite the likelihood approximation as
Chapter 6. General misspecification test
We are interested in testing the nul1 hypothesis H,:x = O (0 is not stochastic)
against the alternative hypothesis H,:x > O (0 is stochastic). The score with respect to
x is,
Evaiuated at rc = O the score simplifies to Un (y ,O) = + @((le + 2, z:)I,=, ) . It is not
completely obvious that this is a martingale since the number of parameten and the
dimension of 2, and Zo are al1 increasing with n .
Note that the elements of 1, are & f (y, 10, ) and the diagonal elernent of 1, are
f (y, 10, ) . The score therefore become U, (y,O) = x:=, {$ f (y, 10, ) a:
+($ f (y# ,))2}g=, . ?bis is, of course, the same as replaçing the n parameters in 0 with
the single (1 -dimensional) parameter p ,, , that is U,,(y,O) =
&en write the score a s U,,(y,O) = Zp0 + I , ~ Z ; , which is now clearly a martingale. Since
Chapter 6. General misspecification test
b t LL is a sequence of independent and identically distributed random variables and
assuming the derivatives of f with respect to p0 are continuous,
7 (5 f (y, 1 + (& f (y , ( is also a sequence of independent and identically t= 1
distributed random variables. Assuming the variance exists and is fmite, thai is
v v ) + ( ( 1 ) ) ) = G , the centrai limit theorem implies that
For the more generai case we let { be a sequence of p x 1 vectors of
observations. It is assumed that the marginal distribution of y, depends on the parameter
8 , which is a k x 1 random vector. Let L(yl9) and L(y) be respectively the conditional
and marginal likelihood, where y = (y:, ,y ) ' and 0 = (8 :, ,O ) ' . Also assume that
r E [ B ] = p and var[0] = Cl(II), where = ( x , , ,A ,) , and that R(0) = O,,, . McCabe
and Leybourne show that the marginal likelihood c m be approximated as
UY) = UYlP){l + +n((& + ~4g1*=, a)}
We are interested in testing the hypothesis that none of the parameters are
stochastic, &Il = O , against the alternative that at least one of the parameters is
stochastic, H,:n, > O for at least one i , 1 = 12, O , m . McCabe and Leybourne show that
the test statistic to be u e d in this case is UN = n{(l, + & ( B ) ~ . , - ~ ~ , @ l n d } or
Chapter 6. General misspecification test
equivalently U , = vec[(l, + 4 I [ ) L = ~ ] & v e c [ ~ ] e , where e is the rn x l unit vector and
vec is the operation of column stacking. Under &:Il = O U., is a zero mean martingale,
d
and under appropriate regularity conditions, again see section 4.1, (u): U , + N(0,l) as
N + m .
6.3 Details for the Poisson AR(1) model
In this section we present the details of the specification test for the Poisson AR(1) model
and apply the test to simulated data, some for which the Poisson AR(I) is the correct
specification and some for which the Poisson AR(1) is not the correct specification.
We will denote the conditionai probability of X, given X,-, and the parameter
values a , and h, by,
min( X,. . X,-1) -LI X x x,,-Xe A l f -
r=O X (xt - x)! .
We assume that the sequences {a,}' and {h,):=, are independent and identicdly r=I
distributed with the following means, variances and covariance: ~ [ a , ] = a , E [ L , ] = h ,
~ a r [ a , ] = n , 2 0 , var[ht]=n2 > O and ~ov[a,,h,]=-a, 50.
We cari justiQ the use of a negative covariance as follows: To sirnpliQ the
argument we assume a , = a and h, = h for al1 t. The marginal mean of X, is
p = h/(l - a) . If the mean p is fixed then increasing a corresponds to decreasing h
Chapter 6. General misspecification test
and we therefore assume a and h to be negatively correlated.
Our vector of random parameters is 9 = (a ,,az, ,an ,k,, h 2 , ,In)' and the
first and second derivatives of the log-likelihood with respect to 0 are given by
7
'0 = ( ' a , 7 ' a 2 9 ~ ' a m ~ ' ~ l , ' ; 2 ~ ?'A")
and
The variance matrix of 8 can be written as
where fn denotes the n x n identity matrix. The denvatives of R are
Chapter 6. General misspecification test
Finally the score statistic, to test the null hypothesis Ho: a, = Oy r = O & n, = O against
the alternative hypothesis Ha :ni > O for at least one i , is given by Un = CL, (12t ( a , 1)
+I,,(a,k)+ r,'t(a,h)+l,f(ayl) -2C (a,k)lkt(a,h)-21,,,r(ay~)). Note that al1 the
derivatives in this expression can be found in section 4.4.
If this joint test rejected the null hypothesis then the test can be separated into
individual tests to ~IY and identiQ which parts of the model are inadequate.
In the Poisson AR(1) model there are two random components: the thinning
operator and the arrival process. Recall in Section 4.4 we showed the Pearson residuals
could be decomposed into continuation and arrival residuals. In a sirnilar rnanner we can
decompose the score statistic into components. However in this case the decomposition is
into three parts: continuation component (for the binomial thinning operator), amival
component and an interaction component for the interaction between the binomial
thinning operator and the arrival process.
In the test statistic Un the component C' {C (a , A) + 1, (a, A)} tests the 1=1 r
adequacy of the binomial thinning operator to explain the variation in those who continue
to collect f?om one penod to the next. Binomial thinning assumes that the recovery of
individuals is independent, which is probably a reasonab le assumption. It is unlikely that
one individual's recovery time would affect another's, unless we had limited medical
Chapter 6. General misspecification test 133
services and recovery on one meant that the next individual could start treatment. The
binomial thinning operator also assumes that al1 individuals recover at the same rate. This
is a less realistic assumption, since there is wide variation in individual health due to
genetic factors and lifestyle choices, such as diet and exercise. Recovery rates should Vary
Eorn penon to person, since we would expect recovery rates to depend on the peeon's
health Unmediately pnor to injury or illness.
We now examine the thinning operator in more detail. The Bernoulli assumption
seems reasonable. That is, for individual i who is collecting disability at tirne t- l there are
two outcornes: recovers (stops collecting STD) or does not recover (continues to collect
STD). This, of course, ignores other possibilities, such as death, which we could either
assume has a negligible probability or c m simply be added to the recovery probability
and thought of as an exit probability.
However, each individual collecting disability at time 1-1 will have a different
level of health and therefore will have a different probability of recovery before the next
penod. The number of individuals who continue to collect at time t can be written as
x.' p i , where p i are independent Bernoulli random variables, and the probability that r = l
individual i recovers is P(P, = O) = 1- a, .
This thinning operator has more variation than the binomial thirining operator.
The question is how much more variation or is the binomial thinning operator sufficient
to account for the majority of the variation? If we found that the binomial thinnùig
Chapter 6. General misspecification test
operator was not sufficient to explain the variation, that is, the result of the specification
test was to reject the hypothesis that the binomial thinning parameter was non-stochastic,
then we would have to consider over-dispersed rnodels such as in McKenzie (1986)' Al-
Osh and Aly (1992) and Joe (1 996). The problem with the thinning operators used in
McKenzie (1986) and Al-Osh and Aly (1992) is that they are a random sum of geometric
random variables and it is possible to have a 0 X,-, > X,-, , which wouldn't make sense if
we want to use it to model a queue.
The component En (1: (a, h) + 2, (a, A)} of the test statistic U,, tests the t=1 r
adequacy of the Poisson distribution in describing the arriva1 process. Two reasons for
using the Poisson distribution are: first it has a simple probability rnass function with
well-known properties and second when combined with the binomial thinning operator
the marginal distribution of our process remains Poisson.
In practice we ofien use the Poisson distribution to model count data. The main
criticism of this is that most real data are over-disperse4 that is, the Poisson distribution
is not sufficient to describe the variation found in the data. One method for dealing with
over-dispened count data is to let the Poisson mean )i. be random. Usually it is assumed
that h bas a gamma distribution, which tmnsfoxms the distribution into a negative
binomial. For AR(1) models with negative binomial marginals see McKenzie (1986), Al-
Osh and Aly (1992) and Joe (1996).
Finally, the component (la (a, l ) l L t ( a , A) + (a, A))} tests if the amval t=l t
Chapter 6. General misspecification test
and departure processes are independent. If the workers are a small cohort then the arriva1
and departure processes are dependent, since, as workers recover, the cohort size
increases, which raises the cohort's exposure to injury and increases the number of new
injuries (arrivais). In most industries this is unlikely to be a problem since the number of
injured workers is usudly a very smail fiaction of the industries work force.
Next, suppose we believe that the binomial thinning operator with a fixed non-
stochastic parameter is sufficient to explain the variation in the nurnber of individuals
rernaining in the system f?om one penod to the next. However, we suppose we have
picked the arriva1 process to be Poisson out of convenience and wish to check the
adequacy of this specification. In this case, the score statistic simplifies to
(In = 1" (z;, (a, l ) + l , (a, A)) . Using the expression in Sections 4.5 we can rewrite the t=l
= C' {A' + E, [E i] - (1 + 2h) E, [E , 1) , which is, of course, is a zero mean martingale. For r = l
low count series, it is easy to calculate the variance numencally, which is given by
Since the martingale differences, - Ut-, , are bounded by k2 + X: which has
finite moments, the weak law of large numbers holds for the martingale differences and
P P îheir squares. That is, -t E [u, - u ~ - , ] = O and n-' [cl]. + E [(Ut - u~-, )2]
= var[ut - y - [ ] . The bound on the martingale differences aiso means that the Lindeberg
Chapter 6. General misspecification test
d
condition is satisfied; hence var[^,]-' Un + N(0,l) .
To evaluate how well this test performs, we apply it to some simulated data which
are correctly specified, and to some simulated data which are misspecified. In the first
case, we simulated 100 series of length 200 firom the Poisson AR(1) mode1 with a = 0 5
and )\. = 1 . At the 1% level of significance the information matrix test rejects 3 series and
at the 5% level it rejects 8 series. The number of rejections is only slightly higher tban
expected, so it appears that a sample size of 200 is sufficient for the distribution of the
test statistic to be approximately standard normal.
Next, to assess the power of the test we simulate some misspecified data by letting
E , follow the unifoxm distribution over the set of integers {0,1,2) , that is, P ( E , = i) = X ,
i = 0,1,2 . Note that the mean for 6, remains the same. This t h e , out of 100 series, the
information matrix test rejects 92 senes at the 1% level and al1 100 series at the 5% level.
This indicates that the test has strong power against this type of misspecification.
Erample 63.1 In this example we test the specification of the Poisson arrivals in our
illustrative data set. The information matrix test statistic is 1.45 @-value 14.7%). We
therefore accept the nul1 hypothesis that the Poisson parameter is non-stochastic.
Chapter 7
7. ModeIs with covariates
7.1 Mode1 definition and introduction
Let Xo , X, , . - , X, be a series of dependent Poisson counts generated according to the
following rnodel
where X, has a Poisson distribution with mean )co and { E , } ~ ,=I is a series of
independently distributed Poisson random variables with mean A, . The thinning operator
" O " is defmed as in Section 2.1. Given X,-, , a, 0 X,-, and E , are assumed to be
independent; this can be checked with the model specification test, see Sections 6.3 and
7.5. As in Section 6.3 we will denote the conditional probability of X, given X,-, as
pt(XtIXt-1) -
An easy way to incorporate covariates into the model is to use a link function,
which is the common method in generalized linear models. The idea behind the link
function is to map an unrestricted (real) parameter space for the regression coefficients
into the restricted parameter space required by the model. For example, since the
Chapter 7. Models with covariates
parameter space for a, is (0,l) , the following link is appropriate
where is an rn -dimensional vector of tirne-varying covariates and y E W m is an rn-
dirnensional vector of parameters. Similarly, taking h, = exp(z:p ) , where 2, is a p -
dirnensional vector of tirne-varying covariates and B E iRP , will insure that h, is
positive.
7.2 Forecasting
The fmt step in the process of forecasting is to find the k-step ahead distribution, that is,
the conditional distribution of X,-, given X,v . This distribution is defined by the
conditional moment generating function of X,v+k given X,; and is given in the
following theorem.
Theorem 7.2.1 For the Poisson AR(I) model defined in Section 7.1 the k step ahead
conditional moment generatingjünction is given by
Proofi The result is proved using induction. The one step ahead conditional moment
generating function is
Chapter 7. Models with covariates
Now, suppose that the k- 1 step ahead conditionai moment generating function is
where we define n a = O when i = N + k - 1. Then the k step ahead conditionai j=i+l
moment generating f ic t ion is
.V+ k where e' = es nf=+:+, a , + (1 - n a, ) . Substituthg this in for et gives
Chapter 7. Models with covariates
", +(l- gaiIr cxP{(es -1) '=hi fia,}. j=.V+I j = N 4 i = M + l j=i+l
Remarks
1. The distribution of X,+,I X, is a convolution of a Binomial distribution with
parameten n:+, a and X,, and a Poisson distribution with parameter
~,.n,=:+,.~. C=,w Hence, it has mean X, nr-fy+l - , a , + zNtk i=,v+l ki ,=i-+I a i and
2. From the conditional moment generating function the conditional distribution of X,
given X, is a convoiution of a Binomial distribution with parameters nr j=i a, and X,
and a Poisson distribution with parameter x: I = I Li n j = , a . Hence, if the unconditional
distribution of Xo is Poisson with mean ho, then the unconditional distribution of X, is
Poisson with mean 1, + 7c,-,a, + A,-,ara,-,+ +hoa,a,-, a,.
For cornparison purposes we define a corresponding Gaussian AR(1) mode1 with
covariates as follows:
Chapter 7. Models with covariates
where E , is nonnaily distributed with mean zero and variance o ' and the parameters X,
and a, are defined as in the Poisson rnodel. For this Gaussian AR(1) rnodel, X,,, / X,,
has a normal distribution with rnean XdV nSVTk j= N+I a + zrk J=N+I hi ny-::a,. - and variance
N+k-l (l + n /=,+, a )oz . As in section 3.1 the conditional mean of X,v+k 1 X , in the Poisson
and Gaussian models are the same, but the conditional distributions are quite different.
Let p, (x) be the conditional probability function of X,,, given X , and define
the conditionai median of X,v,, given X , as the smallest non-negative integer rn, such
that pk (x) 2 05 . This is the sarne definition used in Chapter 3. If, at time N. we x=O
know the future covariate values, {q}::: and {z,}:::, as well as the parameter values
y and p ; then the k step ahead conditionai probability function, p, (x), can be found
fiom the k step ahead moment generating. Proposition 3.2.1 says that the k step ahead
conditional median is the forewt that minimizes the k step ahead absolute forecast error.
With regard to forecasting count series, the biggest advantage in using a data
cohesive model is that individual probabilities (point mass forecasts) c m be caiculated for
ai1 possible fixture outcomes. This is especially useful for low count senes, where o d y a
few of the outcomes have non-zero probabilities.
If the purpose of a rnodel is to make forecasts, it is helpful to select covariates that
are either detenninistic or easy to forecast, since future covariate values are needed to
Chapter 7. Models with covariates
forecast with these types of models.
Next we consider the limiting distribution for a couple of simple cases.
Erample 7.2.1 Suppose that there are two different rates of new injuries, and each month
the rate switches back and forth. That is 7cl,+, = hl and h ,, = h, , t = O,1,2,- . Also
assume that the recovery rate is constant over tirne, that is a, =a fixed. The
unconditional distributions of XI,+, and X2( are Poisson with respective means,
hi +h,a +&a' + +7c,~"-~ +&a2'-' and X,+h,a+hza2+ +h2a2f-'+hla2r-'.
There is no limiting distribution in this case. However the following subsequentid limits
d d
hold X I + ( ) and X,, + ~('3) . Also if X, - P,(*.y) then the I - a 1-
unconditional or marginal distribution is X?,,, - P,('e:-) IQ and X2, - P, (h~?!) 1-a -
Example 7.2.2 Consider the following modification of Example 1. Suppose there are
two distinct seasons (winter and sumrner) and that the two seasons have different injury
rates, that is,
Chapter 7. Models with covariates
Example 2 is fairly realistic in that a lot of industries have two well defined
seasons. Exposure to injury is ofien highest during the summer months, when there is a
larger work force and more over-tirne. In contrast, during the winter months, when there
is less work, the work force is smaller and those employees who are working are
encouraged to take holidays. in British Columbia, the logging industry and the fishing
industry are two examples of industries whose exposure to injury changes according to
the season.
Ln Section 3.4 we showed how to construct confidence intervals for the
pro bability mass forecasts. The fo 110 wing modifications are needed when regressors are
included in the model. Suppose we have a
likelihood estimates for this sample by f ,
T asymptotically nomai with mean i y , , Po J
sample of size n and denote the maximum
A T and fi,. We will assume that f,,,P,) is
and variance n-'i-l , where i is die Fisher
information ma& and y and p, are the "îrue" parameter values.
The k step ahead probability fùnction c m be written as
n t k where a,, = n,.+,aS, k = 12,. - - and A* = k + ~ , + , , k - , +k+2a.+2.k-2 + +~.+t-~a,+it-~.~
+A,+, . By Theorem 3.4.1 for fked x, p, (XI X,,;f,, ,) bas an a~yniptotically n ~ n n d
distribution with mean p, (XI X, ;y , , P ,) and variance
Chapter 7. Models with covariates 144
where the rnatrix id' is partitioned as
and the matrices i;', i ~ l = (i& ) and ial are of dimension pxp. pxm and m. The partial
derivatives of the point mass probability pk (xi Xn;y , P) are given by
and
a a Expressions for the partial derivatives - pk (xi Xn ) and - pk (XI Xn ) are found in ' a n . , ah*
(4.4.1 ) and (4.4.3) respectively. The other 3 partial derivatives are given by
Chapter 7. Models with covariates
and
7.3 Estimation
The model parameter cm be estimated using the Newton-Raphson iterative scheme
discussed in Section 4.6. In this case, the vector of parameter is 0 = ( y , ,y Y . . ..y ,,,,
T P ,, P 2 , . . ., P, . The addition of regressors in the rnodel rnakes E[U@ '")] less practical.
In Chapter 8 we fit Poisson AR(1) models to some WCB data. In these models a
is held constant over tirne. The parameter estimates were found using a quasi-Newton
procedure, in which we calculate ~(8'") numencally. We f o n d the following starting
valuesworkedwella=0.9, P l =P, = = P , = O .
a We now consider the asymptotic properties of the model. Let Uy,, = Cr=, Z,, .a,
and (le = Cr=, $ h , denote the score function for the departme parameters y and the
anival parameters p respectively, where i,, = " 1 = - 1% P, (x, i 4 - 4 1, h r h r
ix, =$I=&logp,(X,IX,-,), &a, =qat:a,cI-a,) and $A, =Z,h,. We also denote the
Chapter 7. Models with covariates 146
martingale differences as 5, = U?, - LI ,-, and up, = UB, - Ug.l-i. Using the expressions
in Section 4.5 for the denvatives of the conditionai probability the martingale differences
can be written as
and
The following proposition shows this model cm be identified.
Proposition 7.3.1 The Poisson AR(I) mode1 defined in Section 7.1 can be identified if
and only if the jôllowing rwo matrices have full row rank: [ y , q , . .., Y,] and
ProoJ A statistical model is identifiable if its Fisher information is positive definite. It is
therefore sufficient to show that
T if and ody if a , = a , = = a , = b l = & = = b , = O , where a = ( a , , a ,,..., a,) and
Chapter 7. Models with covariates
T where c,, = a ,( 1 - a, 1 a and c,, = h , b ' ~ , . From Proposition 4.6.1 E[(c, ,~, , ,
+cLfiX,)' = O if and only if cl, = c2, = O . That is (7.3.1) holds ifand only if a T [ q , ,...
, Y,] = O and b r [ z 1 , &, .. . , Zn] = O which by assurnption hold if and only if a, = a?
- - - -a, = b , =b2 = = b , =O.
cl
If the regularity conditions for the martingale CLT hold then the score can be
used to make inference about the parameters y and p . Basically, if the covariate
processes are "well" behaved then these conditions will be satisfied.
As mentioned in exarnple 7.2.2, in many industries, exposure to injury is seasonal,
in which case the addition of seasonal covariate to the mival of injuries is appropriate. A
common method for modeling seasonality in monthiy data is to use indicator covariates
for each month. Let 2, = (z*, ,zt2 , .. . ,z, 12 )T , where i" component is one in month i and
zero in aII other months. That is,
Chapter 7. Models with covariates
Note, if a constant is included in the regressors then one of the monthly indicators must
be dropped.
Often it is more appropnate to use seasonal indictors rather than monthly
indicators. However, the number and length of the seasons is somewhat arbitrary. Some
authors prefer to use sinusoidai covariates, such as Zr = sin( 2rrtl l2) , s i q 2irt /12) ,- - - T
,sin( 2xr/12) . When ail 12 sinusoidai components are included both models are the
same. Usuaily o d y a few of the sinusoidal components are needed. However, this still
generates 12 dif5erent monthiy levels.
Suppose we have a model with 12 values of the thinning parameter, one for each
month of the year, that is, a , ,a2, ,a lZ are the thinning parameters for January,
February, , December respectively. Simiiarly, suppose the model also has 12 values of
the departure parameter, h , , h , , , h l, , again corresponding to the 12 months of the year.
Theorern 73.1 If X, has a Poisson distribution with mean p, then the marginal
distribution of X j is Poisson with mean p j , where
- - 1 pi - prnod12(j) -
l-a,az a,, ,.,
and
Chapter 7. Models with covariates
Pro08 Since XI,,+, is a convolution of a ,2,+k Xlzr+t-l and E 12r+k , its mean must be
equal to a, v,-, + h t , or, in other words, pk = a, p,-, + )ck . We will show this equation
holds for k = 1 , and omit the tedious details for k = 2,3, ,12. We begin by writing p,
and pi out in long form. That is,
and
0
It can be s h o w that the sequence of random vectors (XI,,+,, X,,,,,,
' a is stationary. *-*rX12,+12) ,=,
Proposition 73.2 I f the stochastic process X, follows the Poisson M(I) model as
Chapter 7. Models with covariates 150
defined in Section 7.1 and there exisi an a , < 1 and a A, < cn such that a , 5 a - and
l , c l - f o r a l l t , theni t i sa -rn ix inga(n)=Oa, .
The proof of Proposition 7.3.2 is very similar to that of Proposition 4.3.1, but the notation
required makes it tedious an is therefore omitted.
Proposition 73.4 Under the assumptions of Proposition 7.3.2 all the moments of X,
exist and are finite. Further, al[ the moments of u, and ci, exist and are finite, where
ut = U - U t , CI, is the score jûnction and u, is the matrix of partial derivative of ut
with respect to y and P .
ProoJ As noted in Remarks following Theorem 7.2.1 the marginal distribution of X, is
Poisson with mean h, + ,-,a, + h ,-,a ,a ,-, +- - +h,a ,a ,-, -a, which is bounded by
A- 1-a"l m;Ut /( 1 -a, > < m. Hence, al1 the moments of X , exist and are finite. The
second part follows fiom the fact that both u, and u, are bounded polynomiais in X, and
hence al1 of there moments exist and are f i t e .
Proposition 7.33 If in addirion to the assumptions ofProposition 7.3.1 we assume that
the model is idenifiable and X, is stationary, then the Fisher information, i. isfinite and
Chapter 7. Modeis with covariates
Proof By Proposition 7.3.2 X , is a -mixing, which combined with stationarity implies
that X, is ergodic. As a consequence of Theorem 4.1 .l ut and ut are both ergodic.
Further, u, and ut have finite variances due to Proposition 7.3.2. Hence conditions for
Corollary 4.1.1 are therefore satisfied and the result follows.
7.4 Testing
In the fmt part of the section we consider testing for independence, or more formally
testing the nul1 hypothesis, Ho :a = O , against the alternative hypothesis, Ha :O < a < 1 .
We begin by fmding the Fisher information matrix under the nul1 hypothesis.
a* -dl - " 1 =L/ Let 1, = p , X,IX,-, , in=$[ , , , - lm-, , a, m and
Lt = $ lrÀ, . Al1 of these partial derivative are found in Section 4.4. When a = O we have
the following simplifications:
Chapter 7. Models with covariates
The expected values of these under the assumption a = O are:
Chapter 7. Modeis with covariates
Next we calculate the partial derivative with respect to P .
- .. ah, ah/ . a%, - I^' apap A , dB'.
We then have
Chapter 7. Models with covariates
and
The Fisher information is therefore given by
We will assume there exist a positive defuiite manit i such that lim,,, in +P i and let o'
denote the fmt e n w in i-l .
We defme the following three sets of parameter "estimates": Let a, and 6, be
the unrestricted maximizers of the likelihood, cin and fi, be the maximum likelihood
estirnates (that is maximizing over the parameter space) and be the maximum
likelihood estimate of P when a = 0.
Under the nul1 hypothesis && /a converges in probability to a standard noxmal
Chapter 7. Models with covariates 155
random variable and &,/o converges in probability to a random variable Z', where
Z* is defined as,
where Z has a standard normal distribution.
If we respectively defme the Wald and likelihood ratio statistics as:
y = =.a: /G' and 2 log hi , where A = L(6i P,; x)/L(o, P ; X) . Then under the nul1
hypothesis both converges to the usual chi-square random variable with one degree of
fkeedom.
However if redefine the Wald and likelihood ratio statistics as: CY, = nâ /o '
and 2 log A> , where A = L(â n , 6 ,,; x)/L(o, B :; X) , respectively. Then under the null
hypothesis both converges to a modified chi-square random variable defined by,
where x , has a chi-square distribution with one degree of fieedom.
Consider the following zero mean martingale
where U,(a, p) is the score function with respect to a . Under the null hypothesis
Chapter 7. Models with covariates 156
U,(O, P ) is a mean zero martingale. However, under the alternative it is necessary to
subtract a 2 XI- fiorn &,, (O, P ) to get a zero mean martingale. Hence large positive r=l kt
values of U, (O, p) give evidence in favor of the alternative hypothesis. That is, the score
test is one-sided.
Next we consider the information rnatrix test for specification. We will allow
covariates in both the &val and departue process and use the same notation as in
Section 7.3. That is, we let ur = Ur -y-, , where Ut denote the score function at t h e t
and let ut denote the matrix of partial derivative of ut with respect to y and P .
The information matrix test is based on the following martingale: Mn = CL, mr ,
where mt = ln, utu: t ri, l,, and 1,, is a m+p vector of 1 'S. The quadratic variation
of Mn is given by [M] = Zn in: . Under the assumptions of Proposition 7.3.3 m, is I=I
stationary and a mixing. Further al1 the moments of m, exist and are firute. This is
sufficient for the conditions of Theorem 4.1.1 to hold and hence [MI;' Mn converges in
probability to a standard normal random variable.
Chapter 8
8. Application to counts of workers coilecting disability benefits
In this chapter we anaiyze five data series drawn fiom the WCB data set. Section 8.1
contains descriptions of rhe five data senes and preliminary analysis of the data. Next, in
Section 8.2, we cany out our in-depth analysis, which includes: model estimation,
selection and testing. We calculate forecasts for the first six months of 1995, in Section
8.3. Finally, in Section 8.4 we show what happens if the Gaussian AR(1) model is fit to
îhe data.
8.1 Workers' Compensation Data
We have selected the following five data series for analysis in this chapter: Each senes
contains monthly counts of claimants collecting STWLB fiom the WCB. Al1 the
claimants are male, between the ages of 35 and 54, work in the logging industry and
reported their claim to the Richmond service delivery location. The distinguishing
difference between the five series is the nature of the injury. We will refer to the five
senes as data set 1,2, 3,4 and 5. The claimants in data set 1 have bum related injuries. in
data set 2 the claimants have sofi tissue injures, such as contusions and bruises. The
claimants in data set 3 have cuts, lacerations or punctures. Claimants in data set 4 have
dennatitis and data set 5 contains claimants with dislocations.
Chapter 8. Application to counts of worken collecting disability benefits 158
Table 8.1.1 contains a summary of simple descriptive statistics for the five data
sets and Figures 8.1.1 through 8.1.5 contain time series plots for the five data sets. Plots
of the sample autocorrelation function and sample partial autocorrelation function are
found in Figures 8.1.6 and 8.1.7.
Maximum 1 Median Mean Data Set
1
2
3
4
5
Minimum
O
4
1
O
O
A property of the Poisson AR( 1) model is that the marginal mean and variance
Table 8.1.1 A summary ofs im~
17
2 1
3 - 4
should be the same, see Proposition 2.3.1. In Table 8.1.1 we see that for each data set the
9
5
O
1 *
mean and variance are close except for data set 3, where the variance is almost twice the
e srarkriajîor daru sers I rhrough
mean. However, if the Poisson mean is non-constant this would cause the variance to be
larger than the mean.
In the tixne series plot of data set 1, Figure 8.1.1, there is a significant change in
the pattern after the middle of 1993. It therefore seems unlikely that an AR(1) model will
fit the series well. This is M e r confirmed by the sample autocorrelation function, which
doesn't decay fast enough for an AR(1) model. Also, the first two lags of the partial
autocorrelation function are statistically significant at the 5% level, suggesting that an
AR(2) model might be appropriate. However, an AR(2) mode1 has the problem that it is
not easy to interpret and it has more than one specification. See the discussion at the
Chapter 8. Application to counts of worken collecting disability benefits
beginning of Section 2.4.
The difficulty with analyzing a series with very low counts, such as data set 1, is
that a single claim c m drastically change the shape and correlation pattern of the senes.
For instance, in data set 1 there appears to be a single claimant collecting STWLB
between May 1993 and December 1994. It is impossible to tell this for sure, since this
same pattern could be caused by several individual claims, although, based on the earlier
claims fiequency in the senes, this is less likely.
Further investigation shows that our conjecture of a single long duration claimant
is correct. Since the frequency of severe claims is low (one c l a h in ten years) and since
our mode1 is not designed to handle such claims we removed this outlier from data set 1.
To distinguish this new series tiom the original we refer to it as data set 1 *.
The sample autocomelation function and sample partial autocorrelation function
for data set 1 * are found in Figure 8.1.8. There is a slight seasonal pattern in the sample
autocorrelation function. However, a seasonal rnodel may not be necessary, since the
conelations at lag 6 and 12 are well within the 5% confidence limits. The sample partial
autocorrelation function indicates that an AR(1) rnodel is appropriate.
The clairns counts in the second data set are significantly higher than in data set 1.
Hence one or two persistent claims would not have a profound effect on the series' shape
and correlation pattern. The tirne series plot of data set 2, Figure 8.1.2, looks stationary
with possible seasonality. A seasonal pattern appears in the sample autocorrelations
function, which has with large negative correlations at Iags 7, 8 and 9, and large positive
Chapter 8. Application to counts of workers collecting disability benefits
correlations at lags 13 and 14. The sample partial autocorrelation fuoction is consistent
with the AR(1) model.
The claims counts in data set 3 are also relatively large, with a mean of 6.133. In
July 1988 the claims count is unusually high at 21 and is the only observation above 14.
The time series plot, Figure 8.1.3, shows a season pattern and a drop in the variation after
January 1990. The sample autocorrelation function confimis a seasonal pattern, while the
sample partial autocorrelation function suggest an AR(1) model is appropriate, see Figure
8.1.6.
The clairns counts in data set 4 are low. Between June 1990 and April 199 1 there
appears to be a persistent claim, again it is impossible to tell for sure without M e r
investigation. The later half of the senes appears to have a slightly lower claims
frequency than the first half of the senes. The sample autocorrelation function and partial
autocorrelation fûnction, Figure 8.1 -7, are consistent with an AR( 1 ) model.
In data set 5 the claims counts are low with slightly higher claims counts
occurring between January 1990 and December 1992, which again could be caused by a
single claimant with a severe or reoccurring dislocation. The sample autocorrelation
function and partial autocorrelation function, Figure 8.1.7, are consistent with an AR(1)
model.
Chapter 8. Application to counts of workers collecting disability benefits
Time Series Plot (data set 1)
Figure 8.1.1 A rime series plot of data set 1.
- - -
Time Series Plot (data set 2)
Figure 8.1.2 A tirne series plot of data set 2
--
Time Series Plot (data set 3)
Figure 8.1.3 A rime series plot ofdata set 3
Chapter 8. Application to counts of workers collecting disability benefits
Time Series Plot 1 (data set 4)
Figure 8.1.4 A rime series plor of dara set 4
Time Series Plot (data set 5)
Figure 8.1.5 A rime series plor of dara ser 5.
Chapter 8. Application to counts of workers collecting disability benefits 163
ACF (data set 1)
Lag Num ber
ACF (data set 2)
ACF (data set 3)
PACF (data set 1 )
"I
PACF (data set 2)
r---
PACF (data set 3)
Figure 8.1.6 A CF 3 and PA CF 's fi r data se& 1 IO 3.
Chapter 8. Application to counts of workers collecting disability benefits 164
ACF (data set 4)
l C c PACF (data set 4)
ACF (data 5 )
l 0 7 PACF (data set 5)
l 0 O
Figure 8.1.7 ACF's and PACF >for data sets 4 and 5.
ACF (data set 1') PACF (data set 1 ')
- -
=igure 8.1.8 A CF and PA CF for data set 1 *
Chapter 8. Application to counts of worken collecting disability benefits
8.2 Mode1 selection and testing
In this section we select and estimate a Poisson AR(1) model for each of our data sets.
We restrict the ciass of models by considering only models where the deparhue rate is
fixecl, that is, a, = a for al1 t, and where the amval rate is either constant or depends on
sinusoida1 seasonal regresson. The parameter estimates for data sets 1 *, 2, 3, 4 and 5 are
summarized in Table 8.2.3.
Our preliminary anaiysis in Section 8.1 suggested the following need for
regressors: data sets 1* and 2 rnight need seasonal regressors, data set 3 will almost
certainly need seasonal regressors and data sets 4 and 5 will not require seasonal
regressors.
In analyzing data set l*, we fail to h d any seasonal regressors for which the
coefficients were significantly different from zero, at the 5% level. We therefore
proceeded to analyze the model with a constant anival rate. The information rnatrix test
statistic for the joint test that neither the departue nor the arriva1 parameters are
stochastic is 0.240 (P-value 0.81), while for the individual test that the amival parameter
is non-stochastic is 0.134 (P-value 0.89). In both cases we accept the nul1 hypothesis of
non-stochastic parameters. Henceforth in this section we will refer to these two tests as
simply the joint information matrix test and the individual information matrix test.
In Table 8.2.3 we see that the lower bound of the 95% confidence interval for a is
close to zero, 0.007. It is therefore worth testing for independence in this series. In Table
Chapter 8. Application to counts of worken collecting disability benefits
8.2.1 we see that al1 of the tests reject independence at the 5% level and that the CLS and
WaId test reject independence at the 1% level. We conclude that it is unlikely the series is
independent.
In Figure 8.2.1 we have included three residual plots: Pearson, continuation and
arrival, see Section 4.4 for the development of continuation and arrivd residuals. Recall
the Pearson residuals can be decomposed into the continuation and arrival residuals as
follows:
where r,: = E, [a 0 4-, ] - aX,-, and r2; = E, [ E , ] - h are, respectively. the continuation and
mival residuals at time t. The residual can be standardized as follows: r , / ~ , - , [ ~ ' ] ~ ,
r , : / ~ , - , [ri:']' and r l / ~ , - , [ r ; : ] ~ . Recall E, is the operation of expectation conditional
on 3, =o(X, ,X ,,..., X,).
Since the continuation and arrival residuals are new ideas, we now aiialyze these
residuals in detail for data set I*. We begin by considering the following cases for the
standardized continuation residuals of data set 1 *:
Case 1. X,-, = O.
Chapter 8. Application to counts of workers collecting disability benefits
In this case, a 0 4-, given 4-, = O, is not random but identically equal to zero, hence
~,[a 0 4-,] = O . Since uX,-, is also equal to zero in this case, the residual at time t is zero.
This is a key observation and an important property. It shows that, in this case, dl of the
deviation between the observed value of X, and its expected value at tirne t-l is due to the
arriva1 process and not the continuation process.
Case 2. X,-, = 1 and X, = O.
In this case, a 0 X,-, given X, = O , is non-random and equal to zero since nobody
continues. Therefore ~ , [ a O X , - , ] = O . Since dl-, is positive, the residual at time t is
negative. For data set 1 * the standardized residual is -0.709.
Case3. 4-, = 2 and X, =O.
This case is similar to case 2, the difference is that CLY,-, is twice as large as in case 2. For
data set 1 * the standardized residual is -0.919. Note that the residual is not twice the value
in case 2, since, in this case, the standardization is conditional on X,-, = 2 and not
X[-, = 1.
Case 4. X,-, = 1 and X, = 1.
In this case, a 0 X,-, given 4-, = 1 and X, = 1, is a random variable taking values in the
set {O, 1 ). That is, the one individual collecting at time t is either contiouing to collect
fiom tirne t-1 or is a new cIaim at time t. For data set l* the standardized residual is
Chapter 8. Application to counts of workers collecting disability benefits
Case 5. X,-, = 1 and X, = 2.
In this case, a 0 4-, given X,-, = 1 and X, = 2 is again a random variable taking values in
the set {0,1). Note this is a different random variable than the one in case 4, that is
~ ( a 0 4-, = II&-, = 44 = 2) > ~ ( a 0 4-, = 114-, = 1,X, = 1). At t h e t there are two
possibilities: two arrivals or one arriva1 and one continuhg cl&. For data set l * the
standardized residual is 1.727.
Although other cases are possible these are the only five observed in data set 1 *.
In a similar manner we now analyze the standardized arriva1 residuals for data set 1 *.
Case 1. X,-, = O and 4 =O.
Conditional on X, = 0, E , is non-random and equal to zero, hence E, [ E , ] - O . This results
in a negative residual. For data set l* the standardized residual is -0.366.
Case 2. X,-, = 1 and X, = 0.
This case is the same as case 1 except that the scaling (standard deviation conditional on
X,-, = 1) is different. For data set 1 * the standardized residual is -.520.
Case3. X,-, = 2 and X, =O.
This case is the same as cases 1 and 2 except that the scaling (standard deviation
Chapter 8. Application to counts of workers collecting disability benefits
conditional on 4-, = 2) is different. For data set l* the residuai is -.654.
Case 4. 4-, = O and X, = 1.
In this case, 9 , given 4-, = O and X, = 1 is non-random and equal to 1 For data set 1 * the
standardized residud is 2.3 66.
Case 5. 4-, = 1 and X, = 1.
In this case, E , aven XI-, = 1 and X, = 1 is a random variable taking values in the set
(0,l) . Thus E, [E , ] > O . For data set 1 * the standardized residual is 0.63 8.
Case6. 4-, = l and X, =2 .
In this case, 8 , given 4-, = 1 and X, = 2 is a random variable taking values in the set
( 1,2) . Thus E, [E , ] > O . For data set 1 * the standardized residual is 4.043.
For data set 2 we found the coefficients for the following seasonal regressors
statistically significant at the 5% level: sin(2 x t/ 12) and cos(2 n t/ 12). If these regressors
are essential to the mode1 then the information matrix test for a simpler model should
reject the null hypothesis of non-stochastic parameters. For the model with a constant
arriva1 rate the joint information matrix test statistic is 9.537 (P-value 0.59), while the
individual information matrix test statistic is 0.417 (P-value 0.68). In both cases we
accept the null hypothesis of non-stochastic parameters. That is, the mode1 with a
constant arriva1 rate is sufficient to explain the variation observed in the series. The
Chapter 8. Application to counts of workee collecting disability benefits
simulation in Section 6.3 showed that the information matrix test had good power against
the misspecification considered. However, the power of the test may be lower for other
types of misspecification. We therefore need to check the residuals before making any
conclusions.
The residual plots for this simple model, Figure 8.2.2, look random. Further none
of the sarnple autocorrelation for the residuals are statistically significant at the 5% level.
We therefore choose to use the simpler model with a constant arrivai rate.
In our analysis of data set 3 we found the following seasonal regressors
statistically significant at the 5% level: sin(2 IT t/ 12) and cos(2 rr tl 1 2). The joint
information matrix test for this mode1 is 1.008 (0.31 P-value). Figure 8.2.3 shows the
residuais ploned in chronological order. At the 5% level, none of the sample
autocorrelations are significant. Ln Figure 8.2.3 the residuals are plotted against the two
seasonal regressors. These plots indicate no significant problems.
Before we c m go ahead and accept this mode1 we need to show that simpler
models are not adequate. We consider three simpler models with the following
regressorç: constant only (model l), constant plus sin(2 ir t/12) (model 2) and constant
plus cos(2 ir t/12) (model 3). The joint information matrix tests for these three models are
summarized in Table 8.2.4.
For model 1, the Pearson and continuation residuals have a significant sample
autocorrelation at Iag 12, whle the arrivai residuals have significant sample
autocorrelations at lags 2 and 12. These correlations in the residuals dong with the low P-
Chapter 8. Application to counrs of workea collecting disability benefits
value for the joint information matrix test lead us to reject mode1 1.
The Pearson, continuation and arrival residuals for model 2 have significant
sample autocorreIations at lags 2, 12 and 2, respectively. Although the P-.due for the
joint information matrix test is above .O5 we reject the model because the residuals are
correlated.
In the case of model 3 the residuals appear to be uncorrelated, however we reject
the model due to the low P-value of the joint information rnatrix test.
In the case of data set 4 we fit a model with a constant arrivai rate. The joint
information matrut statistic is 0.222 (0.82 P-value). This leads us to accept the nul1
hypothesis that the parameten are non-stochastic. In the three residual plots, see Figure
8.2.5, the suspected persistent daim is evident (observations 54-64). In the case of the
Pearson and arriva1 residuds this causes a band of residuals close to zero. However, in the
case of the continuation residuals this band of residuals is quite far fiom zero. This causes
the lag 1 sample autocorrelation for the continuation residuals to be significant at the 5%
level. Othenvise the residuals for this model look good, and we decide not to remove or
M e r investigate the suspected outlier.
For data set 5 we also fit a model with a constant arrival rate. The joint
information ma& test statistic is -0.5 14 (0.61 P-value) and therefore we accept the nu11
hypothesis of non-stochastic parameters. The residuai plots are found in Figure 8.2.6.
None of the sample autocorrelation for the residuds are significant at the 5% level.
Chapter 8. Application to counts of worken collecting disability benefits 172
RecaIl duration is the number of months that a claimant collects STWLB. In
Section 2.2 we showed that the mean duration is (1 -a)-'. We M e r showed how to
consbnict 95% confidence intervals for the mean duration in Section 3.5. The mean
durations, Table 8.2.5, for data sets 1 *, 2,3 and 4 are between one and two months. In the
case of data set 5 the mean duration is longer, between two and three months.
Month January February
March Apnl
May June July
August September
October November December
Table 8.2.2 This ru,
Test CLS Our Score Wald LR Usual Score
5.177 5.043 4.450 3.680
3.000 2.547
le dirplays the semonal am'val rare for data set 3.
Table 8-22 Tests for independence in doru set 1 *
Staîistic Value 2.456 2.256 6.893 5.037 5.090
5% Critical Value 1.645 1.645 2.71 2.7 1 2.7 1
Chapter 8. Application to counts of workers collecting disability benefits
and the upper and Iower 95% confidence limits.
Mode1 Test statistic P-value
2.1 14 Table 8.2.4 This table s u m a r k s rhe joint informarion matru : resr ofmodels 1.7 and 3 on data set 3.
'er estimai
I I
Table 8.2.5 This rable contains the mean duration and 95% confrdece intervaI for the mean durationJor data sers i **2, ...* 5.
Chapter 8. Application to counts of workers collecting disability benefits
Pearson Faasiduals Continuation Residuals (data set 19 (data set 1')
Arrivai Residuals (dataset 1")
Figure 8.2.1 Pearson. continuation and am*val residuafs plotted against tirne for data set 1 *
-- -- Pearson Residuals Continuation b s iduals
(data set 2) (data set 2)
--
Arrivai Residuals (data set 2)
Figure 8.2.2 Pearson. continuation and arriva1 residuals plotted against rime for data set 2.
Chapter 8. Application to counts of workers collecting disability benefits
Pearson bsiduais Continuation Fbsiduals (data set 3) (data set 3)
Arriva1 Residuals (data set 3)
- --
'igrrre 8 -23 Pearson. continuation and ~ m ' v a l residuals plotred againsr rime for data set 3.
Chapter 8. Application to counts of workers collecting disability benefits
Pearson &dduals Pearson bsiduais
Continuation Wsiduals Continuation ksiduals
Figure 8.2.4 Pearson. continuation and arrivai residuals plotted against mode1 regressors in dafa set 3.
Chapter 8. Application to counts of workers collecting disability benefits
Pearson fhsiduals Continuation Fbsiduals (data set 4) (data set 4)
Arrival ksiduals (data set 4)
Figure 8.2.5 Peatmn, continuarion and amval reduais plorted againsr rime for data set 4.
Pearson fbsiduafs (data set 5)
Arrivai bsiduals (data set 5)
- Continuation Residuals
(data set 5)
Figure 8.2.6 Pearson, continuation and arrival raiduais piorted againsr timefor data set 4.
Chapter 8. Application to counts of workers collecting disability benefits
8 3 Arriva1 process
The Poisson AR(1) rnodel assumes the continuation and amival processes are not directly
observable. For the five data sets described in Section 8.1 we were able to obtain the
arrivd data &om the WCB. To distinguish the two sets of data we will refer to the anivals
as data sets 1A, 2A, 3A, 4A and 5A. In general data of this fom may not be readily
available, since it requires more detaiied record keeping. Since these data are available we
consider the following two questions: how well were we able to estimate the arriva1 rates
with the Poisson AR(1) model? Are the arrival processes Poisson?
We begin by estimating the arrival rates for the arrival data, which we assume to
be independent and Poisson. For data sets lA, 2A, 3A and 5A we assume a constant
mean, while for data set 3A we assume a seasonally changing mean. The maximum
likelihood estimates and 95% confidence intervals for the parameters are found in Table
8.3.1. For data sets 1 *, 4 and 5 the parameter estirnates fiom the Poisson AR(1) model
are contained within these 95% confidence limits. The estimated anival rate for data set
2A, 4.475, is contained in the some what wider 95% confidence interval for the sarne
parameter in the Poisson AR(1) model. In the case of data set 3, two of the three
parameter estimates fiom the Poisson AR(1) model are contained within the 95%
confidence intervals for the same parameter as calculated fiom data set 3A. The estimate
for p, in data set 3A is contained in the 95% confidence interval for P , as calculated
f?om data set 3A.
Chapter 8. Application to counts of workers collecting disability benefits
Table 8.3.2 displays the estimated seasonal arrival rate for data set 3A. These rates
are slightiy lower than those estimated by the Poisson AR(I) model. Also note the
seasonal pattern has shified by one month. That is, the lowest and highest months are
respectively February and August, where as for the Poisson AR(L) model they are
January and July.
Overall, the Poisson AR(1) model appears to do a good job at estimating the
amival rate. Next we check the Poisson assumption.
If the Poisson assumption is correct we would expect the mean and variance of
each arrival process to be close. This is the case for data sets lA, 4A and 5A. However,
for data set 2A the variance, 7.17 1, is "rnuch large? than the mean, 4.475. Since the
arrival rate is non-constant for data set 3A we would expect the variance to be larger than
the mean. The Poisson specification can be formally tested by the information matrut test
as foIlows.
The Poisson probability function is
The first and second denvative of log[p( X, )] are:
and
Chapter 8. Application to counts of workers coliecting disability benefits
Therefore the information matrix test statistic is
Since the denominator is constant, we consider the following statistic instead
d We have that [ U ( ~ ) ] ; ' L ~ , ( X ) + N 0,l , since the data are independent and identically
distnbuted with finite moments. Note the test is basically checking whether the mean and
variance are the same.
The parameter A is unknown, so we replace it with (the maximum likelihood
estirnate). LI, (X) is related to Un@) as follows:
The results of the information matrix test are summarized in Table 8.3.3. In the
case of data set 2A, the P-value is smail which suggests the arrivals are unlikely to be
Chapter 8. Application to counts of worken collecting disability benefits
Poisson.
Finally we check the assumption of independent amivals. None of the fmt 12
sample autocorrelations for data sets lA, 3A and 5A were significant at the 5% level.
Note for data set 3A we used the sarnple autocorrelations in the residuals. For data set 2A
lags 1 and 12 had autocorrelations that were significant at the 5% level. This indicates
possible seasonality as well as violation of the independence assumption. For data set 4A
the sixth lag was significant at the 5% level, indicating possible seasonality.
To conclude, the Poisson AR(1) model adequately estimates the anival parameter.
Al1 of the data sets expect data set 2A appear to be Poisson and independent.
95% C.I.
(0.064,0.204)
(3.898,6.478)
(1 .O39,l.46 1)
Table 8.3.1 This table summarïes the parameter estimation for the arriva1 processes in data se& I A to 5A; included are the parmeter estimutes and 95% confidence interval. The fast nvo colwnns coniain esrirnated am'val rates and 95% confidence intervalsjhm the Poisson AR(1) model.
Chapter 8. Application to counts of workers collecting disability benefits
Arriva1 Rate Arriva1 Rate (arrivai data) (Poisson AR( 1 ))
1.987 2.353
February 1.916 2.415
March 2.095 2.737
Apnl 2.538 3.310
May 3.235 4.060
June 4.064 4.783
Jul y 4.735 5.177
August 4.9 1 1 5.043
September 4.490 4.450
October 3.706 3.680
November 2.908 3 .O00
December 2.3 15 2.547 Table 8.3.2 This table displays the seasonal amval rare for data set
Data Set Test P-value S tatistic
1A -0.543 0.59 2A 2.954 0.00 3A 1.407 O. 16 4A -0.385 0.70 SA 0.170 0.87
Table 8.3.3 7313 rable sununarizes the in jnnarion matrir test for data sets I A - j A .
8.4 Forecasting
In this section we calculate forecasts for the first 6 rnonths of 1995, which are found in
Tables 8.4.1-8.4.6.
Since the models used for data sets 1 *, 2,4 and 5 are simple, that is the arrival and
departure rates are fxed over tirne, we cm apply the forecasting techniques of Chapter 3.
For these four data sets we have caicuiated individual 95% confidence intervals for the k-
step ahead conditional distribution k=1,2, ..., 6, oo .
Chapter 8. Application to counts of workers collecting disability benefits 183
Note that in each case the 95% confidence intervals for the 6-step ahead
conditional distribution are very close to the 95% confidence intervals for the marginal
distribution. Therefore if we require forecasts beyond six months into the fùture we can
simply use the marginal distribution.
A few of the lower bounds in Tables 8.4.4 and 8.4.5 are negative. This is because
the confidence intervals are constnicted by applying an asymptotic normal result to a
finite sarnple and are therefore only approxirnate. However, the confidence intervals that
are affected by a negative lower bound are for probabilities that are very srnall (less than
1 %).
In Section 7.2 we show how to constmct individual confidence intervals for the k-
step ahead conditional distribution when the arrivai and departure rates depend on
regressors. However, to practically apply these results would require a significant arnount
of programming. Therefore for data set 3 we have only calculated the k-step ahead
conditional distribution. A possible alternative for constructing confidence intervals for
the k-step ahead conditional distribution is to use a bootstrap methoâ, see Efioon (1982),
which may be more practical than the method in Section 7.2.
The marginal distribution for data set 3 is given in Table 8.4.6. Notice the
marginal mean peaks in August, which is one month d e r the arrivai rate peaks, see Table
8.2.2. Also note, there is a fair bit of difference between the 6-step ahead conditional
distribution and the marginal distribution for June.
Chapter 8. Application to counts of workers cokcting disability benefits
Chapter 8. Application to counts of workers collecting disability benefits
L Table 84.3 The k-siep ahead condiiiond means, medians. tnoder and poinr mms/oreca.stfÔr data set 3.
Chapter 8. Application to counts of workers collecting disability benefits
marginal and distributions for dafa set 3.
8.5 Gaussian AR(1) models
In this section we fit the Gaussian AR(1) model, see Example 4.1.1 for the model
definition, to our five data sets and compare the results to the Poisson AR(I) model.
Table 8.5.1 contains the parameter estimates for the Gaussian AR(1) model. Most
of the estimates are quite close to the corresponding parameter estimates of the Poisson
AR(1) model. The largest difference is in the estirnate for a in data set 4, where the
Gaussian mode1 estimates 0.291 and the Poisson model estirnates 0.404. However, the
Gaussian estirnate is still well within the 95% confidence interval for a given by the
Poisson model.
Chapter 8. Application to counts of worken collecting disability benefits 187
The Gaussian model gives wider 95% confidence intervals for A than the Poisson
model. In fact for data set 1 * the Gaussian 95% confidence interval for h includes zero.
For data sets 2, 3 and 5 the Gaussian models give wider 95% confidence intervals for a
than the Poisson model. While for data sets 1 and 4 the Gaussian model gives narrower
95% confidence intervals for a than the Poisson model.
Note the parameters p, , p and p, for data set 3 can not be directly compared to
the corresponding parameters in the Poisson model, since in the Gaussian model we have
not used an exponential link function. However, the estimated amval rates for the
Gaussian model, Table 8.5.2, can be directly cornpared to the arriva1 rates of the Poisson
model. While the mival rates calculated by the Poisson AR(I) model are slightly higher
than those calculated directly £iom the arrival data the arrival rates calculated by the
Gaussian AR(1) model are slightly lower those calculated directly fiom the an-ival data
The biggest advantage in using the Poisson AR(1) model over the Gaussian
AR(1) model is in the area of forecasting. Figure 8.5.1 displays Poisson AR(1) model
forecasts and the Gaussian AR(1) model forecasts for the first six months of 1995 for data
set 1 *. From the Poisson AR(1) model we have calculated the forecast distribution, which
is the bar portion of the chart. The three lines on the chart mark the predicted values and
the 95% prediction interval for the Gaussian AR(1) model.
The cumulative probabilities are marked on the left Y-axis. For the 1-step ahead
distribution there is a 0.875 probability that the count will be zero and a 0.117 probability
that the count will be 1. While for the 6-step ahead distribution there is 0.838 probability
Chapter 8. Application to counts of workers collecting disability benefits
of a count of zero and a 0.148 probability of a count of 1.
The numbers for the predicted values in the Gaussian AR( 1 ) mode1 are labeled on
the nght Y-ais. The mean predicted value is about 0.2 and the 95% prediction interval is
approxirnately between -0.6 to 1 .O.
The lower bound of the 95% prediction interval as given by the Gaussian AR(1)
mode1 is also negative for data sets 3, 4 and 5. in data set 2 the counts are sufficiently
high, so that the lower bound is positive. This illustrates the importance of using
appropnate distributional assurnptions in modeling.
Data Set
1*
* 2 a
2 h 3
3
3
3 4
4 h 5
Parameter
Table 8.5. I The Gaussian AR(1) mode1 parameter estimates for dazu sers 1 ro 5.
Gaussian Estimate
a
h
a
(0.047,0.395)
(-0.045,0.3 17)
(0.289,0.609)
(3.823,6.977)
(0.330,0.648)
I
0.22 1
0.136
0.449
5.400
0.489
Gaussian 95% C.I.
0.240
0.134
0.472
5.188
0.406
p , (constant)
p , (sin(2n z : 12))
p , (cos(2r r I 12))
a
cc
Poisson Estimate
(0.007,0.472)
(0.064,0.204)
(0.344,0.599)
(3.898,6.478)
(0.294,0.5 19)
1.250
-0.243
-0.3 15
0.404
0.170
Poisson 95% C.I.
(1.039J.461)
(-0.401 ,-O.OS 1)
(-0.483,-0.147)
(0.203,0.604)
(0.090,0.25 1)
1
6.147
-1.683
-1.138
0.29 1
0.199
0.59 1 0.652 1 (0.539,0.765)
(5.192,7.102)
(-2.794,-0.572)
(-2.238,-0.038)
(0.120,0.462)
(0.014,0.384)
(0.447,0.735)
0.376 (0.153,0.599) 0.333 1 (0.209,0.457) r
Chapter 8. Application to counts of workers collecting disability benefits
J a n w February
March April
May June July
August
September
October November December
Amval Rate Gaussian AR(1)
1.87 1
2.008
2.449
3.076
3.720
4.209
4.4 12
4.274
3.833
3 -207
2.563
2.074
Arrival Rate Poisson AR(1)
2.353
2.4 15
2.737
3.3 10
4.060
4.783
5,177
5 .O43
4.450
3 -680
3 .O00
2.547
Arrival Rate (arrivai data)
1.987 1.916
2.095
2.538
3.235
4.064
4.735
4.91 1
4.490
3 -706
2.908
2.3 15 russian A R(l ) model.
Forecasts (data set 1')
k a p s ahead .+ lower M.
Figure 8.5. f ïhe bar chart repraents the k-step ahead conditional cumrrlative distribution for the Poisson AR(!) model, while the fine graph represents the forecastsfmm the Gaussian AR(]) model.
Chapter 8. Application to counts of workers collecting disability benefits
in this chapter we have illustrated the methods developed in the earlier chapters.
The prelirninary analysis is identical to the preliminary analysis of continuous valued
time series. This includes t h e series plots, as well as plots of the autocorrelation and
partial autocorrelation function. This analysis gives us a starting point for model
selection.
Ln Chapter 4 we showed that the parameters c m be estimated via maximum
likelihood estimation and gave expression for the observed Fisher information. The
expected Fisher information is easily evaluated numerically, and can be inverted to get
the asymptotic variance of the parameter estimates. This allows us to perform t-tests on
the model parameters and is useful in determining which covariates to include in the
model. For models with covariates it is more dificult to calculate the expected Fisher
information. In this case the observed Fisher information can be used and cm easily be
calculated by numerically differentiating the score function.
If the t-statistic for a is small, we can M e r test for independence with the tests
found in Chapter 5. This was the case for data set l*, which we concluded to be a
dependent series.
Mode1 selection is M e r refinea with the help of the information matrix test, see
Chapter 6, and the new residuals defined in Section 4.4. The information matrix test
checks whether the model is sufficient to explain the variation found in the data. Patterns
in the residuals may indicate the need of additional regressors. We found the simple
Chapter 8. Application to counts of workea collecting disability benefits
Poisson AR(1) model without regressors was sufficient to model data sets 1 *, 2, 4 and 5.
In the case for data set 3 the model required the following two seasonal regressors in the
&val process: sin(2xtll7) and cos(2m/12).
The WCB was able to provide us with the amival process, which the Poisson
AR(1) rnodel assumes to be latent. From these data we were able to directly estimate the
&val parameter. We found these estimates to be close to the estimates found using the
aggregate data. The additional data also allowed us to check whether the arrivals were
independent and Poisson. Independence was assessed using the sample autocorrelation
function. We found that the anivals for data set 2 appeared to be dependent. Further we
found that the anivals for data set 4 may be seasonal, which wasn't indicated in the
aggregate data. The information ma& test was used to assess the Poisson assumption,
which was only rejected in the case of data set 2. We conclude the Poisson assumption
appears to be realistic for the WCB data.
In Chapter 3 we considered forecasting for the Poisson AR(1) model. If data
cohesion is considered important the k-step ahead conditionai median or mode can be
used as a forecast. The k-step ahead conditional rnean can also be used as a forecast.
However it is not integer valued. For low count senes we proposed using the k-step ahead
conditional distribution as a forecast. In the case of data sets l*, 4 and 5 the counts are
low enough that the k-step ahead conditionai distribution is easy to read. However for
data sets 2 and 3 the k-step ahead conditional distribution is much harder to read, since
the non-zero probabilities are spread over more non-negative integers.
Chapter 8. Application to counts of workers collecting disability benefits
The analysis in Chapter 8 concluded with a look at the results nom fitting a
Gaussian AR(1) model to the 5 data sets. We found that the parameter estimates were
similar to the Poisson AR(1) estimates. We know from Section 4.7 that the loss of
efficiency in using the Gaussian AR(1) likelihood increases with a, but that the Gaussian
estimates are more robust than the Poisson estimates. With the possible exception of data
set 2, our analysis has found the Poisson AR(1) adequate. Further in four of the five series
the estimates for a was above 0.400. We therefore favor the Poisson estimates. Finally
using the Gaussian AR(1) prediction intervals is meaningless, since the data are discrete
vdued. In fact we saw that the Gaussian AR(1) prediction intervais can include negative
values, which again is meaningless for non-negative data
We conclude with some fiiture possible avenue for analyzing the WCB data. Our
analysis only considered simple sinusoidal regressors. Better models may be found by
considering econornic regressors, such as employment rates, weather conditions, sales
etc.
The WCB often funds accident prevention prograrns and is hterested in whether
or not the program has had an effect on injuy rates. Indicator regressors can be used to
model the change in the arrivai rates before and after the start of an accident prevention
program. We can then estimate the change and test whether it is significantly different
than zero.
Bibliography
Al-Osh, M.A. and M y , E.A.A. (1992). First order autoregressive t h e series with negative binomial and geometric marginals. Communications in Statistics A 21, 2483-2492.
Al-Osh, M.A. and Aizaid, A.A. ( 1987). First-order integer-valued autoregressive (INAR(1)) process. Journal of Tirne Series Analysis 8,26 1-275.
Al-Osh, M.A. and Alzaid, A.A. (1988). On maximum likelihood estimation for a subcritical branching process with immigration. Pakistan Journal of Statistics 4, 147-156.
Alzaid, A.A and Al-Osh, M.A. (1 988). First-order integer-valued autoregressive (WAR(1)) process: Distributional and regression properties. Statistica Neerlandica 42, 53-6 1 .
Alzaid, A.A. and Al-Osh, M.A. (1990). An integer-valued pth-order autoregressive structure @KR@)) process. Journal of Applied Probability 27,3 14-324.
Barndorff-Nielsen, O.E. and Sorensen, M. (1994). A Review of Some Aspects of Asymptotic LikeIihood Theory for Stochastic Processes. International Statistical Review 62 133-165.
Box, G.E.P. and Jenkins, G.M. (1970). Time Series Analysis, Forecasting and Control. San Francisco: Holden-Day.
B illingsley, P. ( 1986). Probability and Measure, 2nd Edition. New York: John Wiley & Sons.
Brockwell, P.J. and Davis, R.A. (1987). Tirne Series: Theory and Methoh. New York: Springer-Verlag.
Brown, B.M. (1 97 1). Martingale Central Limit Theorems. The Annals of Mathematics and Statistics 42,59-66.
Chan, K.S. and Ledolter, J. (1995). Monte Carlo EM estimation for tirne series models involving counts. Journal of American Statistical Association 90,242- 252.
Chant, D. (1974). On asymptotic tests of composite hypotheses in nonstandard conditions. BiometnXa 61,29 1-298.
Chesher, A. (1 983). The Infornation Ma& Test, Simplified Calculation Via a
Score Test Interpretation. Econornics Letters 1 3 , 6 4 8 .
Chow, G.C. (1960). Tests of equality between sets of coefficients in two linear regressions. Economet&a 28, 59 1-605.
Cox, D.R. and Hinkley, D.V. (1974). Theoreticai Statistics. London: Chaprnan and Hall.
Cramer, H. And Wold, H. (1936). Some theorems on distribution functionsl. Journal of London Mathematical Society 11,290-295.
Crowder, M.J. (1 976). Maximum Likelihood Estimation for Dependent Observations. Journal of the Royal Statistical Society, Senes B 38,4543.
Davidson, J. (1994). Stochastic Limit Theory. New York: Oxford University Press.
Davidson, R. and MacKinnon, J.G. ( 1 993). Estimation and In ference in Econorneh-ics. New York: Oxford University Press.
Efion, B. (1 982). The Jackknif, the Bootsirap and other Resampling Plans. Philadelphia: Society for Industrial and Applied Mathematics.
Fahrmeir, L. (1 98 8). A Note on Asyrnptotic Testing Theory for Nonhomogeneous Observations. Srochastic Processes and their Applications 28,267-273.
Fahrmeir, L. and Tua, G. ( 1 994). Multivariate Statisticul Modeling Based on Generalized L ine~r Models. New York: Springer-Verlag.
Ferguson, T.S. (1996). A Course in Large Sample Theory. New York: Chapman and Hall.
Godarnbe, V.P. and Heyde, C.C. (1987). Quasi-likelihood and optimal estimation. International Statistical Review 55,23 1-244
Godfrey, L.G. (1 988). Misspecifcation tests in econometn'cs: The Lagrange rnult@lierprincipie and other approaches. New York: Cambridge University Press.
Hall, P. And Heyde, C.C. (1980). Martingal Limit Theory and its Application. New York: Academic Press.
Hall, W.J. and Mathiason, D.J. (1990). On Large-Sample Estimation a ~ d Testing in Parametric Models. International Statistical Review 58,77-97.
Hamilton, J.D. (1994). Time Series AnaZysis. Princeton: Princeton University Press.
Harvey, A.C. (1 98 1). Time Series Models. Oxford: Allan.
Bibiiography 195
Harvey, A.C. and Fernandes, C. (1989). Time series models for count or qualitative observations. Jouma l of Business & Economic Statistics 7,407-4 1 7.
Hiller, F.S. and Lieberman, G.J. (1986). Interoduction to Operations Research. 4th edition. Oakland: Holden-Day.
Jacobs, P.A. and Lewis, P.A.W. (1977). A rnixed autoregressive-rnoving average exponential sequence and point process (EARMA 1,l). Advances in Applied hbab i l i t y 9, 87- 104.
Jacobs, P.A. and Lewis, P.A.W. (1978). Discrete time series generated by mixtures. L: Correlation and runs properties. Journal of the Royal Statistical Society Series B 40,94- 1 05.
Jacobs, P.A. and Lewis, P.A.W. (1978). Discrete time series generated by mixtures. II.: Asymptotic properties. Journal of the Royal Statistical Society Series B 40,222- 228.
Jacobs, P.A. and Lewis, P.A. W. ( 1978). Stationary discrete autoregressive-rnoving average tirne series generated by mixtures. Journal of Time Series Anabsis 4, 19- 36.
Jin-Guan, D. and Yuan, L. (1 991). The integer-valued autoregressive @JAR@)) model. Journal of Tirne Sena Analysis 12, 129-142.
Joe, H. (1 997). Multivariate models and dependence concepts. London: Chapman & Hall.
Joe, H. (1996). Time series models with univariate margins in the convolution- closed infinitely divisible class. Journal of Applied Probability 33,664-677.
Jprgensen, B., Lundbye-Christensen, S., Song, X.-K. and Sun, L. (1995). A state space model for multivariate longitudinal count data. Technical Report #148, Department of Statistics , University of British Columbia
Jprgensen, and Song, X X . (1 998). Stationary tirne-series models with exponential dispersion model margins. Journal of Applied Probability 35 (to appear).
Kalman, R.E. (1 987). Regression methods for non-stationary categoricd tirne series: Asymptotic estimation theory. Annals of Statistics. 17,79-98.
Klimko, L.A. and Nelson, P.I. (1978). On conditional least squares estimation for stochastic process. The Annals of Statistics 6, 629-642.
Little, J.D.C. (196 1). A proof for the queuing formula L = hW. Operations Research 9,383-387.
MacDonald, I.L. and Zucchuii, W. (1997). Hidden Markov and Other Models for Discrete- Valued Times Series. Monographs on Statistics and Applied Probability 70. London: Chapman and Hall.
McCabe, B and Tremanyne, A (1993). Elements of modern asymptotic theory with statistical applications. Manchester: Manchester University Press.
McCabe, B. and Leybourne, S. (1 996). A General Test. Working Paper.
McKenzie, E. (1 988). Some ARMA models for dependent sequences of Poison counts. Advnnces in Applied Probability 20,822-835.
McKenzie, E. (1 986). Autoregressive moving-average processes with negative- binomiai and geornetric marginal distributions. Advances in Applied Probability 18,679-705.
McLeish, D.L. ( 1 975). A maximal inequality and dependent strong laws. Annals of Probabiliq 3,826-836.
Pierce, D.A. (1982). The Asyrnptotic Effect of Substituting Estimators for Parameters in Certain Types of Statistics. The Annals of Statistics 10,475-478.
Ross, S.M. (1983). Stochastic Processes. New York: John Wiley & Sons.
Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics. New York: John Wiley & Sons.
Song, X.-K. ( 1996). Some sratistical rnodels for the multivariate ana(vsis of longitudinal data. Ph.D thesis. Department of Statistics, University of British Columbia
Sprott, D.A. (1983). Estirnating the parameters of a convolution by maximum likelihood. Journal of the Arnerican Statistical Association 78,457-460.
Stouî, W.F. (1974). Almost Sure Convergence. New York: Academic Press.
White H. (1982). Maximum Likelihood Estimation of Misspecified Models. Economeirica 50, 1-25.
White H. (1984). Asymptotic Theory for Econometricans. New York: Academic Press.
White, H. And Domowitz, 1. (1984). Nonlinear regression with dependent observations. Econometnca 52, 143- 16 1.
Wooldridge, LM. ( 199 1 a). On the application of robust, regression-based diagnostics to models of conditional means and conditional variances. Journal cf
B ibliograp hy
Economen-ics 47,546.
Wooldndge, J.M. (1 99 1 b). Specification testing and quasi-maximum-Iikelihood estimation. Journal of Econorneirics 48,2945-
Zeger, S.L. (1995). A regression mode1 for time series of counts. Biomehikn 75, 62 1-629.
Zeger, S.L., Liang, K.-Y. and Self, S.G. (1985). The analysis of binary longitudinal data with time independent covariates. Biometrika 72,3 L -38
Appendix
The following is a list of the data sets used in this thesis. Data set O refen to the illustrative data set introduce in Section 2.5, and unIike the other series starts at J a n u q 1987.
O 1 1' 2 3 4 5 1A 2A 3A 4A 5A Jan-85 0 0 9 6 0 0 0 2 2 0 0 Feb-435 0 0 6 7 1 0 0 0 3 1 0 Mar-85 0 0 6 8 0 1 0 3 4 0 1 Apr-85 0 0 7 9 0 1 0 3 5 0 0 May-85 0 0 1 0 6 1 1 0 8 1 1 0 3un-85 0 0 8 8 0 1 0 2 4 0 0 JuI-85 O O 1 4 5 O 1 O 1 0 4 O O
Aug-85 0 0 8 3 0 1 0 4 1 0 0 Sep85 0 0 7 7 0 0 0 5 4 0 0 Oct-85 O 0 1 0 1 1 O 1 0 8 8 0 1 Nov-85 0 0 1 0 8 1 1 0 9 5 1 0 Dec-85 0 0 1 2 4 0 2 0 6 3 0 2 Jan-86 0 0 8 2 0 0 0 6 1 0 0 Feb-86 0 0 8 3 0 0 0 4 2 0 0 Mar-86 1 1 8 4 0 0 1 5 3 0 0 Apr-86 1 1 8 5 1 0 0 4 4 1 0 May-86 1 1 1 3 7 1 1 1 8 2 1 0 Jun-86 1 1 1 2 8 0 0 1 8 5 0 0 Jul-86 0 0 1 4 1 2 0 O 0 7 8 0 0
Aug-86 0 0 1 3 1 1 O 1 0 6 6 0 1 Sep-86 0 0 1 3 1 2 0 1 0 6 7 0 1 Oct-86 0 0 8 6 1 1 0 3 5 1 0 Nov-86 0 0 1 3 2 1 1 0 8 1 1 0 Dec-86 1 1 1 0 2 0 1 1 3 0 0 0 Jan-87 6 1 1 1 2 3 O O 1 5 2 O O Feb-8711 O O 1 2 3 O O O 7 1 O O M a r - 8 7 5 0 0 9 5 0 0 0 4 2 0 0 A p r - 8 7 5 0 0 8 6 0 1 0 4 2 0 1 May-87 5 O O 13 13 2 O O 10 9 2 O Jun-87 2 O O 9 1 2 O O O 4 6 O O JuI-87 7 O O 8 2 1 O O O 3 15 O O
A u g - 8 7 4 0 0 6 9 0 0 0 3 3 0 0 Sep-87 5 O O 7 1 1 1 O O 4 6 1 O Ott-87 4 O O 1 0 11 O O O 8 7 O O Nov-87 6 1 1 17 10 O 2 1 11 5 O 2 Dec-87 8 O O 1 1 8 O 1 O 6 2 O 1 Jan-88 7 1 1 1 3 5 O O 1 8 2 O 0 Feb-88 7 O O 1 0 4 O O O 4 2 O O M a r - 8 8 9 1 1 9 4 0 2 0 5 3 0 2 Apr-88 9 2 2 1 5 4 O 2 2 7 2 O 1 May-88 13 O O 1 3 2 O 2 O 8 1 O O Jun-88 12 O O 1 2 9 O 2 O 9 8 O 1 Ju l - 8811 O O 8 8 O 1 O 2 5 O O
Appendix
Appendix
I MAGE EVALUATI ON TEST TARGET (QA-3)
APPLlED IMAGE. lnc - = 1653 East Main Street - -. - Rochester, NY 14609 USA - --= Phone: i l W482-0300 -- -- - - Fax: 7161288-5989
O 1993. Appited Image. Inc.. Ail Rights Reserved