CLS - Library and Archives Canada · of monthly counts of claimants collecting short-term disability benefits fiom the Workers ... 4.7.2 The diagonal elements of the Godambe information

National Library 1+1 , canada Bibliothèque nationale du Canada

Acquisitions and Acquisitions et Bibliographie Services senrices bibliographiques

395 Wellington Street 395, nie Wellington OttawaON K1A O N 4 Ottawa ON K I A ON4 Canada Canada

The author has granted a non- L'auteur a accordé une licence non exclusive licence allowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, loan, distribute or sell reproduire, prêter, distribuer ou copies of this thesis in microfom, vendre des copies de cette thèse sous paper or electronic formats. la forme de rnicrof iche/~, de

reproduction sur papier ou sur format électronique.

The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts fiom it Ni la thèse ni des extraits substantiels may be printed or othexwise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation.

Abstract

This thesis examines the statistical properties of the Poisson AR(1) model of Al-

Osh and Alzaid (1987) and McKenzie (1988). The analysis includes forecasting,

estimation, testing for independence and specification and the addition of regresson to

the model.

The Poisson A N I ) mode1 is an Uinnite server queue, and as such is well suited

for modeling short-term disability claimants who are waiting to recover from an injury or

illness. One of the goals of the thesis is to develop statistical methods for analyzing senes

of monthly counts of claimants collecting short-term disability benefits fiom the

Workers' Compensation Board (WCB) of British Columbia

We consider four types of forecasts, which are the k-step ahead conditional mean,

median, mode and distribution. For low count senes the k-step ahead conditional

distribution is practical and much more informative than the other forecasts.

We consider three estimation methods: conditional least squares (CLS),

generalized least squares (GLS) and maximum likelihood (ML). In the case of CLS

estimation we find an analytic expression for the information and in the GLS case we fmd

an approximation for the information. We find neat expressions for the score function and

the observed Fisher information matrix. The score expressions leads to new definitions of

residuaIs.

Special care is taken to test for independence since the test is on the boundary of

the parameter space. The score test is asyrnptotically equivalent to testing whether the

CLS estimate of the correlation coefficient is zero. Further we defke a Wald and

likelihood ratio test.

Then we use the generai specification test of McCabe and Leyboume (1996) to

test whether the model is sufficient to explain the variation found in the data

Next we add regressors to the model and update our earlier forecasting, estimation

and testing results. We also show the model is identifiable.

We conclude with a detailed application to monthly WCB claims counts. The

preliminary analysis includes plots of the senes, autocorrelation function and partial

autocorrelation function. Mode1 selection is based on the preliminary analysis, t-tests for

the parameters, the generai specification test and residuals. We also include forecasts for

the f i t six months of 1995.

Table of Contents

.. Abstract .............................................................................................................................. u

.. List of Tables ............................................................................................................. v 1 I

.......................................................................................... List of Figures ................... ...... x

Acknowlegements .......................................................................................................... ..v

1 Inhoducfion ............................................................................................................ 1

1.1 General .......................................................................................................... 1

1.2 Overview of topics ...................................................................................... 4

Poisson AR(1) mode1 ..................,,,.....................................s........o.............. .. 10

2.1 Mode1 definition ......................................................................................... 10

2.2 Interpretation .............................................................................................. 12

2.3 Basic properties ......................................................................................... -13

2.4 Poisson AR@) model ............................................................................... 15

2.5 An illustrative example .............................................................................. 22

Forecasting ............................................................................................................ 26 . . 3.1 Mmmurn Mean squared error .................................................................... 26 . . 3.2 Muiimum mean absolute error ................................................................... 32

. *

3 $3 Forecasts distributions ............................................................................... -33 . .

3.4 Predictron intervals ..................................................................................... 36

3.5 Duration ..................... .,.,, ......................................................................... 39

Estimation . . . . a . . . . . . a ~ . . . . . . . . . ~ m . . . . . . . ~ ~ ~ ~ ~ ~ ~ . ~ ~ ~ ~ ~ a ~ ~ ~ ~ ~ ~ ~ a ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ * ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ * ~ ~ ~ ~ ~ * . . ~ ~ ......41

4.1 Likelihood theory and estimating bc t ions ............................................... 41

............................. 4.2 Conditional Ieast squares for the Poisson AR(1) mode1 64

4.3 Generalized least squares for the Poisson AR(1) mode1 ............................. 69

4.4 The score and Fisher information for the Poisson AR(1) mode1 ............... 8 1

4.5 The score and Fisher Somation for a general AR(1) mode1 ................... 94

4.6 Asymptotics of the conditional maximum likelihood estimaton for the

Poisson AR(1) mode1 ................................................................................ -98

........................................................................... 4.7 Cornparison of rnethods 103

................................................................................... Testing for independence 108

5.1 GaussianAR(1) ........................................................................................ 108

......................................................................... 5.2 Conditional les t squares 1 0 9

5.3 Score test ................................................................................................. 111

5.4 The score fûnction on the boundary of the parameter space .................... 114

General misspecification test ............... ......~............o........o..o......o................ 121

.......................................................... .................................... 6.1 Overview ... -121

............................................................................................. 6.2 Outline test 1 2 3

.................................................... 6.3 Details for the Poisson AR(1) mode1 1 2 7

...................................................................................... Models with covariates 134

7.1 Mode1 definition and introduction ............................................................ 134

.............................................................................................. 7.2 Forecasting 1 3 5

................................................................................................. 7.3 Estimation 142

..................................................................................................... 7.4 Testing 1 4 8

Application to counts of workers collecting disabiüty benefits ................... AS4

8.1 Workers' Compensation Data .................................................................. 154

8.2 Mode1 selection and testing ...................................................................... 162

8.3 Amval process ........................................................................................ 1 7 5

8.4 Forecasting ............................................................................................... 180

8.5 Gaussian AR(1) models ........................................................................... 184

Bibliography ................................................................................................................... 190

Appendix ................... ~..0.......0...........0~...0.......0.....0.....~.....~...0...195

List of Tables

3.3.1 k-step ahead conditional rneans. medians. modes and point mass forecasts ........ 35

3.4.1 95% prediction intervals for the k-step ahead conditional distribution ........... -39

4.7.1 The diagonal elements of the Godambe information rnatrYr for GLS. CLS and

............................................................ CML when a = 0 3 and h = 1 105

4.7.2 The diagonal elements of the Godambe information matrix for GLS, CLS and

CML when a = 0.7 and h = 1 ........................................................... 105

5.4.1 Tests for independence in the illustrative data set ...........

5.4.2 The percentage of t h e the nul1 hypothesis of independence

....................... 119

was rejected out of

1000 simulated series of length 200 with a = O and h = 1 .......................... 119

5 A.3 The percentage of t h e the nul1 hypothesis of independence was rejecied out of

1000 simulated series of length 200 with a = 0.1 and 1 = 1 ........................ -120

8.1.1 A summary of simple statistics for data sets 1 through 5 ........................... 155

8 .2.1 Tests for independence in data set 1 * .................................................. -169

8.2.2 This table displays the seasonal arrival rate for data set 3 .......................... -169

8 . 2.3 This table summaries the parameter estimation for data sets 1 to 5; included are

the parameter estimates and the upper and lower 95% confidence limits ........ 170

8.2.4 This table summarizes the joint information matrut test of models 1,2 and 3 on

data set 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . . . .170

8.2.5 This table contains the mean duration and 95% confidence interval for the

mean duration for data sets 1 *,2,. ..,5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . .l7O

8.3.1 This table summaries the parameter estimation for the arriva1 processes in data

sets 1A to 5A; included are the parameter estimates and 95% confidence

interval. The Iast two colurnns contain estimated arriva1 rates and 95%

confidence intervais fiom the Poisson AR( l ) model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .l78

8.3.2 This table displays the seasonal arriva1 rate for data set 3A. . . . . . . . . ... . . . . . . . . . . ... 179

8.3.3 This table surnrnarizes the information matrix test for data sets 1 A-5A. . . . . . . . . . .179

8.4.1 Individual 95% confidence intervals for the k-step ahead conditional

distribution for data set 1 *. . . . . . . . . . . . . . . . . . . . . . . . . . . .: . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1 8 1


distribution for data set 2. . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 18 1

8.4.3 The k-step ahead conditional means, conditional medians and point mass

forecast for data set 3. . , .... . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 182


distribution for data set 4. . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 182


distribution for data set 5 ................................................................. 182

8.4.6 The marginal means. medians and distributions for data set 3 ..................... 183

8 .5 . 1 The Gaussian AR( 1) mode1 parameter estimates for data sets 1 to 5 ............... 185

8.5.2 This table displays the seasonal arrivd rate for data set 3 given by the

Gauss ian AR(1) mode1 ............................................................................... 186 . .

List of Figures

2.5.1 Times series plot of rnonthiy claims counts of workers collecting STWLB fkom

January 1987 to December 1994. Al1 claimants are male, between the ages of

25 and 34, are employed in the heavy manufacturing industry and are

collecting STWLB due to a burn related injury.. ................................................... -24

2.5.2 Correlogram for the t h e series of monthly claims counts of workers collecting

STWLB fiorn January 1987 to Decernber 1994. Al1 claimants are male,

between the ages of 25 and 34, are employed in the heavy manufachukg

.................... industry and are collecting STWLB due to a burn related injury. -25

2.5.3 Sample partial autocorrelation function for the tirne senes of monthly claims

counts of workers collecting STWLB fiom January 1987 to December 1994.

Al1 clairnants are male, between the ages of 25 and 34, are employed in the

heavy manufacturing industry and are collecting STWLB due to a bum related

injury. ......................................................................................... -25

3.3.1 Time series plot of monthly claims count collecting STWLB fiom January

1987 to December 1994. Al1 claimants are male, between the ages of 25 and

34, ernployed in the heavy manufacturing industry and collecting STWLB due

to a bum related injury. .................................................................... .3 5

4.4.1 Residual plot of the continuation process for the time series of monthly daims

counts of workers collecting S T W B from January 1987 to December 1994.

Al1 claimants are male, between the ages of 25 and 34, are employed in the

heavy manufacturing industry and are collecting STWLB due to a burn reiated

injury. ................... .. .................................................................. -92

4.4.2 Residual plot of the arriva1 process for the time series of monthly clairns

counts of workers collecting STWLB from January 1987 to December 1994.

Al1 claimants are male, between the ages of 25 and 34, are ernployed in the

heavy manufacturing industry and are collecting STWLB due to a burn related

injury. .................................... .,.. ........................................................................ .93

4.4.3 Autocorrelations in the continuation residuals for the time series of monthly

clairns counts of workers collecting STWLB fhm January 1987 to December

1994. Al1 claimants are male, between the ages of 25 and 34, are employed in

the heavy manufacturing industry and are collecting STWLB due to a burn

related injury. ................................................................................ -93

4.4.4 Autocorrelations in the amival residuals for the time series of monthly clairns

counts of workers collecting STWLB fiom January 1987 to December 1994.

Ali claimants are male, between the ages of 25 and 34, are employed in the

heavy manufacturing industry and are collecting STWLB due to a bum related

injury. ......................................................................................... -93

4.7.1 The asymptotic efficiency of conditional least squares as a function of a when

h = 1. ......................................................................................... 104

4.7.2 The asymptotic efficiency of conditional least squares as a function of À. when

4.7.3 Box plots cornparhg the sampling distributions of â when the arrival process

............................................................. { E , } is uniform over {0,1,2}. -1 06

4.7.4 Box plots comparing the sampling distributions of i when the arxival process

............................................................. { E , } is unifonn over {0,1,2}. .107

5.2.1 A comparison of the power for the Gaussian and Poisson based tests as a

h c t i o n of a. h = 1 and n = 100 ......................................................... 1 IO

5.2.2 A comparison of the power for the Gaussian and Poisson based tests as a

fimctionof A . a = 0.01 and n = 100 . . . . . . . . . . . . . . . .c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i l 1

5.4.1 Plots of the likelihood as a hc t ion of a for five sirnulated samples of size

200 with a = O and h = 1. ................................................................. 115

5.4.2 Plots of the likelihood as a function of a for five simulated samples of size 200

with a = 0.1 and 3, = 1. .................... ,. ............................................. 115

8.1.1 A time senes plot of data set 1. .......................................................... -158

8.1.2 A t h e series plot of data set 2. .......................................................... .158

8.1.3 A time senes plot of data set 3. .......................................................... .158

8.1.4 A tirne series plot of data set 4. .......................................................... .159

........................................................... 8.1.5 A time series plot of data set 5 -159

.................................................. 8.1.6 ACF's and PACF's for data sets 1 to 3 160

.................................................. 8.1 -7 ACF's and PACF's for data sets 4 to 5 161

....................................................... 8.1.8 ACF's and PACF's for data sets I * 161

8.2.1 Pearson. continuation and arriva! residuals plotted agauist tirne for data set : * .... 170

..... 8.2.2 Pearson, continuation and arrivai residuals plotted against tirne for data set 2 171

8.2.3 Pearson. continuation and amival residuals plotted against tirne for data set 3 ...... 172

8.2.4 Pearson. continuation and arriva1 residuals plotted against model regresson in

................................... data set 3 ................................................. : 173

8.2.5 Pearson. continuation and anival residuals plotted against t h e for data set 4 ..... -174

8.2.6 Pearson. continuation and arriva1 residuals plotîed against tirne for data set 4 ...... 174

8.5.1 The bar chart represents the k-step ahead conditional cumulative distribution

for the Poisson AR(I) model. while the line graph represents the forecasts fiom

................................................................. the Gaussian AR(1) mode1 186

I thank rny supervisors, Dr. Brendan McCabe and Dr. Marty Puteman, for their

encouragement and guidance throughout my graduate career at UBC.

1 would like to acknowledge the assistance and valuable comments of another

member of my supervising cornmittee, Dr. Bent Jergensen.

1 would like to thank the Workers' Compensation Board of British Columbia for

providing the data, which motivated this thesis. I M e r thank the Workers'

Compensation Board of British Columbia for providing me with some financial support.

1 thank my wife Mary Kelly for her support and patience. Our first son Terry

Freeland was born during our stay at UBC and our second son Sandy Freeland was bom

only a few weeks ago. Both boys have brought us vast arnounts joy and happiness.

Finally 1 would like to thank my supervison, Dr. Brendan McCabe and Dr. Marty

Puteman, for their financial support to me in the past six years.

xiv

Chapter 1

1. Introduction

1.1 General

Count data is generated when the number of occurrences of an event type are recorded.

Examples include the monthly number of motor vehicle accidents, the annual number of

murders, the number of goals scored in a hockey game, the number of stolen bases by a

baseball player in a season and the monthly number of new cancer patients. Ofken the

prime objective in studying count data is to understand how counts are changing over

time, for instance seasonal patterns such as more suicides in February, and trends such as

increases in the nurnber of cancer patients. For this reasons count data is often collected

over time to form a time series. Often the term discrete tirne series is used and includes

more general situations in which the discrete values were not necessarily counts.

Methods for examining and modeling tirne series have been around a long tirne.

The most popular rnethod for modeling time series is the Box-Jenkins modeling

procedure (ARCMA models), developed by Box and Jenkins (1970). The limitation of

these models is that the random variation in the series is assumed to be normally

distributed. When counts are "large" it is often assumed that t h e series of counts c m be

adequately approximated by continuous time series which are normally distributed. The

reasoning for this is that many common distributions for count data (e.g. binomial,

1

Chapter 1. Introduction

Poisson and negative binomial) have an approxirnate normal distribution when the

distribution mean is "large". OAen series have very smail counts or even many

consecutive zeroes. Series such as the monthly number of Polio cases in the US, or the

monthly number of new cases of women with AIDS in Vancouver will typically contain

many consecutive mon& with no occurrences.

Because of the inappropnateness of classic tirne senes methods for the snidy of

time series of count data, much effort has and continues to be put into the development of

methodology necessary to study such time series. It is important to h d out when and to

what extent classical tirne senes methods fail. Throughout the thesis we will compare our

results for the Poisson AR(1) model to the results one would get by incorrectly using a

Gaussian AR(1) model. For example we get inefficient parameter estimates when

estimation is based on the pseudo Gaussian AR(1) likelihood. However when it cornes to

testing for independence in a Poisson AR(1) time series one can incorrectly assume a

Gaussian AR(1) model with Little consequence.

Several authors have studied the time series of polio counts in the US, see Zeger

(1988), Chan and Ledolter (1995), Jsrgensen et al (1995) and Song (1996). These works

are based on the theory of state space models. For an overview of state space models see

Fahrmeir and Tutz (1994). Other authors have taken a different approach to modeling

count time series, by trying to develop models with similar properties of the popular

ARIMA models. See Al-Osh and Alzaid (1987), McKenzie (1988), Joe (1996 & 1997),

Jmgensen and Song (1995) and Song (1996).

Chapter 1. Introduction 3

One of the goals of this thesis is to develop models to use in an analysis of data on

short-term disability (Sm) claim counts at the Workers' Compensation Board of British

Columbia (WCB). The WCB provides disability insurance for more than 130,000

employers in British Columbia Every year the WCB receives about 200,000 new claims.

There are two broad types of disability claims: health care only clairns and wage Ioss

claims. In this thesis claims counts for those workers collecting Short Term Wage Loss

Benefits (STWLB) wiII be examined.

A large portion of the economy in British Columbia has historically been based on

the resource sector. In industries such as forestry, fishing and mining, injuries such as

broken bones, cuts, dislocations and sprains are predorninant. Lately the depletion of

natural resources and other environmental concerns have reduced the activities of many

resource industries. This however has not brought a decrease in the nurnber of clairns,

since injuries fiom other industries are becorning more prevalent. For exarnple, the

nurnber of repetitive strain injuries suffered by store clerks and computer programmers

has increased dramatically. About half of al1 the claims at the WCB are strains and of

these about half are back straïns. The data which have been provided by the WCB

contains the monthly count of the nurnber of workers collecting STWLB. These data are

grouped by 47 injuries, 16 industries, 10 claims centen, 2 genders and 4 age groups,

resulting in 47xl6x10x2x4=60,160 separate series. Of course many of the series have no

occurrences at dl.


Each month the number of workers collecting STWLB is composed of 2

components, new claims and continuing claims. Since the number of continuing c l M s

this month depends on the total number collecting claims last month, these data have an

auto-regressive structure of order 1. The count AR(1) model discussed in Al-Osh and

Alzaid (1987) is appropriate for studying such data. Much of the work done to date on

such AR(1) models for count data has concemed probabilistic issues, such as correlation

structure, distribution and time reversibility. Little or no attention has been devoted to

issues of inference and forecasting. There is a need for statisticians and economeûicians

to study these models keeping statistical and practicality issues in mind. This thesis

focuses on forecasting, estimation, inference, covariates, residual analysis, and the fining

of the mode1 to actual data-

1.2 O v e ~ e w of topics

The outline of this thesis is as follows. In Chapter 2 we begin by presenting the Poisson

AR(1) model, Al-Osh and Alzaid (1987). The appropnateness of fitting such a model to

the WCB claim data is discussed. We can think of disabled workers as remaining in a

queue waiting to get healthy. Under certain assumptions this queuing process is

equivalent to the Poisson AR(1) model. This is important shce it gives justification for

the use of binomial thinning to mode1 the correlation in these data. We continue the

chapter by reviewing known properties of the Poisson AR(1) model. Then we discuss

three extensions of this model to a Poisson AR@) model that are f o n d in the literature.

Our contribution is to show that the partial autocomelation fùnction c m be used to select


the number of iags in the Poisson AR@) model of Jin-Guan and Yuan (199 1). That is, the

partial autocorrelations beyond lag p are zero, as they are for the Gaussian AR@) model.

The chapter concludes with an illustration applied to one of the WCB data series.

We consider Chapter 3 to be a major contribution. In this chapter we consider

how to generate forecasts £iom the Poisson AR(1) model. Two criteria are considered for

fmding optimal forecasts, the minimum squared error and minimum absolute error of the

forecasts. The squared error approach results in the conditional mean as the optimal

forecast, while the absolute error approach yields the conditional median as the optimal

forecast. It is interesting to note that the conditionai median is an integer, which may be

desirable from the point of view of data cohesion. We derive formulas for the k-step

ahead conditional mean and variance as well as the k-step ahead conditional moment

generating fünction. Our analysis is concemed with count series, more specifically low

count series. That is, the number of possible outcomes for the k-step ahead count is small.

Ln such cases it is possible and more usehl to calculate the probability mass for each of

these outcomes, which we cal1 point mass forecasting. We show how to fmd confidence

intervals around each point mass forecast when parameter estimates are used. The chapter

concludes by finding confidence intervals for the mean duration of claims.

Chapter 4 begins with a brief review of likelihood theory and es tka thg functions

and follows the exposition in Barndorff-Nielsen and Sprensen (1994) and Godambe and

Heyde (1 987). This theory is used to uni@ the following methods of estimation and

in ference for this model : conditionai least squares (CLS), generalized least squares

Chapter 1. introduction 6

(GLS), and conditional maximum likelihood (CML). Al-Osh and Alzaid (1987)

considered estimation by CLS and CML, but ignored inference except to refer to the

generai results for CLS in Klimko and Nelson(1978). Wooldndge (1 99 1) considers GLS

estimation and inference for parametric models of conditional means and variances. Our

contributions include the following: we prove a strong law of large numbers which

requires sirnply moment restrictions that typically hold for autoregressive processes. Then

we find an analytic expression for the expected information matrix in the CLS case and

derive an approximation for the expected information matrk in the GLS case. Further we

find neat forms for expressing the score function and observed Fisher information matrix

in tems of conditional expectations. This result generalizes to other cases where each

observation is a drawing fiorn a convolution of two distributions. It is relatively easy to

numericd1 y calculate the expected Fisher information. This makes Fisher scoring much

faster than using the observed information in a Gauss-Newton method. We also show that

the Poisson AR(1) model is a -mixing. We directly compare the asyrnptotic efficiency of

CLS to CML, and fmd that the efficiency is one when a = O and decreases as a

increases. Finally, we use a simulation to show that the methods of CLS and GLS are

more robust to model misspecification than CML.

To date in the literature no one has considered any statistical tests on the Poisson

AR(1) model. In Chapter 5 we develop tests to check if a series is independent. This is

important since it would be inappropriate to fit an autoregressive model to a series of


independent mndom variables. We note that special care must be taken since we are

testing to see if the parameter value lies on the boundary of its domain.

A test based on the score is developed and found to be asymptotically equivalent

to a one sided test of the least squares estirnator for a . In fact the difference between the

score and CLS statistics is in the denominators, which are respectively the sample mean

and variance. Since under the nul1 hypothesis, the senes consists of independent and

identically distributed Poisson random variables, the sample rnean and variance should be

asymptotically equivalent. However if the data are over-dispersed then the sample mean

will be smaller than the sample variance, or the score based statistic will be larger than

the CLS statistic. It is M e r found that the score is a well defined martingale for values

of a in a neighborhood of zero, and hence a Taylor series expansion of the score

function around the point a = O is also well defhed. This leads to a one sided score,

Wald and likelihood ratio test.

Chapter 6 considers a generd

ourselves that a series is dependent

specification test for the model. Having satisfied

is it adequately modeled by the Poisson AR(I)

process? This general specification test is often referred to as the information matrix test.

Basically it tests to see if the difference between the two representations of the

information matrix (sornetirnes referred to as the b e r and outer products) is distributed

about zero, which should be the case if the model is correctly specified. Chesher (1983)

showed that the information matrix test can be interpreted as a score test, and amounts to

testing whether the model parameters are stochastic. This test will therefore indicate


whether or not over-dispersion is present in the data McCabe and Leyboume (1996)

extended Chesher's result to a much more general setting, including tirne series of

dependent non-homogeneous random vectors. In this chapter we outluie the result of

McCabe and Leybourne and present the details for the Poisson AR(1) model. To assess

the test we simulate 100 series for which the Poisson model is the correct specification

and LOO series for which the Poisson model is a misspecification. We find that the level

of the test is approximately correct for finite series of length 200 and that the test has

strong power.

Ln Chapter 7 we extend the model to include covariates. It is the fmt time anyone

has considered adding covariates to both the continuation process and the arrivai process.

Joe (1996) briefly considers the addition of covariates to the mean of the process. Note

loe's parameterization is slightly different than ours. We h d the k-step ahead moment

generating function. From this we get the k-step ahead conditionai mean, variance and

distribution. Further we find the marginal distribution. Then we show how to construct

individual confidence intervals around the point mass forecasts (k-step ahead conditional

distribution). Next we give necessary and sufficient conditions for the model to be

identifiable. We further give conditions for the model to be a-mixing. If the covariates

are such b a t the model is identifiable, a-mixing and stationary then the maximum

likelihood estimates are asyrnptotically nomai. We conclude the chapter by outlining

testing for independence and for specification.


Joe (1997) applied the Poisson AR(1) model to daily counts of children reporting

symptoms associated with air pollution. The advantage of the WCB data over the data

used by Joe (1 997) is that the WCB data can be interpreted as a queue and hence justifies

our use of the Poisson AR(1) model. In Chapter 8 we apply the methods in the previous

chapters to rnodeling and forecasting five monthly time series of counts of disabled

workers collecting STWLB. Our andysis includes the-series plots as well as plots of the

sample autocorrelation fùnction and partial autocorrelation function. Al1 five senes

appear to be autoregressive of order 1 and one of the series appears to be seasonal. Mode1

selection is based on t-ratios, the information matrix specification test and residuai

analysis. We give a detailed discussion of the interpretation of the residuals.

For the five WCB series we analyze in this chapter we were able to obtain

additional data about the nurnber of anivals each month. The previously analyzed data

contain the total number collecting each month, that is, the nurnber continuing plus the

nurnber of arivals. We use these data to assess whether the arrivals are Poisson. In

addition we directly estimate the arrivai parameters and compare them to the estimates

obtained using the Poisson AR(1) model.

Next we calculate point mass forecasts for the five series. We note that the k-step

ahead distribution quickly approaches the marginal distribution. We conclude the chapter

by comparing the results of the Poisson AR(1) model to the results that would corne fiom

assuming a Gaussian AR(1) model.

2. Poisson AR(1) mode1

In this chapter, we begin by presenting the Poisson AR(1) model, Al-Osh and Alzaid

(1987). The appropriateness of fitting such a model to the WCB c l a h data is discussed.

We can think of workers collecting STWLB as waiting in a queue. When they begin

receiving benefits they enter the queue and when they stop receiving benefits they exit the

queue. Under certain assurnptions this queuing process is equivalent to the Poisson AR(1)

model. Next we review some basic properties of the model, which are shown to foliow

&om the moment generating function. We then discuss extensions of the model to a

Poisson AR@) model which has properties similar to the Gaussian AR@) model. We

conclude the chapter with an illustrative exarnple which demonstrates the theory covered

in this chapter.

2.1 Mode1 formulation

Let X, , X, , . , X, be a series of dependent Poisson counts generated according to the

model

where X, has a Poisson distribution with parameter h/(l -a), writien as X, - PO(&),

and { is a series of independent identically distributed (iid) Poisson random

10

Chapter 2. Poisson AR( 1 ) model

variables with parameter h , that is 6 , - Po(h). The thinning operator " 0 " is defmed as

follows; given XI-, , a 0 Xr-, = £3, (a), where BI, (a),B,, (a), . . . ,B,r r- , , (a) are iid

Bernoulli random variables with P(B, (a) = 1) = 1 - P(B, (a) = O) = a . Since a 0 X,-,

given ,Y,-, is a sum of iid Bernoulli random variables if follows that it has a binomial

distribution with parameters a and X,-, , written as a 0 XI-, / X,-, - Bi(a, XI-,). The term

thinning operator is used since its operation is to randornly thin out or reduce the

number in a group. It is M e r assumed that B, (a ) and E , are independent.

For cornparison purposes we defme the Gaussian AR(1) model as,

where {rl,}~, is a sequence of independent identically distributed normal random

variables with mean zero and variance a'.

An important distinction between the Poisson AR(I) model and the Gaussian

AR(1) model is that in the Poisson model X, is composed of two random components,

the survivorship component a 0 XI- , 1 X,-, , and the new entrant (innovation) component

E , . in the Gaussian model given X,-, the first component aXt-, is not random. The

Poisson mode1 is also complicated by the fact that these two random components are not

observed. That is the distribution of X, given Xt-, is given by the convolution of the two

random components. This makes the Poisson model much harder to work with than the

Gaussian model.

Chapter 2. Poisson AR(1) mode1

This model has been studied by Al-Osh and Alzaid (1987) and McKenzie (1 988).

Joe (1996) extended this model by developing a general method to define a random

operator for cases were the marginal distribution is in the convolution-closed infuiitely

divisible class. His method is consistent with the AR(1) Poisson model in that it defines

the binomial thinning operator when the marginal distribution is Poisson.

2.2 Mode1 interpretation

This model can be interpreted as a birth and death process, see Ross (1983, Section 5.3)

for an introduction to birth and death processes. Each individual at time. t - 1 . has

probability a of continuing to be dive at time, I , and at each time, t , the number of

births follows a Poisson distribution with mean h .

Altematively, the model can be interpreted as an infinite server queue, see Ross

(1983, Example 2.3 b). The service time is geomeaic with parameter 1 - a and the

arrival process is Poisson with mean 1. One fundamental resuit fiom queuing theory is

that the expected length of the queue is equal to the arriva1 rate tirnes the expected

waiting tirne, or L = W , where , L , 1 and W are respectively the expected queue

length, the anival rate and the expected waiting tirne, see Little (1 96 1). In this example

the mean waiting tirne, W , is equal to l/(l -a) and the expected queue length

is L = L / ( l - a ) .

With regards to short-terni disability claims, X, is the number of worken

collecting short-tem wage Ioss benefits (STWLB) at time t . It equals the sum of the

Chapter 2. Poisson AR(1) model

number of workers continuing to collect £iom tirne t - 1, a 0 XI-, , and the number of

new claims at tirne t , E , . The waiting time l/(l -a) is the mean number of months that

a newly disabled worker is expected to collect STWLB. This i s referred to as duration at

the WCB and is an important input into managerial decision making. Note that it is not

directly extractable from available WCB data

2.3 Basic properties

In this section we look at the stationary distribution of the Poisson AR(1) process and

basic quantities, such a s mean, variance, moment generating fùnction and correlation.

For the Poisson AR(1) model the conditional mean and variance of X , given

X,-, are respectively E [ X , 1 Xr-, ] = a&-, + k and Var[ X, 1 Xr-, ] = a ( 1 - a ) Xr- , + I . The

Gaussian AR(L) model has the same conditionai mean but the conditional variance, o',

is different, McKenzie ( 1 988).

In the following proposition the stationary distribution for the Poisson AR(1)

model is given and eorn this we c m fmd the unconditional moments of XI. McKenzie

(1988) sketches the proof of this result, we give a simple proof using moment generating

functions.

Proposition 23.1 m e n X, is Poisson with mean h/( l - a ) the marginal distribution of

X , is Poisson with mean k/(l -a).

Chapter 2. Poisson M(1) mode1

Prooj We use induction to prove the result. By assurnption the result is true for X, . We

now assume Xr-, is Poisson with mean A/(l -a), that is the moment generating

h function of Xt-, is M , (r) = exp -(es - 1)) . The moment generating fünction of X, (1 -a

is,

= E E exp sa 0 x,-, + s E , ) I x , - , ] ] [ [ (

= E [( ae' + (1 - a)) exp(h(es - l))]

where es' = aes + 1 -a . Making this substitution gives us,

Thus the marginal distribution of X, is Poisson with mean X/(l -a ) and the

result follows.

Remark

Since the marginal distribution of X, is Poisson it follows that the unconditional mean

Chapter 2. Poisson AR( 1) mode1 15

and variance of X , are both equal to À/(l- a) . in addition the first three marginal

3 moments of X , are respectively & , & + (&)' and & + 3(&)2 + (&) .

An important quantity for this mode1 is the autocomelation fhction, which is the

k same as in the standard Gaussian AR( 1 ) model, and is given by p, = a , k = 1,2,. . . , see

Al-Osh and Alzaid ( 1987).

2.4 Poisson AR@) model

A natural extension of the Poisson AR(1) mode1 is the following Poisson AR@) model,

where { E ,)= is again a sequence of iid Poisson random variables, " 0 " is the binomial ,=1

thinning operatar, given ,Y,, , Xi,. . . , XI-, a , O X,-I, a O . , a 0 XI - p and E , are

mutudly independent and a, E [0,1], j = 12,. . . , p. Note that the marginal distribution is

Poisson only for the case p = 1. This extension is studied in detail in Jin-Guan and Yuan

(1991).

Alzaid and Al-Osh (1990) examine a different Poisson AR@) model where the

thinning operator is defined such that a, 0 X, ,a 0 X, ,. . . ,a 0 X, given X, has a

multinomial distribution with parameters a, ,a,, . . . ,a and X, . A non-linear Poisson P

AR@) model is found in Joe (1996). Joe writes his model as X, = A, (x,-, , X,-, , . . . , XI- , )

+ E~ where the probability function of A,(x,-, , X,, ,. . ., XI- , ) is a p l dimensional sum.


The model by Jin-Guan and Yuan (1991) has the same autocorrelation function as

the Gaussian AR@) model. The parameters are easily estirnated by conditional least

squares, see Theorem 2.4.4. A drawback to this model is the lack of a physical

interpretation as in the Poisson AR(1) model. Jin-Guan and Yuan give no interpretation

of this model, nor c m we think of any physical system it could represent. However it is

often the case that ARMA models do not have physical interpretation, so this may not be

much of a concem.

It is possible to give a physical interpretation of the model in Alzaid and Al-Osh

(1990) but the interpretation is a bit odd and is as follows. Let ,Yr be the number of

newborn femdes in period t . Each female has reproductive life span of p periods and is

permitted to have at most one offspring. From the cohort X, of newbom females the

distribution of their female offspring in the next p periods is assumed to be multinomial.

That is they let a, 0 Xr ,a, 0 X, , . . . , a , 0 X, denote the female offspring respectively in

periods t + l , t + 2 ,..., t + p and assume that a, 0 X,,a, 0 X ,,..., a, 0 X, given X, has a

multinomial distribution with parameters a,, a ?, . . .,a P and X, . The innovation E ,

represents the nurnber of newbom females that immigrate into the system at tirne t- The

autocorrelation function for this process is the same as the Gaussian ARMA@,p-1)

process.

The higher order AR@) models of Joe (1996) are extremely difficult to use in

practice. The conditional moments for his model are nonlinear and involve the calculation

of rnulti-dimensional s m s . In his AR(2) model the calculation of E [ X , 1 x,-, , x,-,]

Chapter 2. Poisson AR( 1 ) rnodel

requires the calculation of four 2-dimensional s u s . This is a major draw back since it

makes conditional least squares estimation much more difficult. Maximum likelihood

estimation is possible but dso dificult, since even when p = 2 the conditionai

probabilities involve Cdimensiond sums. Joor (1997) contains an exatr.p!e where an

AR(2) model is estimated. However the model does have a physical interpretation and as

a special case can be interpreted as a queue. For the AR(2) modcl there exists random

variables 2,. 2, - ,Z,, - Z12, Z,, ,Z, and Z,, such that the distribution of X+, , ,Y,+, , X,,, is

the sarne as the distribution of Z,+Z12+Z13+ZlLt , z2 + Z12 + ZL) + Z13

Z, + Z,, + 5; + Z,, . This is not a queue since Z1, represents the number present at both

time t+ 1 and t+3 but are not present at tirne r+2. It is possible to make Z,, degenerate and

equal to zero, in which case the rnodel becomes a queue.

In the remainder of this section we examine the properties of the Poisson AR@)

model of Jin-Guan and Yuan (1991) and henceforth refer to it shp ly as the Poisson

AR@) model. The objective of the rest of this section is to show that we can use the

sample autocorrelation function and partial autocorrelation function to select the order in

a Poisson AR@) rnodel in exactly the same way as they are used to select the order in a

Gaussian AR@) model.

Theorem 2.4.1 (Jin-Guan and Yuan, 1991) Let {E ,}" be c3unt valued random variables i=l

with mean p ,finite variance a * and let a , ~[0,1], j = 12,. . . , p . Ifthe mots of


are imide the unit cirde, rhen there exists a unique stationan. count valued time series

{X, ) which satisfes (2.4.1) and COV[X,, E , ] = O , s < t .

The proof of this result is long and is found in Jin-Guan and Yuan ( 1 99 1).

We d e h e the sample covariance and sample correlation respectively as Y,

=~-'C:::(X~-X)(X,~, -K) and fi, =y,/yo.

Theorern 2.4.2 The P o h n AR@) process {x,} defined &y (2.4.1) is ergodic.

Theorem 2.43 f , and fi, are stronglv consistent.

Jin-Guan and Yuan (1 99 1 ) show that the Poisson AR(1) process {x, ) ergodic and use

this to prove the strong consistency of the sample covariance's and correlations.

The following result, also from Jin-Guan and Yuan (1991), implies that the

autocorrelation function of the Poisson AR@) model is the same as for the Gaussian

AR@) model. We provide a simple alternative proof to that found in Jin-Guan and Yuan

(1 99 1).

Proposition 2.4.1 The Yule- WaIker equation, p, = a , p ,-, +a, p ,-, + * -+a, p,-, . holh

for the Poisson AR@) model.


ProoJ Multiplying (2.4.1 ) by XI-, and taking expectations we get,

Next we take the expectation of (2.4.1) and multiply by E[x,-,] to get,

We note that E[E,x,-,]= E[E,]E[x,-,] and that because of stationarity E[x,-,x,-,]

- E[x,-,]E[x,-, ] = c o v [ ~ , , x,,-, ] = y s-k . Taking the difference between (2.4.2) and

(2.4.3) we get y , = a ,y ,-, +a ,y ,-? t -+a ,y ,-, . Finally dividing this by y , completes

the proof.

There is a common misconception that the sample autocorrelations for a senes of

iid data will be zero. In fact for iid data we can expected one autocorrelation out of 20 to

be larger in absolute value than 2rCX and as a result values of lbk 1 larger than 2 t i X are

statistically significant at the 5% level.

Parameter estimates for the Poisson AR@) model can be found using the method

of conditional least squares, Klimko and Nelson (1978). In this method the parameter

estimates are choosen to minimize the sum of squared distances between X, and

E [ X , 1 x,-, , X+, ,. . .] . Details for the Poisson AR(1) case are found in Section 4.2.


A

Theorem 2.4.4 The conditional l e m squares estimares. 6 ,, ,&, ,. . . ,& and k,, , cf P*

a, ,a ?,. . .a and h are strongly consistent andfirther

where C is ajnite covariance mawix.

Jin-Guan and Yuan (199 1) prove this by noting that the Poisson AR@) satisfîes the

conditions needed in Klirnko and Nelson (1978) for the conditional least squares

estimates to be consistent and asymptotically normal. The covariance matrix Z is equal

to the inverse of the Godambe information, see Section 4.1.1 and an expression for it is

given in Jin-Guan and Yuan ( 199 1 ).

Defuiifion 2.4.1 The pth partial autocorrelation, rr , , is the last coefficient, a, , when

fitting a Poisson AR@) model, and measures the excess correlation at lag p which is not

accounted for in a Poisson AR@- 1 ) model.

The following new result is useful in model selection, since it shows that a

Poisson AR@) process has the same partial autocorrelations as a Gaussian AR@) process.

Corollary 2.4.1 lf the series {x,} follows a Poisson AR@) process and satisfies the

conditions in Theorem 2.4.1 then the partial autoco~elutions beyond lag p are zero. that

Chapter 2. Poisson AR( 1) model

is a,,, =O. for k 2 1.

Proof: By Theorem 2.4.1 X, is uniquely defined by (2.4.1) with p lags. Therefore the

only way to represent X, is by the following equation,

- a ,,, = O and hence xPr , = O . is to set a,+, = =---

cl

As a result of Theorem 2.4.4 we can use conditional least squares to €id strongly

consistent estimates of the partial autocorrelation coefficients.

Another estirnate cornes fiom the Yule-Walker equations. Setting k = p in the

Yule-Waiker equation and solving for a , gives us a p = p , a , p , - , + x 2 p p - ?

-. . .* ,-, p, . This suggests the foilowing estimate for rc .,

where 6 , ,&, ,. . . ,â are any & consistent estimates of a , ,a2 ,. . . ,a when fitting a P - 1 P- 1

Poisson AR@- 1) model.

If {x,} is a sequence of iid random variables satisfjring certain mild moment

restrictions, for example finite variance and fourth moment, then nX6, has a standard

normal asymptotic distribution, where the sample partial autocorrelation coefficient 6, is

defined as in (2.4.4). As a result, values of le,[ larger than 2 f K are statistically

Chapter 2. Poisson AR(I) model

significant at the 5% level.

2.5 An iliustrative example

To illustrate the points in this chapter and the following chapters we have selected

one of the series form the WCB clairns data set, which is analyzed in more detail in

Chapter 8. As mentioned in Section 1.1 the WCB data are rnonthly claims counts of

workers collecting STWLB and are grouped into more than 60,000 separate series.

One of the categones in which the data is grouped is type of industry, most of

which are strongly affected by seasonality. Examples of seasonally affected industries are

logging, hotels, restaurants, fishing, and retail. To model the senes fiom these industries

it will be necessary to add seasonal regressors to the Poisson AR(I) model defmed in

Section 2.1. The addition of covariates or regressors to the model will be covered in

Chapter 7.

The heavy manufacturing industry is one which we feel is less sensitive to

seasonality. It is fiom this industry that we have selected a series to examine in the

foilowing exarnple.

Example 2.5.1 The data series used in this example and the examples in Chapters 3

through 6 consist of rnonthly claims counts of workers collecting STWLB f?om the

Richmond claims center between January 1987 and December 1994. All the claimants are

males, are berneen the ages of 25 and 34, are employed in the heavy manufacturing

industry and are collecting STWLB due to a bum related injury.

Chapter 2. Poisson AR(1) mode1 23

A time series plot of this data is given in Figure 2.5.1. From this plot the data

appear to be stationary and non-seasonal. This is confïxmed by the correlogram, Figure

2.5.2. In a non-stationary time senes the autocorrelations do not corne down to zero

except for large lags. If the series were seasonal then we would expect the absolute value

of the sample autocorrelation at lags 6 and 12 to be large. The correlogram for an AR(I)

process with O < a < 1 should move to zero exponentially as the lag increases. While the

autocorrelations in our correlogram approach zero quickly it does not appear exponential.

A good discussion on interpreting correlograrns is found in Chatfield (1989) section 2.7.2

and 4.1.1.

Fi,we 2.5.3 is a plot of the sample partial autocorrelations, which were calculated

using (2.4.4). This data set consists of 12 x 8 = 96 observations. As noted at the end of

Section 2.4 any partial autocorrelation larger than 214% = 0.204 is statistically

significant at the 5% level. Consequently the second partial autocorrelation is on the

border of being significant.

Since the correlogram does not give any strong evidence against an AR(1) plus

the fact that we have a physical interpretation which suggest an AR(1) rnodel, it is

appropriate to proceed assurning an AR(1) model.

Suppose we know the '?me" parameter values are a = 0.40 and A = 5 2 (these are

actually the maximum likelihood estimates caiculated in Section 4.6). Then the

unconditional expected number of claimants collecting STWLB each month is

5 2 / (1 - 0.40) = 8.67 . The expected waiting tirne, or the expected number of months that


a new injured claimant can expect to be off work, is 1 / (1 - 0.40) = 1.67. A property of the

geometric service tirne is that it is memoryless. That is, at the start of the current month

al1 claimants collecting this month, both old and new claimants, can expect to collect, in

addition to the current month, for an average 0.67 months.

Cla ims Counts (heavy manufacturing indu*, males, ages 25-34, bums)

20 I

Figure 2.5.1 Times series plot of monrhly clairns counu of workers collecring STWLB /rom January 1987 ro December 1994. All claimanu are male, benveen rhe ages of25 and 34, are employed in the heavy manufachrring indusrry and are collecting STWLB due to a burn relared injuy.

Correlogram (heavy manufacturing indu-, males, ages 25-34, burns)

- . * . . - * * - - . . - - - - - - . - - - - * * * - . * . - - - - - - - * . - - - - - - - - - - . -

A A - - A A A - v T ?

Figure 2.5.2 Correlogramfor the tirne series of monthly claim counts of workers collecting STWLBfim Januaty 1987 ro December 1994. Ali claimants are male, beiween rhe a g a of 25 and 34, are employed in rhe heavy rnanufacturing industry and are collecring STWLB due ro a burn &red injuty.


Figure 2.5.3 Sample partial auroconelatton JUIlcrzon for the lime series of monthly c la im counts of workers collecting STWL.B/rom Janua-, 1987 io December 1994. Ail claimants are male. between the agez of25 and 34. are employed in the heu- manufacruring industly und are c o l k i n g STWLB due to a bum related injury.

Chapter 3

3. Forecasting

The first three sections of this chapter conceni forecasting when the model parameters are

known. We begin by considering two cnteria for optimal forecasting: minimum mean

squared error forecasting and minimum mean absolute error forecasting. The first

criterion almost always results in a non-integer forecast and is the same point forecast that

the usual Gaussian AR(1) model would produce. The second critenon leads to an integer

forecast which is attractive fiom the point of view of data cohesion. When forecasting a

future count fkom a low count tirne series there is only a srnall set of possible values the

future outcome is Iikely to take. In these situations it is practical and desirable to give the

individual probabilities for each outcome in the set. We cal1 this point mass forecasting.

A fourth type of forecast is the conditional mode, which is found by selecting the

outcome (point mass forecast) with the largest probability. We conclude the chapter by

consûucting individual confidence intervals for the point mass forecasts when using

parameter estimates.

3.1 Minimum mean squared error

Consider a sarnple {x,},", fiom the Poisson AR(1) mode1 in Chapter 2. The objective is

to find a forecast, X,,, of X,, that minimizes the expected squared error given the

26

Chapter 3. Forecasting

sample. That is to find &+, which minimizes

The first order conditions are,

This irnplies that the conditional mean, k,v+k = E[x,,, 1 X , ] , is the forecast of X,vTk

that minimizes the mean squared forecast error. Note that this is a general result when

forecasting tirne series and does not depend on the model.

The next result is for the k-step ahead conditional mean, and has never explicitly

appeared in the literature except for the case were k=l . It is the same as the k-step ahead

conditional mean for the Gaussian AK(I) model.

Proposition 3.1.1 In the Poisson AR(I) model the k-step ahead conditional mean is

ProoJ We prove the result by induction. Since E , is independent of a 0 X,-, the one step

ahead conditional mean is


Now suppose that the k-l step ahead conditionai mean is

Then the k-step ahead conditional mean is

and the result follows by induction.

O

It is also interesting to know the variation of X,_, given X, around the forecast k*,,, ,

which is given in the following proposition.

Proposition 3.1.2 In the Poisson M(I) model the k-step ahead conditional variance is

1-ak given by VO~[X,+, 1 x,] = a' (1 -a ')x, + - h, k=1,2,3 ,....

1 -a

ProoJ We prove the result by induction. The one step ahead conditionai variance is,

~ a r [ ~ , + , ~ ~ , ] = a ( l - a ) ~ , +k .

Now suppose that the k-1 step ahead conditionai variance is


Then the k-step ahead conditional variance is

For the Gaussian AR(1) mode1 the k-step ahead conditional variance is

(l - a ' k b ' .

As k goes to infinity the conditional mean and variance respectively go to the

stationary (unconditional) mean and variance of the process. That is

lim,,, qx,w I X,] = %-a and lim,,, var[&, I X , v ] = %-a .

A third result which actually includes the two previous results is the conditional

moment generating function of X,,, given X, .


Theorem 3.1.1 For the Poisson AR(I) mode! the distribution of X,,, given X, is a

convolution of a binomial distribution with parameters ak and X , and a Poisson

distribution with parameter h S. Thhot is, the k-step ahead conditional moment

genemtingjhction is given by

Proofi We prove the result by induction. The one step ahead conditional moment

generating function is given by

Now suppose that the k- 1 step ahead conditional moment generating function is

Then the k-step ahead conditional moment generating function is


where e' = a *-'es + (1 -a "' ) . Substituting this for er gives

The above result shows that the distribution of X,,, 1 X , is a convolution of a

binomial distribution with parameters a and X, and a Poisson distribution with

pararneter 1 . Hence it has mean a 'x, + h and variance

a ' (1 - a ' )x, + h E, which of course agrees with Propositions 3.1.1 and 3.1.2. This

extends the results in McKenzie (1988) to cases were k> 1.


In the usual Gaussian AR(1) mode1 X,-, 1 X, has a normal distribution with

mean a' X, + A +$ and variance s a ' . So while the conditional mean of XLV4 j X,

in the Poisson and Gaussian models are the same, theû conditionai distributions are quite

different.

Coroliary 3.1.1 Let p, denote the distribution of X,,, 1 X , and let p be the

dishibufion of a Poisson randorn variable with mean &. Then. pk p . Thar is p,

converges weakly to p or XdVTk 1 X , has n Poisson lirniting distribution with mean &.

ProoJ From (3.1.1) we have

which is the moment generating function of a Poisson distribution with mean & . The

result follows since convergence of the moment generating function implies convergence

of the probability measure whenever the moment generating function exists in a

neighborhood of zero, see for example Billingsley (1 986).

3.2 Minimum mean absolute error

The objective in this section is to find a forecast, kN+, of XN+, that mùiimizes the

Chapter 3. Forecasting 33

expected absolute error given the sampie. That is to find T,_, which minimizes

E [ I X . ~ . * - ~ , , ; l x , ] -

Let p, (XI X,) be the conditional probability function of X,,, given X,v . We

will define the conditional rnedim of X,+, given X , as the smallest non-negative

integer m, for which Cm' p, ( X I X, ) 2 05 . An alternative definition is to let m, be the x=o

largest non-negative integer such that pk (XI X , ) 5 05. However if p, (O( X, ) > 05 x=o

then the median would not be defmed.

Proposition 3.2.1 In the Poisson AR(0 mode1 the k-step ahead conditional median is the

forecast which minimizes the erpected absolute forecast e m r . That is. E[~x,,-X,,llx,]

hm a global minimum ut X,v+k = mk .

ProoJ Suppose that &+, is between rn - 1 and m , where m is a non-negative integer.

Then E[(x,~, - ~,.,llx,] = xz(i.v-, -r), ( x ) - 1'-(X,, - x ) ~ , ( x ) . The slope of this expectation as

a function of k,v+k is ~ m - ' p k ( x ) - z* p, (x) . If m I m, the dope is negative and if x=O x=m

m 2 m, + 1 the dope is positive. The minimum therefore occurs at k,,, = mk .

cl

It is interesting to note that we did not have to restrict our search to values in the non-


negative integers in order to get an integer solution.

3.3 Point mass forecasts

In this section we look at point mass forecasts, or the k-step ahead conditional

distribution. in cases were the counts are low, the median forecast may not be very

informative. For example, consider the following 2 cases of a discrete random variable

X: case 1, P (X=O)=l -P (X=l )=50 andcase2 P(X=0)=1-P(X=5)=.90. In

both cases the median of X is O and the mean is 0.5, but in case 2, there is almost twice

the probability of observing a zero. Since there are only 2 outcomes in this example it

would be more informative to give the probability distribution for the outcomes.

By Theorem 3.1.1 the distribution of X,,,,, given X, is a convolution of a

binomial distribution with parameters a k and X, and a Poisson distribution with

parameter h . The probability mass function of X,,, given X , is,

X-S

1 k .Y,-s (ak)'(l-a ) 1-ak

(x - s)! . x i . { - + } ( E ) 1-a 1-a ( 3 . 3 4

in the following example we illustrate the conditional mean, median, mode and point

mass forecasts. The conditional mode is easy to find since it is the point, x, at which the

probability mas, p, (4 X. 1, is largest, where x is a non-negative integer.

Example 3.3.1 This is a continuation of Example 2.5.1. Our series consists of 8 x 12

= 96 observations and the last observation is X,, = 1 1. Again we assume that the "true~'

Chapter 3. Forecasting 35

parameter values are a = 0.40 and )c = 52. Figure 3.3.1 is a time senes plot of the

observations and predicted values (one step ahead conditional mean forecasts). It

indicates that one step ahead conditionai mean forecasts are reasonably good.

Conditional forecasts for the fint 6 months of 1995 given a count of I l in

December 1994 are displayed in Table 3.3.1. The table includes the k-step ahead

conditional mean and median forecasts as well as the k-step ahead conditional distribution

(point mass forecasts). The last column contains the limiting distribution or unconditional

distribution,

Monthly Claims Counts (heavy rnanufacturing indu*, males, ages 25-34, burns)

20 1

Figure 3.3.1 Time series plot ofrnonthly d a i m counr collecting STWLB /tom Januaty 1987 to Decernber 1994. Al1 claimonts are male. between the ages of 25 and 34, ernployeà in the heavy rnanujâctunng i n d w s , and collecring STWLB due to a bum relared injuty.


mode

z:=* pk (4 1 1 ) A 1)

Pk(6(11)

~k (711 1)

P, (811 1)

~k ('1 l ') pk(lql l)

P, (1 li 1 1)

~k ( l 21 ') ~ t ( 1 W )

~k (14!' l)

I

Table 3.3.1 k-srep ahead con modes and point maïs forecasrs.

Remarks

1 . The conditional mean forecasts quickly approach the unconditionai mean of 8.67.

2. The conditionai median forecast is 9 and remauis unchanged except in the lirnit where

3. The conditional mode is easy to fhd fiom the table of point mass forecasts

(conditional distribution). For example, the largest 1-step ahead point mass, p,(x(l l),


is 0.142 and occurs when ~ 9 .

4. The 6 step ahead conditional distribution is very close to the limiting distribution.

3.4 Prediction intervals

The point mass forecast in Section 3.3 were calculated assuming the "tnie" parameter

values were known. Since in practice we have to estimate the parameters we would like

to include this uncertainty by constnicting point mass prediction intervals. The following

result, which is found in Serfling (1 980), will be used to develop the prediction intervals.

Theorem 3 4.1 Suppose thaî the sequence. { Y , }LI, of k-dimensional random vectors is

asyrnptotically normal with mean p and covariance rnahir n-' B = n-' [CJ ,] . Leî g(y) be

a real-valued jùnction having a non zero derivaiive al y = p . Then g( Y, ) is

and covariance rnaîrix a.symptoticaZ[v normal with mean

Now suppose we have a sarnple of size n and denote the maximum likelihood

h

estimates for this sarnple by â, and A,. Under mild regularity conditions, see Section

r 4 . 1 ( â n ) is asymptotically normal with mean (a,,~,)' and variance di-', where

i is the Fisher information matrix and a, and ho are the ' h e " parameter values.

h

As a consequence of the above result for fixed x, p, (XI X,;â, , L n ) h a an


asyrnptotically normal distribution with mean p, (XI Xn ;a,, ho ) and variance

where ii' and i;' are the diagonal element of i-' and iz is the off diagonal element. The

partial derivatives of the point mass probability p, (XI Xn ;a, h) can either be f o n d

directly or with the help of the expressions in Section 4.5 and are given by

and

An approximate 95% confidence interval for p, (xl Xn ;a,, I L , ) is

We now continue Example 3.3.1.

Example 3.4.1 The maximum likelihood estimates for our data set are 6 = 0.40 and

= 5 2 . The inverse of the expected Fisher information rnatrix at these parameter

estimates is given by


The details of these calculations are found in Example 4.4.1.

In section 4.1 we show that the Poisson AR(I) mode1 satisfies the regularity

conditions for the maximum likelihood estimates to be asyrnptotically normal.

Consequently we can use (3.4.4) to construct individual approximate 95% prediction

intervals for the k-step ahead point mass forecasts. Table 3.4.1 contains the 95%

prediction intervals for the fmt 6 months of 1995.

Remarks

1) The width of the prediction intervals increases with the number of steps ahead.

However after about 6 steps ahead the width changes very little as it is very close to its

maximum.

2) The prediction intervals for the 6 step ahead conditional distribution is almost the

same as the prediction intervals for the unconditional (limiting) distribution.

3) These are individual confidence intervals for the probabilities in the k-step ahead

conditional distribution NOT the forecasts.


3.5 Duration

In this section we examine duration, which is the number of months that the WCB

continues to pay a clairn. Under the Poisson AR(1) mode1 assumption the duration time is

geometric with parameter 1 -a , see Section 2.2, and the mean duration is (1 -a)-' .

Suppose we have a sample of size n and denote the maximum likelihood


A

estimaton by â ,, and h,, . We assume that 6, is asymptotically normal with mean a,

and variance o2(a0, ho), where c'(a,, ho) is that portion of the inverse of the Fisher

information matrix which pertains to a , and a, and k, are the "true" parameter values.

As a result of Theorern 3.4.1 the estimated duration (1 -6,)-' is asymptoticalty

normal with mean (1 -a ,)-' and variance (1 -a ,)%'(a,, h,)n-' . Hence an approximate

95% confidence interval for the mean duration is (1 -6 ,,)-' f 196(1- 6 n)-'O(ân, i n )n - ' ! .

Example 3.4.1. For our illutrative example the estimated duration is (1 - 0.40)-' = 1.667

months and an approximate 95% confidence interval for the mean duration is

or (1.229,2.104).

Chapter 4

4. Estimation

Section 4.1 outlines the asymptotic theory of estimating functions and is illustrated by

reviewing estimation and inference for the Gaussian AR(1) model. In Sections 4.2 and

4.3 we show that the methods of conditional least squares (CLS) and generalized least

squares (GLS) sat ise the regularity conditions for asymptotic normality of Section 4.1.

Further we denve an analytic expression for the Godambe information matrYt in the CLS

case and an approximation to the Godambe information matrix in the GLS case. Then in

Section 4.4 we show that the score fiuiction and obsewed Fisher information matrix can

be neatly represented as conditional expectations. Futher in Section 4.5 we show that we

can generalize our expressions in Section 4.4 to generai AR(1) models. In Section 4.6 we

show that the method of maximum likelihood (ML) estimation satisfies the regularity

conditions for asymptotic normaiity. Finally in Section 4.7 we compare the asymptotic

efficiency and robustness of the CLS and ML.

4.1 Likelihood theory and estimamg hinetions

The goal of this section is to present a unified approach to estimation through the use of

estimating functions. We provide the estirnating function approach since it includes many

standard estimating techniques, such as maximum likelihood, generalized method of

moments, conditional least squares and generalized least squares.

42

Chapter 4. Estimation

A detailed review of likelihood theory and an introduction to estimating functions

is found in Barndorff-Nielsen and Sprensen (1994). Godambe and Heyde (1987) give

more details on the asyrnptotics and optimality of estimating functions. Both of these

papers consider estimation and inference for multivariate continuous time stochastic

processes. However in this review we will resûict ourselves to univariate discrete time

stochastic processes. Many applications of estimating functions can be found in Godambe

(199 1).

Let {x,)' be a discrete t h e stochastic process for which we consider r = l

parametric statistical models of the form ( o n , Y, fe;û E @) , where Rn is the sample

space, 3 is a sigma field, 4" is a probability measure defined on the measurable space

(CF, Y ) and O is an open subset of W. We assume that al1 of the probability measures

in the family Pn = { ~ ; 0 EO} are dominated by a common cr -finite measure p and we

let

denote a version of the Radon-Nikodym derivative.

Definition 4.1.1 We call a p-dimenriona[ finction Y,, (0 ) = Y,, (x, , X, , . . . , X,, ; 8 ) a

regular esthatirtg funcrion if it safisfies the following conditions for al2

8 = (8 , ,8 ,. . . ,O ,) €0.


1) Y,@) is a zero menn square integrable martingale with respect to &n and the sigma

field 3. = o ( X , , X2 ,..., X,),

2) the covariance matru: E, [Y: ( 0 ) ~ ~ (0 )] , is positive defznite,

3) Y,, is almost suretv dz,&erenfiable with respect to the components of Cl .

4) Y, is nomingular, where Y, denote the rnatrix of partial derivatives of Y, with

respect ro the components of 0 .

The parameter 8 is estimated by solving the system of equations Y,, (8) = O and

we let 6 denote the estimate. Estirnating functions or estimating equations arise fiom

many standard estimation techniques. The first order conditions for minimizing

conditional least squares is an example of an estimating equation. The score function is

another comrnon estimating function. In the case were Yn (8) is the score f i c t i on we

will write U , instead of Y,. Finally note that Y,, is also cdled an inference fùnction,

since both estimation and inference are based on it.

Defulltion 4.1.2 A sequence {3;,}= of sigma fields is said to be aààpted to a sequence i = t

{ x } of random variables i f X, is 3, measurctb!e. Further, { x } is called an

adirpted sequence.

Definition 4.1.3 Let (3, ): t-1 be an increasing seguence of sigma fields. An adapted

sequence {x, ,3, ) z, on a probabilig space (a, 3, P) is called a marîingafe if for al2 t


E[Ix,I] erists and Lsjnite and E [ x , / ~ , - , ] = X ,-,.

Definition 4.1.4 A stochastic process {x,}:, is cal

sequence if Xr is memurable wirh respect to Ci, and E [ X ,

'led a martingale d#èrence

is,-,] = o.

Remarks

1) Unless otherwise stated we will assume our martingales to be vector valued and

denote the transpose of X, by ~f .

2) If {X }* is a martingale difference sequence then the sequence is uncorrelated. That 1 t= l

is for al1 s and t such that s c t , E [ X , X, ] = E [ X , E [ X , 13,]] = E [ X , O] = O

3) If the sequence { x , } ~ is a martingale then a martingale difference sequence { T } ~ r=I r = l

can be formed by defming = Xr - Xr-, . Further the variance of the martingale is

In the following well known example we illustrate definitions 4.1.1 to 4.1.4.

Example 4.1.1 Let p be Lebesque measure, $2"" = 3"" be the Borel sigma

field of !Rn" and O = (0,l) x %+, where %+ is the set of positive real numbers. The

family of probability measures P"" is defined by the following version of the Radon-

Nikodym derivative,


1 &x, 4+, -A)= p ( ~ ; a , h ) =fi-- e - 1 - a

,=* JI;; 9

T where (aJ) EO and x=(x,,x, ,..., X,) EW"".

This is the Gaussian AR(1) mode1 defined in (2.1.2), with a ~(0 ,1 ) , 0' - 1 and

Â. Xo = - + where E is normal with mean zero and variance l/(l -a ' ) . Note that

1-a

defining X, in this way makes the unconditional distribution of X, normal with mean

h/( 1 - a) and variance I/(l -a ' ) . Note that #;a, A) is the likelihood, and the log-

likelihood is proportional to

The relative contribution of the last two tems of this expression to the log-likelihood is

small for large samples. Since our primary objective is to obtain asyrnptotic properties we

will cowider instead the following approximation to the log-likelihood,

t=i

The score h c t i o n associated with this log-likelihood is


It is useful to express Ln as

O-,, (a, À) =

Next we show that Un is a regular estimating fimction. It is well known that score

functions are martingales. It is easy to ver@ that Un is a martingale with respect to

T The martingale differences of the score are ut = Ur - Ut-, = (X,-,E , , E , ) , which have the

following variance,


This matrix is positive-definite since for any 2-dimensional vector I = (Z, ,&)'.

is positive except if 1, = 6 = O. The variance of Un is simply n time the variance of y

and is positive-defmite.

The partial derivatives of Un exist and are given by the following matrix:

negative except for the case were X, = X , =* X,, . Therefore Un is a regular estirnating


The solution of &(&, i) = O gives the following parameter estimates,

Definition 4.1.5 Let {m, }" be a sequence of martingale dzflerences. which generates r = l

the martingale Mn = Cr=, m, . ( M) = E [rn, rn: 1 3,_, ] is called the quadratic

characteristic of Mn and [ M ] ~ = En nt, m: is called rhe quadratic variation of Mn. 1=I

Remark

If Mn is a martingale with respect to 3, then [ M I , - ( M ) , is ako a martingale with

respect to the same sigma field 3,, .

Definition 4.1.6 A sequence {x,)~ is said to be stationary if/or everyjinite k the joint r= I

dishibution of the vector ( X , , X, -, , . . . , X, +, ) is independent of t.

Definition 4.1.7 A sequence { x , } ~ is said to be a -mUcing ifthere erist a sequence of r=l

positive numbers {a( t ) lm convergent to zero, such that r=I '


I P ( A ~ B)- P ( A ) P ( B ) ~ ~ a ( n )

for anyset A E ~ ( x ~ , X? ,..., X k ) , anyset B E ~ ( x ~ _ , , X k n - l ,...) and k > 1. n t 1 .

Definition 4.1.8 Let {x, }= be a stationary stochartic process defne on the probabil* r = l

n

space (Cl, 3, P ) . I f for every two sets A and B E 3, Ibn- l P(X, E A n X, E B) n+co

= P(4 E A ) P(X, E B) then the sequence {.Yt is called ergodic. r=t

Theorem 4.1.1 Let g be a measurable Jùnction onto %'and defne =

g ( X * , X , ,,,...J,, ) - Then

2) f {x,):, is a -miring a ( n ) = ~ ( n - ' ) for some r>O then { is a-miring r = l

a(n ) = ~(n-') ,

3) if {x, );il is ergodic then {r ):, is ergodic..

For a proof of parts 1 and 3 see Stout (1974, pp. 170, 182) and for a proof of part 2 see

White and Domowitz (1 984, Lemma 2.1).

1) Parts 1 and 3 hoId even whens is infinite.


2) If {x,}:, is stationary and a -rnixing then it is ergodic. The converse is not always

m e .

For a proof of the Ergodic theorem see Stout (1974, p. 18 1).

The following regularity conditions are needed to establish a central limit

theorem: Let {m,}" [ = l be a çequence of martingale differences, which generate the

martingale Mn = Cr=, rn, .

Theorem 4.13 y Mn is a one dimensional zero mean square integrable martingale

satisfiing conditions CI and C2 then

and ifcondition C2 is replaced by condition C2' then


J

[ MI:,% Mn + N(0,l) (mixingl.

A proof of this result can be found in Hall and Heyde (1980, ch. 3).

Remarks

1) Condition C 1 can be replaced by weaker conditions, see Hall and Heyde ( 1980, ch. 3).

2) Loosely speaking the regularity conditions insure that Mn is not dominated by a few

of the rn, 's and that the variance is standardized to unity.

3) I f the martingale difference sequence {rn lx is stationary and var[m,] < then t t=I

condition C 1 is satisfied.

4) If the martingale difference sequence {rn,}h, is stationary ergodic and ~ [ r n ' ] < ;r

then conditions C2 and C2' are satisfied with q L = 1 almost surely, and the result is

called asymptotic normality.

5) If q' is not degenerate the result is called mixed asymptotic nomality and the mode1

is called non-ergodic.

6) For the multivariate case, one simply checks that for al1 non-zero p-dimensionai

vectors Z the conditions for the univariate martingale I T Mm hold. In this case the

Cramér-Wold (1 936) device says that (4.1.4) and (4.1.5) hold for the vector Mn.

Example 4.1.2 In this example we show the estimating function in Example 4.1.1

Chapter 4. Estimation 53

satisfies condition C 1. For any 2 dimensional vector Z = (1, ,Z,)' we have Var Z u, = [ ' 1

I'E[u,u:]z, which we showed in Example 4.1.1 to be equal to 1,'

The variance of Z r & is sirnply n tirnes this. Substituting into condition Cl we get,

We defer veriQing conditions C2 and C2' to Example 4.1 S.

Theorem 4.1.4 (Continuous Mapping Theorem, C m Let Zn and Z be random vectors

fiom some sample space to k-dimensional Euclidean space 93 * . Furrher let g(-) be a

measurable finction fiom Rk to '31". We will allow g(-) ro be discontinuous at a set of

d zero measure points Dg. If Zn + Z and if P(Z E Dg) = O then the Continuous Mapping

d

Theorern says that g(Z, ) + g(Z) .

See McCabe and Tremayne (1993, Section 4.4) for a sketch of the proof and for many

illustrations of its use.

Remark

d d The following three cases of the CMT are widely used. If X, + X and Y, -+ c then,


and

We require the following additional regularity conditions for the next theorem.

P

C3) (0 .)yn (O) + O , for any 0 in a suitable neighborhood of e .

Theorem 4.1.5 Let Yn be a regular in ference fùnction and let 6, be a solution of

Y,, (0 ) = O . I f the regulariîy conditions CI. CZ. C3 and C4 are satisfied rhen.

The proof of this result is outlined in Godarnbe and Heyde (1987).

Remarks

1) In condition C3 a suitable neighborhood of 0 means one such that 6, is in the

neighborhood for al1 n greater than some nurnber N.


2) The result of Theorem 4.1.5 also hold when condition C2 is replaced by C2' and

(Y (0 ))" is replaced by [Y (0 )] .

The following regularity conditions, for the sequence of martingale differences

{y~ , } . lead to a simplification of Theorem 4.1 -5.

C5 For al1 s and r, ~ [ ~ , y ( ~ ] = E [ ~ , V : ] < ~ and E [ ~ . ] = E [ + , ] , where Y , is the

matrix of partial derivative of y, with respect to the components of 0 .

Remarks

1) Condition CS implies condition C 1.

2 ) Conditions C5 and C6 imply either conditions C2 or C2'.

3) If {Y ,) is stationary then conditon C5 holds.

4) If {y~ ,} is stationary ergodic, ~ [ b ,y ri] c m and E [ K , I] < a then conditons C6 and

C7 hold.

Definition 4.1.4 When condition CS holds for the regular estimating function Yn we

detine the variability rnatrix as ~ ( 0 ) = var[v ,(Q)] = E[V ,(eh :(O)]


Remark

A desired property of an estimating function is to give similar estirnates in repeated

samples. That is, we would like the function Y,, to not Vary much from sample to sample

or the variance of Y,, to be as small as possible. We therefore desire V to be as smail as

possible.

Definition 4.1.4 When condition CS holds for the regular estirnating function Yn we

d e h e the sensitivity matrix as ~ ( 9 ) = E[$ ,(O)].

Remark

Another desired property of an estimating function is that it should be easy to distinguish

between small changes in the parameters. The steeper the estirnating function is around

the true pararneter value the easier it is to distinguish and identiQ the pararneter. It is

therefore desirable for S(8) to be as "large" a s possible.

Definition 4.1.4 When condition C5 holds for the regufar estimating function Y,, we

define the Godambe information matrix as j = S ' @ ) Y - L ( 0 ) ~ ( ~ ) .

Definition 4.1.9 M e n estimation is based on the score and it satisfies the conditions for

a regular estimating function as well as condition CS, then we define the Fisher

information matrix as i = V = - S .


Remarks

1) The Godambe information cm be intzrpreted as a measure of the arnount of

idormation that can be obtained about the parameter Eom the estimating function.

2) Al1 the estimating functions considered in this chapter satise conditions C5, C6 and

3) Al1 of the estimating functions considered in this thesis are score or quasi-score

functions, that is Y, = L, , where L,,(x, ,x, ,. . . ,x,:0) is either a likelihood or a pseudo

likelihood. Since in this case Y~ will be a matrix of second partial derivatives of L, with

respect to 0 , S(0) will be a symmetric matrix. By pseudo iikeühood we mean when the

true likelihood is replaced by a simpler likelihood, which retains some of the properties of

the true likelihood. For example the Poisson AR(1) likelihood could be replaced by the

pseudo Gaussian AR(1) likelihood, which is much simpler. Recall that the Poisson AR(1)

mode1 and the Gaussian AR(1) mode1 have the same conditional mean, marginal mean,

autocorrelation function and partial autocorrelation fimction.

Coroiisiry 4.1.1 Let Y,, be a reguZar infirrence /unction and let 6 , 6e a solution of

~ ~ ( 0 ) = O . Ifthe regulan'ty conditions C3. C4. C5, C6 and C7 are satisfied then.

Proof From Theorem 4.1.5 we have


where Z a p-dimensional standard normal random variable. From the CMT we have

The left hand side is equivalent to n X ( 6 , -8), while random variable on the right hand

side has a multivariate normal distribution with variance covariance matrix

s T ( û ) v ( 0 ) ~ ' ( 0 ) = j".

Example 4.1.4 We show that condition CS holds for the Gaussian AR(1) mode1 in

Exarnple 4.1.1. In Example 4.1.1 we showed that

which is fmite and due to stationarity is independent of t . The expected value of lit is

given by,


which is also finite and independefit of t. Hence condition CS is satisfied. We will defer

showing that conditions C6 and C7 hold to Exarnple 4.1 S.

Ideally we would like the variation in 6 , to be as small as possible. That is we

want j-' to be small or the Godambe information to be large. The following regularity

condition dlows cornparison between the Godambe information and the Fisher

information. We denote the score function as Un (s;B) = pn (x;8)-' 5 pn (x;8) , where

CS The score function is a regular inference fünction and the order of integration and

different iation may be interchanged as follows,

T heorem 4.1.6 For n regular estimating function iuhich satisf es conditions CS through

C8 the Godambe information is always Iess than or equal to the Fisher information.

More specIf;caliy det(i - j ) 2 O.

Next we state a strong law of large numben for martingales and prove some

special cases which will be used later in the thesis.


Theorem 4.1.7 (Martingale sirong l m ) Let {Y,,3,) be a martingale dzflerence

sequence, where 3, = o ( Y ; , 5,. . . , ) . the o -field generated by Y ; , 5,. . . , y. I f there exist

(U

an r 2 I such that Ce ~ [ l l f ' ] / t l " < CO , rhen zkl i n +O alrnost surely as n + a. r = l

For a proof of this strong law see Chow (1960) and Stout (1974).

Remark

If there exists an M such that ~ [ q ' ] < M for al1 t then l n G O , çince r = l

Next in Theorems 4.1.8 and 4.1.9 we prove a strong law of large numbers which

only requires restrictions on moments. In such cases, Theorems 4.1.8 and 4.1.9 can be

used instead of the Ergodic theorem which requires showing the process is ergodic.

Suppose a stochastic process { x ; } ~ has the following moment retrictions: r = l

for al1 positive integers s and t and k = 1,2,--,p. Further suppose the conditional

expectation of X: given 3,-, = o (x, , x?, . . . , x,-, ) is a polynomial in X,-, of degree k.

That is, the conditional expectation c m be written as


where a,, t O .

Theorem 4.1.8 Let {x,}' be a stochastic process which satisjes the moment r =O

restrictions in (4.1.8) and (4.1.9). Then the s ~ o n g Zaw of Zarge numbers holdî for x:.

as

thatis fx x:+E[x:] a l m o s t s u ~ v a s n + m and k = 1 , 2 ; - - , p . r=1

1 Pro08 First note conditions (4.1.8) and (4.1.9) together imply E[x:] = -

1 - 4 0

{akl E[x:"] +ak2 E [ x : ' ' ] + . . + ~ , , ~ _ , E [ ~ Y , ] + a , ) .

We will proceed to prove the result by induction of k. Consider the case k = 1 .

Let = Xr - a,,X,_, -a , , and note that (y, 3, ) forms a martingale difference sequence.

Further satisfies condition (4.1.8) and hence satisfies the conditions for Theorem 4.1.7.

Therefore as n + a,

ar

which implies that Zn Xr -t $ = EIX,] . 1=1

m We now assume that for Z=1,2 ,..., k, where k < p , ~C:=,X:+E[X:]. Let


= x:*' -a ,+,. ,xf-il - ak+ , ,~ : - , - - a k + ~ . k X r - ~ - a k + i . t + i - Then Y , , } is again a

martingale difference sequence and satisfies condition (41.8). By Theorem 3.1.7 the

following surns converge alrnost surely as n + a,

The proof then follows by induction on k .

Theorem 4.1 -9 Wnder the assumptions of Theorem 4.1.8

almosr sureiy as n + m.

Proo j Note that repeated application of the conditional expectation assumption implies

that E [ X , 13,-,] = b j X t - j + c, , for some bj and ci, and this implies that

E [ X , - , X r ] = b j ~ [ ~ < 2 ] + c , E [ X t ] . Let Y, = X,- j (X, - b j X , - j - c i ) . { + is again

a martingale difference sequence and satisfies condition (4.1.8). By Theorem 4.1.7 we

have the following,


Remarks

<ff

1) Under the sarne conditions the more general result 1 In x:, X: + E[x:-, x:] can t= j

be proven using basically the same proof, only it is more cumbersome to write down.

2) We believe that these are new results.

3) The conditions for Theorem 4.1.7 are satisfied by the Poisson AR(1) model.

Example 4.1.5 We continue with Exarnple 4.1.4. Since the unconditional distribution of

X, is normal with rnean h/(l -a) and variance 1/(1 -a2) we have that for al1 s and t

E[X: ] = E[x,'] c a,, k = 42,. . Also the conditional expectation of X, given X,-, is

linear in X,-, . Hence we can apply Theorems 4.1.6 and 4.17 to get,

and


Tneorems 4.1.6 and 4.17 can also be applied to 6'0, (0 .) which gives us,

where the equality to S results because tir is independent of the parameten. Applying the

CMT we establish condition C3 as follows

Also since Ùn (0) = on (O, ) for al1 8 and 8, in the parameter space condition C4 is

trivially satisfied.

Since conditions C3 through C7 are satisfied Corollary 4.1.1 gives us the

following asymptotic result,

where i = V and appears in (4.1.7).


4.2 Conditional least squares for the Poisson AR(1) model

We begin by defining the family of statistical models. Let p be the counting measure,

fi' = G, where No is the set of non-negative integen, 3 be the set of al1 possible

subsets of an, and O = (0,l) x %+ . We d e h e the family of probability measures Pn by

the following version of the Radon-Nikodym derivative,

where

and a, h E O . This defines the Poisson AR(1) model given in (2.1.1).

In this section we consider estimation of the Poisson AR(1) model parameters by

the method known as conditional least squares (CLS). Klimko and Nelson (1978) and

Hall and Heyde (1980, ch. 6) consider CLS estimation and inference for stochastic

processes. The parameter estimates are selected to minunize the s u m of the squared

distances of each observation X, from its conditional expected value given the previous

observations X,, X , , . . . , X,-, , E[X, 13,-, ] = aX,-, + A. , where 9, is the standard filtration,


that is 3, =rr(Xo, Xi..., X , ) . The problem is equivalent to maximiring the following

function over the parameter space,

The fmt order conditions to this maximization problem lead to the following

Note this estimating function is the same as the score function for the Gaussian AR(I)

mode1 given in (4.1.1) and is therefore called a quasi score. The following are a direct

consequence of this equivalence:

1) fiom (4.1.2) the partial denvatives of the estimating function are,

2) this matrix is non-singular (except if Xo = X, =. = X, = O),

3) from (4.1.3) and (4.1.4) the solutions to Yn 0 = O are,


and

(Note that in cases were Û is negative we set it equal to zero)

4) Yn is a martingale with respect to Zn = o X,, X1 ,. . ., X,, , since the conditionai

expectation of X, given XI-, is the same in both the Poisson and Gaussian AR(1)

models.

Next we calculate the variance of y , , which is given by


Note that the above expectations are with respect to the Poisson AR(1) model. This

r matrix is positive definite since for my 2 dimensional vector 1 = l,, 4 . the quantity

is positive except if 1, = 4 = 0.

The expected value of Y, is given by,

S ince E [v ,y :] and E [q , ] are both finite and independent of t condition C5 holds and


we define the variability ma& and the sensitivity ma& respectively as V = E [ ~ I , y ~ :]

and s = E[+ ,].

The Godambe mformation and its inverse are given by,

h' 1 +a l ta + 2 h' l-a 1-a 1 -a 1 +a

h' a l - a + I + a h 1 -a

and

l-a

Note these explicit expressions for the Godambe information and its inverse are new.

Condition (4.1.8) is satisfied since the marginal distribution of X, is Poisson with

mean A/ 1-a and al1 moments for the Poisson distribution exist and are finite.

Condition (4.1.9) is also satisfied since E [ X , 1 x,-, ] = aY,-, + î~ . We can therefore use

Theorems 4.1.8 and 4.1 -9 to show the fo llowing:

as

1) n-l [Y]* -, V , which is condition C6,

QI

2) n-'Y,, + S , which is condition C7,


S ince 'kn a, h is independent of a and h condition C4 holds. Further condition

C3 is Y;' a,h Yn a,h =(n-'Y,, a,h )-ln-'!Pn a,h , which by the C m converges in

probability to zero.

Finally we can apply Corollary 4.1.1, since conditions C3 through C7 are

satisfied, to get

4 J Generalized Ieast squares for the Poisson AR(1) model

In this section we use generalized (weighted) least squares (GLS) to estimate the model

parameters, see Wooldridge (199la). In this method, parameter estimates are selected to

minimize the sum of the weighted squared distance of each observation X, fiom its

conditional expected value given the previous observation X,-, , E [x, 1 X,-, ] = aX,-, + h .

The weights are the inverse of the estimated conditionai variance

A A CI

V'~[X, 1 X,-,] = â, 1 - â, X,-, + A, , where â , and In are strongly consistent estirnates,

such as the conditional least squares estirnates. Observations with large conditional

variances are given smaller weightings than those with smaller conditional variances.

This problem is equivalent to rninimizing the following function over the parameter

space,


The fmt order conditions to this minimization problem lead to the following estimating

The sequence {Y , )LI , where v I = Y, - Y,-, , is martingale difference sequence

Therefore YI is a martingale with respect to 3, = o (x, , X, , . . . , X, , â , , h , ) .

The partial derivative of v, with respect to a and h are


This matrix Y,, = C:=,yi, is non-singular except if al1 of the Zr's are zero.

Notice that the matrix

is positive definite and hence the variance of Y,, is also positive definite. Y,, is therefore

a regular inference function. I f we assume that for dl n 1, > 6 for sorne 6 > O then

y ,y is dominated by the function

which has a finite expectation. By the Dominated Convergence Theorem we have


Fuaher this implies

Note that strong consistency of â, and h, is required to use the Dominated Convergence

Theorern. Similarly y , is dominated by the h c t i o n

which has a finite expectation. By the Dominated Convergence Theorem we have

We will therefore define the variability matrix V and sensitivity matrix S by

Theorem 43.1 Let Ym t e the inferencefunction defined in (4.3.1). Then the following

converge under the Poisson AR(I) model:


P n-9Jn -b s

Before we prove this result we need to show the following new result.

Proposition 43.1 I f the stochastic process {x, } follows the Poisson AR(1) mode[ with

a E 0,l , then it is a -mliing with a n = O ( a n ) .

Pro08 We let p, (x,,, 1 X , ) , p(X, / x,-, ) and p X, denote respectively the conditional

probability of XI+, given X,, the conditional probability of X, given X,-, and the

marginal probability of X, . These are respectively defined in (3.3.1), (4.2.1) and (4.2.2).

We denote the joint distribution of X,, X2, . . ., Xk by,

the joint distribution of X,,, , X,,,,, ,. . . by

and the joint distribution of X I , X2 ,. . ., X,, X,+, , X,,,+ ,,... by

For any A E<T X,, X2 ,. . ., Xk and any B = a X, , , X,,,,,. . . we have the following,


x-1-s 1-a* 1-a"

-* -- mtnts-I.Y, -il

= e 1- + a u 5 [ n , ) ( . n , ~ ( 1 - ~ n , . ~ k - ~ - s ~

x ! (x - 1 - s ) ! x-1-s

1-an 1-an 1 -& - mtn(x - l ,X , -1) X, - 1 . 1-a'

- ( 1-a ) +anxk (s+ il( ) ( a n ) f ( l - a n ) * ' * - l - s e l-

x! (x - 1 - s)!

min(x-l.X& -1) X , -1 x, the summation on the right is bounded by s + 1 , which is less than

s u 2

Hence we have


where 0 5 a. 5 a . Therefore

and


Therefore {x, } is a -mking with a n = ~ ( a " ) .

ProoJ (Theorem 4.3.1) We begin by introducing some notation. Let

and

n

s; =Cr', 1 = 1

P Showing (4.3 -6) is equivalent to showing + Y .

Since {x,}~ r=i is stationary and a -mixing with a n = ~ ( a ") by Theorem 4.1.1 {r}:, is also stationary and a -mixing with a n = ~ ( a " ) . Further {x,}: and { y )Li are

r-1


both ergodic.

Since Y, and q are both positive and Y, I we have that the variance of

n-'Sn is less than or equal to the variance of Ân'n-'~;. By assumption n, 5 A and by the

Ergodic theorem n "< 5 E[T] . Hence Âi2n-'< converges almost surely and therefore

its asyrnptotic variance is zero. Chebyshev's inequality cm then be used to show that

P n-'s,, - E[~-'s,] + O . Since lim,,, E[Y,] = V we therefore have the desired resuit,

P nq'Sn -t V . The proof of (4.3.7) is sirnilar and is omitted.

P Condition C2' is satisfied since we have shown that n-'[~]~ +V and

lim,,, n-'~ar[~,,] = V . Condition Cl is also satisfied since lim,, r n a ~ , + ~ ---a) ,]

= V . Condition C4 is satisfied since [yn a, h ]-' qn (6 .,in) = I , . Finally condition C3 is

satisfied since

which by the CMT Theorem 4.1.4 converges in probability to zero. We can therefore use

Theorem 4.1.5 which implies


By the CMT Theorem 4.1.4 the following holds

The following theorem gives an approximation for V .

Theorem 4.3.2 Thefirnction V in 4.3.5 is equivalent to

1

a ' l - a a l - a

where

and R c h6 1 -a 1-a


Proofi Making use of the following two identities,

we cm rewrite the information rnatrix as follows,

To approximate the expectation above we consider the following function,


th The k denvative of g x with respect to x is,

and a Taylor expansion about the point h/ 1 - a is,

where

R 5 wili be largest in absolute value when 5 = 0. Now we can approximate the

expected value of g X,-, .

where the error in the approximation is negative and bounded in absolute value by


4.4 The score and Fisher information for the Poisson AR(1) model

h this section we derive expressions for the score function and observed Fisher

information for the Poisson AR(1) model. n i e conditional likelihood given X,., is

L(a,h) = n:,lp(~,I~,-,), where ~ ( X , I X , _ , ) is defined in (4.2.1.).

Let Ua be the score with respect to a , that is

where l(.) is the log-likelihood. The partial denvative of the conditional probability is

found by making use of the following denvative

and the relation


Thus the score with respect to a is,

Let U, denote the score with respect to h , that is

The partial derivative of the conditional probability is found by making use of the

following denvative


Thus the score with respect to h is,

The second derivative of the conditional probability can now easily be found.

x x-1 + - { - ~ p ( ~ 1-a 1-a - 2 x -2) - p ( y - l lx - l)}

and


The second derivatives of the log-likelihood are then,

and


Next we show how to write the h t and second derivatives of the log-likelihood

as conditionai expectations. To make these conditional expectations more clear, we

consider the following example with just 2 random variables. Let X and Y be

independent randorn variables and denote their densities respectively as f , ( x )

and f,(y) , where the densities are with respect to the measure v (Lebesgue rneasure or

counting measure). Let Z = X + Y be the convolution of X and Y . The joint distribution

of Z and X is f, ( x ) f, (z - x) (note the Jacobian for this transformation is 1). The density

for Z is found by integrating out X.

fAz , =

The conditional density for X given Z is

The conditional moments for X and Y are then,


and

Proposition 4.4.1 For the Poisson m(I) mocie2, with conditional probabilities

where Er[- ] denotes the conditional expectation with respect to the sigma field

3, =4x09x,, ,Xt-IXr)*


Proofi We k t show (4.4.4). Rearranging (4.4.1) we see that

Next the lefi hand side of (4.4.5) is

which is the right hand side of (4.4.5).

Next the left hand of (4.4.6) is


m" XI. -LX,- [ -3 e - À k X t - ' - I

PW,IXI- , ) ==O X (X , - 2 - x ) ! min( Xt.-2.Xl-L-2)

e-À hXt -'-" C X r - 1 (Xr-1- 1) ~ ( X t l X t - 1 ) x=o (x,-, -2 -x)!x! (xr - 2 - x)!

min(X,.-2.X,-, -2)

P(XI I X I - , ) .=-2 (X t -2-x)!


Next the left hand side of (4.4.7) is


=-À 1.q - 1 -1 - -

~ ( X r l X r - i ) ..O (x[-, - 1 - x)! x ! ( X I - 2 - x)!

min' X,.-2.X,-1 - 1 ) - - (*Y + l ) ( x r - i ). .+' ( 1 - a )*- -'-x e-' hXr -'-"

P ( X , I x r - , ) .r=-1 x+l (xr - 2 - x)! =-k LX, -$

- - X (x, - 1 - x ) !

min( Xt.-l..Y,-i 1 =-L x X r -I

- - P ( X , I x t - 1 ) 5 -r(x~')I(i-a)~r-'-r(~I--r) (x, -x)!

which is the nght hand side of (4.4.7).

Finally the lefi hand side of (4.4.8) is

e - h h x , -2-1 - -

X (x, - 2 -x)!

- - ~ ( 1 -a)x~-l-x{(xr - x)2 - (xt - e-'hXr "

p(XrI XI-, (XI - x)!



Proposition 4.41 leads to the following new expression for the h t and second

denvatives of the log-likelihood:

- 1 crlc - 2 {a [(a O Xt-i)&, ] - Et [a O X I - 1 l ~ r [& r l }

ha (1 -a) ,.,

and

Note that Er [a x,-, ] * a Xr-, , Et [e t ] * E , , Et-l [a O X,-, ] = aX,- , and


Et-l [E , ] = ;I . Further note that given X, the correlation between a 0 XI-, and E , is - 1.

This follows because

In the cases were the tirne series is comprised of low counts the expected

information is easy to calculate numencally. For example if the probability of a count

larger than 6 is ahnost zero then there are 36 outcomes to sum over in the calculation of

E[E, [a 0 XI- , 1'1 or E[E, [E , 1' ] . Hence the expected information is easily to calculate.

The above representation of the score function leads to a new defuiition of

residuals. The usual way to define a residual is to take the difference between a random

variable and its conditional expectation. That is, for the Poisson AR(1) mode1 residuals

can be defined as r, = X, - E,-,[x,] = X , -aXt-, - h . However since Xr is compnsed of

two random components it would be nice to define two sets of residuals, one for each

random component.

The natural way to define such residuals is as follows: for the continuation

component let r,, = a 0 XI-, -&,-, and for the arrival component let 5, = E , - A.

Unfominately these definitions won? work, because a 0 XI-, and E , are not observable.


However we can replace a 0 Xr-, and 6 , respectively with ~ , [ a 0 x,-,] and Et [&, ] (their

conditional expectations given the observed values of 4-, and X,). That is, redehe the

residuals as r,: = E,[r , , ] = E,[a 0 X r - , ] - a X , - , and ri = ~ [ r ? , ] = E l [ € , ] - A.

Remarks

1. These conditional expectations are easily caiculated with the help of Proposition

4-4.1.

2. We are using the aggregate data to estirnate the individual unobserved processes

(continuation process and arriva1 process).

3. The residuals are equivalent to the martingale differences of the score fùnction.

Further note our new expressions for the score fùnction lead to these new definitions

for residuals, which otherwise would not have been obvious.

4. Adding the components of the two new sets of residuals gives the old set of residuals.

That is,

5. All the residuals above should be standardized before making residual plots.

6. Since the two sets of residuals are calculated from aggregate data they might be


highly correlated. This appears to be the case in Example 4.4.1. For low count senes

(series with many zeros) there appears to be less correlation, see the residual plots for

data set 1 *, Figure 8.2.1, and the discussion of the residuals in Section 8.2.

Example 4.4.1 Ln this example we continue our analysis of our illustrative data set by

examining the residuai plots. Figures 4.4.1 and 4.4.2 are respectively the residual plots

for the continuation and arrivai processes, while Figures 4.4.3 and 4.4.4 are respectively

the autocorrelations respectively for the continuation and arrivai residuals.

Both residual plots have patterns similar to the Pearson residual plot in Figure

2.5.4. None of the autocorrelations in the continuation residuals are statisticaily

significant at the 5% level. The second autocorrelation for the arrival residuals is on the

border line significant at the 5% level.

Residuals: Continuation Process (heavy manufaduring industry, males, ages 25-34, burns)

k ' ~ - - - = - , l Figure 4.4.1 Raidual plot of the continuation pmcess for the tirne series of rnonthly d a i m couna of workers collecring STWLB/iom Jantiury 1987 fo Decernber 1994. A11 claimants are mule. berween the ages of 25 and 34. are employed in the heavy manufacruring indusrty and are collecting STWLB due to a burn related injury.


-

Residuals Arrivai Proces (heavy manufacturing indu*, males, ages 2544, bums)

F

e Figure 4.4.2 Residual plot of the arrivai procas /or the rime series of rnonthly clairns corn& of workers STWLB /mm January 1987 to December 1994. Ail claimanfi are male. between the ages of 25 and 34. are employed in the heavy manufacturing industry and are collecting STWLB due to a burn related injury.

Autocorrelations of the Residuals: Continuation Process (heavy manufaduring industry, males, ages 25-34, burns)

i: Figure 4.4.3 Aufocorrelations in the continuation residuals for the rime series of monthly claim counts of wor collecting S W L B fmm Januaty 1987 to December 1994. Ali claimanu am male. benoeen the aga of 25 and 34, are employed in the heavy manufacnrring industry atad are collecting STWLB due to a bum related injury.

Autocorrebtions of the Residuals: Arrival Process (heavy manufacturing industry, mies, ages 25-34, burns)

Figure 4.4.4 Autocorrelations in the am'val residuals for the t h e series of monthly d a i m counfi of w o r k . collecting STWLB /rom January 1987 to December 1994. Al1 claimanrr are male. berween the ages of25 and 34, are employed in the heavy manutcmring industry and are collecting STWLB due IO a burn related injury.


4.5 The score and Fisher information for a generd AR(1) mode1

We now show how to extend this to a general AR(1) model of the following form. Let

= O Xt-l where a 0 XI-, 1 X,-, has density f (SIX,-,;a), E , has density &;A)

and X , 1 XI-, has density

The conditional density of a 0 X,-, 1 X,-, , X , is

Let 3, = o (X, , XI,. . . , X, ) . The conditional moments of a 0 X,-, 13, and E , I 3, can be

expressed as follows.

Given a sample x,, x,, . . . , x,, fiom the above AR(1) model the likelihood is the product of


we take h o ) = 1 The log-likelihood and score are respectively then

!@,A) =En i=I log(h(x,lx,-,;a,A)) and

a a If -/(SI x,-, ;a ) and - g(x, - s; 1) are polynomials of the following fom,

da 33c.

a -f(s[x,-, ;a) = (a, + a,s t a2s2 +a-+a,sp)f (SIX,_, ; a ) , da

and

d -g(x, -s;A) = (ao +a,(x; -3)+a2(x, -s)'+--.+a,(x, - s ) ~ ) ~ ( x [ -s;h) d h

then the score can be rewritten is tenns of conditional moments as,

Note that we can write


and

by simply defining r ( s ; a ) = - a /If and y ( x t - ; Thus the score c m be aa. written in terms of conditional expectations as,

The second derivative of the log-likelihood are,

and

a2 --;-g(xr - s ; h ) = y &, -s ;h)g(x, -~;A)+(Y (x, - s; A))' g(xr - s; A). ah-

Using these two equations we can rewrite the second denvatives of the log-likelihood as,

Applying these results to the AR(1) models of Joe (1996) is a bit more

complicated, since Joe allows the distribution of a 0 Xr-, to depend on A. If we define

a 6 (-) such that - f (SI x,-, ;a, A) = S (s; A) f (SI x,-, ;a, A), then the score function with

ah

respect to )c becomes


Similar modifications can be made to the expressions for the second derivatives of the

log-likelihood.

In the following three cases a 0 XI-, is independent of h .

1) The Poisson AR(1) model, see Section 4.4,

2) The Gaussian AR(1) model, with a 0 4-, - aX,-, and E , - ~(h,o'). This is slightly

different than loe (1996), who lets a 0 X,-, IX,-, - N ( ~ x , - , ,aa ') and E , -

N((h, (1 -a )oz). Note however that both models are equivalent.

3) The generalized Poisson AR(1) model, where X, has a generalized Poisson

distribution with parameters h/(l -a ) and k/(l -a). Note Joe ( 1996) allows the two

parameters to be different.

In these 3 cases the functions r(s) and y (s) are polynomials in s, and hence the score

fiinction and observed Fisher information cm be written in terms of the conditional

moments of a 0 4-, and E , .

4.6 Asymptotics of the conditional maximum iikelihood estimators for the

Poisson AR(1) model

In Section 4.3 we showed that the Poisson AR(I) model is a -mkhg. Since it is also


stationary the mode1 is ergodic. If the Fisher information is fhite and positive dennite

then the regularity conditions C b C 3 and C5C7 hold. The next theorem shows that the

Fisher information is f i t e .

T Theorem 4.6.1 Let u, = Lr, -U ,,*- u, = CI, -U,,-, and ut = (u,,%) , where U,

and CI, are the scorefwictions for the Poisson M(I) respectively with respect to a and

h. Further, let Ut denote the mat+ ofpartial derivatives of ut with respect to a and h

and let ut = x:&. For any rwo dimensional vector l and any positive inreger k,

Pro03 For the Poisson AR(1) mode1 we note the following:

for dl positive integers k and 1. Also note that, for any two dimensional vector 1,

is a polynomial in Et [a 0 x,-,] and E , [ E , ] of degree k, which we write as


k for some constants a, ,a, ,- ,ak. The expected value of { I ' u , ) is

The proof for the second part is automatic rince E [([TU, )' ] = E [ZTu, url] = E[Z 'zi, 11.

The next theorem shows that the Fisher information is positive definite.

T Proposition 4.6.1 E = O ifand only if I, = 4 = O . where 1 = ( Z , ,& ) .

ProoJ It is sufficient to show that ~ar[1, E, [a 0 x,-,] + Z, E, [E, ]] # 0, or that

~ a r [ l , a ~ , - , ~ ( ~ , - 11 x,-, - 1 ) + Z&(X, - 11 x,~,)] * O when I is non-zero. We prove the

result by contradiction.

Suppose there exists a non-zero 1 such that ~ i z r [ Z , c l ~ , - , ~ ( ~ , - 11 x,-, - 1)


+z&(x, - 11 x,-, )] = O . This implies that

almost everywhere for some constant c. in particular (4.6.4) holds when X, = 1 and

X,-, = O which gives us

or 4 = ch-lei . Taking X, = 1 and X,-, = 1 in (4.6.4) we can solve for Z, as follows

or 1, = cei . Finally taking X , = 1 and X,-, = 2 in (4.6.4) we have

This implies either a = 1 , a = 2 or c = O. Since a = 1 aad a = 2 are outside the

parameter space we conclude that c = O , however this implies that Z, = l2 = 0.


It can be shown that the regularity condition C4 follows from the uniform strong

law of large numbers, see Ferguson (1996, Ch. 16). Therefore the regularity conditions

for Corollary 4.1.1 are satisfied, and as a result the maximum likelihood estimators &,

and i,, have the following asymptotic distribution:

where i is the Fisher information ma&.

The parameter estimates c m be found using a Newton-Raphson type iterative

procedure as follows: let U(0) be the score function and let Ù(0) denote the ma& of

partial denvative of U with respect to 0 ; where 0 =(a,k)'. The iterative procedure is

defined by

In some cases this procedure can be rnodified by replacing l?(êtr') with q~(6" ' ) I and is

sometirnes referred to as Fisher Scoring.

We use Fisher Scoring to estimate the parameter in the Poisson AR(1) model,

since I@(ê('))] is easy to calculate. The CLS estimates generally work well as starting

values. Occasionally (less than 1% of the t h e ) we found that the CLS estimates caused

the algorithm to diverge.

Example 4.6.1 The maximum likelihood estimates for our illustrative example are

Chaprer 4. Estimation

6 = 0.40 and h = 52 . The inverse of the expected Fisher information ma& at these

parameter estimates is given by

4.7 Cornparison of methods

Al-Osh and Alzaid (1987) used a simulation to compare the following three estimation

methods for the Poisson AR(1) model: Yule-Walker, CLS and C a . They note that the

algebraic formulas for the Yule-Walker and CLS estimates are almost identical and is

also seen in their numencal comparisons. Their results also show that there is only a

small eficiency gain in CML when a alpha is small, Say less than 0.3. However for

larger values of a alpha, Say between 0.5 and 1, the efficiency gain in CML seems well

worth the effort.

Our analysis differs in that we look at the asymptotic efficiency (AE). That is we

consider what happens as the sampie size goes to infinity and our results are not based on

a simulation. We let 9, be an estimate of 0 and denote it Godambe information by j and

its inverse by j-'. Let ji' be the k, k element of j-' and let i;' be the k, k element of the

inverse of the Fisher information m a h . Cox and Hinkiey (1974) define the AE of the


Figure 4.7.1 The qmpcoric eflcienc-v ofcondirional leur squares as ajiuicrion of a when h = 1 .

Figure 4.7.2 The arymproric eficienq of condirional leasr squares as afûncrion of A when a = 03.

Figure 4.7.1 shows that as a goes to zero the AE goes to 1, that is for srnall

values of a there is litîle ciifference between CLS and CML. However as a goes to one

the AE goes to O, that is there is a substantial advantage in using CML when a is large.

Figure 4.7.2 shows that as h increases the AE in estimating a rises while the AE in


estirnating drops. Ln both cases the AE appears to be approaching a lirnit of about 0.83.

For GLS estimation we can use Theorem 4.3.2 to h d bounds on the Godambe

mforrnation. Unfominately we found no Uicrease in information in moving nom CLS

estimation to GLS estimation. Tables 4.7.1 and 4.7.2 show that the CLS information is

contained within our bounds for the GLS information. Further the upper bound on the

GLS information is not significantiy closer to the CML information.

Gf S. CLS and when and

Next we tested our estimation methods on some rnisspecifed data. We simdated

200 series of length 100 using binomial thinning with parameter a = O 5 and

rnisspecifiing the arriva1 process b y letting the distribution of E , be uni f o m over {0,1,2}.

The resulting sampling distributions of â and h for the estimation methods CLS, GLS

and CML are summarized by box plots in Figures 4.7.4 and 4.7.5.

Figure 4.7.3 shows that CML estimates for a are strongly biased, in fact almost


al1 of the CML estimates for a are greater than 0.5. In contrast the CLS and GLS

estimates for a are oniy slightly b i a s d Note that the interpretation a has not changed,

that is a is the probability that a claimant in the current period continues to be a claimant

in the next period.

In Figure 4.7.4 we see that CML estimates for h are biased estimates of the mean

of E, which is 1, in fact almost ail of the CML estimates for j\. are smailer than 1. In

contrast the sample mean of the CLS and GLS estimates for A are very close to 1. We

can thuik of k as the mean parameter of E , , however it is not used to speciQ the

distribution of E , . Estimates of A. are therefore estimates of the mean of E , .

Box Plots (estimates for alpha)

Figum 4.7.3 Box plorr comparing the sampiing dLFtriburions of 6 when the am'val procas {E , } ir uni/onn over

(0.12) -

V) y k Z

E O : .- CI

W b 2{ c.

E 2 2 Y:

O _

Y - O

O a I 1 1 1

CLS GLS CML Estimation Method


Box Plots (estimates for lambda)

I I I 1 CLS GLS CML

Estimation Method

Figure 4.7.4 Box plors comparing ihe sarnpling disrribuiions of A, when the anival pmcess ( ~ , } i s unlform over

{O,[ 2).

Chapter 5

Testing for independence

In this chapter we consider testing if the thinning parameter a in the Poisson AR(I)

mode1 is zero, that is testing for independent observations. We have a strong belief that an

AR(1) mode1 is appropriate for WCB data and in such cases it may not be necessary to

test for independence. However the test for independence is interesting fiom a

mathematical point of view and would be necessary before fitting the mode1 to other data

sets, which may not be interpretable as a queue.

The first section examines what would happen if one blindly went ahead and used

the standard Gaussian AR(1) asymptotic results. The next section bases inference on the

conditional least squares estirnator for a . As noted in section 4.3 this estirnator is the

same as the Gaussian estimator for a , however this time the asymptotic properties of the

estirnator are worked out under the Poisson rnodel. We then derive a score based test and

show that it is asymptoticafly equivalent to the CLS based test. Finally we examine the

score function in detail about the point a = O , leading to non-standard Wald and

likelihood ratio tests.

5.1 Gaussian AR(1)

Suppose we decided to ignore the fact that our tirne series is integer valued, and used the

Chapter 5. Testing for independence

usual Gaussian AEt(1) model. In Example 4.1.5 we showed that &(â, -a) is

asymptotically normal with mean zero and variance L -a ' . Under the null hypothesis,

Ho :a = O , &â , is asyrnptoticaily standard normal. We reject the null hypothesis when

&ân is larger than the critical value 6 , , where 6 , is selected so that the probability of

committing type 1 error is 6 (the significance level of the test). That is, under Ho :a = 0 ,

~ ( & â > 6 , ) = ~ ( 2 , > Cs ) = 6 , where Z denotes the standard nomal random variable.

Note this is a one sided test and that a negative estirnate for a would lead us to accept

the nul1 hypothesis.

The power of the test is the probability of not comrnitting a type II error, that is

the probability of rejecting &:a = O when H,:O < a c 1 is mie. The power of the test

under the Gaussian assumption is,

In the next section we continue by using the correct asymptotic distribution for && .

5.2 Conditional least squares

A more sophisticated test for the hypothesis is based on the conditional least squares

method. In section 4.2 we showed that the CLS estirnate for a was the same as the

Gaussian baçed estimate, but with a larger asymptotic variance, 1 - a + a(l - a)2 /h .

Chapter 5. Testing for independence 112

However under the nul1 hypothesis both variance expressions reduce to 1 and hence we

would use the same cntical value for both tests- Another way to put this is that the

significance level of the test in section 5.1 is correct. However the power function in

section 5. L is incorrect under the Poisson AR(1) assumption since the variance used is too

small. The difference between the Gaussian and Poisson based variance is a(l - a ) ' / ~ , ,

which is small when h is large or whena is near zero or one.

To help assess this effect on the power fünction we consider two graphs of the

power function: fmt the power as a function of a , and second as a function of h .

Power of Gauçsian vs Poisson Test (function of alpha)

Figure 5.2.1 A cornparison of the power for the Gaussian and Poisson bas& tesu as a fitnction of OL. = 1 and n = 100.

Figure 5.2.1 show that for a between 0.01 and 0.16 the Gaussian based power is

understating the "true" power and for a larger than 0.16 the Gaussian based power is

overstating the '‘truc" power. However overall the error in the Gaussian based power


seems small. For example if the tnie model is the Poisson AR(1) with a = 030, h = 1

and we perfomed the test on a sample of size 100, then the probability of rejecting

&:a = O is 90.0% and not 93.2% as given by the Gaussian based power.

Figure 5.2.2 shows the following: Gaussian based power is independent of A., for

extremely small values of A. the "tnie" power based on the Poisson AR(I) model is large,

but as )c increases the %ueYy power quickly converges to the Gaussian based power. For

example if the tme model is the Poisson AR(1) with a = 0.01, h = 0.1 and we perfomed

the test on a sarnple of size 100, then the probability of rejecting &:a = O is 8.0% and

not 6.1 % as given by the Gaussian based power.

Power of Gaussian vs Poison Test (hinction of tam bda)

Lambda

Figrue 5.2.2 A cornparison of the p o w for the G w i a n and Poisson based teru as a fiction of ic . a = 0.0 1 and n = 100.

Chapter 5. Test ing for independence

53 Score test

In this section we derive a test for a = O by considering the distribu

fùnction when a = 0.

ition of the score

We begin by noting the following simplification for the conditional probabilities,

equation 4.1, when a = O , p(X,I X,-,) = e-'hxf /x, !. That is, the series is iid Poisson.

Making use of this simplification the score function (4.4.1) reduces to,

Under Ho :a = O, Ua (O, l ) is a zero mean square integrab le martingale. Its

quaciratic characteristic and variance are both equal to n(1 + h) . Since conditions for the

as

strong law hold, fiom Theorem 4.1.8 and 4.19, we have that n -' [O, (O, h)In +(1+ À) .

The martingale differences each have identical moments, and al1 of these moments exist

and are finite. This insures that the Lindeberg condition trivially holds. Thus the

conditions for the central limit theorem are satisfied and the distribution of the score is

To find the distribution of the score under the alternative hypothesis H,:O < a c 1

we rewrite the score,


The first tem in the score is a zero mean square integrable martingale. Applying the

strong law of large numbea we get,

Therefore under the alternative hypothesis n'% U= (0, A) will diverge since it has no

mean correction. This impiies that the test is consistent or that asymptoticdly the nul1

hypothesis is rejected with probability one when the alternative hypothesis is me . Also

note again that the test is one sided, that is large positive values of the score lead us to

reject the nul1 hypothesis.

In practice we replace the nuisance parameter h with an estimate. Under

H,:a = O the maximum likelihood estimate for the Poisson mean parameter is the

sarnple mean h = X . Plugging this estirnate into the score fûnction we get,

The second terni is O, (1) and the first term ignoring the 1/X is a zero mean square

integrable martingale with variance h2 . Again the centrai limit theorem holds and


d

together with the continuous mapping theorern we get n'Kx(O,i)+N(O,l) .

Altematively we could have found the distribution of the score by noting that,

where â, is the conditional least squares estimator for a. Since both and

1 -z:=l(~,-, -K)' converge in probability to h under the nul1 hypothesis and to n

h/(l -a) under the alternative hypothesis, the continuous mapping theorem implies that

1 and &ân both have the same asymptotic distribution. The difference Jn

between the score test and the conditional least squares t-test is in the estimate for h in

the denominator of the statistic.

To summarize, the test based on the Gaussian mode1 has the correct significance

level, but the wrong power. The error in the power, however, is not large. The test based

on the score is asyrnptotically equivalent under alternatives to the conditional least

squares t-test. So f?om the point of view of testing for independence it is interesting to

note that naively using the standard Gaussian test gives the correct result except for a

small error in the test's power.

5.4 The Score function on the boundary of the parameter space

In this section we take a detailed look at the score function about the point a = 0, and

although th is is a boundary point of the parameter space the score is well defined and


behaved.

Recall the interpretation of a is the probability that an individual in the systern at

t h e r - 1 remains in the system at time t . In such a case values of a less than zero

would not make any sense. However the probability function p(x;a ) = a' (1 - a)'-' , x=O,

1, is well defined for any real number a and its denvatives exist and are continuous.

Sirnilarly, even though the Poisson AR(1) likelihood is not defined for negative

values of a , we can still examine the function's mathematical properties at the point

a = O. To get an idea of what the Iikelihood looks like at a = O we simulated ten data

sets and plotted the likelihood as a function of a . Figure 5.4.1 shows five plots of the

likelihood for simulated independent and identically distributed Poisson sarnples with

mean 1, which correspond to the case where a = O. For cornparisons we have also

included five plots for the case were a is near zero in Figure 5.4.2.

alpha

Figure 5.4.1 Plots o,fthe likelihood as afwlction ofa forfive sirnulated samples ofsire 200 wirh a = O and A = 1.


alpha

Figure 5.4.2 Plors of the likelihood as a/iurction o f a for jve simulated samples of size 200 wirh a = 0.1 and A = 1.

in both figures the likelihood plots are smooth with a unique maximum. These plots do

not m w e r the question of whether or not the score function is well behaved about a = 0,

but suggest that the derivative of the likelihood about a = O may exist and be continuous.

Also note, the maximum for one of the simulation senes in Figure 5.4.1 occurs at a point

where a is negative.

We know that the score is defined at a = O, we even gave an expression for this

in section 5.3. However is it deked for a < O and if so how does it behave? Recall that

the conditional probabilities p(X, 1 X,-, ) , equation 4.1, are polynornials in a . One might

think that a negative value for a could cause the conditional probabilities to become

negative. To consider this possibiiity we rewrite the conditional probabilities as follows

The first term is positive and the summation is less than one in absolute value. Therefore


given a value 6 > O it is possible to find a neighborhood X, of a = O such that the

. I X,-, ) being non-positive is less than 6 .

1 X f - , ) is a polynomial in a , the denvatives of p(Xf 1 X,-, ) with

respect to a exist and are continuous. Further derivatives of the log(p(X, 1 X,-, )) with

respect to a will be the ratio of two polynomials in a , with the polynomial in the

denominator being positive for a in some neighborhood of zero. Hence the derivative of

log(p(Xl J XI- , ) ) with respect to a exists and is continuous for a in some neighborhood

of zero.

Before calculating the Fisher information matrix we note the following

simplification when a = O .

Substituting this into Equation 4.4.3 we find the observed Fisher information.

Under the nul1 hypothesis the expected information is


In a sùnilar manner we h d id = 1 and i, = l/A. The inverse of the Fisher information

Let an be the value of a that maximizes the likelihood when the search for a is

not restricted to (0,l). We cal1 an the likelihood maximizer to distinguish it f?om the

maximum likelihood estimate â n , which must lie in the parameter space (0,l). Under the

nul1 hypothesis an converges in probability to a standard normal randorn variable and

&â converges in probability to a random variable Z', where 2' is defmed as,

where Z has a standard normal distribution.

We cm define the Wald statistic in the usual way, bY, = n(ân - o ) ~ . If the

maximum likelihood estimator â is replace by the likelihood maximizer an then the

Wald statistic has the usual chi-square disiribution with one degree of fieedom. However


for the maximum likelihood estimate the Wald statistic converges to a modified chi-

square random variable defined by,

where x , has a chi-square distribution with one degree of fieedom.

We also define the likelihood ratio statistic in the usual way, that is 2 Log@),

where A = L(&, ,in; x)/L(o, F,,; X ) and 6, and in are the maximum likelihood

estimates and Zn is the sample mean. Our analysis at the beginning of this section

showed that the derivatives of the log-likelihood with respect to a exist and are

continuous at a = O. It is therefore possible to make the usual Taylor senes expansion of

the log-likelihood about a = O and show its asymptotic equivalence to the Wald statistic.

Example 5.4.1 Ln this exarnple we test o u illustrative data set for independence. Table

54.1 shows the test statistic values as well as the 5% and 1% critical values. In d l the

cases the nul1 hypothesis of independence is rejected.

Test Statistic Value 5% Critical Vaiue 1% Critical Value 1

CLS 4.66 1.645 2.33

1 1

Table 5.4. J Tests for independence in the iliustrative data set.


To assess how weil the tests for independence work for smail samples we apply

them to two sets of simulated data. For both sets of data we generate 1000 series of length

200. In the h t set we let a = O and h = 1 and in the second set we let a = 0.1 and h = 1.

Table 5.4.2 show the percentage of time the nul1 hypothesis of independence was rejected

by each test for the first set of data, while Table 5.4.3 show the percentage of time the

nul1 hypothesis of independence was rejected by each test for the second set of data.

Test

I 1 1

Table 5.4.7 The percenrage of rime the nuil hypothesis of independence was rejecred out of 1000 simulared series of lengrh 200 wirh a = O and ii = 1 .

CLS

Our Score

Percentage of rejections at the 5% level

Percentage of rejections at the 1 % levei

4.2%

4.3%

Test

I I 1

Table 5.4.3 The percentage of rime the nul1 hyporhesis of independence was rejected out of IO00 simulared series of lengrh 200 wirh a = 0.1 and h. = 1.

0.7%

0.6%

CLS

Our Score

Wald

LR

From Table 5.4.2 we see that the probability of committing type 1 error is about

the same for d l the tests. In Table 5.4.2 we see that the power of the CLS test may be



37.4%

36.3%

3 5 -9%

3 6 .O%

16.0%

15.0%

17.0%

15.5%


slightly higher than the other tests at the 5% level and that the power of the Wald test

may be slightly higher than the other tests at the 1% level. The table also shows that there

is a large probability of cornmithg a type II error when a is small.

Chapter 6

6. General misspecification test

In this chapter we apply a general specification test, sornetirnes called the hCormation

matrix test, to our model. The test is equivalent to a score test of whether the parameters

in the model are stochastic or not. Section 6.1 motivates the information test. Then in

Section 6.2 we derive the test for the simplest possible case and state the more general

result found in McCabe and Leybourne (1996). Finally in Section 6.3 we give the details

for the Poisson AR(1) model and evaiuate the test on some simulated data

An advantage of maximum likelihood estimation is that the asymptotic variance of its

estimators ofien attain the Cramer-Rao lower bound. However, a drawback to maximum

likelihood estimation is that the estimates are not robust to model misspecification. We

saw this in Section 4.7 where we found that for the misspecified model our parameter

estimates are severely significantly biased. It is therefore important to check a model's

specification if maximum likelihood estimation is to be used.

Let L and I denote the likelihood and log-likelihood respectively, and let 0 be a

vector of parameters. Also let the first and second partial derivatives of L and 1 with

respect to 0 be denoted by 4 , &, 2 , and & respectively. The expected Fisher

Chapter 6. General misspecification test

information c m be expressed in two ways: the Hessian form -E[&,] and the outer

product f o m ~[i , i [] . When the mode1 is correctly specified & + i,l: has a distribution

with mean zero. An equivalent expression for Ï, + & i: is L, / L , which is a zero mean

martingaie. The equivdence of these two expressions is well known and can be shown by

considerhg the second partial derivative of the log-likelihood as follows:

Bamdroff-Nielsen and Sorensen (1994) state that L, / L is a martingale. This is easy to

prove as follows: Let f, (y,) denote the conditional density of Y, given the past

observations, and write the likelihood as Ln = n:=lf, ( y, ) . Let 3, be the sigma field

generated b y y,, "v, , . . , y, . For rn < n we have,


Therefore $ / L is a martingale. Under certain regularity conditions, see Section 4.1, the

martingale central limit theorem can be used to find the asymptotic disiribution of

L +i,i;.

A good description of the information matrix test is found in White (1982).

Chesher (1983) shows that for sequences of independent observations the infoxmation

matrix test c m be interpreted as a score based test of whether the mode1 parameter are

stochastic. McCabe and Leybourne (1996) extend Chesher's result to a much more

general setting. Namely to sequences of random vecton, which are possibly dependent

and non-stationary.

6.2 Outline of the test

In this section we denve the score based interpretation of the information matrix test in

the simplest case and then state the more general result found in McCabe and Leybourne

(1 996).

Let y, ,y, ,-*,y, be a sequence of observations. We will assume that the

distribution of y, depends on an unobserved random parameter O , , and denote the joint


We will also assume the sequence of parameten {El,}:=, is uncorrelated and that

E[B,] = po and ,] = R 2 0 . If a = O then the parameters are not stochastic and the

observations are independent and identically disiributed. Let L(yl0) be the conditional

likelihood where y = (y, ,-,y,, ) and 8 = (8 , ; -,8 )' . Consider the following Taylor

series expansion of L(yl0) about the vector p = ( p o , - - - , p0 ) :

To proceed we need the following well-known property of the trace operator,

E ( X ' A X ) = n ( A L ) , where X is a n x p randorn rnatrix, A is a n x n ma& of

constants and E [ X ~ X] = Z . In the following we will use E, to denote expectation with

respect to 0 . The marginal likelihood is then,

where R = va@) = diag(ir) . Therefore if n is close to zero the likelihood cm be

approximated by Lu) = L(yl p) + ) n(4 (y1 p)n) . Using the relationship

L, = L(Z, + LZ;) f?om Section 6.1, we c m rewrite the likelihood approximation as


We are interested in testing the nul1 hypothesis H,:x = O (0 is not stochastic)

against the alternative hypothesis H,:x > O (0 is stochastic). The score with respect to

x is,

Evaiuated at rc = O the score simplifies to Un (y ,O) = + @((le + 2, z:)I,=, ) . It is not

completely obvious that this is a martingale since the number of parameten and the

dimension of 2, and Zo are al1 increasing with n .

Note that the elements of 1, are & f (y, 10, ) and the diagonal elernent of 1, are

f (y, 10, ) . The score therefore become U, (y,O) = x:=, {$ f (y, 10, ) a:

+($ f (y# ,))2}g=, . ?bis is, of course, the same as replaçing the n parameters in 0 with

the single (1 -dimensional) parameter p ,, , that is U,,(y,O) =

&en write the score a s U,,(y,O) = Zp0 + I , ~ Z ; , which is now clearly a martingale. Since


b t LL is a sequence of independent and identically distributed random variables and

assuming the derivatives of f with respect to p0 are continuous,

7 (5 f (y, 1 + (& f (y , ( is also a sequence of independent and identically t= 1

distributed random variables. Assuming the variance exists and is fmite, thai is

v v ) + ( ( 1 ) ) ) = G , the centrai limit theorem implies that

For the more generai case we let { be a sequence of p x 1 vectors of

observations. It is assumed that the marginal distribution of y, depends on the parameter

8 , which is a k x 1 random vector. Let L(yl9) and L(y) be respectively the conditional

and marginal likelihood, where y = (y:, ,y ) ' and 0 = (8 :, ,O ) ' . Also assume that

r E [ B ] = p and var[0] = Cl(II), where = ( x , , ,A ,) , and that R(0) = O,,, . McCabe

and Leybourne show that the marginal likelihood c m be approximated as

UY) = UYlP){l + +n((& + ~4g1*=, a)}

We are interested in testing the hypothesis that none of the parameters are

stochastic, &Il = O , against the alternative that at least one of the parameters is

stochastic, H,:n, > O for at least one i , 1 = 12, O , m . McCabe and Leybourne show that

the test statistic to be u e d in this case is UN = n{(l, + & ( B ) ~ . , - ~ ~ , @ l n d } or


equivalently U , = vec[(l, + 4 I [ ) L = ~ ] & v e c [ ~ ] e , where e is the rn x l unit vector and

vec is the operation of column stacking. Under &:Il = O U., is a zero mean martingale,

d

and under appropriate regularity conditions, again see section 4.1, (u): U , + N(0,l) as

N + m .

6.3 Details for the Poisson AR(1) model

In this section we present the details of the specification test for the Poisson AR(1) model

and apply the test to simulated data, some for which the Poisson AR(I) is the correct

specification and some for which the Poisson AR(1) is not the correct specification.

We will denote the conditionai probability of X, given X,-, and the parameter

values a , and h, by,

min( X,. . X,-1) -LI X x x,,-Xe A l f -

r=O X (xt - x)! .

We assume that the sequences {a,}' and {h,):=, are independent and identicdly r=I

distributed with the following means, variances and covariance: ~ [ a , ] = a , E [ L , ] = h ,

~ a r [ a , ] = n , 2 0 , var[ht]=n2 > O and ~ov[a,,h,]=-a, 50.

We cari justiQ the use of a negative covariance as follows: To sirnpliQ the

argument we assume a , = a and h, = h for al1 t. The marginal mean of X, is

p = h/(l - a) . If the mean p is fixed then increasing a corresponds to decreasing h


and we therefore assume a and h to be negatively correlated.

Our vector of random parameters is 9 = (a ,,az, ,an ,k,, h 2 , ,In)' and the

first and second derivatives of the log-likelihood with respect to 0 are given by

7

'0 = ( ' a , 7 ' a 2 9 ~ ' a m ~ ' ~ l , ' ; 2 ~ ?'A")

and

The variance matrix of 8 can be written as

where fn denotes the n x n identity matrix. The denvatives of R are


Finally the score statistic, to test the null hypothesis Ho: a, = Oy r = O & n, = O against

the alternative hypothesis Ha :ni > O for at least one i , is given by Un = CL, (12t ( a , 1)

+I,,(a,k)+ r,'t(a,h)+l,f(ayl) -2C (a,k)lkt(a,h)-21,,,r(ay~)). Note that al1 the

derivatives in this expression can be found in section 4.4.

If this joint test rejected the null hypothesis then the test can be separated into

individual tests to ~IY and identiQ which parts of the model are inadequate.

In the Poisson AR(1) model there are two random components: the thinning

operator and the arrival process. Recall in Section 4.4 we showed the Pearson residuals

could be decomposed into continuation and arrival residuals. In a sirnilar rnanner we can

decompose the score statistic into components. However in this case the decomposition is

into three parts: continuation component (for the binomial thinning operator), amival

component and an interaction component for the interaction between the binomial

thinning operator and the arrival process.

In the test statistic Un the component C' {C (a , A) + 1, (a, A)} tests the 1=1 r

adequacy of the binomial thinning operator to explain the variation in those who continue

to collect f?om one penod to the next. Binomial thinning assumes that the recovery of

individuals is independent, which is probably a reasonab le assumption. It is unlikely that

one individual's recovery time would affect another's, unless we had limited medical

Chapter 6. General misspecification test 133

services and recovery on one meant that the next individual could start treatment. The

binomial thinning operator also assumes that al1 individuals recover at the same rate. This

is a less realistic assumption, since there is wide variation in individual health due to

genetic factors and lifestyle choices, such as diet and exercise. Recovery rates should Vary

Eorn penon to person, since we would expect recovery rates to depend on the peeon's

health Unmediately pnor to injury or illness.

We now examine the thinning operator in more detail. The Bernoulli assumption

seems reasonable. That is, for individual i who is collecting disability at tirne t- l there are

two outcornes: recovers (stops collecting STD) or does not recover (continues to collect

STD). This, of course, ignores other possibilities, such as death, which we could either

assume has a negligible probability or c m simply be added to the recovery probability

and thought of as an exit probability.

However, each individual collecting disability at time 1-1 will have a different

level of health and therefore will have a different probability of recovery before the next

penod. The number of individuals who continue to collect at time t can be written as

x.' p i , where p i are independent Bernoulli random variables, and the probability that r = l

individual i recovers is P(P, = O) = 1- a, .

This thinning operator has more variation than the binomial thirining operator.

The question is how much more variation or is the binomial thinning operator sufficient

to account for the majority of the variation? If we found that the binomial thinnùig


operator was not sufficient to explain the variation, that is, the result of the specification

test was to reject the hypothesis that the binomial thinning parameter was non-stochastic,

then we would have to consider over-dispersed rnodels such as in McKenzie (1986)' Al-

Osh and Aly (1992) and Joe (1 996). The problem with the thinning operators used in

McKenzie (1986) and Al-Osh and Aly (1992) is that they are a random sum of geometric

random variables and it is possible to have a 0 X,-, > X,-, , which wouldn't make sense if

we want to use it to model a queue.

The component En (1: (a, h) + 2, (a, A)} of the test statistic U,, tests the t=1 r

adequacy of the Poisson distribution in describing the arriva1 process. Two reasons for

using the Poisson distribution are: first it has a simple probability rnass function with

well-known properties and second when combined with the binomial thinning operator

the marginal distribution of our process remains Poisson.

In practice we ofien use the Poisson distribution to model count data. The main

criticism of this is that most real data are over-disperse4 that is, the Poisson distribution

is not sufficient to describe the variation found in the data. One method for dealing with

over-dispened count data is to let the Poisson mean )i. be random. Usually it is assumed

that h bas a gamma distribution, which tmnsfoxms the distribution into a negative

binomial. For AR(1) models with negative binomial marginals see McKenzie (1986), Al-

Osh and Aly (1992) and Joe (1996).

Finally, the component (la (a, l ) l L t ( a , A) + (a, A))} tests if the amval t=l t


and departure processes are independent. If the workers are a small cohort then the arriva1

and departure processes are dependent, since, as workers recover, the cohort size

increases, which raises the cohort's exposure to injury and increases the number of new

injuries (arrivais). In most industries this is unlikely to be a problem since the number of

injured workers is usudly a very smail fiaction of the industries work force.

Next, suppose we believe that the binomial thinning operator with a fixed non-

stochastic parameter is sufficient to explain the variation in the nurnber of individuals

rernaining in the system f?om one penod to the next. However, we suppose we have

picked the arriva1 process to be Poisson out of convenience and wish to check the

adequacy of this specification. In this case, the score statistic simplifies to

(In = 1" (z;, (a, l ) + l , (a, A)) . Using the expression in Sections 4.5 we can rewrite the t=l

= C' {A' + E, [E i] - (1 + 2h) E, [E , 1) , which is, of course, is a zero mean martingale. For r = l

low count series, it is easy to calculate the variance numencally, which is given by

Since the martingale differences, - Ut-, , are bounded by k2 + X: which has

finite moments, the weak law of large numbers holds for the martingale differences and

P P îheir squares. That is, -t E [u, - u ~ - , ] = O and n-' [cl]. + E [(Ut - u~-, )2]

= var[ut - y - [ ] . The bound on the martingale differences aiso means that the Lindeberg


d

condition is satisfied; hence var[^,]-' Un + N(0,l) .

To evaluate how well this test performs, we apply it to some simulated data which

are correctly specified, and to some simulated data which are misspecified. In the first

case, we simulated 100 series of length 200 firom the Poisson AR(1) mode1 with a = 0 5

and )\. = 1 . At the 1% level of significance the information matrix test rejects 3 series and

at the 5% level it rejects 8 series. The number of rejections is only slightly higher tban

expected, so it appears that a sample size of 200 is sufficient for the distribution of the

test statistic to be approximately standard normal.

Next, to assess the power of the test we simulate some misspecified data by letting

E , follow the unifoxm distribution over the set of integers {0,1,2) , that is, P ( E , = i) = X ,

i = 0,1,2 . Note that the mean for 6, remains the same. This t h e , out of 100 series, the

information matrix test rejects 92 senes at the 1% level and al1 100 series at the 5% level.

This indicates that the test has strong power against this type of misspecification.

Erample 63.1 In this example we test the specification of the Poisson arrivals in our

illustrative data set. The information matrix test statistic is 1.45 @-value 14.7%). We

therefore accept the nul1 hypothesis that the Poisson parameter is non-stochastic.

Chapter 7

7. ModeIs with covariates

7.1 Mode1 definition and introduction

Let Xo , X, , . - , X, be a series of dependent Poisson counts generated according to the

following rnodel

where X, has a Poisson distribution with mean )co and { E , } ~ ,=I is a series of

independently distributed Poisson random variables with mean A, . The thinning operator

" O " is defmed as in Section 2.1. Given X,-, , a, 0 X,-, and E , are assumed to be

independent; this can be checked with the model specification test, see Sections 6.3 and

7.5. As in Section 6.3 we will denote the conditional probability of X, given X,-, as

pt(XtIXt-1) -

An easy way to incorporate covariates into the model is to use a link function,

which is the common method in generalized linear models. The idea behind the link

function is to map an unrestricted (real) parameter space for the regression coefficients

into the restricted parameter space required by the model. For example, since the

Chapter 7. Models with covariates

parameter space for a, is (0,l) , the following link is appropriate

where is an rn -dimensional vector of tirne-varying covariates and y E W m is an rn-

dirnensional vector of parameters. Similarly, taking h, = exp(z:p ) , where 2, is a p -

dirnensional vector of tirne-varying covariates and B E iRP , will insure that h, is

positive.

7.2 Forecasting

The fmt step in the process of forecasting is to find the k-step ahead distribution, that is,

the conditional distribution of X,-, given X,v . This distribution is defined by the

conditional moment generating function of X,v+k given X,; and is given in the

following theorem.

Theorem 7.2.1 For the Poisson AR(I) model defined in Section 7.1 the k step ahead

conditional moment generatingjünction is given by

Proofi The result is proved using induction. The one step ahead conditional moment

generating function is


Now, suppose that the k- 1 step ahead conditionai moment generating function is

where we define n a = O when i = N + k - 1. Then the k step ahead conditionai j=i+l

moment generating f ic t ion is

.V+ k where e' = es nf=+:+, a , + (1 - n a, ) . Substituthg this in for et gives


", +(l- gaiIr cxP{(es -1) '=hi fia,}. j=.V+I j = N 4 i = M + l j=i+l

Remarks

1. The distribution of X,+,I X, is a convolution of a Binomial distribution with

parameten n:+, a and X,, and a Poisson distribution with parameter

~,.n,=:+,.~. C=,w Hence, it has mean X, nr-fy+l - , a , + zNtk i=,v+l ki ,=i-+I a i and

2. From the conditional moment generating function the conditional distribution of X,

given X, is a convoiution of a Binomial distribution with parameters nr j=i a, and X,

and a Poisson distribution with parameter x: I = I Li n j = , a . Hence, if the unconditional

distribution of Xo is Poisson with mean ho, then the unconditional distribution of X, is

Poisson with mean 1, + 7c,-,a, + A,-,ara,-,+ +hoa,a,-, a,.

For cornparison purposes we define a corresponding Gaussian AR(1) mode1 with

covariates as follows:


where E , is nonnaily distributed with mean zero and variance o ' and the parameters X,

and a, are defined as in the Poisson rnodel. For this Gaussian AR(1) rnodel, X,,, / X,,

has a normal distribution with rnean XdV nSVTk j= N+I a + zrk J=N+I hi ny-::a,. - and variance

N+k-l (l + n /=,+, a )oz . As in section 3.1 the conditional mean of X,v+k 1 X , in the Poisson

and Gaussian models are the same, but the conditional distributions are quite different.

Let p, (x) be the conditional probability function of X,,, given X , and define

the conditionai median of X,v,, given X , as the smallest non-negative integer rn, such

that pk (x) 2 05 . This is the sarne definition used in Chapter 3. If, at time N. we x=O

know the future covariate values, {q}::: and {z,}:::, as well as the parameter values

y and p ; then the k step ahead conditionai probability function, p, (x), can be found

fiom the k step ahead moment generating. Proposition 3.2.1 says that the k step ahead

conditional median is the forewt that minimizes the k step ahead absolute forecast error.

With regard to forecasting count series, the biggest advantage in using a data

cohesive model is that individual probabilities (point mass forecasts) c m be caiculated for

ai1 possible fixture outcomes. This is especially useful for low count senes, where o d y a

few of the outcomes have non-zero probabilities.

If the purpose of a rnodel is to make forecasts, it is helpful to select covariates that

are either detenninistic or easy to forecast, since future covariate values are needed to


forecast with these types of models.

Next we consider the limiting distribution for a couple of simple cases.

Erample 7.2.1 Suppose that there are two different rates of new injuries, and each month

the rate switches back and forth. That is 7cl,+, = hl and h ,, = h, , t = O,1,2,- . Also

assume that the recovery rate is constant over tirne, that is a, =a fixed. The

unconditional distributions of XI,+, and X2( are Poisson with respective means,

hi +h,a +&a' + +7c,~"-~ +&a2'-' and X,+h,a+hza2+ +h2a2f-'+hla2r-'.

There is no limiting distribution in this case. However the following subsequentid limits

d d

hold X I + ( ) and X,, + ~('3) . Also if X, - P,(*.y) then the I - a 1-

unconditional or marginal distribution is X?,,, - P,('e:-) IQ and X2, - P, (h~?!) 1-a -

Example 7.2.2 Consider the following modification of Example 1. Suppose there are

two distinct seasons (winter and sumrner) and that the two seasons have different injury

rates, that is,


Example 2 is fairly realistic in that a lot of industries have two well defined

seasons. Exposure to injury is ofien highest during the summer months, when there is a

larger work force and more over-tirne. In contrast, during the winter months, when there

is less work, the work force is smaller and those employees who are working are

encouraged to take holidays. in British Columbia, the logging industry and the fishing

industry are two examples of industries whose exposure to injury changes according to

the season.

Ln Section 3.4 we showed how to construct confidence intervals for the

pro bability mass forecasts. The fo 110 wing modifications are needed when regressors are

included in the model. Suppose we have a

likelihood estimates for this sample by f ,

T asymptotically nomai with mean i y , , Po J

sample of size n and denote the maximum

A T and fi,. We will assume that f,,,P,) is

and variance n-'i-l , where i is die Fisher

information ma& and y and p, are the "îrue" parameter values.

The k step ahead probability fùnction c m be written as

n t k where a,, = n,.+,aS, k = 12,. - - and A* = k + ~ , + , , k - , +k+2a.+2.k-2 + +~.+t-~a,+it-~.~

+A,+, . By Theorem 3.4.1 for fked x, p, (XI X,,;f,, ,) bas an a~yniptotically n ~ n n d

distribution with mean p, (XI X, ;y , , P ,) and variance

Chapter 7. Models with covariates 144

where the rnatrix id' is partitioned as

and the matrices i;', i ~ l = (i& ) and ial are of dimension pxp. pxm and m. The partial

derivatives of the point mass probability pk (xi Xn;y , P) are given by

and

a a Expressions for the partial derivatives - pk (xi Xn ) and - pk (XI Xn ) are found in ' a n . , ah*

(4.4.1 ) and (4.4.3) respectively. The other 3 partial derivatives are given by


and

7.3 Estimation

The model parameter cm be estimated using the Newton-Raphson iterative scheme

discussed in Section 4.6. In this case, the vector of parameter is 0 = ( y , ,y Y . . ..y ,,,,

T P ,, P 2 , . . ., P, . The addition of regressors in the rnodel rnakes E[U@ '")] less practical.

In Chapter 8 we fit Poisson AR(1) models to some WCB data. In these models a

is held constant over tirne. The parameter estimates were found using a quasi-Newton

procedure, in which we calculate ~(8'") numencally. We f o n d the following starting

valuesworkedwella=0.9, P l =P, = = P , = O .

a We now consider the asymptotic properties of the model. Let Uy,, = Cr=, Z,, .a,

and (le = Cr=, $ h , denote the score function for the departme parameters y and the

anival parameters p respectively, where i,, = " 1 = - 1% P, (x, i 4 - 4 1, h r h r

ix, =$I=&logp,(X,IX,-,), &a, =qat:a,cI-a,) and $A, =Z,h,. We also denote the


martingale differences as 5, = U?, - LI ,-, and up, = UB, - Ug.l-i. Using the expressions

in Section 4.5 for the denvatives of the conditionai probability the martingale differences

can be written as

and

The following proposition shows this model cm be identified.

Proposition 7.3.1 The Poisson AR(I) mode1 defined in Section 7.1 can be identified if

and only if the jôllowing rwo matrices have full row rank: [ y , q , . .., Y,] and

ProoJ A statistical model is identifiable if its Fisher information is positive definite. It is

therefore sufficient to show that

T if and ody if a , = a , = = a , = b l = & = = b , = O , where a = ( a , , a ,,..., a,) and


T where c,, = a ,( 1 - a, 1 a and c,, = h , b ' ~ , . From Proposition 4.6.1 E[(c, ,~, , ,

+cLfiX,)' = O if and only if cl, = c2, = O . That is (7.3.1) holds ifand only if a T [ q , ,...

, Y,] = O and b r [ z 1 , &, .. . , Zn] = O which by assurnption hold if and only if a, = a?

- - - -a, = b , =b2 = = b , =O.

cl

If the regularity conditions for the martingale CLT hold then the score can be

used to make inference about the parameters y and p . Basically, if the covariate

processes are "well" behaved then these conditions will be satisfied.

As mentioned in exarnple 7.2.2, in many industries, exposure to injury is seasonal,

in which case the addition of seasonal covariate to the mival of injuries is appropriate. A

common method for modeling seasonality in monthiy data is to use indicator covariates

for each month. Let 2, = (z*, ,zt2 , .. . ,z, 12 )T , where i" component is one in month i and

zero in aII other months. That is,


Note, if a constant is included in the regressors then one of the monthly indicators must

be dropped.

Often it is more appropnate to use seasonal indictors rather than monthly

indicators. However, the number and length of the seasons is somewhat arbitrary. Some

authors prefer to use sinusoidai covariates, such as Zr = sin( 2rrtl l2) , s i q 2irt /12) ,- - - T

,sin( 2xr/12) . When ail 12 sinusoidai components are included both models are the

same. Usuaily o d y a few of the sinusoidal components are needed. However, this still

generates 12 dif5erent monthiy levels.

Suppose we have a model with 12 values of the thinning parameter, one for each

month of the year, that is, a , ,a2, ,a lZ are the thinning parameters for January,

February, , December respectively. Simiiarly, suppose the model also has 12 values of

the departure parameter, h , , h , , , h l, , again corresponding to the 12 months of the year.

Theorern 73.1 If X, has a Poisson distribution with mean p, then the marginal

distribution of X j is Poisson with mean p j , where

- - 1 pi - prnod12(j) -

l-a,az a,, ,.,

and


Pro08 Since XI,,+, is a convolution of a ,2,+k Xlzr+t-l and E 12r+k , its mean must be

equal to a, v,-, + h t , or, in other words, pk = a, p,-, + )ck . We will show this equation

holds for k = 1 , and omit the tedious details for k = 2,3, ,12. We begin by writing p,

and pi out in long form. That is,

and

0

It can be s h o w that the sequence of random vectors (XI,,+,, X,,,,,,

' a is stationary. *-*rX12,+12) ,=,

Proposition 73.2 I f the stochastic process X, follows the Poisson M(I) model as


defined in Section 7.1 and there exisi an a , < 1 and a A, < cn such that a , 5 a - and

l , c l - f o r a l l t , theni t i sa -rn ix inga(n)=Oa, .

The proof of Proposition 7.3.2 is very similar to that of Proposition 4.3.1, but the notation

required makes it tedious an is therefore omitted.

Proposition 73.4 Under the assumptions of Proposition 7.3.2 all the moments of X,

exist and are finite. Further, al[ the moments of u, and ci, exist and are finite, where

ut = U - U t , CI, is the score jûnction and u, is the matrix of partial derivative of ut

with respect to y and P .

ProoJ As noted in Remarks following Theorem 7.2.1 the marginal distribution of X, is

Poisson with mean h, + ,-,a, + h ,-,a ,a ,-, +- - +h,a ,a ,-, -a, which is bounded by

A- 1-a"l m;Ut /( 1 -a, > < m. Hence, al1 the moments of X , exist and are finite. The

second part follows fiom the fact that both u, and u, are bounded polynomiais in X, and

hence al1 of there moments exist and are f i t e .

Proposition 7.33 If in addirion to the assumptions ofProposition 7.3.1 we assume that

the model is idenifiable and X, is stationary, then the Fisher information, i. isfinite and

Chapter 7. Modeis with covariates

Proof By Proposition 7.3.2 X , is a -mixing, which combined with stationarity implies

that X, is ergodic. As a consequence of Theorem 4.1 .l ut and ut are both ergodic.

Further, u, and ut have finite variances due to Proposition 7.3.2. Hence conditions for

Corollary 4.1.1 are therefore satisfied and the result follows.

7.4 Testing

In the fmt part of the section we consider testing for independence, or more formally

testing the nul1 hypothesis, Ho :a = O , against the alternative hypothesis, Ha :O < a < 1 .

We begin by fmding the Fisher information matrix under the nul1 hypothesis.

a* -dl - " 1 =L/ Let 1, = p , X,IX,-, , in=$[ , , , - lm-, , a, m and

Lt = $ lrÀ, . Al1 of these partial derivative are found in Section 4.4. When a = O we have

the following simplifications:


The expected values of these under the assumption a = O are:

Chapter 7. Modeis with covariates

Next we calculate the partial derivative with respect to P .

- .. ah, ah/ . a%, - I^' apap A , dB'.

We then have


and

The Fisher information is therefore given by

We will assume there exist a positive defuiite manit i such that lim,,, in +P i and let o'

denote the fmt e n w in i-l .

We defme the following three sets of parameter "estimates": Let a, and 6, be

the unrestricted maximizers of the likelihood, cin and fi, be the maximum likelihood

estirnates (that is maximizing over the parameter space) and be the maximum

likelihood estimate of P when a = 0.

Under the nul1 hypothesis && /a converges in probability to a standard noxmal


random variable and &,/o converges in probability to a random variable Z', where

Z* is defined as,

where Z has a standard normal distribution.

If we respectively defme the Wald and likelihood ratio statistics as:

y = =.a: /G' and 2 log hi , where A = L(6i P,; x)/L(o, P ; X) . Then under the nul1

hypothesis both converges to the usual chi-square random variable with one degree of

fkeedom.

However if redefine the Wald and likelihood ratio statistics as: CY, = nâ /o '

and 2 log A> , where A = L(â n , 6 ,,; x)/L(o, B :; X) , respectively. Then under the null

hypothesis both converges to a modified chi-square random variable defined by,

where x , has a chi-square distribution with one degree of fieedom.

Consider the following zero mean martingale

where U,(a, p) is the score function with respect to a . Under the null hypothesis


U,(O, P ) is a mean zero martingale. However, under the alternative it is necessary to

subtract a 2 XI- fiorn &,, (O, P ) to get a zero mean martingale. Hence large positive r=l kt

values of U, (O, p) give evidence in favor of the alternative hypothesis. That is, the score

test is one-sided.

Next we consider the information rnatrix test for specification. We will allow

covariates in both the &val and departue process and use the same notation as in

Section 7.3. That is, we let ur = Ur -y-, , where Ut denote the score function at t h e t

and let ut denote the matrix of partial derivative of ut with respect to y and P .

The information matrix test is based on the following martingale: Mn = CL, mr ,

where mt = ln, utu: t ri, l,, and 1,, is a m+p vector of 1 'S. The quadratic variation

of Mn is given by [M] = Zn in: . Under the assumptions of Proposition 7.3.3 m, is I=I

stationary and a mixing. Further al1 the moments of m, exist and are firute. This is

sufficient for the conditions of Theorem 4.1.1 to hold and hence [MI;' Mn converges in

probability to a standard normal random variable.

Chapter 8

8. Application to counts of workers coilecting disability benefits

In this chapter we anaiyze five data series drawn fiom the WCB data set. Section 8.1

contains descriptions of rhe five data senes and preliminary analysis of the data. Next, in

Section 8.2, we cany out our in-depth analysis, which includes: model estimation,

selection and testing. We calculate forecasts for the first six months of 1995, in Section

8.3. Finally, in Section 8.4 we show what happens if the Gaussian AR(1) model is fit to

îhe data.

8.1 Workers' Compensation Data

We have selected the following five data series for analysis in this chapter: Each senes

contains monthly counts of claimants collecting STWLB fiom the WCB. Al1 the

claimants are male, between the ages of 35 and 54, work in the logging industry and

reported their claim to the Richmond service delivery location. The distinguishing

difference between the five series is the nature of the injury. We will refer to the five

senes as data set 1,2, 3,4 and 5. The claimants in data set 1 have bum related injuries. in

data set 2 the claimants have sofi tissue injures, such as contusions and bruises. The

claimants in data set 3 have cuts, lacerations or punctures. Claimants in data set 4 have

dennatitis and data set 5 contains claimants with dislocations.

Chapter 8. Application to counts of worken collecting disability benefits 158

Table 8.1.1 contains a summary of simple descriptive statistics for the five data

sets and Figures 8.1.1 through 8.1.5 contain time series plots for the five data sets. Plots

of the sample autocorrelation function and sample partial autocorrelation function are

found in Figures 8.1.6 and 8.1.7.

Maximum 1 Median Mean Data Set

1

2

3

4

5

Minimum

O

4

1

O

O

A property of the Poisson AR( 1) model is that the marginal mean and variance

Table 8.1.1 A summary ofs im~

17

2 1

3 - 4

should be the same, see Proposition 2.3.1. In Table 8.1.1 we see that for each data set the

9

5

O

1 *

mean and variance are close except for data set 3, where the variance is almost twice the

e srarkriajîor daru sers I rhrough

mean. However, if the Poisson mean is non-constant this would cause the variance to be

larger than the mean.

In the tixne series plot of data set 1, Figure 8.1.1, there is a significant change in

the pattern after the middle of 1993. It therefore seems unlikely that an AR(1) model will

fit the series well. This is M e r confirmed by the sample autocorrelation function, which

doesn't decay fast enough for an AR(1) model. Also, the first two lags of the partial

autocorrelation function are statistically significant at the 5% level, suggesting that an

AR(2) model might be appropriate. However, an AR(2) mode1 has the problem that it is

not easy to interpret and it has more than one specification. See the discussion at the

Chapter 8. Application to counts of worken collecting disability benefits

beginning of Section 2.4.

The difficulty with analyzing a series with very low counts, such as data set 1, is

that a single claim c m drastically change the shape and correlation pattern of the senes.

For instance, in data set 1 there appears to be a single claimant collecting STWLB

between May 1993 and December 1994. It is impossible to tell this for sure, since this

same pattern could be caused by several individual claims, although, based on the earlier

claims fiequency in the senes, this is less likely.

Further investigation shows that our conjecture of a single long duration claimant

is correct. Since the frequency of severe claims is low (one c l a h in ten years) and since

our mode1 is not designed to handle such claims we removed this outlier from data set 1.

To distinguish this new series tiom the original we refer to it as data set 1 *.

The sample autocomelation function and sample partial autocorrelation function

for data set 1 * are found in Figure 8.1.8. There is a slight seasonal pattern in the sample

autocorrelation function. However, a seasonal rnodel may not be necessary, since the

conelations at lag 6 and 12 are well within the 5% confidence limits. The sample partial

autocorrelation function indicates that an AR(1) rnodel is appropriate.

The clairns counts in the second data set are significantly higher than in data set 1.

Hence one or two persistent claims would not have a profound effect on the series' shape

and correlation pattern. The tirne series plot of data set 2, Figure 8.1.2, looks stationary

with possible seasonality. A seasonal pattern appears in the sample autocorrelations

function, which has with large negative correlations at Iags 7, 8 and 9, and large positive

Chapter 8. Application to counts of workers collecting disability benefits

correlations at lags 13 and 14. The sample partial autocorrelation fuoction is consistent

with the AR(1) model.

The claims counts in data set 3 are also relatively large, with a mean of 6.133. In

July 1988 the claims count is unusually high at 21 and is the only observation above 14.

The time series plot, Figure 8.1.3, shows a season pattern and a drop in the variation after

January 1990. The sample autocorrelation function confimis a seasonal pattern, while the

sample partial autocorrelation function suggest an AR(1) model is appropriate, see Figure

8.1.6.

The clairns counts in data set 4 are low. Between June 1990 and April 199 1 there

appears to be a persistent claim, again it is impossible to tell for sure without M e r

investigation. The later half of the senes appears to have a slightly lower claims

frequency than the first half of the senes. The sample autocorrelation function and partial

autocorrelation fûnction, Figure 8.1 -7, are consistent with an AR( 1 ) model.

In data set 5 the claims counts are low with slightly higher claims counts

occurring between January 1990 and December 1992, which again could be caused by a

single claimant with a severe or reoccurring dislocation. The sample autocorrelation

function and partial autocorrelation function, Figure 8.1.7, are consistent with an AR(1)

model.


Time Series Plot (data set 1)

Figure 8.1.1 A rime series plot of data set 1.

- - -


Figure 8.1.2 A tirne series plot of data set 2

--


Figure 8.1.3 A rime series plot ofdata set 3


Time Series Plot 1 (data set 4)

Figure 8.1.4 A rime series plor of dara set 4


Figure 8.1.5 A rime series plor of dara ser 5.

Chapter 8. Application to counts of workers collecting disability benefits 163

ACF (data set 1)

Lag Num ber

ACF (data set 2)

ACF (data set 3)

PACF (data set 1 )

"I

PACF (data set 2)

r---

PACF (data set 3)

Figure 8.1.6 A CF 3 and PA CF 's fi r data se& 1 IO 3.


ACF (data set 4)

l C c PACF (data set 4)

ACF (data 5 )

l 0 7 PACF (data set 5)

l 0 O

Figure 8.1.7 ACF's and PACF >for data sets 4 and 5.

ACF (data set 1') PACF (data set 1 ')

- -

=igure 8.1.8 A CF and PA CF for data set 1 *


8.2 Mode1 selection and testing

In this section we select and estimate a Poisson AR(1) model for each of our data sets.

We restrict the ciass of models by considering only models where the deparhue rate is

fixecl, that is, a, = a for al1 t, and where the amval rate is either constant or depends on

sinusoida1 seasonal regresson. The parameter estimates for data sets 1 *, 2, 3, 4 and 5 are

summarized in Table 8.2.3.

Our preliminary anaiysis in Section 8.1 suggested the following need for

regressors: data sets 1* and 2 rnight need seasonal regressors, data set 3 will almost

certainly need seasonal regressors and data sets 4 and 5 will not require seasonal

regressors.

In analyzing data set l*, we fail to h d any seasonal regressors for which the

coefficients were significantly different from zero, at the 5% level. We therefore

proceeded to analyze the model with a constant anival rate. The information rnatrix test

statistic for the joint test that neither the departue nor the arriva1 parameters are

stochastic is 0.240 (P-value 0.81), while for the individual test that the amival parameter

is non-stochastic is 0.134 (P-value 0.89). In both cases we accept the nul1 hypothesis of

non-stochastic parameters. Henceforth in this section we will refer to these two tests as

simply the joint information matrix test and the individual information matrix test.

In Table 8.2.3 we see that the lower bound of the 95% confidence interval for a is

close to zero, 0.007. It is therefore worth testing for independence in this series. In Table


8.2.1 we see that al1 of the tests reject independence at the 5% level and that the CLS and

WaId test reject independence at the 1% level. We conclude that it is unlikely the series is

independent.

In Figure 8.2.1 we have included three residual plots: Pearson, continuation and

arrival, see Section 4.4 for the development of continuation and arrivd residuals. Recall

the Pearson residuals can be decomposed into the continuation and arrival residuals as

follows:

where r,: = E, [a 0 4-, ] - aX,-, and r2; = E, [ E , ] - h are, respectively. the continuation and

mival residuals at time t. The residual can be standardized as follows: r , / ~ , - , [ ~ ' ] ~ ,

r , : / ~ , - , [ri:']' and r l / ~ , - , [ r ; : ] ~ . Recall E, is the operation of expectation conditional

on 3, =o(X, ,X ,,..., X,).

Since the continuation and arrival residuals are new ideas, we now aiialyze these

residuals in detail for data set I*. We begin by considering the following cases for the

standardized continuation residuals of data set 1 *:

Case 1. X,-, = O.


In this case, a 0 4-, given 4-, = O, is not random but identically equal to zero, hence

~,[a 0 4-,] = O . Since uX,-, is also equal to zero in this case, the residual at time t is zero.

This is a key observation and an important property. It shows that, in this case, dl of the

deviation between the observed value of X, and its expected value at tirne t-l is due to the

arriva1 process and not the continuation process.

Case 2. X,-, = 1 and X, = O.

In this case, a 0 X,-, given X, = O , is non-random and equal to zero since nobody

continues. Therefore ~ , [ a O X , - , ] = O . Since dl-, is positive, the residual at time t is

negative. For data set 1 * the standardized residual is -0.709.

Case3. 4-, = 2 and X, =O.

This case is similar to case 2, the difference is that CLY,-, is twice as large as in case 2. For

data set 1 * the standardized residual is -0.919. Note that the residual is not twice the value

in case 2, since, in this case, the standardization is conditional on X,-, = 2 and not

X[-, = 1.

Case 4. X,-, = 1 and X, = 1.

In this case, a 0 X,-, given 4-, = 1 and X, = 1, is a random variable taking values in the

set {O, 1 ). That is, the one individual collecting at time t is either contiouing to collect

fiom tirne t-1 or is a new cIaim at time t. For data set l* the standardized residual is


Case 5. X,-, = 1 and X, = 2.

In this case, a 0 4-, given X,-, = 1 and X, = 2 is again a random variable taking values in

the set {0,1). Note this is a different random variable than the one in case 4, that is

~ ( a 0 4-, = II&-, = 44 = 2) > ~ ( a 0 4-, = 114-, = 1,X, = 1). At t h e t there are two

possibilities: two arrivals or one arriva1 and one continuhg cl&. For data set l * the

standardized residual is 1.727.

Although other cases are possible these are the only five observed in data set 1 *.

In a similar manner we now analyze the standardized arriva1 residuals for data set 1 *.

Case 1. X,-, = O and 4 =O.

Conditional on X, = 0, E , is non-random and equal to zero, hence E, [ E , ] - O . This results

in a negative residual. For data set l* the standardized residual is -0.366.

Case 2. X,-, = 1 and X, = 0.

This case is the same as case 1 except that the scaling (standard deviation conditional on

X,-, = 1) is different. For data set 1 * the standardized residual is -.520.

Case3. X,-, = 2 and X, =O.

This case is the same as cases 1 and 2 except that the scaling (standard deviation


conditional on 4-, = 2) is different. For data set l* the residuai is -.654.

Case 4. 4-, = O and X, = 1.

In this case, 9 , given 4-, = O and X, = 1 is non-random and equal to 1 For data set 1 * the

standardized residud is 2.3 66.

Case 5. 4-, = 1 and X, = 1.

In this case, E , aven XI-, = 1 and X, = 1 is a random variable taking values in the set

(0,l) . Thus E, [E , ] > O . For data set 1 * the standardized residual is 0.63 8.

Case6. 4-, = l and X, =2 .

In this case, 8 , given 4-, = 1 and X, = 2 is a random variable taking values in the set

( 1,2) . Thus E, [E , ] > O . For data set 1 * the standardized residual is 4.043.

For data set 2 we found the coefficients for the following seasonal regressors

statistically significant at the 5% level: sin(2 x t/ 12) and cos(2 n t/ 12). If these regressors

are essential to the mode1 then the information matrix test for a simpler model should

reject the null hypothesis of non-stochastic parameters. For the model with a constant

arriva1 rate the joint information matrix test statistic is 9.537 (P-value 0.59), while the

individual information matrix test statistic is 0.417 (P-value 0.68). In both cases we

accept the null hypothesis of non-stochastic parameters. That is, the mode1 with a

constant arriva1 rate is sufficient to explain the variation observed in the series. The

Chapter 8. Application to counts of workee collecting disability benefits

simulation in Section 6.3 showed that the information matrix test had good power against

the misspecification considered. However, the power of the test may be lower for other

types of misspecification. We therefore need to check the residuals before making any

conclusions.

The residual plots for this simple model, Figure 8.2.2, look random. Further none

of the sarnple autocorrelation for the residuals are statistically significant at the 5% level.

We therefore choose to use the simpler model with a constant arrivai rate.

In our analysis of data set 3 we found the following seasonal regressors

statistically significant at the 5% level: sin(2 IT t/ 12) and cos(2 rr tl 1 2). The joint

information matrix test for this mode1 is 1.008 (0.31 P-value). Figure 8.2.3 shows the

residuais ploned in chronological order. At the 5% level, none of the sample

autocorrelations are significant. Ln Figure 8.2.3 the residuals are plotted against the two

seasonal regressors. These plots indicate no significant problems.

Before we c m go ahead and accept this mode1 we need to show that simpler

models are not adequate. We consider three simpler models with the following

regressorç: constant only (model l), constant plus sin(2 ir t/12) (model 2) and constant

plus cos(2 ir t/12) (model 3). The joint information matrix tests for these three models are

summarized in Table 8.2.4.

For model 1, the Pearson and continuation residuals have a significant sample

autocorrelation at Iag 12, whle the arrivai residuals have significant sample

autocorrelations at lags 2 and 12. These correlations in the residuals dong with the low P-

Chapter 8. Application to counrs of workea collecting disability benefits

value for the joint information matrix test lead us to reject mode1 1.

The Pearson, continuation and arrival residuals for model 2 have significant

sample autocorreIations at lags 2, 12 and 2, respectively. Although the P-.due for the

joint information matrix test is above .O5 we reject the model because the residuals are

correlated.

In the case of model 3 the residuals appear to be uncorrelated, however we reject

the model due to the low P-value of the joint information rnatrix test.

In the case of data set 4 we fit a model with a constant arrivai rate. The joint

information matrut statistic is 0.222 (0.82 P-value). This leads us to accept the nul1

hypothesis that the parameten are non-stochastic. In the three residual plots, see Figure

8.2.5, the suspected persistent daim is evident (observations 54-64). In the case of the

Pearson and arriva1 residuds this causes a band of residuals close to zero. However, in the

case of the continuation residuals this band of residuals is quite far fiom zero. This causes

the lag 1 sample autocorrelation for the continuation residuals to be significant at the 5%

level. Othenvise the residuals for this model look good, and we decide not to remove or

M e r investigate the suspected outlier.

For data set 5 we also fit a model with a constant arrival rate. The joint

information ma& test statistic is -0.5 14 (0.61 P-value) and therefore we accept the nu11

hypothesis of non-stochastic parameters. The residuai plots are found in Figure 8.2.6.

None of the sample autocorrelation for the residuds are significant at the 5% level.


RecaIl duration is the number of months that a claimant collects STWLB. In

Section 2.2 we showed that the mean duration is (1 -a)-'. We M e r showed how to

consbnict 95% confidence intervals for the mean duration in Section 3.5. The mean

durations, Table 8.2.5, for data sets 1 *, 2,3 and 4 are between one and two months. In the

case of data set 5 the mean duration is longer, between two and three months.

Month January February

March Apnl

May June July

August September

October November December

Table 8.2.2 This ru,

Test CLS Our Score Wald LR Usual Score

5.177 5.043 4.450 3.680

3.000 2.547

le dirplays the semonal am'val rare for data set 3.

Table 8-22 Tests for independence in doru set 1 *

Staîistic Value 2.456 2.256 6.893 5.037 5.090

5% Critical Value 1.645 1.645 2.71 2.7 1 2.7 1


and the upper and Iower 95% confidence limits.

Mode1 Test statistic P-value

2.1 14 Table 8.2.4 This table s u m a r k s rhe joint informarion matru : resr ofmodels 1.7 and 3 on data set 3.

'er estimai

I I

Table 8.2.5 This rable contains the mean duration and 95% confrdece intervaI for the mean durationJor data sers i **2, ...* 5.


Pearson Faasiduals Continuation Residuals (data set 19 (data set 1')

Arrivai Residuals (dataset 1")

Figure 8.2.1 Pearson. continuation and am*val residuafs plotted against tirne for data set 1 *

-- -- Pearson Residuals Continuation b s iduals

(data set 2) (data set 2)

--

Arrivai Residuals (data set 2)

Figure 8.2.2 Pearson. continuation and arriva1 residuals plotted against rime for data set 2.


Pearson bsiduais Continuation Fbsiduals (data set 3) (data set 3)

Arriva1 Residuals (data set 3)

- --

'igrrre 8 -23 Pearson. continuation and ~ m ' v a l residuals plotred againsr rime for data set 3.


Pearson &dduals Pearson bsiduais

Continuation Wsiduals Continuation ksiduals

Figure 8.2.4 Pearson. continuation and arrivai residuals plotted against mode1 regressors in dafa set 3.


Pearson fhsiduals Continuation Fbsiduals (data set 4) (data set 4)

Arrival ksiduals (data set 4)

Figure 8.2.5 Peatmn, continuarion and amval reduais plorted againsr rime for data set 4.

Pearson fbsiduafs (data set 5)

Arrivai bsiduals (data set 5)

- Continuation Residuals

(data set 5)

Figure 8.2.6 Pearson, continuation and arrival raiduais piorted againsr timefor data set 4.


8 3 Arriva1 process

The Poisson AR(1) rnodel assumes the continuation and amival processes are not directly

observable. For the five data sets described in Section 8.1 we were able to obtain the

arrivd data &om the WCB. To distinguish the two sets of data we will refer to the anivals

as data sets 1A, 2A, 3A, 4A and 5A. In general data of this fom may not be readily

available, since it requires more detaiied record keeping. Since these data are available we

consider the following two questions: how well were we able to estimate the arriva1 rates

with the Poisson AR(1) model? Are the arrival processes Poisson?

We begin by estimating the arrival rates for the arrival data, which we assume to

be independent and Poisson. For data sets lA, 2A, 3A and 5A we assume a constant

mean, while for data set 3A we assume a seasonally changing mean. The maximum

likelihood estimates and 95% confidence intervals for the parameters are found in Table

8.3.1. For data sets 1 *, 4 and 5 the parameter estirnates fiom the Poisson AR(1) model

are contained within these 95% confidence limits. The estimated anival rate for data set

2A, 4.475, is contained in the some what wider 95% confidence interval for the sarne

parameter in the Poisson AR(1) model. In the case of data set 3, two of the three

parameter estimates fiom the Poisson AR(1) model are contained within the 95%

confidence intervals for the same parameter as calculated fiom data set 3A. The estimate

for p, in data set 3A is contained in the 95% confidence interval for P , as calculated

f?om data set 3A.


Table 8.3.2 displays the estimated seasonal arrival rate for data set 3A. These rates

are slightiy lower than those estimated by the Poisson AR(I) model. Also note the

seasonal pattern has shified by one month. That is, the lowest and highest months are

respectively February and August, where as for the Poisson AR(L) model they are

January and July.

Overall, the Poisson AR(1) model appears to do a good job at estimating the

amival rate. Next we check the Poisson assumption.

If the Poisson assumption is correct we would expect the mean and variance of

each arrival process to be close. This is the case for data sets lA, 4A and 5A. However,

for data set 2A the variance, 7.17 1, is "rnuch large? than the mean, 4.475. Since the

arrival rate is non-constant for data set 3A we would expect the variance to be larger than

the mean. The Poisson specification can be formally tested by the information matrut test

as foIlows.

The Poisson probability function is

The first and second denvative of log[p( X, )] are:

and

Chapter 8. Application to counts of workers coliecting disability benefits

Therefore the information matrix test statistic is

Since the denominator is constant, we consider the following statistic instead

d We have that [ U ( ~ ) ] ; ' L ~ , ( X ) + N 0,l , since the data are independent and identically

distnbuted with finite moments. Note the test is basically checking whether the mean and

variance are the same.

The parameter A is unknown, so we replace it with (the maximum likelihood

estirnate). LI, (X) is related to Un@) as follows:

The results of the information matrix test are summarized in Table 8.3.3. In the

case of data set 2A, the P-value is smail which suggests the arrivals are unlikely to be


Poisson.

Finally we check the assumption of independent amivals. None of the fmt 12

sample autocorrelations for data sets lA, 3A and 5A were significant at the 5% level.

Note for data set 3A we used the sarnple autocorrelations in the residuals. For data set 2A

lags 1 and 12 had autocorrelations that were significant at the 5% level. This indicates

possible seasonality as well as violation of the independence assumption. For data set 4A

the sixth lag was significant at the 5% level, indicating possible seasonality.

To conclude, the Poisson AR(1) model adequately estimates the anival parameter.

Al1 of the data sets expect data set 2A appear to be Poisson and independent.

95% C.I.

(0.064,0.204)

(3.898,6.478)

(1 .O39,l.46 1)

Table 8.3.1 This table summarïes the parameter estimation for the arriva1 processes in data se& I A to 5A; included are the parmeter estimutes and 95% confidence interval. The fast nvo colwnns coniain esrirnated am'val rates and 95% confidence intervalsjhm the Poisson AR(1) model.


Arriva1 Rate Arriva1 Rate (arrivai data) (Poisson AR( 1 ))

1.987 2.353

February 1.916 2.415

March 2.095 2.737

Apnl 2.538 3.310

May 3.235 4.060

June 4.064 4.783

Jul y 4.735 5.177

August 4.9 1 1 5.043

September 4.490 4.450

October 3.706 3.680

November 2.908 3 .O00

December 2.3 15 2.547 Table 8.3.2 This table displays the seasonal amval rare for data set

Data Set Test P-value S tatistic

1A -0.543 0.59 2A 2.954 0.00 3A 1.407 O. 16 4A -0.385 0.70 SA 0.170 0.87

Table 8.3.3 7313 rable sununarizes the in jnnarion matrir test for data sets I A - j A .

8.4 Forecasting

In this section we calculate forecasts for the first 6 rnonths of 1995, which are found in

Tables 8.4.1-8.4.6.

Since the models used for data sets 1 *, 2,4 and 5 are simple, that is the arrival and

departure rates are fxed over tirne, we cm apply the forecasting techniques of Chapter 3.

For these four data sets we have caicuiated individual 95% confidence intervals for the k-

step ahead conditional distribution k=1,2, ..., 6, oo .


Note that in each case the 95% confidence intervals for the 6-step ahead

conditional distribution are very close to the 95% confidence intervals for the marginal

distribution. Therefore if we require forecasts beyond six months into the fùture we can

simply use the marginal distribution.

A few of the lower bounds in Tables 8.4.4 and 8.4.5 are negative. This is because

the confidence intervals are constnicted by applying an asymptotic normal result to a

finite sarnple and are therefore only approxirnate. However, the confidence intervals that

are affected by a negative lower bound are for probabilities that are very srnall (less than

1 %).

In Section 7.2 we show how to constmct individual confidence intervals for the k-

step ahead conditional distribution when the arrivai and departure rates depend on

regressors. However, to practically apply these results would require a significant arnount

of programming. Therefore for data set 3 we have only calculated the k-step ahead

conditional distribution. A possible alternative for constructing confidence intervals for

the k-step ahead conditional distribution is to use a bootstrap methoâ, see Efioon (1982),

which may be more practical than the method in Section 7.2.

The marginal distribution for data set 3 is given in Table 8.4.6. Notice the

marginal mean peaks in August, which is one month d e r the arrivai rate peaks, see Table

8.2.2. Also note, there is a fair bit of difference between the 6-step ahead conditional

distribution and the marginal distribution for June.

Chapter 8. Application to counts of workers cokcting disability benefits


L Table 84.3 The k-siep ahead condiiiond means, medians. tnoder and poinr mms/oreca.stfÔr data set 3.


marginal and distributions for dafa set 3.

8.5 Gaussian AR(1) models

In this section we fit the Gaussian AR(1) model, see Example 4.1.1 for the model

definition, to our five data sets and compare the results to the Poisson AR(I) model.

Table 8.5.1 contains the parameter estimates for the Gaussian AR(1) model. Most

of the estimates are quite close to the corresponding parameter estimates of the Poisson

AR(1) model. The largest difference is in the estirnate for a in data set 4, where the

Gaussian mode1 estimates 0.291 and the Poisson model estirnates 0.404. However, the

Gaussian estirnate is still well within the 95% confidence interval for a given by the

Poisson model.


The Gaussian model gives wider 95% confidence intervals for A than the Poisson

model. In fact for data set 1 * the Gaussian 95% confidence interval for h includes zero.

For data sets 2, 3 and 5 the Gaussian models give wider 95% confidence intervals for a

than the Poisson model. While for data sets 1 and 4 the Gaussian model gives narrower

95% confidence intervals for a than the Poisson model.

Note the parameters p, , p and p, for data set 3 can not be directly compared to

the corresponding parameters in the Poisson model, since in the Gaussian model we have

not used an exponential link function. However, the estimated amval rates for the

Gaussian model, Table 8.5.2, can be directly cornpared to the arriva1 rates of the Poisson

model. While the mival rates calculated by the Poisson AR(I) model are slightly higher

than those calculated directly £iom the arrival data the arrival rates calculated by the

Gaussian AR(1) model are slightly lower those calculated directly fiom the an-ival data

The biggest advantage in using the Poisson AR(1) model over the Gaussian

AR(1) model is in the area of forecasting. Figure 8.5.1 displays Poisson AR(1) model

forecasts and the Gaussian AR(1) model forecasts for the first six months of 1995 for data

set 1 *. From the Poisson AR(1) model we have calculated the forecast distribution, which

is the bar portion of the chart. The three lines on the chart mark the predicted values and

the 95% prediction interval for the Gaussian AR(1) model.

The cumulative probabilities are marked on the left Y-axis. For the 1-step ahead

distribution there is a 0.875 probability that the count will be zero and a 0.117 probability

that the count will be 1. While for the 6-step ahead distribution there is 0.838 probability


of a count of zero and a 0.148 probability of a count of 1.

The numbers for the predicted values in the Gaussian AR( 1 ) mode1 are labeled on

the nght Y-ais. The mean predicted value is about 0.2 and the 95% prediction interval is

approxirnately between -0.6 to 1 .O.

The lower bound of the 95% prediction interval as given by the Gaussian AR(1)

mode1 is also negative for data sets 3, 4 and 5. in data set 2 the counts are sufficiently

high, so that the lower bound is positive. This illustrates the importance of using

appropnate distributional assurnptions in modeling.

Data Set

1*

* 2 a

2 h 3

3

3

3 4

4 h 5

Parameter

Table 8.5. I The Gaussian AR(1) mode1 parameter estimates for dazu sers 1 ro 5.

Gaussian Estimate

a

h

a

(0.047,0.395)

(-0.045,0.3 17)

(0.289,0.609)

(3.823,6.977)

(0.330,0.648)

I

0.22 1

0.136

0.449

5.400

0.489

Gaussian 95% C.I.

0.240

0.134

0.472

5.188

0.406

p , (constant)

p , (sin(2n z : 12))

p , (cos(2r r I 12))

a

cc

Poisson Estimate

(0.007,0.472)

(0.064,0.204)

(0.344,0.599)

(3.898,6.478)

(0.294,0.5 19)

1.250

-0.243

-0.3 15

0.404

0.170

Poisson 95% C.I.

(1.039J.461)

(-0.401 ,-O.OS 1)

(-0.483,-0.147)

(0.203,0.604)

(0.090,0.25 1)

1

6.147

-1.683

-1.138

0.29 1

0.199

0.59 1 0.652 1 (0.539,0.765)

(5.192,7.102)

(-2.794,-0.572)

(-2.238,-0.038)

(0.120,0.462)

(0.014,0.384)

(0.447,0.735)

0.376 (0.153,0.599) 0.333 1 (0.209,0.457) r


J a n w February

March April

May June July

August

September

October November December

Amval Rate Gaussian AR(1)

1.87 1

2.008

2.449

3.076

3.720

4.209

4.4 12

4.274

3.833

3 -207

2.563

2.074

Arrival Rate Poisson AR(1)

2.353

2.4 15

2.737

3.3 10

4.060

4.783

5,177

5 .O43

4.450

3 -680

3 .O00

2.547

Arrival Rate (arrivai data)

1.987 1.916

2.095

2.538

3.235

4.064

4.735

4.91 1

4.490

3 -706

2.908

2.3 15 russian A R(l ) model.

Forecasts (data set 1')

k a p s ahead .+ lower M.

Figure 8.5. f ïhe bar chart repraents the k-step ahead conditional cumrrlative distribution for the Poisson AR(!) model, while the fine graph represents the forecastsfmm the Gaussian AR(]) model.


in this chapter we have illustrated the methods developed in the earlier chapters.

The prelirninary analysis is identical to the preliminary analysis of continuous valued

time series. This includes t h e series plots, as well as plots of the autocorrelation and

partial autocorrelation function. This analysis gives us a starting point for model

selection.

Ln Chapter 4 we showed that the parameters c m be estimated via maximum

likelihood estimation and gave expression for the observed Fisher information. The

expected Fisher information is easily evaluated numerically, and can be inverted to get

the asymptotic variance of the parameter estimates. This allows us to perform t-tests on

the model parameters and is useful in determining which covariates to include in the

model. For models with covariates it is more dificult to calculate the expected Fisher

information. In this case the observed Fisher information can be used and cm easily be

calculated by numerically differentiating the score function.

If the t-statistic for a is small, we can M e r test for independence with the tests

found in Chapter 5. This was the case for data set l*, which we concluded to be a

dependent series.

Mode1 selection is M e r refinea with the help of the information matrix test, see

Chapter 6, and the new residuals defined in Section 4.4. The information matrix test

checks whether the model is sufficient to explain the variation found in the data. Patterns

in the residuals may indicate the need of additional regressors. We found the simple

Chapter 8. Application to counts of workea collecting disability benefits

Poisson AR(1) model without regressors was sufficient to model data sets 1 *, 2, 4 and 5.

In the case for data set 3 the model required the following two seasonal regressors in the

&val process: sin(2xtll7) and cos(2m/12).

The WCB was able to provide us with the amival process, which the Poisson

AR(1) rnodel assumes to be latent. From these data we were able to directly estimate the

&val parameter. We found these estimates to be close to the estimates found using the

aggregate data. The additional data also allowed us to check whether the arrivals were

independent and Poisson. Independence was assessed using the sample autocorrelation

function. We found that the anivals for data set 2 appeared to be dependent. Further we

found that the anivals for data set 4 may be seasonal, which wasn't indicated in the

aggregate data. The information ma& test was used to assess the Poisson assumption,

which was only rejected in the case of data set 2. We conclude the Poisson assumption

appears to be realistic for the WCB data.

In Chapter 3 we considered forecasting for the Poisson AR(1) model. If data

cohesion is considered important the k-step ahead conditionai median or mode can be

used as a forecast. The k-step ahead conditional rnean can also be used as a forecast.

However it is not integer valued. For low count senes we proposed using the k-step ahead

conditional distribution as a forecast. In the case of data sets l*, 4 and 5 the counts are

low enough that the k-step ahead conditionai distribution is easy to read. However for

data sets 2 and 3 the k-step ahead conditional distribution is much harder to read, since

the non-zero probabilities are spread over more non-negative integers.


The analysis in Chapter 8 concluded with a look at the results nom fitting a

Gaussian AR(1) model to the 5 data sets. We found that the parameter estimates were

similar to the Poisson AR(1) estimates. We know from Section 4.7 that the loss of

efficiency in using the Gaussian AR(1) likelihood increases with a, but that the Gaussian

estimates are more robust than the Poisson estimates. With the possible exception of data

set 2, our analysis has found the Poisson AR(1) adequate. Further in four of the five series

the estimates for a was above 0.400. We therefore favor the Poisson estimates. Finally

using the Gaussian AR(1) prediction intervals is meaningless, since the data are discrete

vdued. In fact we saw that the Gaussian AR(1) prediction intervais can include negative

values, which again is meaningless for non-negative data

We conclude with some fiiture possible avenue for analyzing the WCB data. Our

analysis only considered simple sinusoidal regressors. Better models may be found by

considering econornic regressors, such as employment rates, weather conditions, sales

etc.

The WCB often funds accident prevention prograrns and is hterested in whether

or not the program has had an effect on injuy rates. Indicator regressors can be used to

model the change in the arrivai rates before and after the start of an accident prevention

program. We can then estimate the change and test whether it is significantly different

than zero.

Bibliography

Al-Osh, M.A. and M y , E.A.A. (1992). First order autoregressive t h e series with negative binomial and geometric marginals. Communications in Statistics A 21, 2483-2492.

Al-Osh, M.A. and Aizaid, A.A. ( 1987). First-order integer-valued autoregressive (INAR(1)) process. Journal of Tirne Series Analysis 8,26 1-275.

Al-Osh, M.A. and Alzaid, A.A. (1988). On maximum likelihood estimation for a subcritical branching process with immigration. Pakistan Journal of Statistics 4, 147-156.

Alzaid, A.A and Al-Osh, M.A. (1 988). First-order integer-valued autoregressive (WAR(1)) process: Distributional and regression properties. Statistica Neerlandica 42, 53-6 1 .

Alzaid, A.A. and Al-Osh, M.A. (1990). An integer-valued pth-order autoregressive structure @KR@)) process. Journal of Applied Probability 27,3 14-324.

Barndorff-Nielsen, O.E. and Sorensen, M. (1994). A Review of Some Aspects of Asymptotic LikeIihood Theory for Stochastic Processes. International Statistical Review 62 133-165.

Box, G.E.P. and Jenkins, G.M. (1970). Time Series Analysis, Forecasting and Control. San Francisco: Holden-Day.

B illingsley, P. ( 1986). Probability and Measure, 2nd Edition. New York: John Wiley & Sons.

Brockwell, P.J. and Davis, R.A. (1987). Tirne Series: Theory and Methoh. New York: Springer-Verlag.

Brown, B.M. (1 97 1). Martingale Central Limit Theorems. The Annals of Mathematics and Statistics 42,59-66.

Chan, K.S. and Ledolter, J. (1995). Monte Carlo EM estimation for tirne series models involving counts. Journal of American Statistical Association 90,242- 252.

Chant, D. (1974). On asymptotic tests of composite hypotheses in nonstandard conditions. BiometnXa 61,29 1-298.

Chesher, A. (1 983). The Infornation Ma& Test, Simplified Calculation Via a

Score Test Interpretation. Econornics Letters 1 3 , 6 4 8 .

Chow, G.C. (1960). Tests of equality between sets of coefficients in two linear regressions. Economet&a 28, 59 1-605.

Cox, D.R. and Hinkley, D.V. (1974). Theoreticai Statistics. London: Chaprnan and Hall.

Cramer, H. And Wold, H. (1936). Some theorems on distribution functionsl. Journal of London Mathematical Society 11,290-295.

Crowder, M.J. (1 976). Maximum Likelihood Estimation for Dependent Observations. Journal of the Royal Statistical Society, Senes B 38,4543.

Davidson, J. (1994). Stochastic Limit Theory. New York: Oxford University Press.

Davidson, R. and MacKinnon, J.G. ( 1 993). Estimation and In ference in Econorneh-ics. New York: Oxford University Press.

Efion, B. (1 982). The Jackknif, the Bootsirap and other Resampling Plans. Philadelphia: Society for Industrial and Applied Mathematics.

Fahrmeir, L. (1 98 8). A Note on Asyrnptotic Testing Theory for Nonhomogeneous Observations. Srochastic Processes and their Applications 28,267-273.

Fahrmeir, L. and Tua, G. ( 1 994). Multivariate Statisticul Modeling Based on Generalized L ine~r Models. New York: Springer-Verlag.

Ferguson, T.S. (1996). A Course in Large Sample Theory. New York: Chapman and Hall.

Godarnbe, V.P. and Heyde, C.C. (1987). Quasi-likelihood and optimal estimation. International Statistical Review 55,23 1-244

Godfrey, L.G. (1 988). Misspecifcation tests in econometn'cs: The Lagrange rnult@lierprincipie and other approaches. New York: Cambridge University Press.

Hall, P. And Heyde, C.C. (1980). Martingal Limit Theory and its Application. New York: Academic Press.

Hall, W.J. and Mathiason, D.J. (1990). On Large-Sample Estimation a ~ d Testing in Parametric Models. International Statistical Review 58,77-97.

Hamilton, J.D. (1994). Time Series AnaZysis. Princeton: Princeton University Press.

Harvey, A.C. (1 98 1). Time Series Models. Oxford: Allan.

Bibiiography 195

Harvey, A.C. and Fernandes, C. (1989). Time series models for count or qualitative observations. Jouma l of Business & Economic Statistics 7,407-4 1 7.

Hiller, F.S. and Lieberman, G.J. (1986). Interoduction to Operations Research. 4th edition. Oakland: Holden-Day.

Jacobs, P.A. and Lewis, P.A.W. (1977). A rnixed autoregressive-rnoving average exponential sequence and point process (EARMA 1,l). Advances in Applied hbab i l i t y 9, 87- 104.

Jacobs, P.A. and Lewis, P.A.W. (1978). Discrete time series generated by mixtures. L: Correlation and runs properties. Journal of the Royal Statistical Society Series B 40,94- 1 05.

Jacobs, P.A. and Lewis, P.A.W. (1978). Discrete time series generated by mixtures. II.: Asymptotic properties. Journal of the Royal Statistical Society Series B 40,222- 228.

Jacobs, P.A. and Lewis, P.A. W. ( 1978). Stationary discrete autoregressive-rnoving average tirne series generated by mixtures. Journal of Time Series Anabsis 4, 19- 36.

Jin-Guan, D. and Yuan, L. (1 991). The integer-valued autoregressive @JAR@)) model. Journal of Tirne Sena Analysis 12, 129-142.

Joe, H. (1 997). Multivariate models and dependence concepts. London: Chapman & Hall.

Joe, H. (1996). Time series models with univariate margins in the convolution- closed infinitely divisible class. Journal of Applied Probability 33,664-677.

Jprgensen, B., Lundbye-Christensen, S., Song, X.-K. and Sun, L. (1995). A state space model for multivariate longitudinal count data. Technical Report #148, Department of Statistics , University of British Columbia

Jprgensen, and Song, X X . (1 998). Stationary tirne-series models with exponential dispersion model margins. Journal of Applied Probability 35 (to appear).

Kalman, R.E. (1 987). Regression methods for non-stationary categoricd tirne series: Asymptotic estimation theory. Annals of Statistics. 17,79-98.

Klimko, L.A. and Nelson, P.I. (1978). On conditional least squares estimation for stochastic process. The Annals of Statistics 6, 629-642.

Little, J.D.C. (196 1). A proof for the queuing formula L = hW. Operations Research 9,383-387.

MacDonald, I.L. and Zucchuii, W. (1997). Hidden Markov and Other Models for Discrete- Valued Times Series. Monographs on Statistics and Applied Probability 70. London: Chapman and Hall.

McCabe, B and Tremanyne, A (1993). Elements of modern asymptotic theory with statistical applications. Manchester: Manchester University Press.

McCabe, B. and Leybourne, S. (1 996). A General Test. Working Paper.

McKenzie, E. (1 988). Some ARMA models for dependent sequences of Poison counts. Advnnces in Applied Probability 20,822-835.

McKenzie, E. (1 986). Autoregressive moving-average processes with negative- binomiai and geornetric marginal distributions. Advances in Applied Probability 18,679-705.

McLeish, D.L. ( 1 975). A maximal inequality and dependent strong laws. Annals of Probabiliq 3,826-836.

Pierce, D.A. (1982). The Asyrnptotic Effect of Substituting Estimators for Parameters in Certain Types of Statistics. The Annals of Statistics 10,475-478.

Ross, S.M. (1983). Stochastic Processes. New York: John Wiley & Sons.

Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics. New York: John Wiley & Sons.

Song, X.-K. ( 1996). Some sratistical rnodels for the multivariate ana(vsis of longitudinal data. Ph.D thesis. Department of Statistics, University of British Columbia

Sprott, D.A. (1983). Estirnating the parameters of a convolution by maximum likelihood. Journal of the Arnerican Statistical Association 78,457-460.

Stouî, W.F. (1974). Almost Sure Convergence. New York: Academic Press.

White H. (1982). Maximum Likelihood Estimation of Misspecified Models. Economeirica 50, 1-25.

White H. (1984). Asymptotic Theory for Econometricans. New York: Academic Press.

White, H. And Domowitz, 1. (1984). Nonlinear regression with dependent observations. Econometnca 52, 143- 16 1.

Wooldridge, LM. ( 199 1 a). On the application of robust, regression-based diagnostics to models of conditional means and conditional variances. Journal cf

B ibliograp hy

Economen-ics 47,546.

Wooldndge, J.M. (1 99 1 b). Specification testing and quasi-maximum-Iikelihood estimation. Journal of Econorneirics 48,2945-

Zeger, S.L. (1995). A regression mode1 for time series of counts. Biomehikn 75, 62 1-629.

Zeger, S.L., Liang, K.-Y. and Self, S.G. (1985). The analysis of binary longitudinal data with time independent covariates. Biometrika 72,3 L -38

Appendix

The following is a list of the data sets used in this thesis. Data set O refen to the illustrative data set introduce in Section 2.5, and unIike the other series starts at J a n u q 1987.

O 1 1' 2 3 4 5 1A 2A 3A 4A 5A Jan-85 0 0 9 6 0 0 0 2 2 0 0 Feb-435 0 0 6 7 1 0 0 0 3 1 0 Mar-85 0 0 6 8 0 1 0 3 4 0 1 Apr-85 0 0 7 9 0 1 0 3 5 0 0 May-85 0 0 1 0 6 1 1 0 8 1 1 0 3un-85 0 0 8 8 0 1 0 2 4 0 0 JuI-85 O O 1 4 5 O 1 O 1 0 4 O O

Aug-85 0 0 8 3 0 1 0 4 1 0 0 Sep85 0 0 7 7 0 0 0 5 4 0 0 Oct-85 O 0 1 0 1 1 O 1 0 8 8 0 1 Nov-85 0 0 1 0 8 1 1 0 9 5 1 0 Dec-85 0 0 1 2 4 0 2 0 6 3 0 2 Jan-86 0 0 8 2 0 0 0 6 1 0 0 Feb-86 0 0 8 3 0 0 0 4 2 0 0 Mar-86 1 1 8 4 0 0 1 5 3 0 0 Apr-86 1 1 8 5 1 0 0 4 4 1 0 May-86 1 1 1 3 7 1 1 1 8 2 1 0 Jun-86 1 1 1 2 8 0 0 1 8 5 0 0 Jul-86 0 0 1 4 1 2 0 O 0 7 8 0 0

Aug-86 0 0 1 3 1 1 O 1 0 6 6 0 1 Sep-86 0 0 1 3 1 2 0 1 0 6 7 0 1 Oct-86 0 0 8 6 1 1 0 3 5 1 0 Nov-86 0 0 1 3 2 1 1 0 8 1 1 0 Dec-86 1 1 1 0 2 0 1 1 3 0 0 0 Jan-87 6 1 1 1 2 3 O O 1 5 2 O O Feb-8711 O O 1 2 3 O O O 7 1 O O M a r - 8 7 5 0 0 9 5 0 0 0 4 2 0 0 A p r - 8 7 5 0 0 8 6 0 1 0 4 2 0 1 May-87 5 O O 13 13 2 O O 10 9 2 O Jun-87 2 O O 9 1 2 O O O 4 6 O O JuI-87 7 O O 8 2 1 O O O 3 15 O O

A u g - 8 7 4 0 0 6 9 0 0 0 3 3 0 0 Sep-87 5 O O 7 1 1 1 O O 4 6 1 O Ott-87 4 O O 1 0 11 O O O 8 7 O O Nov-87 6 1 1 17 10 O 2 1 11 5 O 2 Dec-87 8 O O 1 1 8 O 1 O 6 2 O 1 Jan-88 7 1 1 1 3 5 O O 1 8 2 O 0 Feb-88 7 O O 1 0 4 O O O 4 2 O O M a r - 8 8 9 1 1 9 4 0 2 0 5 3 0 2 Apr-88 9 2 2 1 5 4 O 2 2 7 2 O 1 May-88 13 O O 1 3 2 O 2 O 8 1 O O Jun-88 12 O O 1 2 9 O 2 O 9 8 O 1 Ju l - 8811 O O 8 8 O 1 O 2 5 O O

Appendix

Appendix

I MAGE EVALUATI ON TEST TARGET (QA-3)

APPLlED IMAGE. lnc - = 1653 East Main Street - -. - Rochester, NY 14609 USA - --= Phone: i l W482-0300 -- -- - - Fax: 7161288-5989

O 1993. Appited Image. Inc.. Ail Rights Reserved

Documents

CLS - Library and Archives Canada · of monthly counts of claimants collecting short-term disability benefits fiom the Workers ... 4.7.2 The diagonal elements of the Godambe information