29
A General Multivariate Chain Ladder Model Yanwei Zhang CNA Insurance Company [email protected] Abstract: A general multivariate stochastic reserving model is formulated, which not only specifies contemporaneous correlations, but also allows structural connections among trian- gles. Its structure extends the existing multivariate chain ladder models in a natural way, and this extension proves to be advantageous in improving model adequacy and increasing model flexibility. It is general in the sense that it includes various models in the chain ladder framework as special cases. At the heart of this model is the seemingly unrelated regression technique, which is utilized to estimate parameters that reflect contemporane- ous correlations. The use of this technique is essential to construct flexible models, and related statistical theories are applied to study properties of existing estimators. A numer- ical example is utilized to show the advantage of the proposed model in studying multiple triangles that are related both structurally and contemporaneously. Keywords: Chain ladder; multivariate; seemingly unrelated regression 1

General Chain Ladder

Embed Size (px)

DESCRIPTION

Paper on the General Chain Ladder Method

Citation preview

Page 1: General Chain Ladder

A General Multivariate Chain Ladder Model

Yanwei ZhangCNA Insurance [email protected]

Abstract: A general multivariate stochastic reserving model is formulated, which not onlyspecifies contemporaneous correlations, but also allows structural connections among trian-gles. Its structure extends the existing multivariate chain ladder models in a natural way,and this extension proves to be advantageous in improving model adequacy and increasingmodel flexibility. It is general in the sense that it includes various models in the chainladder framework as special cases. At the heart of this model is the seemingly unrelatedregression technique, which is utilized to estimate parameters that reflect contemporane-ous correlations. The use of this technique is essential to construct flexible models, andrelated statistical theories are applied to study properties of existing estimators. A numer-ical example is utilized to show the advantage of the proposed model in studying multipletriangles that are related both structurally and contemporaneously.

Keywords: Chain ladder; multivariate; seemingly unrelated regression

1

Page 2: General Chain Ladder

1 Introduction

Reserving with multiple triangles simultaneously is important in that either structuralconnections may exist among triangles where development of one triangle may be dependentupon past information from other triangles, or joint development that takes into account thecontemporaneous correlations among triangles could result in more efficient estimations.In the area where triangles are linked structurally, developing paid and incurred trianglesis one example, where Separate Chain Ladders (SCL) results in unreasonable divergentpredicted paid-to-incurred ratios, and the Munich Chain Ladder (MuCL) model by Quargand Mack (2004) provides a good solution. The second area of joint development focuseson the contemporaneous correlations between triangles instead of structural connections,where several papers, Braun (2004), Prohl and Schmidt (2005), Kremer (2005), Schmidt(2006) and Merz and Wuthrich (2008a, 2008b), have made great contributions concerningfirst moment estimators as well as corresponding conditional mean square error (MSE)estimators. What characterizes this model is a diagonal development matrix, and it (alongwith the proposed estimators) is referred to as Multivariate Chain Ladder (MCL) in thefollowing discussion, which stems from the title of the paper by Prohl and Schmidt (2005).

A General Multivariate Chain Ladder (GMCL) model is considered in this paper,which not only specifies contemporaneous correlations, but also allows structural connec-tions among triangles. The structure of GMCL extends MCL in a natural way, however,this simple extension has great advantage over MCL in that GMCL performs well whenMCL is inadequate by including an intercept term, as suggested by Barnett and Zehnwirth(2000). Also, the full bivariate GMCL, identical to the Double Regression model first con-sidered by Mack (2003), can be applied to develop paid and incurred triangles togetherwhere SCL is inappropriate. Hence, joint development with paid and incurred trianglesfrom multiple lines is possible under GMCL. Moreover, the Backward Recursive ReserveDevelopment (BRRD) approach described by Marker and Mohl (1980) can be viewed as arestricted bivariate model.

At the heart of GMCL is the employment of Seemingly Unrelated Regressions (SUR)to estimate parameters. It allows flexible model structures and takes into account thecontemporaneous correlations. It will be shown that existing proposed estimators can findtheir equivalents in the SUR estimator family. After this relationship is established, relatedSUR theories are utilized to study the properties of the feasible estimators proposed byMerz and Wuthrich (2008b) and the necessary and sufficient condition for MCL to collapseto SCL (Prohl and Schmidt 2005).

We formulate the model and introduce SUR in Section 2 and 3, and then build theconnection with existing ones in Section 4 and 5. Lastly, a numerical example is employedto illustrate how the GMCL model overcomes difficulties in existing models.

2

Page 3: General Chain Ladder

2 Model Specification

In this section, the GMCL model that handles multiple correlated triangles is formulated.It is assumed that N triangles of the same size are available, and n > �1, . . . ,N� refers tothe nth triangle, i > �1, . . . , I� the ith accident year and k > �1, . . . , I� the kth development

year. Denote Y i,k � �Y �1�i,k , . . . , Y

�N�i,k �� as an N � 1 vector of cumulative losses at accident

year i and development year k where �n� refers to the nth triangle. Each cumulative loss

Y�n�i,k is assumed to be positive explicitly. As a convention, bold face is used for vectors

and matrices and � is used as the transpose of a vector or matrix. The data available forthe N loss triangles could be represented by Table 1. It is to be noted that our use of thetriangular form does not play an important role in developing the theory since we do notgive an explicit formula for our estimators, which will become clear in Section 3. However,we base the discussion on triangular form for easy presentation.

Table 1: N loss trianglesAccident Development Year k

Year i 1 2 . . . I � 1 I

1 Y 1,1 Y 1,2 � Y 1,I�1 Y 1,I

2 Y 2,1 Y 2,2 � Y 2,I�1

� � � �

i Y i,1 � Y i,I�1�i

� � �

I � 1 Y I�1,1 Y I�1,2

I Y I,1

Denote D � �Y i,k; i�k B I �1,1 B i B I,1 B k B I� as the set of all the observed losses, D�,k ��Y i,j ; 1 B i B I, j B k� as the set of all the losses up to and including development year k, andDi,k � �Y i,j ; j B k� as the set of losses for accident year i up to and including development

year k. We denote Y�n�B,k � �Y �n�

1,k , . . . , Y�n�I�1�k,k�� as an �I�1�k��1 vector of the first �I�1�k�

losses at development year k in the nth triangle, and Y�n�@,k � �Y �n�

1,k , . . . , Y�n�I�k,k�� as the first�I � k� losses. Also we denote by D�a� and D�a�b the N �N diagonal matrices with the

N -dimensional vectors a � �a1, . . . , aN�� and �ab1, . . . , abN�� along the diagonal respectively.We use vec as the vectorization operator of a matrix such that the vectorization of an m�nmatrix A � �a1,�,an�, where each column ai is an m� 1 vector, will be an mn� 1 vector,that is, vec�A� � �a�

1,�,a�

n��.As a natural generalization of the MCL model, we consider the following as our baseline

model in development period k, which refers to the development from development year k

3

Page 4: General Chain Ladder

to k � 1:Y i,k�1 �Bk �Y i,k � εi,k, (2.1)

where

Bk �

���β11 � β1N� � �

βN1 � βNN

���is an N �N development matrix in development period k, and the nth row contains thedevelopment parameters for the nth triangle. It is to be noted that we omit the developmentperiod indicator k for the β’s in Bk for simplicity since these components are not useddirectly in the recursive calculations in the following discussion. We refer to model (2.1) asthe baseline model because it does not have an intercept term. We will defer the discussionof models with intercepts to Section 3.4 because one of the main goals of this paper isto establish the connection of GMCL with existing models which have only assumed themultiplicative relationship between successive evaluations, and working on a model withintercepts makes this relationship obscure. However, to this point, it is worth emphasizingthat a model with intercepts is often preferred to (2.1), or at least one should test whetherto include intercepts before adopting the structure in (2.1).

One can see that the non-diagonal development matrix Bk allows the development ofone triangle in period k to be directly dependent upon the loss information of the othertriangles at development year k. One potential use of this feature is to regress paid andincurred losses against each other so that the predicted paid and incurred losses will be closeto each other. However, the fully parameterized model has N2 development parameters ineach period. Direct application without parameter restriction will be rarely the case in thateither parameter estimation will not be feasible or structural connections are not deemednecessary. Sometimes even under parameter restrictions, the parameters specified are notestimable, especially in tail development periods. One possible solution is to use data intrapezoid form to make sure there are enough observations in the tail. Alternatively, Mack(2003) suggests one “robustify” the tail parameters simply using SCL. We will discuss thejustification of using SCL in Section 6. Despite the potential non-identifiability problem,we base all the discussions on the full model and the results will apply to sub-models withrestriction as well.

For model (2.1), the following assumptions are made:

E�εi,kSDi,k� � 0, (2.2)

cov�εi,kSDi,k� �D�Y i,k�1~2 �Σk �D�Y i,k�1~2, (2.3)

losses from different accident years are independent, (2.4)

εi,k are symmetrically distributed, (2.5)

4

Page 5: General Chain Ladder

where

Σk �

���σ11 . . . σ1N� � �

σN1 . . . σNN

���is a symmetric positive definite N �N matrix. Again, for each σ, we leave out the devel-opment period indicator k for simplicity.

Remark 1.

� In (2.3), one could specify D�Y i,k�δ~2ΣkD�Y i,k�δ~2 as a more general covariancestructure, where δ determines the weight in the estimation. We base the discussionhere on δ � 1, but the results are easy to be modified for a different δ.

� (2.5) means that the continuous error term vector εi,k has a symmetric probabilitydensity, that is, p�εi,k� � p��εi,k�. This is often an appropriate approximation foraggregate triangles, but should be used with caution for highly skewed data.

3 The Seemingly Unrelated Regressions

A further look at the structure specified in (2.1) reveals that none of the dependent vari-

ables Y�n�i,k�1 appears as an explanatory variable in the other equations, and the individual

equations for each triangle are linked statistically through the non-zero correlation betweenthe error terms, which suggests the employment of Seemingly Unrelated Regressions (SUR)for parameter estimations. In this section, we derive the parameter estimation under SUR.We use the development from k to k � 1 as an example for illustration, and estimation forother periods follows the same procedure.

3.1 OLS, GLS and FGLS Estimators

It is easy to verify that model (2.1) could be written as the following system of equations:

�������Y

�1�B,k�1

Y�2�B,k�1

Y�N�B,k�1

��������

�����X1 0 � 00 X2 � 0� � �

0 0 � XN

����������β1

β2

βN

����� ������ε1ε2�

εN

����� , (3.1)

where for n � 1,2, . . . ,N :

Y�n�B,k�1 is an �I � k�� 1 vector of all the observed losses at development year k � 1 from the

nth triangle;

5

Page 6: General Chain Ladder

Xn � �Y �1�@,k, . . . ,Y

�N�@,k � is an �I�k��N matrix of the first I�k observations at development

year k from each triangle, and satisfies X1 �X2 � � �XN ;βn � �βn1,�, βnN�� is an N � 1 vector of the development parameters in the nth equation ;εn is an �I � k� � 1 vector of error terms in the nth equation.

Model (3.1) can be further written as

Y �Xβ � ε, (3.2)

where:Y � vec�Y �1�

B,k�1,�,Y�N�B,k�1� is an N�I � k� � 1 vector of response variables;

X � I aXn is an N�I � k� � N2 block diagonal matrix with X1,X2,�,XN along thediagonal, where a is the Kronecker product operator of matrices;β � vec�β1,β2,�,βN� is an NN � 1 vector of development parameters;ε � vec�ε1, ε2,�, εN� is an N�I � k� � 1 vector of error terms.

Denote W � vec�Y �1�@,k,Y

�2�@,k,�,Y

�N�@,k � as an N�I �k��1 vector of the first �I �k� observed

losses at development year k. Then from (2.3) and (2.4), we know that

cov�ε� � E�εε�� �D�W �1~2�Σk a I�D�W �1~2, (3.3)

where I is the identity matrix of order �I�k���I�k�, and D is the diagonal operator definedin Section 2. Pre-multiplying both sides of (3.2) by D�W ��1~2, one gets the following

D�W ��1~2Y �D�W ��1~2Xβ �D�W ��1~2ε,Y �

�X�β � ε�, (3.4)

where Y ��D�W ��1~2Y , X�

�D�W ��1~2X, ε� �D�W ��1~2ε, and using (3.3) we have

cov�ε�� �D�W ��1~2cov�ε�D�W ��1~2 � Σk a I. (3.5)

We see that the variance-covariance structure in (3.5) is consistent with the typicalSUR assumption introduced by Zellner (1962). It is to be emphasized that, due to thisreason, it is model (3.4) that we will use to derive parameter estimators and ascertain otherproperties, not the original model (3.2). We use the asterisk to distinguish the transformedvariables from the original. Applying Aitken’s Generalized Least Squares (GLS) to (3.4),we can get the best linear unbiased estimator (BLUE) of β as

βG � �X���Σk a I��1X���1X�

��Σk a I��1Y �. (3.6)

And the variance-covariance matrix of the estimator βG is easily shown to be

V �βG� � �X���Σk a I��1X���1. (3.7)

6

Page 7: General Chain Ladder

It is to be noted that the above estimations in (3.6) and (3.7) are all based on a knownvariance-covariance matrix Σk, but this is generally not the case. When the variance-covariance matrix is not known, an estimator of it could be used for (3.6) and (3.7) tobe operationable. Using the estimated variance-covariance matrix Σk, one gets a FeasibleGeneralized Least Squares (FGLS) estimator of β as:

βFG � �X���Σk a I��1 X���1X�

��Σk a I��1Y �. (3.8)

There are many possible choices for the estimator of Σk, among which one commonlyused consistent yet not unbiased estimator is:

Σo �1

I � k�ε�1 , ε�2 ,�, ε�N���ε�1 , ε�2 ,�, ε�N�, (3.9)

where ε�n is an �I � k�� 1 vector of residuals obtained by applying Ordinary Least Squares(OLS) to the nth single-equation regression in (3.4). As another choice, an unbiased esti-mator is proposed by Zellner and Huang (1962) to be Σu, whose elements satisfy:

σnm �

¢¦¤ε�

n ε�

n~�I � k �Kn� ¦n �m

�

n ε�

m

I�k�Kn�Km�tr��X��

n X�

n��1X�

n X�

m�X��

mX�

m��1X��

mX�

n�¦n xm

, (3.10)

where Kn is the number of regressors in the nth equation, and tr is the trace of a ma-trix. Another interesting estimator that also corrects the degrees of freedom is Σg, whoseelements satisfy:

σnm �ε�

n ε�

m��I � k �Kn��I � k �Km��1~2 . (3.11)

This estimator is unbiased only if n equals m. Up to this point, we see that both the GLSestimator βG and its feasible estimator βFG take into account the correlation between theerror terms in the estimation. However, if this correlation is ignored, then we will obtainthe OLS estimator as:

βO � �X��

X���1X��

Y �. (3.12)

3.2 Iterative Estimators (IFGLS)

We see that when Σk is unknown, an estimator of it should be used in the FGLS esti-mation. The use of Σk will increase the variability of βFG, and one may ask whetherthe performance of βFG can be improved and more efficient estimator can be constructed.One approach will be a procedure of iterating with respect to the choice of an observableestimator Σk. In fact the estimation of βFG stated in the above section can be viewed asa two-stage Aitken’s estimation as described in Zellner (1962). We first replace Σk by I so

7

Page 8: General Chain Ladder

that we obtain the OLS estimator of β. And then we estimate Σk from the residuals basedon the OLS estimate of the coefficient vector β. This new estimator of Σk is then usedin (3.6) to obtain the FGLS estimator in (3.8). However, we can continue this procedureand go beyond two steps. Likewise, the residuals based on the FGLS estimator can beused to develop another estimate of Σk, and this, when used in (3.6), provides yet anotherestimate of β. Repeating this process leads to an Iterative Feasible Generalized Squares(IFGLS) estimator of β.

Denote the estimator at the lth round by β�l�

I . It is easy to see that

β�l�

I � �X����l�

k a I��1X���1X����l�

k a I��1Y �, (3.13)

where Σ�l�

k is a consistent estimator of Σk constructed from the residuals based on β�l�1�

I .

Particularly, we denote the OLS estimator as β�0�

I estimated based on Σ�0�

k � I. And hence,

the FGLS estimator will be β�1�

I based on the estimators in (3.9) or (3.10) or (3.11).

3.3 Some Results

Using equation (2.1) recursively, one can obtain the ultimate values of the losses for accidentyear i in terms of the latest observed losses Y i,I�1�i as:

Y i,I � �I�1�iMk�I�1

Bk�Y i,I�1�i �

I�1

Qk�I�1�i

� k�1

Ml�I�1

Bl� εi,k, (3.14)

where LI�1�ik�I�1Bk represents strict product of Bk from lower limit to upper limit, and

LIl�I�1Bl is defined to be the identity matrix I to simplify the expression. Using (3.14)and assumption (2.2) and (2.4), it can be shown that the expectation of the ultimate lossesfor accident year i conditional on the observed triangles D is:

E�Y i,I SD� � E�Y i,I SD�,I�1�i� � E�Y i,I SDi,I�1�i� � �I�1�iMk�I�1

Bk�Y i,I�1�i. (3.15)

Now, we can estimate the expected ultimate loss using the estimated developmentmatrix Bk as:

Y i,I � �I�1�iMk�I�1

Bk�Y i,I�1�i. (3.16)

We now summarize some properties with regard to this estimation in the followinglemma:

8

Page 9: General Chain Ladder

Lemma 1. Under assumptions (2.2) to (2.5), the following hold:

(i) E�BkSD�,k� �Bk and E�Bk� �Bk ;(ii) Estimators of the development matrix from different development periods are uncor-

related, that is, E�BjBk� � E�Bj�E�Bk� ¦ j x k ;

(iii) Conditional on D�,I�1�i, the estimator Y i,I is an unbiased estimator of E�Y i,I SD�,that is, E�Y i,I SD�,I�1�i� � E�Y i,I SD� ;

(iv) The estimator Y i,I is an unbiased estimator of E�Y i,I�, that is, E�Y i,I� � E�Y i,I�.Proof. (i): E�BkSD�,k� � Bk follows directly from the results given by Kakwani (1967)

under assumption (2.5). Hence, E�Bk� � E�E�BkSD�,k�� � E�Bk� �Bk.

(ii): Without loss of generality, assume j @ k, then:

E�BjBk� �E�E�BjBkSD�,k���E�BjE�BkSD�,k���E�Bj�Bk

�E�Bj�E�Bk�,where the last step uses the unconditional result in (i).

(iii) Given D�,I�1�i, Y i,I�1�i is observed, and hence using the results from (i),(ii) and (3.15),we have:

E�Y i,I SD�,I�1�i� �E ��I�1�iMk�I�1

Bk�Y i,I�1�iSD�,I�1�i�E ��I�1�iM

k�I�1

Bk�Y i,I�1�i

� �I�1�iMk�I�1

E�Bk�Y i,I�1�i

��I�1�iMk�I�1

Bk�Y i,I�1�i

�E�Y i,I SD�.(iv) From the result in (iii) and (3.15), we have

E�Y i,I SD�,I�1�i� � E�Y i,I SD� � E�Y i,I SD�,I�1�i�.Taking expectation on both sides, we have

E�Y i,I� � E�Y i,I�.9

Page 10: General Chain Ladder

Remark 2.

� (i) in Lemma 1 does not depend on the choice of the variance-covariance esti-mators Σo, Σu or Σg since they are all even functions of the error term vectorε� � vec�ε�1 ,�, ε�N� under (2.5) and all the corresponding development matrix esti-mators will be unbiased. See Kakwani (1967).

� Results similar to Lemma 1 could also be found in Lemma 3.5 (a-d) in Merz andWuthrich (2008b). However, their lemma is actually in terms of the GLS estimatorwhere the variance-covariance Σk is assumed to be known. Merz and Wuthrich(2008b) did not ascertain the properties of their proposed feasible estimator withregard to first moment estimation, which uses the unbiased estimator Σu. However,we see from the above that, assumption (2.5) is also needed for their Lemma a-d toapply to the feasible estimators.

3.4 Model with Intercepts

Barnett and Zehnwirth (2000) pointed out that the chain ladder model is often not adequatesince it tends to overestimate large values and underestimate small values, and hence theresidual plot has a downward trend. As we will see in Section 6, this is also true forMCL since the structure of MCL is almost identical to SCL, and the only difference is thatMCL takes into account the variance-covariance structure in the estimation of developmentparameters. The cause of the failure of SCL and MCL in residual plot is that both excludethe intercept term. Although it is not clear how an intercept can be added using theapproach given by Prohl and Schmidt (2005) and Merz and Wuthrich (2008a, 2008b), thisbrings no difficulty under SUR. Let Ak � �β10,�, βN0�� be an N � 1 vector of intercepts,where βn0 is the intercept for the nth triangle. Again, we omit the development periodindicator k for simplicity. Now we can extend (2.1) as

Y i,k�1 �Ak �Bk �Y i,k � εi,k, (3.17)

where assumptions (2.2)-(2.4) remain the same. We will refer to (3.17) as model withintercepts, and we will always refer to Bk as the development matrix. The estimation ofmodel (3.17) is straightforward under the SUR technique stated in Section 3.1. We onlyneed to change the definition of Xn and βn in equation (3.1) to reflect the inclusion of

intercepts. To be specific, we now define Xn � �1,Y �1�@,k, . . . ,Y

�N�@,k � as an �I � k� � �N � 1�

matrix, where 1 is a column of 1’s, and βn � �βn0, βn1,�, βnN�� as an �N � 1�� 1 vector ofparameters to be estimated for the nth triangle. Then we proceed similarly as in Section3.1 to obtain our estimators of β (now including both Ak and Bk), Σk, and V �β�.

10

Page 11: General Chain Ladder

In combining the estimated β’s from each period to obtain the expected ultimate losses,we use the following augmented procedure in order to keep the multiplicative structure asin (3.16) so that the inclusion of intercepts has no effect on our previous results. Moreover,this approach proves to be very useful when we derived the mean square error estimatorsin a separate paper. We re-write (3.17) as

� 1Y i,k�1

� � � 1 0Ak Bk

� � � 1Y i,k

� � � 0εi,k

� , (3.18)

where we just add a constant equality as the first row. Now denote

Zi,k � � 1Y i,k

� ,Ek � � 1 0Ak Bk

� ,ei,k � � 0εi,k

� ,and (3.18) becomes

Zi,k�1 � Ek �Zi,k � ei,k. (3.19)

We see that we are able to make (3.17) a multiplicative model. Now it is easy to seethat (3.14) and (3.15) also hold if we replace Y i,k, Bk and εi,k with Zi,k, Ek and ei,krespectively since E�ei,kSDi,k� � 0 according to (2.2). We then obtain the estimator Zi,I ��LI�1�ik�I�1 Ek� �Zi,I�1�i. Removing the constant 1 in Zi,I will result in the estimator Y i,I .Our augmented approach does not have any impact on Lemma 1 since we just added non-stochastic terms. Specifically, items (i) and (ii) also hold by replacing Bk with Ek andBk with Ek since �Ak, Bk� is unbiased. Hence items (iii) and (iv) follow immediately.

Although models with intercepts are preferred, we will focus on the baseline model(2.1) in Sections 3 and 4 to study the relationship of GMCL and SUR estimators withexisting models and re-investigate some established theories under SUR. We will illustratethe use of (3.17) in the example section.

4 Multivariate Chain Ladder

In this section, we focus on a restricted version of model (2.1) where the developmentmatrix Bk is restricted to be diagonal, that is,

Bk �D�β11, β22,�, βnn�. (4.1)

Under this restriction, the structure of model (2.1) coincides with the Multivariate ChainLadder model (MCL) proposed by Prohl and Schmidt (2005). In the following, we will showthat the estimators proposed by Prohl and Schmidt (2005) and Merz and Wuthrich (2008b)could find their equivalents in the SUR framework. We will also study the positive semi-definiteness of the unbiased variance-covariance estimator given by Merz and Wuthrich

11

Page 12: General Chain Ladder

(2008b). Moreover, a necessary and sufficient condition for the MCL to collapse to SeparateChain Ladder (SCL) is also given and compared to that given by Prohl and Schmidt (2005).Efficiency comparisons are also made between MCL and SCL. All the discussions in thissection are based on the restriction in (4.1), and we continue to focus on developmentperiod k.

The estimation of the restricted parameters could be obtained in a similar way asdescribed in Section 3. We will discuss two ways to specify linear restrictions, of whichone is illustrated here and the other in Section 5. Denote β � �β11,�, β1N ,�, βNN�� asthe full N2

� 1 vector of parameters, and βR � �β11, β22,�, βNN�� as the restricted N � 1vector under (4.1). We want to set all the off-diagonal elements of the original Bk to bezero, which can be achieved through a restriction matrix such that β � RβR, where R isan N2

�N matrix that has value of 1 at the �n � �n � 1�N,n� element and 0 elsewhere.Thus, this equation will map all the off-diagonal elements of the original Bk to zero (SeeHenningsen and Hamann 2007). Then the estimator of βR can be obtained by replacingX� with the new design matrix X�R in (3.6), (3.8) and (3.12) respectively, where it iseasy to verify that

X�R �

�������Y

�1�@,k 0 � 0

0 Y�2�@,k � 0

� � �

0 0 � Y�N�@,k

�������

1~2

. (4.2)

It is to be noted that in the FGLS estimation, the residuals used to estimate Σk shouldalso be obtained by running OLS under the restriction in (4.1).

4.1 Relationship with Prohl and Schmidt (2005)

Intuitively, the GLS estimator must be equivalent to that given by Prohl and Schmidt(2005) when Σk is known, because since GLS estimator is BLUE (Theil 1971, p. 238-239),it must satisfy the expression given by the estimator from Prohl and Schmidt (2005) (SeeSchmidt 2006, Thm 4.3). However, the fact that they are derived in fairly different waysand bear different forms makes this relationship unclear, and since most discussions in thefollowing are based on this equivalence, we devote this section to show this relationshipexplicitly.

Define F i,k ��D�Y i,k��1Y i,k�1 as an N �1 vector of the observed development factors

at accident year i from the N triangles, and F�n�i,k �� Y

�n�i,k�1~Y �n�

i,k as the observed developmentfactor at accident year i in the nth triangle.The parameters proposed by Prohl and Schmidt

12

Page 13: General Chain Ladder

(2005) under our notation are:

�I�kQi�1

D�Y i,k�1~2�1k D�Y i,k�1~2��1 �I�kQ

i�1

D�Y i,k�1~2Σ�1k D�Y i,k�1~2F i,k� . (4.3)

Up to this point, we have structured model (2.1) in a way that is consistent with thegeneral structure of the seemingly unrelated regressions. That is, we transform our modelto (3.4) so that the variance-covariance of the error terms satisfies (3.5). But to showthe equivalence of GLS and the estimator given by Prohl and Schmidt (2005), it is moreconvenient to re-order the restricted model (3.1) by accident year instead of triangle sothat the components of (3.2), Y �Xβ � ε, becomes:

Y �

���Y 1,k�1

Y I�k,k�1

��� ,X �

���D�Y 1,k�

D�Y I�k,k���� ,β �

���β11�

βNN

��� .Denote ÈW � vec�Y 1,k,�,Y I�k,k�, so the variance-covariance matrix for the error termbecomes

cov�ε� �D�ÈW �1~2�I aΣk�D�ÈW �1~2.Multiplying both sides of the re-ordered equation of (3.2) by D�ÈW ��1~2, we haveÇY � ÇXβ �Çε, (4.4)

where ÇY �D�ÈW ��1~2Y � vec�D�Y 1,k�1~2F 1,k,�,D�Y I�k,k�1~2F I�k,k�, ÇX �D�ÈW ��1~2X �

X1~2 and cov�Çε� � I aΣk. The structure of (4.4) is only used in this sub-section to showthe equivalence between (4.3) and the GLS estimator, so we use tilde to distinguish it from(3.4). Applying Aitken’s Generalized Least Squares to (3.4), we obtain

β � � ÇX ��I aΣk��1 ÇX��1 ÇX ��I aΣk��1 ÇY� �I�kQ

i�1

D�Y i,k�1~2�1k D�Y i,k�1~2��1 �I�kQ

i�1

D�Y i,k�1~2Σ�1k D�Y i,k�1~2F i,k� . (4.5)

The equivalence in the last step is easily verified by partitioning ÇY as an �I �k��1 matrixwhere the ith component is D�Y i,k�1~2F i,k, ÇX as an �I � k� � 1 matrix where the ithcomponent is D�Y i,k�1~2, and �I a Σk��1 as an �I � k� � �I � k� block-diagonal matrixwhere Σ�1

k is along the diagonal. Therefore, (4.5) shows that the estimator given by Prohland Schmidt (2005) is virtually the GLS estimator.

4.2 Relationship with Merz and Wuthrich (2008b)

Merz and Wuthrich (2008b) proposed an iterated feasible procedure to estimate (4.3) usingan unbiased estimator of Σk. This iterated procedure will be equivalent to the IFGLS

13

Page 14: General Chain Ladder

described in Section 3.2 if the estimators of Σk are the same. We first show that theirestimator is equivalent to Σu under restriction (4.1), and then we discuss the positivesemi-definiteness of this estimator.

Since it is well-known that the development factors from SCL are equal to the OLSestimators βO in (3.12) with restriction (4.1), the unbiased estimator of Σk proposed byMerz and Wuthrich (2008b) could be written as

σnm �

¢¦¤ε�

n ε�

n~�I � k � 1� ¦n �m

�

n ε�

m

I�k�2�ωnm¦n xm

, (4.6)

where ε�n is an �I � k�� 1 residual vector obtained by applying OLS to the nth equation in(3.4) under (4.1), and

ωnm � �I�kQi�1

½Y

�n�i,k

½Y

�m�i,k �2��I�kQ

i�1

Y�n�i,k ��I�kQ

i�1

Y�m�i,k � . (4.7)

It is obvious that the elements along the diagonal in (4.6) are the same as those in(3.10), and since Kn �Km � 1, to show that (4.6) and (3.10) are equal with respect to theoff-diagonal elements, we need to show that

tr��X��

nX�

n��1X��

nX�

m�X��

mX�

m��1X��

mX�

n� � ωnm, (4.8)

where X�

n and X�

m are nth and mth diagonal component of (4.2), and satisfy X�

n �

½Y

�n�@,k ,

X��

nX�

n � PI�ki�1 Y

�n�i,k and X�

nX�

m � PI�ki�1

½Y

�n�i,k

½Y

�m�i,k . Therefore,

tr��X��

nX�

n��1X��

nX�

m�X��

mX�

m��1X��

mX�

n�� tr

<@@@@>�I�k

Qi�1

Y�n�i,k ��1 �I�kQ

i�1

½Y

�n�i,k

½Y

�m�i,k ��I�kQ

i�1

Y�m�i,k ��1 �I�kQ

i�1

½Y

�n�i,k

½Y

�m�i,k �=AAAA?

� �I�kQi�1

½Y

�n�i,k

½Y

�m�i,k �2��I�kQ

i�1

Y�n�i,k ��I�kQ

i�1

Y�m�i,k �

� ωnm.

Result (4.8) shows that under (4.1), the unbiased estimator Σu from the SUR isequivalent to that proposed by Merz and Wuthrich (2008b) in the first iteration. Andtherefore, the iteration procedure used by Merz and Wuthrich (2008b) will be equivalentto the iterative feasible generalized least squares described in Section 3.2 using the unbiasedestimator Σu in each iteration.

14

Page 15: General Chain Ladder

Remark 3.

� Since the variance-covariance estimators of Mack (1993) (N � 1) and Braun (2004)(N � 2) are two special cases of the estimators given by Merz and Wuthrich (2008b),the equivalence shown in (4.8) also means that the unbiased estimator in (3.10)includes the univariate Mack (1993) and bivariate Braun (2004) estimators as specialcases.

� It is known that, although unbiased, the estimator Σu is not necessarily positivesemi-definite (Theil 1971, p.322). Hence, the claim by Merz and Wuthrich (2008b,p.191) that their proposed estimator is positive semi-definite is not justified.

4.3 Equivalence of MCL and SCL

It is well-known that the development factors from Separate Chain Ladder (SCL) are equalto the OLS estimators βO under (4.1). Hence, the relationship between MCL and SCLis converted to that between βG and βO. The necessary and sufficient condition for theGLS estimator in the seemingly unrelated regressions to collapse to the OLS estimator isalready well studied in the statistical literature (see Baltagi 1988, Bartels and Fiebig 1991),which is summarized in the following lemma.

Lemma 2. The necessary and sufficient condition for βG � βO is:

(i) Σk is diagonal; or(ii) X�

n and X�

m span the same column space ¦n xm.

We consider whether condition (ii) could be fulfilled by the MCL first. In the MCL

case, each X�

n �

½Y

�n�@,k is an �I � k� � 1 vector. So for (ii) to be valid, it requires ¦n A 1:

Y�n�@,k � cn �Y

�1�@,k, (4.9)

where cn is a positive constant. This means that at development year k, the first I � kobservations from each triangle should be proportional to each other by a positive constant.Hence, for MCL and SCL to be equivalent, we have the following theorem:

Theorem 1. The necessary and sufficient condition for the multivariate chain ladder tobe equivalent to separate univariate chain ladder is ¦1 B k B I � 1:

(i) Σk is diagonal ; or(ii) The first I � k losses at development year k from different triangles are proportional

to each other.

Remark 4.

15

Page 16: General Chain Ladder

� Although the condition (ii) in Theorem 1 is quite strict, it is still not a sufficientcondition for the additivity of these triangles considered by Ajne (1994) since thelatest observed cells Y i,I�1�i are not included in (ii).

� The condition given by Prohl and Schmidt (2005, Thm 4.1), that is, part (i) in theabove theorem, is only sufficient. They showed that for the MCL to be equivalent toSCL, it must be that either Σk is diagonal, or Y i,k � ci � Y 1,k, where ci is a positiveconstant. They excluded the second condition because it implies cov�Y i,k,Y j,k� �ci � cj � cov�Y 1,k� x 0, contradicting the independence assumption in (2.4). However,(4.9) does not contradict the independence assumption, and all it implies is a perfectcontemporaneous correlation.

4.4 Efficiency Comparison between Estimators

Now we consider the efficiency of the three estimators βO, βG, and βFG. When Σk isknown, βG is the best linear unbiased estimator from Aitken’s theorem. The differencebetween βG and βO in terms of efficiency is:

V �βO� � V �βG� � �X��

X���1X���Σk a I�X��X�

X���1 � �X���Σk a I��1X���1

� P �Σk a I�P �

where P � �X��

X���1X��

� �X���Σk a I��1X���1X�

��Σk a I��1 and satisfies PX�� 0.

Since Σk a I is positive definite, P �Σk a I�P � is at least positive semi-definite, and henceβG is at least as efficient as is βO. They will be equally efficient when the condition inTheorem 1 is satisfied.

However, this conclusion does not hold with regard to the comparison of βFG andβO since the use of the estimator Σk will increase the variability of the FGLS estimator.Unfortunately, although it is recognized that the FGLS estimator will not be universallybetter than the OLS estimator, the general exact relationship between them is still anopen question. Zellner (1963) considered a special case for a two-equation model where theerror terms are normally distributed and regressors from different equations are orthogonalto each other, that is, X�

1 X�

2 � 0. In this special case, Zellner (1963) showed that for asample with less than 20 observations and the absolute value of the true contemporaneouscorrelation ρ12 � σ12~ºσ11σ22 less than 0.3, βFG is in fact less efficient than βO. Moreover,some asymptotic approximation results find that efficiency gain will be reduced due to:

1. Small correlation between disturbance terms ;2. High correlation between regressors across different equations ;3. High correlation between regressors within the equation ;

So if the asymptotic results still hold in finite samples, we would expect that the requiredcorrelation should be even larger than 0.3 to gain efficiency by using βFG. So we see that

16

Page 17: General Chain Ladder

in the case with multiple correlated triangles, although it is reasonably expected that moreefficient estimations will result from developing the triangles jointly by taking into accountthe correlation, it is not surprising that the reverse could happen due to the use of anestimated Σk. This is more likely to happen with small correlations between triangles.

The iterative procedure, although aimed at improving the efficiency of the FGLSestimator, could not avoid this potential efficiency loss either. Srivastava and Giles (1987,chp. 5) showed that iterations may not always be worthwhile in that FGLS and IFGLShave the same variance for large sample, but for small sample, IFGLS could result in aloss in efficiency. Situations favorable to the IFGLS estimator were found to be a lowcorrelation between regressors across the equations and a high correlation between errorterms.

In the following, we give a summary of the comparisons between OLS and FGLS,which will also apply to SCL and feasible MCL.

� Unbiasedness: The OLS estimator is unbiased in that, using (3.12)

E�βO �βSD�,k� � �X��

X���1X��

E�ε�SD�,k� � 0.

The FGLS is unbiased provided that the error terms follow a symmetric distribu-tion (See Kakwani 1967). For insurance data where losses are unlimited but alwayspositive, this assumption of symmetry only holds as an approximation.

� Estimation of Σk: OLS ignores the jointness of the system of equations, so it doesnot require an estimate of Σk in estimating the coefficients; An estimate of Σk isneeded in FGLS. The unbiased estimator Σu used by Merz and Wuthrich (2008b)is not necessarily positive semi-definite. The estimator Σo tends to underestimatethe parameter in small sample such as loss triangle data. The positive semi-definiteestimator Σg partially corrects degree of freedom and is usually preferred.

� Efficiency : The FGLS estimator βFG should improve the efficiency of βO at leastasymptotically since it utilizes extra information that there is correlation betweenequations. However, when finite-sample properties are considered, and dependingupon how the correlation information is actually taken into account, we cannot besure, a priori, that βFG is universally superior to βO. In fact, efficiency loss willresult in examples with small correlations (Srivastava and Giles 1987).

5 The Bivariate Model

Of special interest is the bivariate model, a sub-model of (2.1) where N � 2. When appliedto paid and incurred data, this model provides a solution to the problem of divergent pre-dicted paid-to-incurred loss ratios in the separate development. We discuss the similarity

17

Page 18: General Chain Ladder

and difference between the bivariate model and the Munich Chain Ladder (MuCL) model.Also, when applied to paid and case reserve data, this model, with necessary parameterrestrictions, will also coincide with the Backward Recursive Reserve Development (BRRD)model described by Marker and Mohl (1980). Denote the first triangle by P for paid andthe second by C for incurred or case reserve depending on the context. The bivariate modelbecomes

Pi,k�1 � β11Pi,k � β12Ci,k � εPi,k, (5.1)

Ci,k�1 � β21Pi,k � β22Ci,k � εCi,k, (5.2)

where εPi,k and εCi,k are the error terms for the two triangles respectively.

5.1 Paid and Incurred Triangles

Quarg and Mack (2004) identified that when both triangles of paid losses and incurredlosses are available, the projections based on paid losses and incurred losses separately arefar different from each other. This is because in separate development, for each accidentyear, the ratio of the projected paid-to-incurred ratio to the corresponding average at acertain development year remains the same as that at the latest observed development year,and they proposed the MuCL as a solution, where paid development factor is regressedon incurred-to-paid ratio and incurred development factor is regressed on paid-to-incurredratio. We see in the following that this is also the idea behind the bivariate model whenapplied to paid and incurred data, since dividing (5.1) by Pi,k and (5.2) by Ci,k, we nowhave

�Pi,k�1~Pi,k� � β11 � β12�Ci,k~Pi,k� � εPi,k~Pi,k, (5.3)�Ci,k�1~Ci,k� � β21�Pi,k~Ci,k� � β22 � εCi,k~Ci,k. (5.4)

Hence one can reasonably expect that the bivariate model will also result in convergentpredicted paid-to-incurred ratios. In fact, this model structure is identical to the DoubleRegression model considered by Mack (2003) in an ASTIN colloquium presentation wherehe argued that it will produce about the same result as the MuCL. However, the estimatorsgiven by Mack is virtually βO. Although we believe that the two methods will both generatea convergent paid-to-incurred ratio, there is no guarantee that they will result in the sameor even similar ultimate loss amounts considering the following differences:

� MuCL uses standardized residuals and pools all residuals from each period together;the bivariate model does not use residuals and performs sequentially by developmentperiod.

� MuCL considers only structural dependency, and runs two separate regressions; thebivariate model reflects both structural and contemporaneous correlations if FGLSestimators are used.

18

Page 19: General Chain Ladder

Besides the above differences in estimation, we have the following remark:

Remark 5.

� Both MuCL and the bivariate model do not result in identical paid and incurredamounts.

� Although the pooling in MuCL makes the estimation more stable, the estimateddevelopment factors do not satisfy (ii) in Lemma 1, which makes it hard to derive amean square error estimator.

5.2 The Backward Recursive Reserve Development Model

Marker and Mohl (1980) described a Backward Recursive Reserve Development Model(BRRD) that tries to model incremental paid losses and case reserves together. This isfurther discussed by Wiser (2001). Although this method is mainly used in claims-madeinsurance reserving, it is worth pointing out that it can also be modeled using a restrictedform of (2.1).

The fundamental idea of BRRD is to track the development of a case reserve intosubsequent paid and remaining reserves. It constructs a table of paid-on-reserve ratios,that is, the incremental paid losses from k to k � 1 divided by the case reserve at k, anda second table of remaining-in-reserve ratios, that is, the case reserve at k � 1 divided bythe case reserve at k. The two ratio tables are completed by projecting down each columnby taking simple averages. Using the projected ratios, one completes the incremental paidand case reserve table by multiplying the case reserve at k with the projected paid-on-reserve and remaining-in-reserve ratios at k � 1 respectively. Then the sum of projectedincremental paid loss for each year will be the IBNR loss reserve. This process can berepresented mathematically as

�Pi,k�1 � Pi,k�~Ci,k � β12 � εPi,k,Ci,k�1~Ci,k � β22 � εCi,k, (5.5)

where E�εPi,kSDi,k� � E�εCi,kSDi,k� � 0, and cov��εPi,k, εCi,k��SDi,k� �D��σ11, σ22���. Multiplyingboth equations by Ci,k, we have

Pi,k�1 � Pi,k � β12Ci,k � εP�

i,k

Ci,k�1 � β22Ci,k � εC�

i,k , (5.6)

where εP�

i,k � Ci,kεPi,k, ε

C�

i,k � Ci,kεCi,k, E�εP�

i,k SDi,k� � E�εC�

i,k SDi,k� � 0 and

cov��εP�

i,k , εC�

i,k ��SDi,k� � C2i,k �D��σ11, σ22���. (5.7)

19

Page 20: General Chain Ladder

So we see that BBRD can be modeled using GMCL by replacing (2.3) with (5.7), andimposing restrictions that β11 � 1 and β21 � 0, which could be achieved by the following:

Rβ � q, (5.8)

where

R � � 1 0 0 00 0 1 0

� ,q � � 10� ,β � �β11, β12, β21, β22��.

Each linear independent restriction is represented by one row of R and the correspondingelement of q. See Greene (2003, chp. 6) for more details.

6 Numerical Example

In this section, we consider an example where paid and incurred triangles from differentlines are modeled simultaneously. The data are from Schedule P of General AccidentInsurance Company, published by NAIC. The three triangles to be used are shown inTable 2-4. Table 2 and 3 are cumulative paid and incurred losses from Personal AutoInsurance while Table 4 is cumulative paid losses from Commercial Auto Insurance.

We should first notice that the data are not fully developed, but we will not considerthe development beyond year 10 in this paper. In analyzing this data, we will comparethe results under SCL, MCL, MuCL and GMCL, but we first address an important pointin GMCL and MCL, where we use the SCL for development in the tail to “robustify” tailestimates (see Mack 2003). This approach is not only necessary but also very reasonablebecause:

� As stated in Section 5, although the multivariate models try to achieve more efficientestimation by taking into account the residual covariance structure, this efficiencygain is reduced when the correlation between disturbance terms is small, which isoften the case in tail development periods (see the results in Merz and Wuthrich2008b for example).

� Data available in tail development periods are often well developed, which resultsin a small Σk to be used in the feasible estimation in (3.8). Inverting such a smallmatrix sometimes gives erroneous estimation of the parameters. For example, fittingthe MCL in development years 8-9 in our example will give an unreasonable estimateof the development parameters as �1307.25,0.38,�1187.99��.

� If GMCL is used to model paid and incurred losses, high correlation between the paidand incurred could occur in later development periods, which may lead to collinearproblem.

� It is observed that divergent paid-to-incurred ratios in SCL usually only happen toimmature accident years. This not only means that the model should regulate the

20

Page 21: General Chain Ladder

Table 2: Cumulative Paid Triangle from Personal AutoAY 1 2 3 4 5 6 7 8 9 101 101,125 209,921 266,618 305,107 327,850 340,669 348,430 351,193 353,353 353,5842 102,541 203,213 260,677 303,182 328,932 340,948 347,333 349,813 350,5233 114,932 227,704 298,120 345,542 367,760 377,999 383,611 385,2244 114,452 227,761 301,072 340,669 359,979 369,248 373,3255 115,597 243,611 315,215 354,490 372,376 382,7386 127,760 259,416 326,975 365,780 386,7257 135,616 262,294 327,086 367,3578 127,177 244,249 317,9729 128,631 246,803

10 126,288

Table 3: Cumulative Incurred Triangle from Personal AutoAY 1 2 3 4 5 6 7 8 9 101 325,423 336,426 346,061 347,726 350,995 353,598 354,797 355,025 354,986 355,3632 323,627 339,267 344,507 349,295 351,038 351,583 352,050 352,231 352,1933 358,410 386,330 385,684 384,699 387,678 387,954 388,540 389,4364 405,319 396,641 391,833 384,819 380,914 380,163 379,7065 434,065 429,311 422,181 409,322 394,154 392,8026 417,178 422,307 413,486 406,711 406,5037 398,929 398,787 398,020 400,5408 378,754 361,097 369,3289 351,081 335,507

10 329,236

Table 4: Cumulative Paid Triangle from Commercial AutoAY 1 2 3 4 5 6 7 8 9 101 19,827 44,449 61,205 77,398 88,079 95,695 99,853 104,789 105,427 106,6902 22,331 48,480 68,789 92,356 104,958 112,399 115,638 117,415 118,5713 22,533 44,484 65,691 88,435 102,044 112,672 115,973 118,3594 23,128 51,328 81,542 98,063 113,149 121,515 124,3475 25,053 57,220 84,607 104,936 117,663 126,1806 30,136 64,767 92,288 108,835 121,3267 34,764 69,125 91,354 111,9878 31,803 63,471 92,4399 40,559 77,667

10 46,285

development of paid and incurred in earlier periods, but also suggests that SCL workwell for development in the tail. So it makes intuitive sense to combine GMCL andSCL.

As for SCL, in the last development period where the the variance-covariance matrixΣk can not be estimated, we use Mack’s (1993) extrapolation method to get an estimateof the diagonal component. That is, we use

σnn,I�1 � min�σnn,I�2, σnn,I�3, σ2nn,I�2~σnn,I�3�, (6.1)

where σnn,k is the nth diagonal element of the variance-covariance matrix Σk. For our

21

Page 22: General Chain Ladder

Table 5: Predicted Ultimate Paid-to-Incurred Loss Ratios (%)AY SCL MCL GMCL1 MuCL GMCL2 GMCL3

1 99.50 99.50 99.50 99.50 99.50 99.502 99.49 99.49 99.49 99.55 99.49 99.493 99.29 99.29 99.29 100.23 99.29 99.294 99.20 99.20 99.20 100.23 99.20 99.205 99.83 99.82 99.46 100.04 99.43 99.386 100.43 100.43 99.64 100.03 99.57 99.517 103.53 103.53 102.08 99.95 99.69 99.448 111.23 111.23 106.73 99.81 99.83 99.419 122.10 122.15 111.17 99.67 99.98 99.39

10 126.22 126.15 111.37 99.69 99.97 99.38

Total 105.57 105.56 102.67 99.88 99.60 99.40

GMCL and MCL considered in the following, we will fit the multivariate model for de-velopment years 1-7 and SCL for development years 7-10. The point at which we splitthe models reflects our belief that the gain of increasing model complexity after year 7is minor, but one could have used more development years for the multivariate model ifthis was deemed necessary. Also, we will use the estimator Σg as the estimator for thevariance-covariance matrix.

We first study the simple SCL as the starting point. We fit the model, calculate thepredicted ultimate paid-to-incurred loss ratios, as shown in Table 5, and generate diagnosticplots using standardized residuals and fitted values, as shown in Figure 1, where the redline is the LOESS smoother of the residuals. It is now clear that there are three problemsassociated with SCL:

#1 It ignores contemporaneous correlations among triangles, and the correspondingmean square error estimation will be underestimated if positive correlation exists.

#2 The plots of standardized residuals against the fitted values for each triangle showclear downward trend, which indicates that the assumption under the model is notappropriate.

#3 The predicted ultimate paid-to-incurred loss ratios go far beyond 100% for accidentyears 8-10.

MCL is designed to solve problem #1, but it is not helpful to resolve the last two sincethe model structure of MCL is almost identical to SCL. We see in Table 5 that the predictedultimate paid-to-incurred loss ratios under MCL also diverge for immature accident years.The residual plots are almost identical as those of SCL in Figure 1, so we simply labelthem together in Figure 1.

22

Page 23: General Chain Ladder

Figure 1: Residual Plots of SCL & MCL

200000 300000

−1.

5−

0.5

0.5

1.5

Residual Plot for Triangle 1

Fitted

Sta

ndar

dise

d re

sidu

als

340000 380000 420000−

2−

10

1

Residual Plot for Triangle 2

Fitted

Sta

ndar

dise

d re

sidu

als

40000 80000 120000

−1.

5−

0.5

0.5

1.5

Residual Plot for Triangle 3

Fitted

Sta

ndar

dise

d re

sidu

als

Figure 2: Residual Plots of GMCL1

200000 300000

−1.

00.

01.

02.

0

Residual Plot for Triangle 1

Fitted

Sta

ndar

dise

d re

sidu

als

340000 380000 420000

−1.

5−

0.5

0.5

1.5

Residual Plot for Triangle 2

Fitted

Sta

ndar

dise

d re

sidu

als

60000 100000

−1.

5−

0.5

0.5

1.5

Residual Plot for Triangle 3

Fitted

Sta

ndar

dise

d re

sidu

als

Figure 3: Residual Plots of GMCL2

200000 300000

−1.

00.

01.

0

Residual Plot for Triangle 1

Fitted

Sta

ndar

dise

d re

sidu

als

340000 380000 420000

−1.

00.

01.

0

Residual Plot for Triangle 2

Fitted

Sta

ndar

dise

d re

sidu

als

60000 100000

−1.

5−

0.5

0.5

1.5

Residual Plot for Triangle 3

Fitted

Sta

ndar

dise

d re

sidu

als

23

Page 24: General Chain Ladder

Barnett and Zehnwirth (2000) pointed that SCL (and hence MCL) is often inadequatedue to the lack of intercept terms. Models excluding intercepts tend to overestimate largevalues and underestimate small values, which is exactly the reason for the downward patternin the residual plots in Figure 1. So a reasonable fix for problem #2 is to include intercepts.We now simply add intercepts for the MCL model, and refer to this model as GMCL1. Wesee that the residual plots in Figure 2 now show no clear pattern and the LOESS smootheris very flat for the first two triangles. The small downward trend in the third plot is withinthe realm of acceptability. The model GMCL1 seems to be a reasonable fit for each triangle.However, as we can see from Table 5, GMCL1 still results in divergent ratios although itpartly corrects this divergence since these ratios are closer to 100% than those in SCL andMCL. This is also not surprising since GMCL1 does not reflect the structural relationshipbetween the paid and incurred triangles.

The MuCL model is designed to account for this dependence between the paid and in-curred triangles, which results in predicted paid-to-incurred ratios close to 100%, as shownin Table 5. However, as we commented in Section 5, MuCL does not reflect the contempo-raneous correlations, and hence can not be used to solve problem #1. Furthermore, MuCLcan not model the three triangles simultaneously.

We now fit the GMCL model with intercepts and block-diagonal development matrixwith the following structure:

�Ak,Bk� � ���β10 β11 β12 0β20 β21 β22 0β30 0 0 β33

��� . (6.2)

We refer to this model structure as GMCL2. One can see that the ratios in Table 5 isclose to 100%, and they even outperformed the MuCL for accident years 3-6 where MuCLhas predicted ratios over 100%. The residual plots in Figure 3 do not show clear pattern.Although the LOWESS line in the second plot has a fast downturn at large values, thisis simply determined by one data point. The overall pattern reveals no clear violationof model assumptions. Hence, GMCL2 successfully solved all the problems that we havelisted above.

However, the GMCL2 model tends to overfit the data since it has too many parametersin the model, which is against the basic principle of parsimony. We can now eliminate pa-rameters that are not necessary. In this elimination process, we consider not only statisticalsignificance but also practical intuition. That is, we will keep the estimated parameters ofβ11, β22 and β33 regardless of their statistical significance for intuitive interpretation, andthen remove variables that seem redundant in a step-wise manner. A consequence of thisis that different model structure could be used in different periods. However, allowing thedevelopment structure to vary across development periods is in fact desirable since modelsin the chain ladder framework estimate loss development sequentially, and relationships

24

Page 25: General Chain Ladder

Figure 4: Residual Plots of GMCL3

200000 300000

−1.

5−

0.5

0.5

1.5

Residual Plot for Triangle 1

Fitted

Sta

ndar

dise

d re

sidu

als

340000 380000 420000−

1.5

−0.

50.

51.

5

Residual Plot for Triangle 2

Fitted

Sta

ndar

dise

d re

sidu

als

60000 100000

−1

01

Residual Plot for Triangle 3

Fitted

Sta

ndar

dise

d re

sidu

als

Figure 5: QQ-Normal Plots of GMCL3

−2 −1 0 1 2

−1.

5−

0.5

0.5

1.5

QQ−Plot for Triangle 1

Theoretical Quantiles

Sam

ple

Qua

ntile

s

−2 −1 0 1 2

−1.

5−

0.5

0.5

1.5

QQ−Plot for Triangle 2

Theoretical Quantiles

Sam

ple

Qua

ntile

s

−2 −1 0 1 2

−1

01

QQ−Plot for Triangle 3

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Figure 6: Histogram Plots of GMCL3

Histogram for Triangle 1

Standardized Residuals

Den

sity

−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

Histogram for Triangle 2

Standardized Residuals

Den

sity

−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

Histogram for Triangle 3

Standardized Residuals

Den

sity

−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

25

Page 26: General Chain Ladder

among triangles could vary over time due to changes in economic forces or claims han-dling. A universal algorithm such as SCL and MCL is seldom appropriate. We use thismore parsimonious model as the final selected one, and refer to it as GMCL3. One can seethat results in Table 5 also show convergent paid-to-incurred ratios for GMCL3, and theresidual plots reveal no clear patterns in Figure 4. Again, the little downturn in the thirdplot is within the realm of acceptability. Along with these residual plots, we show the QQplots of the standardized residuals against Normal quantiles. We see that our data do notquite follow a Normal distribution. However, Normal assumption is not necessary for ourmodel. But we do need to check the assumption of symmetry in (2.5). As an approxima-tion to the joint density, we use the marginal histograms along with the Gaussian kerneldensity estimation in Figure 6 as our simple diagnostic tools. Except for a little skewnessin the third plot, we see no clear indication of violation of the symmetry assumption.

The estimated parameters from GMCL3 are shown in Table 6. Parameters restrictedas zero are left blank to improve readibility. We see that most intercepts remain in themodel, which help correct the inadequacy of SCL and MCL, and that β12 and β21 togethermake into four out of six development periods, which help to predict convergent paid-to-incurred ratios. The last three rows show the residual correlations, which are definedas ρij � σij~»σii � σjj . We see clearly that there are strong positive correlations betweenthe two Personal Auto triangles and the two paid triangles, while the correlation betweenPersonal Auto incurred and Commercial Auto paid is hard to explain where positive andnegative values appear equal times.

Table 6: Estimated Parameters from GMCL3k 1 2 3 4 5 6 7 8 9

β10 43480 48237 32184 24728

β11 1.498 1.285 0.991 0.92 0.94 0.947 1.006 1.004 1.001

β12 0.155

β20 33246 57935 66070 82813 25029

β21 0.127 0.363 0.888

β22 0.916 0.772 0.54 -0.006 0.934 1.001 1.001 0.9999 1.001

β30 12255 20500 6196 9199

β33 1.641 1.441 0.987 1.07 1.081 0.947 1.027 1.008 1.012

σ11 204.322 159.28 39.835 12.911 2.023 4.395 1.426 2.976 1.426σ22 625.12 12.659 15.371 7.578 3.133 1.311 0.376 6.8e-07 1.2e-12σ33 290.638 354.369 158.311 23.104 20.608 0.017 35.31 0.782 0.017

ρ12 0.388 0.347 0.724 0.922 0.537 0.902ρ13 0.168 0.906 0.75 -0.087 0.342 0.042ρ23 -0.329 0.056 0.993 -0.449 0.845 -0.364

26

Page 27: General Chain Ladder

Table 7 shows the summary statistics for the portfolio, that is, the sum of PersonalAuto paid and Commercial Auto paid. Our choice of the paid triangle as representativefrom Personal Auto is not crucial for the final estimate since the paid and the incurredpredictions are very close under GMCL3. However, this is not true for SCL or MCL, whereextra analysis is needed to pick one ultimate estimate for the Personal Auto line. It is alsoto be noted that although not used in the calculation of the portfolio statistics, the PersonalAuto incurred triangle not only directly affects the development of the Personal Auto paidtriangle, but also affects the estimation of all the development parameters through theestimated variance-covariance matrix.

Table 7: Predicted Portfolio Resultsk 1 2 3 4 5 6 7 8 9 10 Total

Latest 460,274 469,094 503,583 497,672 508,918 508,051 479,344 410,411 324,470 172,573 4,334,390Ultimate 460,274 470,744 507,798 507,816 526,604 543,803 548,507 541,103 562,619 572,860 5,242,127

IBNR 0 1,650 4,215 10,144 17,686 35,752 69,163 130,692 238,149 400,287 907,737

Remark 6.

� We also tried the iterative procedure under GMCL3, but unfortunately, it does notconverge for this data.

� The R package “ChainLadder” by Gesmann and Zhang (2010) is used to analyze thedata. Code that enables one to reproduce the result in this paper could be found in thehelp documentation of the function “MultiChainLadder” (type ?MultiChainLadder).At the time of writing, version 0.1.3-3 has not been released at CRAN, but one canstill download it from:http://code.google.com/p/chainladder/source/checkout.

7 Conclusions

In this paper, we proposed a general multivariate reserving model which specifies bothnon-diagonal development matrix with intercepts and contemporaneous correlations. Theadvantage of this model over existing ones is obvious:

� It improves model adequacy by including intercepts.� It helps to predict consistent paid and incurred results through a non-diagonal de-

velopment matrix.� Simultaneous estimation considers correlation and hence results in more efficient

estimations.� Existing SUR theories could be utilized to study properties of different estimators.

This model structure is very flexible and can be potentially further extended for otherpurposes. However, the weakness of the GMCL model (also SCL and MCL) is that it does

27

Page 28: General Chain Ladder

not determine the dynamics of the stochastic claim process. For example, one can notsimulate the whole triangle without knowing the first column (See Jessen, Mikosch andSamorodnitsky 2009 for more details). In this paper, we only consider the first momentestimation of GMCL, and we devote the mean square error estimation to a separate paperwhere we generalize the Mack (1993, 1999) approach to the GMCL case.

8 Acknowledgements

The author is indebted to David R. Clark at Munich Reinsurance America for his generoushelp in formulating the idea of multivariate reserving. Thanks are also due to: CASdependency task force, especially Dr. Glenn Meyers and Prof. Edward Frees, for providingthe data used in the numerical analysis; Dr. Gerhard Quarg for pointing out that thebivariate model is identical to the double regression model of Mack; Markus Gesmann forhis generous help in constructing the R package; Dr. Thomas Mack for providing detailedinformation on the double regression model.

9 References

1. Ajne B (1994). Additivity of chain-ladder projections, ASTIN Bulletin, 24, No.2,311-318.

2. Baltagi B.H (1989). Applications of a necessary and sufficient condition for OLS tobe BLUE, Statistics & Probability Letters, 8, 457-461.

3. Barnett G, Zehnwirth B (2000). Best estimates for reserves. Proceedings of theCasualty Actuarial Society Casualty Actuarial Society, LXXXVII, 245-321.

4. Bartels R, Fiebig D.G (1991). A simple characterization of seemingly unrelatedregressions models in which OLS is BLUE, The American Statistician, 45, No.2,137-140.

5. Braun C (2004). The prediction error of the chain ladder method applied to correlatedrun-off triangles, ASTIN Bulletin, 34, No.2, 399-423.

6. Gesmann M, Zhang Y (2010). ChainLadder: Mack, Bootstrap, Munich and Multivariate-chain-ladder Methods. R package version 0.1.3-3.

7. Green W.H (2003). Econometric analysis, Prentice Hall.8. Henningsen A, Hamann J.D (2007). Systemfit: A package for estimating systems of

simultaneous equations in R, Journal of statistical software, 23(4), 1-40.9. Jessen, Mikosch and Samorodnitsky (2009). Prediction of outstanding payments in

a Poisson cluster model, Preprint, http://www.math.ku.dk/�mikosch/preprint.html.10. Kakwani N.C (1967). The unbiasedness of Zellner’s seemingly unrelated regression

equations estimators, Journal of the American Statistical Association, 62, 141-142.

28

Page 29: General Chain Ladder

11. Kremer E (2005). The correlated chain-ladder method for reserving in case of corre-lated claims developments, Blatter DGVFM, 27, 315-322.

12. Mack T (1993). Distribution-free calculation of the standard error,ASTIN Bulletin,23, No.2.

13. Mack T (1999). The standard error of chain ladder reserve estimates: recursivecalculation and inclusion of a tail factor,ASTIN Bulletin,29, No.2, 361-366.

14. Mack T (2003). Presentation at ASTIN Colloquium 2003, Berlin.15. Marker O. J, Mohl F. J (1980). Rating claims-made insurance policies, CAS Discus-

sion Paper Program, 265-304.16. Merz M, Wuthrich M (2008a). Prediction error of the chain ladder reserving method

applied to correlated run off trapezoids, Annals of Actuarial Science, 2(1): 25-50.17. Merz M, Wuthrich M (2008b). Prediction error of the multivariate chain ladder

reserving method, North American Actuarial Journal, 12, No.2, 175-197.18. Prohl C, Schmidt K.D (2005). Multivariate chain-ladder, Dresdner Schriften zur

Versicherungsmathematik.19. Quarg G, Mack T (2004). Munich chain ladder, Blatter DGVFM, Band XXVI, 4,

597-630.20. Schmidt K.D (2006). Optimal and additive loss reserving for dependent lines of

business, CAS Forum (fall), 319-351.21. Srivastava V.K, Giles D.E.A (1987). Seemingly unrelated regression equations models,

Marcel Dekker Inc.22. Theil H (1971). Principles of econometrics, Wiley.23. Wiser F. R (2001). Loss reserving, Foundations of Casualty Actuarial Science, Ca-

sualty Actuarial Society.24. Zellner A (1962). An efficient method of estimating seemingly unrelated regressions

and tests for aggregation bias, Journal of the American Statistical Association, 57,348-368.

25. Zellner A, Huang D.S (1962). Further properties of efficient estimators for seeminglyunrelated regression equations, International Economic Review, 3, No.3, 300-313.

26. Zellner A (1963). Estimators for seemingly unrelated regressions equations: someexact finite sample results, Journal of the American Statistical Association, 58, 977-992.

29