Model Points

8/2/2019 Model Points

1/30

Maastricht UniversityFaculty of Economics and

Business Administration

4

Model Points for

Asset Liability Models

June 28, 2008

Master thesis

Author

N.C. Janseni040010

Supervisors

Dr. J.J.B. de Swart (PwC)Dr. F.J. Wille (PwC)

Prof. dr. ir. S. van Hoesel (UM)Prof. dr. R.H.M.A. Kleynen (UM)


2/30

Abstract

ALM models are used by insurance companies as input to evaluate

their Solvency. They give rise to computational complexity because of

the number of policies and mainly the stochastic element. This stochastic

element defines the need for scenarios. Some insurance companies now use

data grouping methods to reduce run times, however these lack a theoret-

ical basis. In this paper we develop a theoretical framework to optimize

grouping strategies and to derive upper bounds for the inaccuracy caused

by grouping. We discuss determinants for the projected future cash flow

deviations, and as a result of those we try to bound the error resulting

from a certain grouping strategy. One of the main results in the paper is

the effect of linear data on the grouping strategy; we can simply group

them away. To make the method work in practice we rely on grid con-

struction, infinity norms and numerical derivatives. We will furthermore

apply the method to a real life insurance product.

1


3/30

Contents

1 Introduction 3

2 Life insurances 5

2.1 ALM models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Solvency II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Model Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Framework for the model point generation 7

3.1 The problem in an Operations Research context . . . . . . . . . 73.2 Minimizing the error induced by making use of model points . . 83.3 Numerical model point creation . . . . . . . . . . . . . . . . . . . 11

3.3.1 Grid construction . . . . . . . . . . . . . . . . . . . . . . . 113.3.2 Determining the amount of buckets . . . . . . . . . . . . . 12

3.3.3 Creating the model points . . . . . . . . . . . . . . . . . . 123.3.4 Assigning the policies to the model points . . . . . . . . . 133.3.5 Estimating the errors . . . . . . . . . . . . . . . . . . . . 13

4 A real life example 14

4.1 Describing the product . . . . . . . . . . . . . . . . . . . . . . . . 144.2 Selecting the attributes . . . . . . . . . . . . . . . . . . . . . . . 144.3 Exploring the function . . . . . . . . . . . . . . . . . . . . . . . . 154.4 Interpreting the function . . . . . . . . . . . . . . . . . . . . . . . 164.5 Generating model points . . . . . . . . . . . . . . . . . . . . . . . 174.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.6.1 Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.6.2 Approximate deviation from the base run . . . . . . . . . 19

5 Conclusion and future work 20

A Appendix 22

A.1 Gridconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 22A.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25A.3 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26A.4 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2


4/30

1 Introduction

Insurance companies have gained increasing interest for Economic Capital, whichis used for making the optimal economic decisions, risk based pricing and perfor-mance management. In addition, regulators are further developing their capitalrequirement regulations in the appearance of Solvency II, with which insurersneed to comply to stay in business. Insurance companies use Asset Liabilitymodels to project their future cash flows as input for evaluating their solvencyposition.

Insurers have a portfolio of policies with different attributes and differentscenarios that can occur. To make a projection of their expected cash flowsthey need to calculate for every scenario/policy combination the present value.These projections can take an enormous amount of time given current process-ing power. This is so much so that they needed to invent more sophisticatedtechniques to tackle this problem. Multiple options exist to deal with the prob-

lem, and generally it is not the case that one option alone will solve the problem.The options are:

Replicating portfolios

Software optimization

Scenario optimization

Grid technology

Model Points

Replicating portfolios has found usage in the financial industry to valuecash flows that are not actively traded, such as insurance products. Investingin a replicating portfolio will however not eliminate all risk, but this risk isnon-systematic and can therefore be diversified away. The market value of thereplicating portfolio is then used to determine the value of the liability cashflow.

Optimizing the software so that it runs faster could be done, however sincemost companies use certified products such as MoSes it is not our goal to beatthese well known software products.

Scenario optimization is performed by carefully selecting a subset of a largeset of scenarios, which is still accurately predicting future events. In this paperwe assume that this has already taken place.

Grid technology is a method where they spread the calculation burden forALM models across a grid of computers. Run times can be reduced by approxi-

mately the number of processors available. However, the drawback is that extrahardware costs money and in addition the processors cannot concurrently beused for other purposes.

The following observation makes data reduction methods very powerful. Pro-jecting the future cash flows takes a lot of time, however the model that com-putes them still runs in polynomial time. This means that we can decrease

3


5/30

the input size and achieve a great reduction in the time needed to calculatethe present values. By using model points, which is the technique that we will

present in this paper, we try to minimize the input size while still being accuratewithin some bounds.

Insurance products depend on the bare essentials on policies and scenarios,where the scenarios are generated by a scenario generator which produces up toa certain amount of scenarios. We will assume that scenario optimization hasalready taken place. Therefore, only the amount of policies can be reduced.

Model points are already being used by life insurers and have historicallybeen based on actuarial knowledge. Theoretical foundations are therefore notmade and as far as our knowledge reaches, no research has been done to inves-tigate the process of creating the model points and measuring the inaccuracy ofthem as an application for insurers.

Correctly measuring the inaccuracy is an important task. Inaccuracy cannow only be measured by comparing a grouped run to an ungrouped, the socalled base run. As said previously, a base run does take a lot of time and cantherefore not be performed to often. However, the base run is based upon a setof policies which may change over time. What is done now, is that the groupedrun at time t0 is compared to the base run at time t0, which is still correct. Theinaccuracy can now be measured perfectly. However, an additional time con-suming task is to validate the model points as time has passed and the portfoliohas migrated. Here an error is made by comparing the grouped run at t1 to theungrouped run at t0. By presenting a recipe based on theoretical foundationswe can therefore improve not only processing times, but also estimate the errorcorrectly, without having to perform a base run again.

The remainder of the paper is organized as follows: in Section 2 we explainmore about life insurances, ALM and Solvency, and the idea of model points.

From there we continue to Section 3, where we create a theoretical frameworkfor the model points. We explain notations, model the problem and find goodindicators for grouping and error approximation. We then we move on to areal life product in Section 4 on which we will apply our model and present theresults. Finally in Section 5 we make concluding remarks and comment on somefurther research.

4


6/30

2 Life insurances

Insurers protect the insured for some unforeseeable (mostly negative) events thatcan occur in the future. Of course this does not come for free. The insured paysa premium which can be monthly, yearly or some other frequency. Whenevera certain event happens, the insurer has a liability to fulfil. To calculate thepresent value of the expected liabilities to which they are exposed at a certainmoment in time, they use an Asset Liability Model (ALM). ALM models areused to calculate the present value of all future assets and liabilities. Howeverin this paper we are only interested in the liability side.

2.1 ALM models

By using these ALM models, companies can show that they are able to payout all policies under normal conditions. There exist some minimum amount

of financial resources that insurers must have in order to cover the risks. Therules that stipulate those amounts are bundled under the name Solvency. It isto ensure the financial soundness of insurance undertakings, and in the end toprotect policyholders and the stability of the financial system as a whole.

Previously, only closed form ALM models where used, which calculated thepresent values deterministically. In the last decade they have been moving to-wards a more realistic valuation [12] where stochastic modeling has been provedto be very useful. It makes use of various scenarios, where every scenario repre-sents some future event. These scenarios are produced by a scenario generator.However the development of more realistic models does also come with a numberof drawbacks. Most notably is the very slow runtime.

2.2 Solvency IIRecently (July 2007)[14], they have announced the arrival of Solvency II, whichreplaces old requirements and establishes more harmonized requirements acrossthe EU and therefore promoting competitive equality as well as high and moreuniform levels of customer protection. It is similar to the Basel II, which isthe regulation within the banking world. For Solvency II, market consistentembedded values and other initiatives make it even more important for insur-ance companies to model on a stochastic basis. Doing so, as already said, hassignificant implications for run times in valuing insurance portfolios. However,reporting needs to be done on a frequent basis which means there are certaintime constraints that need to be fulfilled. Since Solvency II is expected to be ef-fective from 2012 [15], there is an enormous interest in methods that can rapidlyas well as accurately predict the liabilities to which they are exposed[ 13]. Assaid in the introduction, model points are a way to accomplish this.

5


7/30

2.3 Model Points

A model point is an aggregate of policies, that should be a good representationof a cluster of policies. However what does good mean? If we think of whatpresent value the original policies produce, and compare the resulting presentvalue when using our model points, then we do have an indication of the qualityof our model points. An additional criterion, is that we do not want too manyof those, otherwise we do not see any computation time reduction.

A typical liability model can be grouped in three components: scenarios,policies and attributes. As already stated, scenarios are calculated by a scenariogenerator and we have assumed that they are already optimized. Therefore theyare assumed to be really different and necessary, which means that there is nofurther grouping to be done here. The attributes are characteristics of the policyholder (age, sex, insured amount, etc) and product specifics (duration, product,premium, etc.). These are necessary to define the product. Furthermore the

amount of attributes a policy has could be around 50, as compared to thenumber of policies, of which there are infinitely many. This means it is thepolicies that we can group together. This is done, in such a way that, if policiesdo resemble one another, we can simply group them in a new fictional policy,which represents the individual policies. This is called the model point.

6


8/30

3 Framework for the model point generation

3.1 The problem in an Operations Research context

In this section we will outline the model and provide a theoretical frameworkfor model point creation. We expect as inputs a data set X Mmn which isan m n matrix with policy data, where m is the number of policies, and n thenumber of attributes. A single policy is denoted by x = [x1 x2 . . . xn]

T, which is

a column vector of the attributes. Furthermore, we expect a black-box modelC(x), which calculates for a scenario, and a policy x, the projects the cash flowsof this policy to a certain time moment. A set of scenarios S, where s S isimplicitly assumed in the model.

In the matrix X, every element is denoted by xij X, i M and j N,where M is the set of the policies and N the set of the attributes of a policy.The black-box model is described by the function C(x) : Rm R in which

for every policy x the discounted cash flows are calculated. The objective is tominimize the CPU time to calculate the present value of the cash flows C(x),subject to the restriction that the quality of the model points is sufficient.

Note that attributes are can be of very different types (e.g. dates, amounts,percentages, etc.) which are all on a different scale. In order to compare them ina later stage, they can be scaled to correct for this. This is done by consideringthe maximum and the minimum of the range of an attribute and divide thevalue of every attribute minus the minimum value by their respective range.For the sake of notation we assume the data to be already scaled.

Model point creation could be done in two ways: we could just take a rep-resentative sample out of the policies and view these as the model points, or wecould use a more sophisticated method by carefully constructing these modelpoints. We take the approach of carefully constructing the model points, and we

will call this grouping. This is done by aggregating policies in the n-dimensionalspace D Rn.

We will now define our objective function. Let X be our set of obtainedmodel point policies, k M the index of a model point, where M is the setof all model point indices. Furthermore, let the amount of model point policiesbe defined as m = |M|. Then assuming that the time needed to calculate thepresent value of every policys cash flow is a constant t in minutes, 1 we get thesimple objective function:

min t m (1)

There are of course limits to the amount of policies we can group, since westill need to ensure that our set X is a good representation of the original set X.

In other words we do not want the loss of accuracy to be bigger then a certainamount p R. The next question, how to measure this accuracy loss? In theend, the present value of the cash flows resulting from the use of the model

1Note that this needs not to be true in practice. Policies with a longer duration tend to

take longer then policies with a short duration.

7


9/30


10/30

where q=3 1q! dC(q)dx(q) are the q-the order derivatives2.Furthermore the Jacobian is: dC

dx=Cx1

, Cx2

, . . . , Cxn

and the Hessian,

H = d2Cdx2

=

2Cx21

2Cx1x2

. . . 2C

x1xn2C

x2x1

2Cx22

. . . 2C

x2xn

......

. . ....

2Cxnx1

2Cxnx2

. . . 2Cx2n

If we now sum our equation (3) over all policies we get the following formula:

iMC(x

k) + dC(xk)

dx

iM(x

i xk) + 12

iM

xi xk

T d2C(xk)

dx2(xi xk) +

iMq=3 1q! dC(q)dx(q) )because of the equivalence of the Jacobian, Hessian and higher order derivativesfor every policy i we can take these terms out of the sum.

For simplicity, suppose first that 1q!dC(q)

dx(q)= 0 for q = 2, which means that

we suppose that C(x) is linear, then the following set of equations hold 3:iMC(x

i) =

N C(xk) + dC(xk)

dx

iM(x

i xk) =

N C(xk) + dC(x)dx

iM

xi N

iMx

i

N

0=

N C(xk)

(4)

This states that we do not make any error at all! For grouping this is a reallypowerful result, because this means that we can use only one bucket for a specificattribute that is linear in the present value of its cash flows. In other words, wecan group this attribute away.

Suppose now, that 1q!dC(q)

dx(q)= 0?. If we now rewrite C(x), this gives:

2Note that this is a non-standard use of a Taylor representation. If we replaced x xk by

h we see the familiar Taylor expression. If limh0h is equivalent to an increase in the amount

of model points3This implicitly sets all higher order derivatives also equal to 0

9


11/30

iMC(xi) = kMiGk C(xi) =()

kM |Gk|C

iGk

xi1|Gk|

iGk

xi2|Gk|

...iGk

xin|Gk|

+

kM

iGk

12

xi x

kT kHmp xi xk =

kM |Gk|C

xk1xk2...

xkm

+kMiGk 12 xi xkT kHmp xi xk

where in (*) we used our result from (4). However this is only holds whenever1q!dC(q)

dx(q)= 0 for q = 3. However we will assume from now on, that higher

order effects are negligible, which defines the error that we make to be solely inthe second order derivatives. Note that we can only consider them negligible,whenever our distance xi xk, k M, i Gk is small.

Considering now only one group k, for this group we can express the errormade as follows:

k =

iGkC(xi) |Gk|C(xk)

=

iGk

12

xi xk

THmpk

xi xk

= iGk 12 xi xkT

jN

2C(xk)x1xj

(xij xkj )

jN 2C(xk)x2xj

(xij xkj )

...jN

2C(xk)xnxj

(xij xkj )

Now for j1, j2 N we have

k =

iGk

j1N

j2N

122C(xk)xj1xj2

(xij1 xkj1

)(xij2 xkj2

) (5)

Suppose now that the cross derivatives are equal to zero 2C(xk)

xj1xj2= 0, j1 =

j2, then the error per model point is reduced to:

k = iGk jN 12 2C(xk)

x2j (x

i

j xk

j )

2

This now defines our grouping method. The grouping should be done in the

following way; if 2Cx2j

is large in a certain area, then we need xij xkj to be small.

This means that we need a lot of groups in those areas where the second orderderivatives are large.

10


12/30

But what if the cross derivatives are not equal to zero? By determining thenorm of the Hessian H we can define our grouping method. Let us denote by

kHmp the Hessian evaluated at model point k. If we would calculate H2 wewould have to calculate the eigenvalues ofH and consequently solve the system(H I) u = 0 of linear equations, where u is the eigenvector corresponding tothe eigenvector . Using Cramers rule, this system has only non-trivial solutionsif and only if its determinant vanishes, which means that the solutions are givenby: det(H I) = 0 which is the characteristic equation of H. This howeverinvolves solving polynomial function of order n , p () =

jN(1)

jSjnj ,

where Sj are the sums of the principal minors. Since there exist no exactsolutions to this system when n > 4, one has to resort to root finding algorithmssuch as Newtons method. In packages such as Mathematica or Matlab, thesealgorithms are readily available, but this is very costly and not desirable inpractice. What we can do however, is instead of making use of the 2-norm wecan use the norm, H = maxjNjN |Hjj | 4, where j are the rows inH and j the columns, Hjj the element of H in row j and column j. This isan easy calculation and can therefore be done in practice. We can now upperbound the error by

k kHmp

2

iGk

j1N

j2N

xij1 x

kj1

xij2 x

kj2

k M (6)

which states again that we should create more buckets whenever the infinitynorm is bigger. We can see that whenever our function C(x) is linear in itsattribute j, the infinity norm equals zero, and again, one bucket suffices.

3.3 Numerical model point creation

Before we move on, we will first solve some practical issues with the method inSection 3.2. So far we have assumed that the function C(x) is known and we haveshown that, whenever we know this function we can analytically differentiate it.Furthermore, we can obtain a grouping strategy, as well as an upper bound onthe error made as we have shown in equation (6). However, we do not alwayshave access to this function. Therefore we will also define a numerical way toobtain a grouping strategy.

3.3.1 Grid construction

To obtain information about the function C(x), we need to explore the functionover the range of the attributes. In order to do so we create a grid. We will call

this grid our exploration grid. We call it exploration grid to avoid confusionlater on, when we describe the creation of the model points in Section 3.3.3.The exploration grid is measuring the value of C(x) at different values of xjover the range of every attribute j. Constructing such a grid can be a time

4If cross derivatives are 0 we simply have only the diagonal elements of H as every sum

11


13/30

consuming task, especially if we want it to be very precise. We discus this inAppendix A.1. This however is a one time only investment. Whenever we know

the landscape where our function C(x) lives we do not have to perform thisaction again. In this paper we assume that we have created a grid in a nestedway as in Appendix A.1. Denote by L the set of grid points and let a single gridpoint be denoted by l L. The amount of grid points is now |L|. These gridpoints are then constructed by defining per attribute j an amount of buckets bej ,which we will call our exploration buckets. The feasible range per attribute jis now divided in bej buckets. This implies a division of the space D in an equalamount of hypercubes |L| =

jN b

ej , which we index by l L. Then Dl is a

hypercube, where lDl = D and Dl

Dl = , l = l, l, l L. In Appendix

A.1, we describe two grid construction methods. For more advanced grids werefer to [10], where an adaptive grid is discussed.

If we move back to our Hessian H, as discussed in Section 3.2, it now hasto be evaluated in l points. Each point is now the center of a hypercube Dl.Let us distinguish between those Hessians by introducing the notation lHe forthe exploration Hessian evaluated in grid point l. Since we need to numer-ically compute the Hessian by using the central difference formula, we needto evaluate the function C(x), 3 times (C(x1, . . . , xj , . . . , xn), C(x1, . . . , xj +

j , . . . , xm), C(x1, . . . , xj j , . . . , xn)) per entry oflHe. This means that for

every lHe, being a symmetric n n matrix, we need to call C(x), 3 n(n+1)2times. In total this makes |L| 3 n(n+1)2 function calls. Whenever we constructsuch a grid, this should really be taken into consideration, since the amount offunction calls becomes very large, even for a small number of grid points.

3.3.2 Determining the amount of buckets

Let us now define the maximum for every attribute j over all grid points lin lHe, l L, by hej = maxlL

jN |

lHejj |, j N. We can now put allthese hej in a column vector h

e Rn. The most important attribute is now the

attribute j corresponding to the value he. Note that this value is equal tomaxlL

lHe. The vector he defines our grouping structure. First we set theamount of buckets for attribute j, the amount of buckets for the other attributesis now determined, based on their relative value of hej , j = j compared to h

ej

.

Let bmpj denote the amount of model point buckets for attribute j. Then we

can calculate bmpj by using the following formula bmpj = b

mp

j

hejhej

, j N\{j},

where [0, 1] is some parameter that should be chosen in any way to adjustfor the amount of buckets bmpj . Note that the amount of buckets b

mpj , j N

can never be more then bmpj

.

3.3.3 Creating the model points

The amount of buckets for every attribute j can be spread in various ways overthe range of every attribute. We could spread them evenly for every attribute j,or be somewhat more sophisticated and let the population distribution decide

12


14/30

on the cut-off points. Dividing the range of every attribute j in any way, weobtain jN bmpj hypercubes in the Rn. Note that some of these hypercubesmay not contain any policies. We will not create any model points in emptysets, which means that the amount of model points constructed in such a wayis m

j b

mpj , where k is the model point of Gk, just as in Section 3.2.

Whenever the amount of model points is considered to be too large, one shoulddecrease bmp

jand recompute bmpj , j = j until the desired amount of groups k

is formed.

3.3.4 Assigning the policies to the model points

The next step is to assign to every policy i a group Gk. This is done by checkingif the policy is within the range of group Gk in every dimension n. Once weknow i M, its group i Gk, we can compute the final model point policies

xk

, k M as follows: xk

=

iG

k

xi1

|Gk| . . .

iG

k

xin

|Gk| T , k M3.3.5 Estimating the errors

To estimate the errors made by using the model points we can use the formulaintroduced in (6). However we need to evaluate kHmp, k M

to obtainan exact upper bound on the error made by our grouping strategy. This involves

again calling the function C(x), m 3 n(n+1)2 times. Note that this is noproblem whenever m is small (this is what we are trying to achieve anyhow).However, whenever m is considered to be too large, we can also get an ap-proximation on the upper bound by considering the values of lHe, l L,which we have already calculated. This means that this does not cost any ex-tra function calls. We can approximate this, by considering the distance fromthe model point to every grid point (xl xk), and use either the maximum orthe closest lHe. Another, somewhat more sophisticated method, would useinterpolation techniques as in [1]. We will describing an interpolation methodby looking at the grid points that are the closest in every direction. So in ourdimension n we have 2n closest grid points, because in every dimension we havetwo ways in which we can go. These will all be incorporated for the estimationofkHmp, by considering their relative distance from x

k. Let all closest gridpoints to k be denoted by lz L where z = 1, . . . , 2

n

kHmp l1He(xl1 x

k) + . . . + l2nHe(xl2n xk)2n

z=1(xlz xk)

Of course since this is only an indication, it cannot guarantee a certain amount

of error, but is still important from a practical view.

13


15/30


16/30

Therefore, this is why we will look at a specific product type. Some otherattributes have only a few outcomes over the range of the attribute which means

there is not much grouping to be done. There also exist a lot of attributes thatare calculated by the model, and attributes that are not used for our specificproduct. So at first we will produce the descriptive statistics of some of theattributes in Table 1. According to these statistics we can get a broad idea onwhich attributes it might be interesting to group. We can already specify oneon which we will not group: sex, since it only has two possible values, whichare integers. After a thorough investigation in the total set of attributes, weconcluded with five attributes that where thought to be the main determinants.All others where either dependent on those five or, were considered to be critical,which means that they will not be grouped upon. The resulting attributes arepresented below:

1. date of birth

2. start date

3. end date

4. insured amount

5. interest

4.3 Exploring the function

Now that we have found our set of interesting variables we want to measure theirinfluence on the projected cash flow under every scenario. For this, will followour method as described in Section 3.3 we will construct a grid with virtual

policies. It is virtual in the sense that the policies do not come from the dataset X. They are made using the feasible input range for every attribute. Thisfeasible input range is assumed to accompany every program. In our programhowever, we did not have this feasible input range and we constructed these bylooking at the descriptive statistics of the data set. In Table 1 we present thesestatistics. The feasible input range of an attribute is now determined by thedifference between the maximum and the minimum value of an attribute.

There exist various options to create such a grid and we discuss two methodsin Appendix A.1. Let us first consider a nested grid. This would require atleast 2 3n amount of function calls, when we consider only 2 grid points perattribute (one in its minimum and one in the maximum, and disturbing themall with a + and a ). For n = 5 this is already 7776 calls of C(x). Sincea policy takes on average 70 seconds to calculate on our hardware, this would

take more then 6 days to calculate. This was considered to be too long for us.Therefore we use a sequential grid construction, which means that for everyattribute we move from its minimum to its maximum in 10 steps and set allother interesting parameters equal to their averages. These are then the virtualpolicies. We are aware of the fact that we lose a lot of accuracy in this wayand we can only hope that the averages are a good representative of the whole

15


17/30

function. This also means that we have only one entry in the matrix H(xl)l L, which is the second order partial derivative with respect to the attribute on

the corresponding axis where we created our grid. This is immediately thenorm of the matrix lHe, l L. Before we continue, we should check if theconstructed grids represent feasible combinations of the attributes. This meansthat no start date can fall after an end date or that the date of birth ofa person should be before the start date of a policy.

After constructing the grid points, we can put them one by one in the modeland obtain the resulting discounted cash flows for every scenario.

4.4 Interpreting the function

Now that we have projected the cash flows for every virtual policy for 500scenarios we will first investigate the cash flow movement over the ten gridpoints. We will look at scenario 1, which we plotted in Figure 2.

A first step is interpreting these graphs. Starting with the cash flows forthe date of birth, we see that the later you are born the more negative theprojected cash flow becomes. This can be explained by the fact that the lateryou were born, the younger you are now and the longer you are expected tolive. The insured amount is build up of two components, the death benefitinsured amount and the surrender benefit insured amount. The relationbetween the death benefit and the surrender benefit is on average 1 : 4,which makes the surrender benefit the strongest determinant of projectedcash flows. Once you have died, there will not be any possibility to surrender.This means that, the longer you live the more surrender benefit you will get,and maybe in addition even the death benefit.

The start date is fluctuating a lot and one can hardly tell what the drivers

behind this function are. However, if we look at the scale, it is not an interestingvariable at all. We can therefore simply put this attribute in one bucket withoutmaking a significant error.

Looking at the end date, we see that when we move our end date to a latertime moment, the cash flow is also increasing, which means that the liability isdecreasing (the cash flows are all negative). Because we expected the liabilityto increase when a policy has longer duration the result is really counter intu-itive. However looking at the actuarial model, it can easily be explained. Byshifting the end date to a later time moment, we are increasing the duration.When investment premium is calculated, it makes use of the insured amount,the duration and the interest rate. Discounting the insured amount atthe same interest rate over a longer term, results in a lower investmentpremium. In turn, the investment premium is used to calculate the possible

amount of surrender cash flows, which is now of course lower.Considering the insured amount we see a linear relationship between at thechosen grid points between the project cash flows and the insured amount.However, this is only an expectation, the function may still fluctuate in betweenthe grid points. However when making the buckets, we expect that we can putthis attribute in one bucket without making any significant error.

16


18/30

Discounting with higher interest rate will result in a lower investmentpremium, which again causes the discounted cash flows will be lower.

What we can see from these functions is that they are either monotonicallyincreasing or monotonically decreasing except for the start date, which wasnot considered important because of its scale. However, if the scale would havebeen very large the amount of buckets needed would be very big which wouldcause issues. However, all the functions seem to be not that far from linearityand we therefore do not expect large errors at all, even if we will group themall in one large model point (we will further discuss this in Section 4.6).

To estimate our second order partial derivatives for every attribute we usethe central difference formula. The graphs are presented in Figure 3. If we lookat them, one can first of all see, that all second order derivatives ar not farfrom zero which means linearity. Since we did not know the whole cash flowfunction, the function like end date might fluctuate a little bit in between. Thiscan simply be seen as numerical noise. Using the recipe presented in Section3.3, we can take the maximum second order partial derivative for every gridpoint and take the maximum of these to determine the amount of buckets forthe attribute. The results are presented in Table 3, where they are all sortedaccording to importance. We now see that the end date is considered themost important attribute. Setting the amount of buckets for the end date,automatically induces the amount of buckets for the other attributes as definedin Section 3.3.2.

4.5 Generating model points

Until now, we have not yet used any policy data. The projected cash flowfunctions were solely based on the feasible range of the attributes as discussed

in Section 4.3. The policy data that we will use is from an existing insurancecompany. However for privacy and computational reasons we have modified thedata. We will now make use of these policies for the bucketing of the attributes,by spreading the population evenly over the buckets as described in Section3.3.2. In Microsoft Visual Basic for Applications (VBA) we developed a scriptto produce a cumulative distribution function which gave us our cut-off pointsfor the buckets at 1/bj . Now that we have set our buckets, we need to find outwhich policies are in a certain group. We will do this by using the method asdescribed in Section 3.3.4. For this we again used VBA by checking if a policyis within the range of every buckets cut-off points and code every policy with agroup number. The model points are now the averages of every attribute overthe policies within the group.

4.6 Results

We will first produce a base run, where we calculate the cash flow by using allsingle policies. In our resulting data set we have 243 policies. The importantstatistics we need are the calculation time and the resulting present value per

17


19/30

scenario. We will present them in Table 2. As shown in Table 2 calculating thecash flows for all single policies took 160 minutes.

4.6.1 Grouping

To show our method at work, we will explain stepwise what we can achieve.Table 3 is produced as described in Section 3.3.2, from which we can see thatthe end date requires the most groups. We will initially set the end date to 3groups which gives the date of birth 2 groups and the amount insured and theinterest rate only 1 according to our formula in 3.3.2. Therefore we go from243 individual policies to 6 model point policies. However, let us start bottom upto show the algorithm at work. We start by grouping the insured amount in onemodel point. Since this attribute looked linear we do not expect any error, apartfrom the error that we made by assuming that the cross derivatives are zero.When grouping this attribute away we could decrease the amount of policies by

a factor 3, i.e. 243 policies to only 81 model point policies. Computation timesdropped from over 2.4 hours to only 62 minutes. If we now look at our cashflow deviation, we still have an accuracy of 99 .9995% which is in line with ourtheory.

Now, as we have seen, we can eliminate the start date attribute since itdoes not contribute much to the discounted cash flow. When doing this, weactually also group this one away and set the value equal to its average. Areduction of 3 is again made i.e. 81 model point policies to 27. Our accuracylowers slightly but we are still 99.83% accurate. Solving this took only 20minutes.

The interest rate is then grouped in one bucket which leaves us with only9 model point policies. Computation times are now only 7 minutes and, quiteunexpectedly, the accuracy is only 97.15 %, which is still fair for most life insur-ers since they are aiming for an accuracy in between 95% and 98%. The resultwas unexpected given that the interest rate looked linear. However, recallthat we only gained very local insight by assuming that the cross derivativesare zero and that we have at every grid point only the second order partialderivative in one direction. Although the interest rate is nearly linear onthis local part of the hyperplane, it fluctuates more elsewhere.

Therefore we will see if perhaps the date of birth should have been nexton the list instead of the interest rate. When now grouping, in addition toinsured amount and start date, the date of birth in one model point, we are98.25 % accurate. This is also unexpected but can be explained by the weaknessof our local grid (see Appendix A.1)

However, let us continue in the way that our algorithm predicted. In the

end, this leaves us with 6 model points which are 95.91 % accurate. This tookonly 3.3 minutes. Therefore, what would happen if we would group them all in 1model point? Doing this, leaves us still with 95.72 % of accuracy. Again we are

5All runs where performed on an IBM Lenovo T61 with Intel T7500 Core 2 Duo 2.2

GHz Processor with 2.0 Gb of RAM. Microsoft Visual Basic for Applications was run on the

Windows XP operating System.

18


20/30

confronted with the limited reliability of the local insight of a sequential grid.The function C(x) seems to be very flat and does not seem to fluctuate very

much over the whole domain. As insurance companies really do have troublegrouping policies while maintaining accuracy, this is generally not the case as.It could be due to the chosen product or the restriction on the input parameters.It was however not very well suited for demonstration purpose of our method.

4.6.2 Approximate deviation from the base run

For the insured amount we looked at the error that we can predict by groupingthem all in one model point. By using the interpolation method as describedin Section 3.3.5 our model estimates an accuracy of 99.999999% which is higherthan the actual accuracy as we can see in Table 3. Although this does not looktoo bad as compared to the 99.9995%, one has to put this in perspective to thetotal error range we are looking at. We will never make a higher error than

95.72% since this is the error made when we group all policies in one modelpoint. Again, we can not be conclusive and the lack of a good grid disturbs theoutcomes.

19


21/30

5 Conclusion and future work

In this paper we have provided a solution to estimate the error induced by theusage of model points, a solution to a problem that insurance companies werenot even aware of. They estimated the errors of their grouping strategy basedon an outdated base run. We have shown how we can correctly upper boundthese errors without having to calculate the base run again.

Furthermore we have defined a way how insurance companies could grouptheir policies. Whenever a linear attribute is encountered it can be groupedaway, without making any error at all. If an attribute is non-linear, it should begrouped according to the Hessian in a certain area of the domain of the functionthat discounts the cash flows.

Even if the exact analytical function is not available, we have shown numer-ical ways to group the policies. We made use of grids to explore the functionthat calculates the present value of the cash flows. These exploration grids do

consume a lot of time, however if one is willing to make the one time only invest-ment of exploring this function, we can upper bound the error made. Moreover,if one does not want to invest too much in the construction of the grid we canstill provide an approximate upper bound.

For a simple but real life example we have illustrated the method. We haveshown the tradeoff and drawbacks of a fast local exploration of the landscape,by making use of the sequential grid versus a slow nested grid construction. Theslightly weak results can be attributed to the unfortunate choice of the productand lack of computing power for a good grid.

Improvements can be made by better distributing the buckets over an at-tribute. This could be done by distributing them depending on the Hessian overthe range of the attribute, instead of uniformly distributing them. Consideringthe grids, a more sophisticated grid such as an adaptive one, may be of greathelp. However, if we know the analytical function we might not need to use theexploration grids at all and we can perfectly upper bound the errors. The exactcalculations can be performed by making use of software such as Mathematicaor Maple.

There is still a lot of work to be done in this area, however a first step hasbeen made which can greatly benefit insurance companies.

20


22/30

References

[1] Robert S. Anderssen and Markus Hegland. For numerical differentia-tion, dimensionality can be a blessing. Mathematics Of Computation,68(227):1121 1141, February 1999.

[2] J. Ghosh A.Strehl, G.K. Gupta. Distance based clustering of associationrules. Department of Electrical and Computer Engineering, 1991.

[3] Prof. dr. A. Oosenbrug RA AAG. Levensverzekering in Nederland. ShakerPublishing, 1999.

[4] S. Z. Wang G. Nakamura and Y. B. Wang. Numerical differentiation forthe second order derivative of functions with several variables. MathematicsSubject Classification, 1991.

[5] Hans U. Gerber. Life Insurance Mathematics. Springer, 1997.[6] R. Bulirsch J. Stoer. Introduction to Numerical Analysis. Springer-Verlag,

2 edition, 1991.

[7] J.M. Mulvey and H.P. Crowder. Cluster analysis: An application of la-grangian relaxation. 1979.

[8] J.M. Mulvey and H.P. Crowder. Impact of similarity measures on web-pageclustering. 2000.

[9] C.M. Procopiuc P. K. Agarwal. Exact and approximation algorithms forclustering. Management Science, 25(4):329340, 1997.

[10] D. S. McRae R. K. Srivastava and M. T. Odmany. An adaptive grid

algorithm for air-quality modeling. Journal of Computational Physics,(165):437472, 2000.

[11] Vladimir I. Rotar. Actuarial Models: The Mathematics Of Insurance.Chapman & Hall, 2006.

[12] J. Rowland and D. Dullaway. Smart modeling for a stochastic world. Em-phasis Magazine, 2004.

[13] M. Sarjeant and S. Morrison. Advances in risk management systems of lifeinsurers. Information Technology, September 2007.

[14] HM Treasury. Solvency II: A new framework for prudential regulation ofinsurance in the EU. Crown, February 2006.

[15] HM Treasury. Supervising insurance groups under Solvency II; A discussionpaper. Crown, February 2006.

[16] G.R. Wood and B.P. Zhang. Estimation of the lipschitz constant of afunction. Journal of global optimization, 8(1):91103, January 1996.

21


23/30

A Appendix

A.1 Gridconstruction

When we do not have an analytical model, we rely on numerical methods. Togain insight in our cash flow function in order to later construct the Hessian, weneed to know the value of C(x) evaluated at different points. A correct grid canonly be constructed when we can isolate an attribute. This means we need to fixall attributes at a value except for one which we let increase over its range in acertain number of steps depending on the chosen grid size. We can do this for allattributes. However we need to check that the policies generated in this fashionare feasible. This means that no end date can fall before a start date, etc.6. There are a many ways to construct such a grid, see [10], but we will presenttwo ways; sequential and nested grid construction. A nested grid constructionconsiders all possible combinations of the attribute values. On the contrary a

sequential grid construction calculates a grid per attribute, while fixing the otherattributes at a certain value. Therefore set obtained by calculating sequentialgrid points is actually a subset of the nested version. Although the nestedversion is far more accurate, the computational complexity is overwhelming [1].

6If we let our grid size go to we have the exact n-dimensional landscape

22


24/30

Algorithm A.1: Sequential grid construction(C(x))

j = [0 . . . 0 j 0 . . . 0]Tfor r 0 to |R|

do

xr1 = mini xi1 + r

maxi xi1mini x

i1

|R|

xr2 = x2, xr3 = x3, . . . , xrn = xnC(xr)C(xr + 1)C(xr 1)

for r 0 to |R|do

xr2 = mini x2 + r maxi x2mini x2

|R|

xr1 = x1, xr3 = x3, . . . , xrn = xn

C(xr)C(xr + 2)C(xr 2)

...for r 0 to |R|

do

xrn = mini xin + r

maxi xinmini x

in

|R|

xr1 = x1, xr2 = x2, . . . , xr,n1 = xn1C(xr)C(xr + n)C(xr n)

As illustration, let R be the set of buckets for every attribute j, and r Ra bucket. Consider then constructing |R| buckets for every attribute j. Thiswill construct in total |R|n(1+2n) policies which is exponential in its attribute.Suppose now that the time to compute the cash flow of one policy is t seconds.The nested grid construction, as described above, will take t |R|n(1 + 2n)seconds. To visualize this consider 4 attributes, for which we would like toto compute 10 grid points, and the time to compute a single policy on thiscomputer is 70 seconds. This would take about 72 days to compute. For thepurpose of this paper, this was considered to be too long and we will look at the

sequential grid which takes (3 n t |R|) seconds and is a factor |R|n1(1+2n)

3nfaster.For the nested version we need to make a choice at what point to fix the other

attributes. For now we will fix them to be their averages, but we are well awarethat this might not be the correct choice. In addition the few points created in

23


25/30

the sequential loops only give very local insight in the true n dimensional space.

Algorithm A.2: Nested grid construction(C(x))

for r1 0 to |R|do

xr11 = mini xi1 + r1

maxi xi1mini x

i1

|R|

for r2 0 to |R|do

xr22 = mini xi2 + r2

maxi xi2mini x

i2

|R|

...for rn 0 to |R|

do

xrnn = mini xin + rn maxi x

i

n

mini xi

n|R|

C(xr1,1, xr2,2, . . . , xrn,n)C(xr1,1 + 1, xr2,2, . . . , xrn,n)...C(xr1,1, xr2,2, . . . , xrn,n + n)C(xr1,1 1, xr2,2, . . . , xrn,n)...C(xr1,1, xr2,2, . . . , xrn,n n)

Faster hardware or splitting the workload over multiple processing unitscould help speeding up the computation times for the grid construction, howeverthis still does not reduce the exponential n which is the main determinant forthe long runtime. The bottom line is that one should really make use of adaptivegrids.

24


26/30

A.2 Notations

1. D = [0, 1]n

: the n-dimensional space in which the scaled function C(x)lives

2. M = {1, . . . , m}: represents the policies, where i M is a policy

3. M = {1, . . . , m}: represents the model point policies, where k M isa model point

4. N = {1, . . . , n}: represents the policy attributes, where j N is anattribute

5. L: represents the grid points, where l L is a grid point

6. Gk: group of policies represented by model point kGk M, Gk Gk = , kGk = M, k, k M

7. x Rn: a generic policy consisting of n attributes

8. xi Rn: a specific policy i

9. xk Rn model point policy k,

10. xij [0, 1]: the value of attribute j for policy i

11. C(x) : Rn R: the function which discounts the future cash flows ofpolicy x

12. bej : the amount of buckets for attribute j in the exploration grid

13. bmpj : the amount of buckets for attribute j in the model point gridjN bmpj m

14. H Rnn: the general Hessian

15. Hj Rn: the j-th row in H

16. Hjj R: the element at row j and column j of the Hessian

17. H R: the norm of the Hessian

18. lHe R: the Hessian evaluated in exploration grid point l

19. kHmp R: the Hessian evaluated in model point k

20. hej R = maxlLjN lHjj : the maximum sum over the rows in everyexploration grid point l, for every attribute j21. he Rn = [he1 h

e2 . . . h

em]

T

22. he = H R: the maximum value in the vector he

25


27/30

A.3 Figures

Grouped

policies

Individual

policies

Model

point

genera

tor

C(X)

Grouped cash

flows

Baserun cash

flows

Compare

Estimate

the

error

Figure 1: An overview of the structure

26


28/30

-100000

-95000

-90000

-85000

-80000

-75000

-70000

1 2 3 4 5 6 7 8 9 10

Date of birth

Dateofbirth

94304 66

94115,3

1 2 3 4 5 6 7 8 9 1 0

Startdate

94872,76

94683,4

94494,03

94304,66

Startdate

60000

40000

20000

0

1 2 3 4 5 6 7 8 9 10

Enddate

160000

140000

120000

100000

80000 Enddate

-1400000

-1200000

-1000000

-800000

-600000

-400000

-200000

0

1 2 3 4 5 6 7 8 9 10

Insured amount

Insured amount

-110000

-105000

-100000

-95000

-90000

-85000

1 2 3 4 5 6 7 8 9 10

Interest

Interest

Figure 2: cash flows for 5 different attributes

27


29/30

-0,0004

-0,0002

0

0,0002

0,0004

0,0006

0,0008

0,001

0,0012

0,0014

1 2 3 4 5 6 7 8 9 10

Date of birth

Dateofbirth

-0,008

-0,006

-0,004

-0,002

0

0,002

0,004

0,006

1 2 3 4 5 6 7 8 9 10

End date

End date

-4E-14

-3E-14

-2E-14

-1E-14

0

1E-14

2E-14

1 2 3 4 5 6 7 8 9 10

Insured amount

Insured amount

-0,02

-0,018

-0,016

-0,014

-0,012

-0,01

-0,008

-0,006

-0,004

-0,002

0

1 2 3 4 5 6 7 8 9 10

Interest rate

Interest rate

Figure 3: 2nd order partial derivative for 4 different attributes

28


30/30

A.4 Tables

Parameter Nr Of Dif-ferent Val-ues

Min Max Average

date of birth 2137 16-8-1933 31-12-1975 26-11-1952sex 2 0 1 -start date 1156 5-4-1995 23-6-2007 36679end date 1469 22-7-2009 21-6-2037 43254insured amount 2505 1086 213292 2911885interest rate 1327 0 0.08552 0.065022

Table 1: Descriptive statistics

Accuracy (%) Time(s)

NumberOfPoli-cies

bucketsdateofbirth

startdate

enddate

insuredamount

ih

100 9655 243 - - - - -99.9995 3718 81 - - - 1 -99.83 1194 27 - 1 - 1 -97.19 415 9 - 1 - 1 198.25 285 9 1 1 - 1 -95.9153 200 6 2 1 3 1 195.72 9 1 1 1 1 1 1

Table 2: Results

Attribute he

end date 431257.4164date of birth 305609interest 1.2607455insured amount 0.098635801

Table 3: Maximal second order partial derivatives (scaled)

29

Documents

Model Points