6 Statistical - kuweb.math.ku.dk/~mogens/springerchap6.pdf · Extremal Ev en ts for Insurance and Finance Decem b er 9, 1996 Springer-V erlag Berlin Heidelb erg New Y ork London P

P. EmbrechtsC. Kl�uppelbergT. MikoschModelling Extremal Eventsfor Insurance and FinanceDecember 9, 1996

Springer-VerlagBerlin Heidelberg NewYorkLondon Paris TokyoHongKong BarcelonaBudapest

Preface

6Statistical Methods for Extremal Events

6.1 IntroductionIn the previous chapters we have introduced a multitude of probabilistic mod-els in order to describe, in a mathematically sound way, extremal events inthe one{dimensional case. The real world however often informs us aboutsuch events through statistical data: major insurance claims, ood levels ofrivers, large decreases (or indeed increases) of stock market values over a cer-tain period of time, extreme levels of environmental indicators such as ozoneor carbon monoxide, wind{speed values at a certain site, wave heights dur-ing a storm or maximal and minimal performance values of a portfolio. Allthese, and indeed many more examples, have in common that they concernquestions about extreme values of some underlying set of data. At this pointit would be utterly foolish (and indeed very wrong) to say that all such prob-lems can be cast into one or the other probabilistic model treated so far:this is de�nitely not the case! Applied mathematical (including statistical)modelling is all about trying to o�er the applied researcher (the �nance ex-pert, the insurer, the environmentalist, the biologist, the hydrologist, the riskmanager, : : :) the necessary set of tools in order to deduce scienti�cally soundconclusions from data. It is however also very much about reporting correctly:the data have to be presented in a clear and objective way, precise questionshave to be formulated, model{based answers given, always stressing the un-

284 6. Statistical Methods for Extremal Eventsderlying assumptions. The whole process constitutes an art: statistical theoryplays only a relatively small, though crucial role here.The previous chapters have given us a whole battery of techniques withwhich to formulate in a mathematically precise way the basic questions under-lying extreme value theory. This chapter aims at going one step further: basedon data, we shall present statistical tools allowing us to link questions askedin practice to a particular (though often non{unique) probabilistic model. Ourtreatment as regards these statistical tools will de�nitely not be complete,though we hope it will be representative of current statistical methodology inthis fast{expanding area. The reader will meet data, basic descriptive meth-ods, and techniques from mathematical statistics concerning estimation andtesting in extreme value models. We have tried to keep the technical level ofthe chapter down: the reader who has struggled through Chapter 5 on pointprocesses may well be relieved! At the same time, chapters like the one onpoint processes are there to show how modern probability theory is capable ofhandling fairly complicated but realistic models. The real expert on ExtremalEvent Modelling will de�nitely have to master both \extremes".After the mathematical theory of maxima, order statistics and heavy{tailed distributions presented in the previous chapters, we now turn to thecrucial question:How do extreme values manifest themselves in real data?A full answer to this question would not only take most of the present chapter,one could write whole volumes on it. Let us start by seeing how in practiceextremes in data manifest themselves. We do this through a series of partlyhypothetical, partly real examples. At a later stage in the chapter, we willcome back to some of the examples for a more detailed analysis.Example 6.1.1 (River Nidd data)A standard data{set in extreme value theory concerns ows of the river Niddin Yorkshire, England; the source of the data is the Flood Studies ReportNERC [472]. We are grateful to Richard Smith for having provided us witha copy of the data. The basic set contains 154 observations on ow dataabove 65 CUMECS over the 35{year period 1934{1970. A crude de{clusteringtechnique was used by the hydrologists to prepare these data. Though the fullset contains a series of values for each year, for a �rst analysis only the annualmaxima are considered. In this way, intra{year dependencies are avoidedand a valid assumption may be to suppose that the data x1; : : : ; x35 arerealisations from a sequence X1; : : : ; X35 of iid rvs all with common extremevalue distribution H say. Suppose we want to answer questions like:

6.1 Introduction 28540 50 60 70

100

150

200

250

300

40 50 60 70

100

150

200

250

300

Figure 6.1.2 The river Nidd data 1934{1970 (top) and the corresponding annualmaxima (bottom). The data are measured in CUMECS.{ What is the probability that the maximum ow for the next year willexceed a level x?{ What is the probability that the maximum ow for the next year exceedsall previous levels?{ What is the expected length of time (in years say) before the occurrenceof a speci�c high quantity of ow?Clearly, a crucial step forward in answering these questions would be ourgaining knowledge of the df H . The theory of Chapter 3 gives us relevantparametric models for H ; see the Fisher{Tippett theorem (Theorem 3.2.3)where the extreme value distributions enter. Standard statistical tools suchas maximum likelihood estimation (MLE) are available. �Example 6.1.3 (Insurance claims)Suppose our data consist of �re insurance claims x1; : : : ; xn over a speci�edperiod of time in a well{de�ned portfolio, as for instance presented in Fig-ure 6.1.4. Depending on the type of �re causing the speci�c claims, a conditionof the type \x1; : : : ; xn come from an iid sample X1; : : : ; Xn with df F" mayor may not be justi�ed. Suppose for the sake of argument that the underlying

286 6. Statistical Methods for Extremal Events0 1000 2000 3000 4000

050

100

150

200

0 1 2 3 4 5

0100

300

500

0 50 100 150 200 250

020

40

60

80

Figure 6.1.4 4 580 claims from a �re insurance portfolio. The values are multiplesof 1 000 SFr. The corresponding histogram of the claims � 5 000 SFr (left) and ofthe remaining claims exceeding 5 000 SFr (right). The data are very skewed to theright. The x{axis of the histogram on the rhs reaches up to 250 due to a very largeclaim around 225; see also the top �gure.portfolio is such that the above assumption can be made. Questions we wantto answer (or tasks we want to perform) could be:{ Calculate next year's premium volume needed in order to cover, with suf-�ciently high probability, future losses in this portfolio.{ What is the probable{maximum{loss of this portfolio if the latter is de�nedas a high (for instance the 0:999{) quantile of the df F ?{ Given that we want to write an excess{of{loss cover (see Example 8.7.4)with priority ak (also referred to as attachment point) resulting in a one{in{k{year event, how do we calculate ak? The latter means that we wantto calculate ak so that the probability of exceeding ak equals 1=k.

6.1 Introduction 287Again, as in the previous example, we are faced with a standard statistical�tting problem. The main di�erence is that in this case we do not immediatelyhave a speci�c parametric model (such as the extreme value distributions inExample 6.1.1) in mind. We �rst have to learn about the data:{ Is F light{ or heavy{tailed?{ What are its further shape properties: skewed, at, unimodal,: : :?In the heavy{tailed case �tting by a subexponential distribution (see Chap-ter 1 and Appendix A3.2) might be called for. The method of exceedancesfrom Section 6.5 will be relevant. �Example 6.1.5 (ECOMOR reinsurance)The ECOMOR reinsurance contract stands for \Le Trait�e d'Exc�edent duCout Moyen Relatif" and was introduced by the French actuary Th�epaut[615] as a novel contract aiming to enlarge the reinsurer's exibility in con-structing customised products. A precise mathematical description is givenin Example 8.7.7. Suppose over a period [0; t], the claims x1; : : : ; xn(t) arereceived by the primary insurer. The ECOMOR contract binds the rein-surer to cover (for a speci�c premium) the excesses above the kth largestclaim. This leads us to a model where X1; : : : ; XN(t) are (conditionally) iidwith speci�c model assumptions on the underlying df F of Xi and on thecounting process (N(t)); see Chapter 1. The relevant theory underlying theECOMOR contracts, i.e. the distributional properties of the k largest orderstatistics X1;N(t); : : : ; Xk;N(t) from a randomly indexed ordered sample, wasgiven in Section 4.3. Standard models are hence at our disposal. It is perhapsworthwhile to stress that, though innovative in nature, ECOMOR never wasa success. �Example 6.1.6 (Value{at-Risk)Suppose a �nancial portfolio consists of a number of underlying assets (bonds,stocks, derivatives,: : :), all having individual (though correlated) values at anytime t. Every asset has its speci�c Pro�t{Loss (P&L) distribution, which canbe represented as a probability distribution governing the (random) changesof value. Through the estimation of portfolio covariances, the portfolio man-ager then estimates the overall portfolio P&L distribution. Management andregulators may now be interested in setting \minimal requirements" or, forthe sake of argument, a maximal limit on the potential losses. A possiblequantity is the so{called Value{at-Risk (VaR) measure brie y treated in thediscussion of Figure 4 of the Reader Guidelines. There the VaR is de�ned asthe 5% quantile of the P&L distribution. The following questions are relevant.{ Estimate the VaR for a given portfolio.

288 6. Statistical Methods for Extremal Events

Time

-0.2

-0.1

0.0

0.1

0.2

02.01.73 02.01.77 02.01.81 02.01.85 02.01.89 02.01.93 -0.2 -0.1 0.0 0.1 0.2

0500

1000

1500

2000

Figure 6.1.7 Daily log{returns of BMW share prices for the period January 2,1973 { July 23, 1996 (n = 6 146), together with a histogram of the data.{ Estimate the probability that, given we exceed the VaR, we exceed it bya certain amount. This corresponds to the calculation of the so{called short-fall distribution.The �rst question concerns quantile estimation for an estimated df, in manycases outside the range of our data. The second question obviously concernsthe estimation of the excess df as de�ned in Section 3.4 (modulo a changeof sign: we are talking about losses!). The theory presented in the lattersection advocates the use of the generalised Pareto distribution as a naturalparametric model in this case. �Example 6.1.8 (Fighting the arch{enemy with mathematics)The above heading is the actual title of an interesting paper by de Haan [290]on the famous Dutch dyke project following the disastrous ooding of parts ofthe Dutch provinces of Holland and Zeeland on February 1, 1953, killing over1 800 people. In it, de Haan gives an account of the theoretical and appliedwork done in connection with the problem of how to determine a safe heightfor the sea dykes in the Netherlands. More than with any other event, theresulting work by Dutch mathematicians under van Dantzig gave the statis-tical methodology of extremal events a decisive push. The statistical analysesalso made a considerable contribution to the �nal decision making about thedyke heights. The problem faced was the following: given a small number p(in the range of 10�4 to 10�3), determine the height of the sea dykes suchthat the probability that there is a ood in a given year equals p. Again, weare confronted with a quantile estimation problem. From the data available,it was clear that one needed estimates well outside the range of the data. The

6.1 Introduction 289seawater level in the Netherlands is typically measured in (N.A.P. + x) m(N.A.P. = Normaal Amsterdams Peil, the Dutch reference level correspondingto mean sea level). The 1953 ood was caused by a (N.A.P. + 3.85) m surge,whereas historical accounts estimate a (N.A.P. + 4) m for the 1570 ood, theworst recorded. The van Dantzig report estimated the (1� 10�4){quantileas (N.A.P. + 5.14) m for the annual maximum. That is, the one{in{ten{thousand{year surge height is estimated as (N.A.P. + 5.14) m. We urge allinterested in extreme value statistics to read de Haan [290]. �Many more examples with an increasing degree of complexity could havebeen given including:{ non{stationarity (seasonality, trends),{ sparse data,{ multivariate observations,{ in�nite{dimensional data (for instance continuously monitored processes).The literature cited throughout the book contains a multitude of examples.Besides the work mentioned already by Smith on the river Nidd and de Haan'spaper on the dyke project, we call the following papers to the reader's atten-tion:{ Rootz�en and Tajvidi [547] where a careful analysis of Swedish wind stormlosses (i.e. insurance data) is given. Besides the use of standard methodol-ogy (�tting of generalised extreme value and Pareto distributions), prob-lems concerning trend analysis enter, together with a covariate analysislooking at the potential in uence from numerous environmental factors.{ Resnick [527] considers heavy tail modelling in a huge data{set (n � 50 000)in the �eld of the teletra�c industry. Besides giving a very readable andthought provoking review of some of the classical methods, extremes intime series models are speci�cally addressed. See also Sections 5.5 and 8.4.{ Smith [588] applies extreme value theory to the study of ozone in Houston,Texas. A key question concerns the detection of a possible trend in ground{level ozone. Such a study is particularly interesting as air{quality standardsare often formulated in terms of the highest level of permitted emissions.The above papers are not only written by masters at their trade (de Haan,Resnick, Rootz�en, Smith), they also cover a variety of �elds (hydrology, in-surance, electrical engineering, environmental research).Within the context of �nance, numerous papers analysing speci�c dataare being published; see Figure 6.1.7 for a typical example of �nancial returndata. A paper which uses up{to{date statistical methodology on extremes isfor instance Danielson and de Vries [151] where models for high frequencyforeign exchange recordings are treated. See also M�uller et al. [465] for more

290 6. Statistical Methods for Extremal Eventsbackground on the data. Interesting case studies are also to be found inBarnett and Turkman [52], Falk, H�usler and Reiss [222], and Longin [424].The latter paper analyses US stock market data.We hope that the examples above have singled out a series of problems. Wenow want to present their statistical solutions. There is no way in which wecan achieve completeness concerning the statistical models now understood:the de�nitive book on this still awaits the writing. A formidable task indeed!The following sections should o�er the reader both hands{on experienceof some basic methods, as well as a survival kit to get him/her safely throughthe \jungle of papers on extreme value statistics". The outcome should bea better understanding of those basic methods, together with a clear(er)overview of where the �eld is heading to. This chapter should also be a guideon where to look for further help on speci�c problems at hand.Of the more modern textbooks containing a fair amount of statisticaltechniques we would like to single out Falk et al. [222] and Reiss [521]. Thelatter book also contains a large amount of historical notes. It always paysto go back to the early papers and books written by the old masters, and theannotated references in Reiss [521] could be your guide. However, whateveryou decide to read, don't miss out on Gumbel [286]!6.2 Exploratory Data Analysis for ExtremesOne of the reasons why Gumbel's book [286] is such a feast to read is itsinclusion of roughly 100 graphs and 50 tables. The author very much stressesthe importance of looking at data before engaging in a detailed statisticalanalysis. In our age of nearly unlimited computing power this graphical dataexploration is becoming increasingly important. The reader interested in somerecent developments in this area may for instance consult Chambers et al.[106], Cleveland [119] or Tufts [621]. In the sections to follow we discuss someof the more useful graphical methods.6.2.1 Probability and Quantile PlotsGiven a set of data to be analysed, one usually starts with a histogram, oneor more box{plots, a plot of the empirical df, in the multi{dimensional casea scatterplot or a so{called draughtsman's display which combines all 2� 2scatterplots in a graphical matrix form. Keeping to the main theme of thebook, we restrict ourselves however to the one{dimensional case and startwith a discussion of the problem:Find a df F which is a good model for the iid data X;X1; : : : ; Xn.

6.2 Exploratory Data Analysis for Extremes 291..............................................................................................

..........................................................................................................................................................................

....................................................................................................................

.........................................................

.......................

.. .. ...... . ..

.

(a)

0 1 2 3 4 5

02

46

.......................................................................................................................................................................................................................................................................................................................................................................................

.......................................................................................................................................................................

...........................................................................................................................................

(b)

0.0 0.2 0.4 0.6 0.8 1.0

02

46

...............................................................................................................................................................................................................................................................................

.......................

.... ..... . ..

..

(c)

0 20 40 60 80 100

02

46

. ...... ...............

....................................................................................

.............................................................................................................................................................................................................................................................................................................................................................

............................. .... . . .

(d)

-5 0 5

-3-2

-10

12

3

Figure 6.2.1 QQ{plot of exponentially (a), uniformly (b), lognormally (c) dis-tributed simulated data versus the exponential distribution. In (d) a QQ{plot oft4{distributed data versus the standard normal distribution is given......................

............................................................................................................................................

.....................................................................................................................................................

..............................................................................................................................

...................................................................

.................. ......

.. . . ..

(a)

-2 0 2 4 6 8

-20

24

6

..................................................................................................................................................................................................................................................................................................................................................

.......................................

.... ....... . . . .

.

(b)

0 5 10 15 20 25 30

-20

24

6

. .. .......................................................................................................................................

..................................................................................................................................................................................................

......................................................................................................................................................

....................................................................................

.........................................

...

(c)

-2 -1 0 1 2 3

-20

24

6

.........................

..................................

......................................

.........................................................................................................................................................................................................................

......................

..... . .

.

(d)

0 20 40 60 80

-20

24

6

Figure 6.2.2 QQ{plots: (a) Gumbel distributed simulated data versus Gumbel dis-tribution. GEV distributed data with parameters (b): � = 0:3, (c): � = �0:3, (d):� = 0:7, versus Gumbel. The values � = 0:7 and � = 0:3 are chosen so that � = 1=�either belongs to the range (1; 2) (typically encountered for insurance data) or (3; 4)(corresponding to many examples in �nance).

292 6. Statistical Methods for Extremal EventsDe�ne the ordered sample Xn;n � � � � � X1;n. The theoretical basis that un-derlies probability plots is the quantile transformation of Lemma 4.1.9, whichimplies that for F continuous, the rvs Ui = F (Xi), for i = 1; : : : ; n, are iiduniform on (0; 1). Moreover,(F (Xk;n))k=1;:::;n d= (Uk;n)k=1;:::;n :From this it follows thatEF (Xk;n) = n� k + 1n+ 1 ; k = 1; : : : ; n :Also note that Fn(Xk;n) = (n� k + 1)=n, where Fn stands for the empiricaldf of F . The graph��F (Xk;n) ; n� k + 1n+ 1 � : k = 1; : : : ; n�is called a probability plot (PP{plot). More common however is to plot thegraph ��Xk;n ; F �n� k + 1n+ 1 �� : k = 1; : : : ; n� (6.1)standardly referred to as the quantile plot (QQ{plot). In both cases, the ap-proximate linearity of the plot is justi�ed by the Glivenko{Cantelli theorem;see Example 2.1.4. The theory of weak convergence of empirical processesforms the basis for the construction of con�dence bands around the graphs,leading to hypothesis testing. We refrain from entering into details here; seefor instance Shorack and Wellner [573], p. 247.There exist various variants of (6.1) of the typef(Xk;n ; F (pk;n)) : k = 1; : : : ; ng ; (6.2)where pk;n is a certain plotting position. Typical choices arepk;n = n� k + �kn+ k ;with (�k; k) appropriately chosen allowing for some continuity correction.We shall mostly take (6.1) or (6.2) withpk;n = n� k + 0:5n :For a Gumbel distribution�(x) = exp��e�x ; x 2 R ;

6.2 Exploratory Data Analysis for Extremes 293the method is easily applied and leads to so{called double logarithmic plot-ting. Assume for instance that we want to test whether the sampleX1; : : : ; Xncomes from �. To this end, we take the ordered sample and plot Xk;n (moreprecisely the kth largest observation xk;n) against � (pk;n) = � ln(� ln pk;n),where pk;n is a plotting position as discussed above. If the Gumbel distribu-tion provides a good �t to our data, then this QQ{plot should look roughlylinear; see Figure 6.2.2(a).Mostly, however, the data would be tested against a location{scale familyF ((��)= ) where in some cases (for instance when F = � standard normal)� and are the mean and standard deviation ofX . A QQ{plot using F wouldstill be linear, however with slope and intercept �. Using linear regressionfor instance, a quick estimate of both parameters can be deduced.In summary, the main merits of QQ{plots stem from the following prop-erties, taken from Chambers [105]; see also Barnett [50], Castillo [101], Sec-tion 6.2.1, David [153], Section 7.8, and Gnanadesikan [261].(a) Comparison of distributions. If the data were generated from a randomsample of the reference distribution, the plot should look roughly linear.This remains true if the data come from a linear transformation of thedistribution.(b) Outliers. If one or a few of the data values are contaminated by gross er-ror or for any reason are markedly di�erent in value from the remainingvalues, the latter being more or less distributed like the reference distri-bution, the outlying points may be easily identi�ed on the plot.(c) Location and scale. Because a change of one of the distributions by a lin-ear transformation simply transforms the plot by the same transforma-tion, one may estimate graphically (through the intercept and slope) lo-cation and scale parameters for a sample of data, on the assumption thatthe data come from the reference distribution.(d) Shape. Some di�erence in distributional shape may be deduced from theplot. For example if the reference distribution has heavier tails (tends tohave more large values) the plot will curve down at the left and/or up atthe right.For an illustration of (a) and (d) see Figure 6.2.1. For an illustration of (d)in a two{sided case see Figure 6.2.1(d).So far we have considered only location{scale families. In the case of thegeneralised extreme value distribution (GEV), see De�nition 3.4.1,

294 6. Statistical Methods for Extremal EventsH�;�; (x)= exp(��1 + � x� � ��1=�) ; 1 + �(x� �)= > 0 ; (6.3)= 8>><>>: ��(1 + (x � �)=(� )) for x > �� , � = 1=� > 0 ;�(�(1� (x� �)=(� ))) for x < �+ �, � = �1=� < 0 ;�((x � �)= ) for x 2 R, � = 0 ;besides the location and scale parameters � 2 R , > 0, a shape parame-ter � 2 R enters, making immediate interpretation of a QQ{plot more deli-cate. Recall that ��, � and � denote the standard extreme value distribu-tions Fr�echet, Weibull and Gumbel; see De�nition 3.2.6. A preferred methodfor testing graphically whether our sample comes fromH�;�; would be to �rstobtain an estimate b� for � either by guessing or by one of the methods givenin Section 6.4.2, and consequently work out a QQ{plot using H�;0;1 whereagain � and may be estimated either by visual inspection or through linearregression. These preliminary estimates are often used as starting values innumerical iteration procedures.6.2.2 The Mean Excess FunctionAnother useful graphical tool, in particular for discrimination in the tails, isthe mean excess function. Note that we have already introduced this func-tion in the context of the GEV; see De�nition 3.4.6. We recall it here forconvenience.De�nition 6.2.3 (Mean excess function)Let X be a rv with right endpoint xF ; thene(u) = E(X � u j X > u) ; 0 � u < xF ; (6.4)is called the mean excess function of X. �The quantity e(u) is often referred to as the mean excess over the thresholdvalue u. This interpretation will be crucial in Section 6.5. In an insurancecontext, e(u) can be interpreted as the expected claim size in the unlimitedlayer, over priority u. Here e(u) is also called the mean excess loss function.In a reliability or medical context, e(u) is referred to as the mean residual lifefunction. In a �nancial risk management context, switching from the right

6.2 Exploratory Data Analysis for Extremes 295

u

e(u)

0 2 4 6 8 10

0.0

0.5

1.0

1.5

2.0

2.5

3.0

0 2 4 6 8 10

0.0

0.5

1.0

1.5

2.0

2.5

3.0

0 2 4 6 8 10

0.0

0.5

1.0

1.5

2.0

2.5

3.0

0 2 4 6 8 10

0.0

0.5

1.0

1.5

2.0

2.5

3.0

0 2 4 6 8 10

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Exponential

Gamma: alpha > 1

Weibull:tau > 1

Weibull: tau < 1 or lognormal

Pareto

Figure 6.2.4 Graphs of the mean excess function e(u) of some standard distribu-tions; see also Table 3.4.7. Note that heavy{tailed dfs typically have e(u) tending toin�nity.tail to the left tail, e(u) is referred to as the shortfall. A summary of the mostimportant mean excess functions is to be found in Table 3.4.7.In Example 3.4.8 we already noted that any continuous df F is uniquelydetermined by its mean excess function; see (3.48) and (3.49) for the relevantformulae linking F to e and vice versa.Example 6.2.5 (Some elementary properties of the mean excess function)If X is Exp(�) distributed, then e(u) = ��1 for all u > 0. Now assume thatX is a rv with support unbounded to the right and df F . If for all y 2 R,limx!1 F (x� y)F (x) = e y ; (6.5)for some 2 [0;1], then limu!1 e(u) = �1. For the proof use e(u) =R1u F (y) dy=F (u) and apply Karamata's theorem (Theorem A3.6) to F � ln.Notice that for F 2 S (the class of subexponential distributions; see De�ni-tion 1.3.3), (6.5) is satis�ed with = 0 so that in this heavy{tailed case, e(u)tends to 1 as u!1. On the other hand, superexponential functions of thetype F (x) � expf�xag, a > 1, satisfy the limit relation (6.5) with =1 sothat the mean excess function tends to 0. The intermediate cases are covered

296 6. Statistical Methods for Extremal Eventsby the so{called S( ){classes; see De�nition 1.4.9, Embrechts and Goldie[202] and the references therein. �Example 6.2.6 Recall that for X generalised Pareto the mean excess func-tion is linear; see Theorem 3.4.13(e). The mean excess function of a heavy{tailed df, for large values of the argument, typically appears to be betweena constant function (for Exp(�)) and a straight line with positive slope (forthe Pareto case).It was noticed by Benktander [60] that interesting claim size distributionscorrespond to mean excess functions of the forme(u) = ( u1��=� ; � > 0 ; 0 � � < 1 ;u=(�+ 2� lnu) ; � ; � > 0 :Note that e(u) increases but the rate of increase decreases with u. As a con-sequence, Benktander [60] introduced the two families of distributions which,within the insurance world, now bear his name. The Benktander{type{I and{type{II classes are de�ned in Table 1.2.6. �A graphical test for tail behaviour can now be based on the empirical meanexcess function en(u). Suppose that X1; : : : ; Xn are iid with df F and let Fndenote the empirical df and �n(u) = fi : i = 1; : : : ; n;Xi > ug, thenen(u) = 1Fn(u) Z 1u Fn(y) dy = 1card�n(u) Xi2�n(u) (Xi � u) ; u � 0 ; (6.6)with the convention that 0=0 = 0. A mean excess plot (ME{plot) then consistsof the graph f(Xk;n; en(Xk;n)) : k = 1; : : : ; ng :The statistical properties of en(u) can again be derived by using the relevantempirical process theory as explained in Shorack and Wellner [573], p. 778.For our purposes, the ME{plot is used only as a graphical method, mainlyfor distinguishing between light{ and heavy{tailed models; see Figure 6.2.7for some simulated examples. Indeed caution is called for when interpretingsuch plots. Due to the sparseness of the data available for calculating en(u)for large u{values, the resulting plots are very sensitive to changes in thedata towards the end of the range; see for instance Figure 6.2.8. For thisreason, more robust versions like median excess plots and related procedureshave been suggested; see for instance Beirlant, Teugels and Vynckier [57] orRootz�en and Tajvidi [547]. For a critical assessment concerning the use ofmean excess functions in insurance see Rytgaard [556].


u

e(u)

0 1 2 3 4 5 6

0.81.0

1.21.4

1.61.8

2.0

u

e(u)

0 20 40 60 80

050

100

150

u

e(u)

0 10 20 30 40 50 60

510

1520

2530

Figure 6.2.7 The empirical mean excess function en(u) of simulated data (n =1 000) compared with the corresponding theoretical mean excess function e(u)(dashed line): standard exponential (top), lognormal (middle) with lnX N(0; 4),Pareto (bottom) with tail index 1:7.


0 5 10 15 20 25 30

010

2030

4050

60

Figure 6.2.8 The mean excess function of the Pareto distribution F (x) = x�1:7,x � 1, together with 20 empirical mean excess functions en(u) each based on simu-lated data (n = 1000) from the above distribution. Note the very unstable behaviour,especially towards the higher values of u. This is typical and makes the precise in-terpretation of en(u) di�cult; see also Figure 6.2.7.Example 6.2.9 (Exploratory data analysis for some examples from insur-ance and �nance)In Figures 6.2.10{6.2.12 we have graphically summarised some properties ofthree real data{sets. Two come from insurance, one from �nance. The dataunderlying Figure 6.2.11 correspond to Danish �re insurance claims in mil-lions of Danish Kroner (1985 prices). The data were communicated to us byMette Rytgaard and correspond to the period 1980{1993, inclusively. Thereis a total of n = 2493 observations.The second insurance data, presented in Figure 6.2.12, correspond to a port-folio of industrial �re data (n = 8043) reported over a two year period. Thisdata{set is de�nitely considered by the portfolio manager as \dangerous",i.e. large claim considerations do enter substantially in the �nal premiumcalculation.


Time

0.0

50.1

00.1

50.2

00.2

5

02.01.73 02.01.77 02.01.81 02.01.85 02.01.89 02.01.93

Time0.0

50.1

00.1

50.2

00.2

523.01.73 23.01.77 23.01.81 23.01.85 23.01.89 23.01.93

-4.0 -3.5 -3.0 -2.5 -2.0 -1.5

010

20

30

40

50

-4.0 -3.5 -3.0 -2.5 -2.0 -1.5

010

20

30

40

50

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

•••••••••••••••••••••••••••••••••••••••••••••••••

•••••••••••••••••

••••••••• •••••

••• •••••

•• •• ••••••

•••

•••• •

••

••

• •• •

•

•

•

0.02 0.04 0.06 0.08

0.0

10.0

20.0

30.0

40.0

50.0

6

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

•••••••••••••••••••••••••••••••••••

•••••••••••••••••

•••••••••••••

••• •••••••• • • •

••• ••

•

••••••

••

•

•

•

•

•

•

0.02 0.04 0.06 0.08 0.10

0.0

10.0

20.0

30.0

40.0

50.0

6

Figure 6.2.10 Exploratory data analysis of BMW share prices. Top: the 500largest values from the upper and lower tails. Middle: the corresponding log{histograms. Bottom: the ME{plot. See Example 6.2.9 for some comments.


Time

050

100

150

200

250

030180 030182 030184 030186 030188 030190 0 2 4

050

100

150

200

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••

•••••••••••••••••••••

• ••• • •••••• ••

• ••• • •

•• • • •

•

•

•

•

•

u

10 20 30 40 50 60

20

40

60

80

100

120

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• • • ••

•••

••

•

•

0 50 100 150 200 250

02

46

Figure 6.2.11 Exploratory data analysis of Danish insurance claims caused by�re: the data (top left), the histogram of the log{transformed data (top right), theME{plot (bottom left) and a QQ{plot against standard exponential quantiles (bottomright). See Example 6.2.9 for some comments.


Time

0 2000 4000 6000 8000

02000

4000

6000

8000

10000

12000

14000

-5 0 5 10

0100

200

300

400

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

•••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••

•••••••••••••• ••

••••••• ••

••••••••

••

••

••

•

•

•

0 500 1000 1500 2000 2500 3000

01000

2000

3000

4000

5000

6000

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• • •

•••

••

•

•

0 2000 4000 6000 8000 10000 12000 14000

02

46

Figure 6.2.12 Exploratory data analysis of insurance claims caused by industrial�re: the data (top left), the histogram of the log{transformed data (top right), theME{plot (bottom left) and a QQ{plot against standard exponential quantiles (bottomright). See Example 6.2.9 for some comments.

302 6. Statistical Methods for Extremal EventsData Danish Industrialn 2 493 8 043min 0.3134 0.0031st quartile 1.157 0.587median 1.634 1.526mean 3.063 14.653rd quartile 2.645 4.488max 263.3 13 520bx0:99 24.61378 184.0009Table 6.2.13 Basic statistics for the Danish and the industrial �re data;bx0:99 stands for the empirical 99%-quantile.A �rst glance at the �gures and Table 6.2.13 for both data{sets immedi-ately reveals heavy{tailedness and skewedness to the right. The correspond-ing mean excess functions are close to a straight line which indicates that theunderlying distributions may be modelled by Pareto{like dfs. The QQ{plotsagainst the standard exponential quantiles also clearly show tails much heav-ier than exponential ones.Whereas the insurance data may be supposed to represent iid observations,this is de�nitely not the case for the BMW daily log{return data underlyingFigure 6.2.10. For the full data{set see Figure 6.1.7. The period covered isJanuary 23, 1973 { July 12, 1996, resulting in n = 6 146 observations on thelog{returns. Nevertheless, we may certainly assume stationarity of the un-derlying times series so that many limit results (such as the SLLN) remainvalid under general conditions. This would allow us to interpret the graphsof Figure 6.2.10 in a way similar to the iid case, i.e. we will assume that theempirical plots (histogram, empirical mean excess function, QQ{plot) areclose to their theoretical counterparts. Note that we contrast these tools forthe positive daily log{returns and the absolute values of the negative ones.The log{histograms again show skewedness to the right and heavy-tailedness.It is interesting to observe that the left and right tail of the distribution ofthe log{returns are di�erent. Indeed, both the histograms and the ME-plots(mind the di�erent slopes) indicate that the left tail of the distribution isheavier than the right one.In Figure 6.2.10 we have singled out the 500 largest positive (left) and nega-tive (right) log{returns over the above period. In Table 6.2.14 we have sum-marised some basic statistics for the three resulting data{sets: BMW|all,BMW{upper and BMW{lower. The nomenclature should be obvious.We would like to stress that it is our aim to �t tail{probabilities (i.e. proba-bilities of extreme returns). Hence it is natural for such a �tting to disregard

6.2 Exploratory Data Analysis for Extremes 303Data BMW{all BMW{upper BMW{lowern 6 145 500 500min -0.26880 0.01818 0.017191st quartile -0.006656 0.020800 0.019490median 0.00000 0.02561 0.02333mean 0.0003407 0.0300500 0.02845003rd quartile 0.007126 0.032990 0.031280max 0.2591 0.2591 0.2688Table 6.2.14 Basic statistics for the BMW data.the \small" returns. The choice of 500 at this point is rather arbitrary; we willcome back to this issue and indeed a more detailed analysis in Section 6.5.2.�6.2.3 Gumbel's Method of ExceedancesThere is a multitude of fairly easy analytic results concerning extremes whichyield useful preliminary information on the data. The �rst method, Gumbel'smethod of exceedances, concerns the question:How many values among future observations exceed past records?Let Xn;n < � � � < X1;n as usual be the order statistics of a sample X1; : : : ; Xnembedded in an in�nite iid sequence (Xi) with continuous df F . Take the kthupper order statisticXk;n as a (random) threshold value and denote by Snr (k),r � 1, the number of exceedances of Xk;n among the next r observationsXn+1; : : : ; Xn+r, i.e. Snr (k) = rXi=1 IfXn+i>Xk;ng :For ease of notation, we sometimes write S for Snr (k) below.Lemma 6.2.15 (Order statistics and the hypergeometric df)The rv S de�ned above has a hypergeometric distribution, i.e.P (S = j) = �r + n� k � jn� k ��j + k � 1k � 1 ��r + nn � ; j = 0; 1; : : : ; r : (6.7)Proof. Conditioning yieldsP (S = j) = Z 10 P (S = j j Xk;n = u) dFk;n(u) ;

304 6. Statistical Methods for Extremal Eventswhere Fk;n denotes the df of Xk;n. Now use the fact that (X1; : : : ; Xn) and(Xn+1; : : : ; Xn+r) are independent, thatPri=1 IfXi>ug has a binomial distri-bution with parameters r and F (u), and, from Proposition 4.1.2(b), thatdFk;n(u) = n!(k � 1)!(n� k)! Fn�k(u)F k�1(u) dF (u)to obtain (6.7). �Remark. It readily follows from the de�nition of S and the argument given inthe above proof that ES = rk=(n+1) for the mean number of exceedances ofthe random threshold Xk;n. For a detailed discussion on the hypergeometricdistribution see for instance Johnson and Kotz [357]. �Example 6.2.16 Suppose n = 100, r = 12. We want to calculate the proba-bilities pk = P (S10012 (k) = 0) that there are no exceedances of the level Xk;100,k � 1, in the next twelve observations. For j = 0, formula (6.7) reduces toP (Snr (k) = 0) = n(n� 1) � � � (n� k + 1)(r + n)(r + n� 1) � � � (r + n� k + 1) :In tabulated form we obtain for n = 100 and r = 12,k 1 2 3 4 5pk 0.893 0.796 0.709 0.631 0.561So if we have, say, 100 monthly data points and set out to design a certainstandard equal to the third largest observation, there is about a 70% chancethat this level will not be exceeded during the next year. �p k = 1 k = 2 k = 3 k = 4 k = 5j = 0 0.7778 0.6010 0.4612 0.3514 0.2657j = 1 0.1768 0.2795 0.3295 0.3428 0.3321j = 2 0.0370 0.0899 0.1446 0.1929 0.2299j = 3 0.0070 0.0234 0.0482 0.0791 0.1130j = 4 0.0012 0.0051 0.0130 0.0255 0.0427j = 5 0.0002 0.0009 0.0029 0.0066 0.0128j = 6 0.0000 0.0001 0.0005 0.0014 0.0031j = 7 0.0000 0.0000 0.0001 0.0002 0.0006j = 8 0.0000 0.0000 0.0000 0.0000 0.0001j = 9 0.0000 0.0000 0.0000 0.0000 0.0000Table 6.2.17 Exceedance probabilities of the river Nidd data. For given k (orderstatistic) and j (number of exceedances), p = P (S3510 (k) = j) as calculated in (6:7),is given; see Example 6.2.18.

6.2 Exploratory Data Analysis for Extremes 305Example 6.2.18 (River Nidd data, continuation)For the river Nidd annual data from Example 6.1.1 we have that n = 35. Theexceedance probabilities (6.7) for the next r = 10 years are given in Table6.2.17. For example, the probability of not exceeding during the next 10 years,the largest annual ow observed so far equals P (S3510(1) = 0) = 0:7778: Theprobability of exceeding at least once, during the next 10 years, the thirdhighest level observed so far equals 1�P (S3510(3) = 0) = 1� 0:4612 = 0:5388.�6.2.4 The Return PeriodIn this section we are interested in answering the question:What is the mean waiting time between speci�c extremal events?This question is usually made precise in the following way. Let (Xi) be a se-quence of iid rvs with continuous df F and u a given threshold. We considerthe sequence (IfXi>ug) of iid Bernoulli rvs with success probability p = F (u).Consequently, the time of the �rst successL(u) = min fi � 1 : Xi > ug ;i.e. the time of the �rst exceedance of the threshold u, is a geometric rv withdistribution P (L(u) = k) = (1� p)k�1p ; k = 1; 2; : : : :Notice that the iid rvsL1(u) = L(u) ; Ln+1(u) = minfi > Ln(u) : Xi > ug ; n � 1 ;describe the time periods between two consecutive exceedances of u by (Xn).The return period of the events fXi > ug is then de�ned as EL(u) = p�1 =(F (u))�1, which increases to 1 as u!1. For ease of notation we take dfswith unbounded support above. All relevant questions concerning the returnperiod can now be answered straightforwardly through the correspondingproperties of the geometric distribution. Below we give some examples.De�nerk = P (L(u) � k) = p kXi=1(1� p)i�1 = 1� (1� p)k ; k 2 N :Hence rk is the probability that there will be at least one exceedance of ubefore time k (or within k observations). This gives a 1{1 relationship betweenrk and the return period p�1.

306 6. Statistical Methods for Extremal EventsOne is often interested in the probability that there will be an exceedanceof u before the return period. Hence this probability becomesP (L(u) � EL(u)) = P (L(u) � [1=p]) = 1� (1� p)[1=p] ;where [x] denotes the integer part of x. For high thresholds u, i.e. for u " 1and consequently p # 0, we obtainlimu"1P (L(u) � EL(u)) = limp#0 �1� (1� p)[1=p] �= 1� e�1 = 0:63212 :This shows that for high thresholds the mean of L(u) (the return period) islarger than its median.Example 6.2.19 (Return period, t{year event)Within an insurance context, a structure is to be insured on the basis that itwill last at least 50 years with no more than 10% risk of failure. What doesthis information imply for the return period? Using the language above, theengineering requirement translates intoP (L(u) � 50) � 0:1 :Here we tacitly assumed that a structure failure for each year i can be mod-elled through the event fXi > ug, where Xi is a structure{dependent criticalcomponent, say. We assume the iid property of the Xi. The above condition,solved for P (L(u) � 50) = 1� (1� p)50 = 0:1, now immediately implies thatp = 0:002105, i.e. EL(u) = 475. In insurance language one speaks in this caseabout a 475{year event.The important next question concerns the implication of a t{year event re-quirement on the underlying threshold value. By de�nition this means thatfor the corresponding threshold ut,t = EL (ut) = 1F (ut) ;hence ut = F �1� t�1� :In the present example, u475 = F (0:9978). This leads us once more to thecrucial problem of high quantile estimation. �Example 6.2.20 (Continuation of Example 6.1.8)In the case of the Dutch dyke example, recall that, assuming stationarityamong the annual maxima of sea levels, the last comparable ood before1953 took place in November 1570, so that in the above language one would

6.2 Exploratory Data Analysis for Extremes 307speak about a 382{year event. The 1953 level hence corresponds roughly tothe (1�1=382){quantile of the distribution of the annual maximum. The sub-sequent government requirements demanded dykes to be built correspondingto a 1 000{to{10 000{year event! �The above examples clearly stress the need for a solution to the followingproblems:{ Find reliable estimators for high quantiles from iid data.{ Because of increasing safety requirements, implying t{year events withincreasingly higher t, the iid assumption may not always be tenable.Moreover, most data in practice will exhibit dependence and/or non{stationarity. Find therefore quantile estimation procedures for non{iiddata.6.2.5 Records as an Exploratory ToolSuppose that the rvs Xi are iid with df F . Recall from Section 5.4 the def-initions of records and record times: a record Xn occurs if Xn > Mn�1 =max(X1; : : : ; Xn�1). By de�nition we take X1 as a record. In Section 5.4 weused point process language in order to describe records and record times Ln.The latter are the random times at which the process (Mn) jumps. De�nethe record counting process asN1 = 1 ; Nn = 1 + nXk=2 IfXk>Mk�1g ; n � 2 :The following result (on the mean ENn) may be surprising.Lemma 6.2.21 (Moments of Nn)Suppose (Xi) are iid with continuous df F and (Nn) de�ned as above. ThenENn = nXk=1 1k and var(Nn) = nXk=1� 1k � 1k2� :Proof. From the de�nition of Nn we obtainENn = 1 + nXk=2P (Xk > Mk�1)= 1 + nXk=2 Z +1�1 P (Xk > u) dP (Mk�1 � u) :Now use P (Mk�1 � u) = F k�1(u) which immediately yields the result forENn. The same argument works for var(Nn). �

308 6. Statistical Methods for Extremal EventsNotice that ENn and var(Nn) are both of the order lnn as n!1. Moreprecisely, ENn � lnn ! , where = 0:5772 : : : denotes Euler's constant.As a consequence:the number of records of iid data grows very slowly!Before reading further, guess the answer to the next question:How many records do we expect in 100; 1 000or 10 000 iid observations?Table 6.2.22 contains the somewhat surprising answer; see also Figures 5.4.2and 5.4.12. n = 10k, k = ENn lnn lnn+ Dn1 2:9 2:3 2:9 1:22 5:2 4:6 5:2 1:93 7:5 7:0 7:5 2:44 9:8 9:2 9:8 2:85 12:1 11:5 12:1 3:26 14:4 13:8 14:4 3:67 16:7 16:1 16:7 3:98 19:0 18:4 19:0 4:29 21:3 20:7 21:3 4:4Table 6.2.22 Expected number of records ENn in an iid sequence (Xn), to-gether with the asymptotic approximations lnn, lnn+ , and standard devi-ation Dn =pvar(Nn), based on Lemma 6.2.21.0 10 20 30 40 50 60

150

200

250

300

350

0 10 20 30 40 50 60

12

34

56

Figure 6.2.23 Vancouver sunshine data and the corresponding numbers of records.Example 6.2.24 (Records in real data)In Figure 6.2.23 the total amount of sunshine hours in Vancouver duringthe month of July from 1909 until 1973 is given. The data are taken from

6.2 Exploratory Data Analysis for Extremes 309Glick [260]. There are 6 records in these n = 64 observations, namely fori = 1; 2; 6; 10; 23; 53. Clearly one would need a much larger n in order to testcon�dently the iid hypothesis for the underlying dataX1; : : : ; X64 on the basisof the record values. If the data were iid, then we would obtain EN64 = 4:74.The observed value of 6 agrees rather well. On the basis of these observationswe have no reason to doubt the iid hypothesis. The picture however changesdramatically in Figure 3 of the Reader Guidelines, based on catastrophicinsurance claims for the period 1970{1995. It is immediately clear that thenumber of records does not exhibit a logarithmic growth. �6.2.6 The Ratio of Maximum and SumIn this section we consider a further simple tool for detecting heavy tails of adistribution and for giving a rough estimate of the order of its �nite moments.Suppose that the rvs X;X1; X2; : : : are iid and de�ne for any positive p thequantitiesSn(p) = jX1jp + � � �+ jXnjp ; Mn(p) = max(jX1jp; : : : ; jXnjp) ; n � 1 :We also write Mn = Mn(1) and Sn = Sn(1) slightly abusing our usualnotation. One way to study the underlying distribution is to look at thedistributional or a.s. behaviour of functionals f(Sn(p);Mn(p)). For instance,in Section 8.2.4 we gained some information about the limit behaviour of theratio Mn=Sn. In particular, we know the following facts (Y1, Y2 and Y2(p)are appropriate non{degenerate rvs):MnSn a:s:�! 0 , EjX j <1 ;MnSn P�! 0 , EjX jIfjXj�xg 2 R0 ;Sn � nEjX jMn d�! Y1 , P (jX j > x) 2 R�� for some � 2 (1; 2) ;MnSn d�! Y2 , P (jX j > x) 2 R�� for some � 2 (0; 1) ;MnSn P�! 1 , P (jX j > x) 2 R0 :Writing Rn(p) = Mn(p)Sn(p) ; n � 1 ; p > 0 ; (6.8)we may conclude from the latter relations that the following equivalenceshold:

310 6. Statistical Methods for Extremal EventsRn(p) a:s:�! 0 , EjX jp <1 ;Rn(p) P�! 0 , EjX jpIfjXj�xg 2 R0 ;Rn(p) d�! Y2(p) , P (jX j > x) 2 R��p for some � 2 (0; 1) ;Rn(p) P�! 1 , P (jX j > x) 2 R0 :Now it is immediate how one can use these limit results to obtain somepreliminary information about P (jX j > x): plot Rn(p) against n for a varietyof p{values. Then Rn(p) should be small for large n provided that EjX jp <1.On the other hand, if there are signi�cant deviations of Rn(p) from zero forlarge n, this is an indication for EjX jp being in�nite; see Figures 6.2.25{6.2.29for some examples of simulated and real data.Clearly, what has been said about the absolute value of the Xi can bemodi�ed in the natural way to get information about the right or left dis-tribution tail: replace everywhere jXijp by the pth power of the positive ornegative part of the Xi. Moreover, the ratio of maximum and sum can bereplaced by more complicated functionals of the upper order statistics of asample; see for instance the de�nition of the empirical large claim index inSection 8.2.4. This allows to discriminate the distributions in a more subtleway.Notes and CommentsThe statistical properties of QQ{plots, with special emphasis on the heavy{tailed case, are studied for instance in Kratz and Resnick [403]. The impor-tance of the mean excess function (or plot) as a diagnostic tool for insurancedata is nicely demonstrated in Hogg and Klugman [326]; see also Beirlantet al. [57] and the references therein. Return periods and t{year events havea long history in hydrology; see for instance Castillo [101] and Rosbjerg [548].For relevant statistical techniques coming more from a reliability context, seeCrowder et al. [140]; methods more related to medical statistics are to befound in Andersen et al. [10].Since the fundamental paper by Foster and Stuart [240] numerous papershave been published on records; see for instance Pfeifer [492], Kapitel 4,Resnick [525], Chapter 4, and the references cited therein; see also Goldie andResnick [273], Nagaraja [470] and Nevsorov [473]. We �nd Glick [260] a veryentertaining introduction. Smith [587] gives more information on statisticalinference for records, especially in the non{iid case. In Section 5.4 we havediscussed in more detail the relevant limit theorems for records and theirconnections with point process theory and extremal processes. Records in the

6.2 Exploratory Data Analysis for Extremes 311n

0 500 1000 1500 2000 2500

0.0

0.4

0.8 p=1

n0 500 1000 1500 2000 2500

0.0

0.4

0.8

p=1.5

n0 500 1000 1500 2000 2500

0.0

0.4

0.8

p=2

n0 500 1000 1500 2000 2500

0.0

0.4

0.8

p=2.5

n0 500 1000 1500 2000 2500

0.0

0.4

0.8

p=3

Figure 6.2.25 The ratio Rn(p) for di�erent p. The Xi are 2 500 iid standard ex-ponential data.

312 6. Statistical Methods for Extremal Eventsn

0 500 1000 1500 2000 2500

0.0

0.4

0.8 p=1

n0 500 1000 1500 2000 2500

0.0

0.4

0.8

p=1.5

n0 500 1000 1500 2000 2500

0.0

0.4

0.8

p=2

n0 500 1000 1500 2000 2500

0.0

0.4

0.8

p=2.5

n0 500 1000 1500 2000 2500

0.0

0.4

0.8

p=3

Figure 6.2.26 The ratio Rn(p) for di�erent p. The Xi are 2 500 iid lognormaldata.


0 500 1000 1500 2000 2500

0.0

0.4

0.8 p=1

n0 500 1000 1500 2000 2500

0.0

0.4

0.8 p=1.5

n0 500 1000 1500 2000 2500

0.0

0.4

0.8 p=2

n0 500 1000 1500 2000 2500

0.0

0.4

0.8

p=2.5

n0 500 1000 1500 2000 2500

0.0

0.4

0.8 p=3

Figure 6.2.27 The ratio Rn(p) for di�erent n and p. The Xi are 2 500 iid Paretodata with shape parameter 2.

314 6. Statistical Methods for Extremal Eventsn

0 500 1000 1500

0.0

0.4

0.8 p=1

n0 500 1000 1500

0.0

0.4

0.8 p=2

n0 500 1000 1500

0.0

0.4

0.8

p=4

n0 500 1000 1500

0.0

0.4

0.8

p=6

n0 500 1000 1500

0.0

0.4

0.8

p=8

Figure 6.2.28 The ratio Rn(p) for di�erent n and p. The Xi are 1 864 daily log{returns from the German stock index DAX. The behaviour of Rn(p) indicates thatthese data come from a distribution with in�nite 4th moment.


0 500 1000 1500 2000 2500

0.0

0.4

0.8 p=1

n0 500 1000 1500 2000 2500

0.0

0.4

0.8 p=1.5

n0 500 1000 1500 2000 2500

0.0

0.4

0.8

p=2

n0 500 1000 1500 2000 2500

0.0

0.4

0.8

p=2.5

n0 500 1000 1500 2000 2500

0.0

0.4

0.8

p=3

Figure 6.2.29 The ratio Rn(p) for di�erent n and p. The Xi correspond to theDanish �re insurance data from Figure 6.2.11 (n = 2493). The behaviour of Rn(p)indicates that these data come from a distribution with in�nite 2nd moment. Alsocompare with Figures 6.2.25{6.2.27.

316 6. Statistical Methods for Extremal Eventspresence of a trend have been investigated by several authors, in particularfor sports data. A good place to start is Ballerini and Resnick [41] and thereferences therein. The behaviour of records in an increasing population isfor instance described in Yang [638]. Smith [587] discusses the forecastingproblem of records based on maximum likelihood methodology.The exploratory techniques introduced so far all started from an iid as-sumption on the underlying data. Their interpretation becomes hazardouswhen applied in the non{iid case, as for instance to data exhibiting a trend.Various statistical de{trending techniques exist within the realm of regressiontheory and time series analysis. These may range from �tting of a determin-istic trend to the data, averaging, di�erencing,: : :. By one or more of thesemethods one would hope to �lter out some iid residuals to which the previousmethods again would apply; see for instance Brockwell and Davis [91], Sec-tion 1.4, Feigin and Resnick [228] or Kendall and Stuart [371], Chapter 46.It is perhaps worth stressing at this point that extremes in the de{trendeddata do not necessarily correspond to extremes in the original data.6.3 Parameter Estimation for the Generalised ExtremeValue DistributionRecall from (6.3) the generalised extreme value distribution (GEV )H�;�; (x) = exp(��1 + � x� � ��1=�) ; 1 + � x� � > 0 : (6.9)As usual the case � = 0 corresponds to the Gumbel distributionH0;�; (x) = expn�e�(x��)= o ; x 2 R : (6.10)The parameter � = (�; �; ) 2 R � R � R+ consists of a shape parameter �,location parameter � and scale parameter . For notational convenience, weshall either write H� or H� depending on the case in hand. In Theorem 3.4.5we saw that H� arises as the limit distribution of normalised maxima of iidrvs. Standard statistical methodology from parametric estimation theory isavailable if our data consist of a sampleX1; : : : ; Xn iid from H� : (6.11)We mention here that the assumption of Xi having an exact extreme valuedistribution H� is perhaps not the most realistic one. In the next sectionwe turn to the more tenable assumption that the Xi are approximately H�distributed. The \approximately" will be interpreted as \belonging to themaximum domain of attraction of".

6.3 Parameter Estimation for the Generalised Extreme Value Distribution 317Fitting of Annual MaximaAs already discussed in Example 6.1.1, data of the above type may becomeavailable when theXi can be interpreted as maxima over disjoint time periodsof length s say. In hydrology, which is the cradle of many of the ideas forstatistics of extremal events, this period mostly consists of one year; see forinstance the river Nidd data in Figure 6.1.2. The 1{year period is chosen inorder to compensate for intra{year seasonalities. Therefore the original datamay look like X(1) = �X(1)1 ; : : : ; X(1)s �X(2) = �X(2)1 ; : : : ; X(2)s �... ...X(n) = �X(n)1 ; : : : ; X(n)s �where the vectors (X(i)) are assumed to be iid, but within each vectorX(i) the various components may (and mostly will) be dependent. The timelength s is chosen so that the above conditions are likely to be satis�ed. Thebasic iid sample from H� on which statistical inference is to be performedthen consists of Xi = max(X(i)1 ; : : : ; X(i)s ); i = 1; : : : ; n : (6.12)For historical reasons and since s often corresponds to a 1{year period, sta-tistical inference for H� based on data of the form (6.12) is referred to as�tting of annual maxima.Below we discuss some of the main techniques for estimating � in the exactmodel (6.11).6.3.1 Maximum Likelihood EstimationThe set{up (6.11) corresponds to the standard parametric case of statisti-cal inference and hence in principle can be solved by maximum likelihoodmethodology. Suppose that H� has density h�. Then the likelihood functionbased on the data X = (X1; : : : ; Xn) is given byL(�;X) = nYi=1 h� (Xi) If1+�(Xi��)= >0g :Denote by `(�;X) = lnL(�;X) the log{likelihood function. The maximumlikelihood estimator (MLE ) for � then equals

318 6. Statistical Methods for Extremal Eventsb�n = argmax�2� `(�;X) ;i.e. b�n = b�n (X1; : : : ; Xn) maximises `(�;X) over an appropriate parameterspace �. In the case of H0;�; this gives us`((0; �; );X) = �n ln � nXi=1 exp��Xi � � �� nXi=1 Xi � � :Di�erentiating the latter function with respect to � and yields the likelihoodequations in the Gumbel case:0 = n� nXi=1 exp��Xi � � � ;0 = n+ nXi=1 Xi � � �exp��Xi � � �� 1� :Clearly no explicit solution exists to these equations. The situation for H�when � 6= 0 is even more complicated, so that numerical procedures are calledfor. Jenkinson [351] and Prescott and Walden [505, 506] suggest variants ofthe Newton{Raphson scheme. With the existence of the Fortran algorithmpublished in Hosking [332] and its supplement in Macleod [428], the numer-ical calculation of the MLE b�n for general H� poses no serious problem inprinciple.Notice that we said in principle. Indeed in the so{called regular casesmaximum likelihood estimation o�ers a technique yielding e�cient, consis-tent and asymptotically normal estimators. See for instance Cox and Hinkley[130] and Lehmann [416] for a general discussion on maximum likelihoodestimation. Relevant for applications in extreme value theory, typical non{regular cases may occur whenever the support of the underlying df dependson the unknown parameters. Therefore, although we have reliable numericalprocedures for �nding the MLE b�n, we are less certain about its properties,especially in the small sample case. For a discussion on this point see Smith[583]. In the latter paper it is shown that the classical (good) properties ofthe MLE hold whenever � > �1=2; this is not the case for � � �1=2.As most distributions encountered in insurance and �nance have supportunbounded to the right (this is possible only for � � 0), the MLE techniqueo�ers a useful and reliable procedure in those �elds.At this point we would like to quantify a bit more the often encounteredstatement that for applications in insurance (and �nance for that matter)the case � � 0 is most important. Clearly, all �nancial data must be boundedto the right; an obvious (though somewhat silly) bound is total wealth. Themain point however is that in most data there does not seem to be clustering

6.3 Parameter Estimation for the Generalised Extreme Value Distribution 319towards a well{de�ned upper limit but more a steady increase over time of theunderlying maxima. The latter would then, for iid data, much more naturallybe modelled within � � 0. A typical example is to be found in the Danish�re insurance data of Figure 6.2.11.An example where a natural upper limit may exist is given in Figure6.3.1. The data underlying this example correspond to a portfolio of water{damage insurance. In contrast to the industrial �re data of Figure 6.2.12, inthis case the portfolio manager realises that large claims only play a minorrole. Though the data again show an increasing ME{plot, for values above5 000, the mean excess losses are growing much slower than to be expectedfrom a really heavy{tailed model, unbounded to the right. The ME{plot forthese data should be compared with those for the Danish �re data (Figure6.2.11) and the industrial �re data (Figure 6.2.12). The Pickands estimator(to be introduced in Section 6.4.2) of the extreme value index in Figure6.4.2 indicates that � is close to zero. Compare also with the correspondingestimates of � for the �re data; see Figures 6.5.5 and 6.5.6.An Extension to Upper Order StatisticsSo far, our data has consisted of n iid observations of maxima which wehave assumed to follow exactly a GEV H�. By appropriately de�ning theunderlying time periods, we design independence into the model; see (6.12).Suppose now that, rather than just having the largest observation available,we possess the k largest of each period (year, say). In the notation of (6.12)this would amount to dataX(i)k;s � � � � � X(i)1;s = Xi ; i;= 1; : : : ; n :Maximum likelihood theory based on these k � n observations would use thejoint density of the independent vectors (X(i)k;s; : : : ; X(i)1;s), i = 1; : : : ; n. Onlyrarely in practical cases could we assume that for each i the latter vectorsare derived from iid data. If that were the case then maximum likelihoodestimation should be based on the joint density of k upper order statisticsfrom a GEV as discussed in Theorem 4.1.3:s!(s� k)! Hs�k� (xk) kY=1h� (x`) ; xk < � � � < x1 ;where, depending on �, the x{values satisfy the relevant domain restrictions.The standard error of the MLEs for � and can already be reduced con-siderably if k = 2, i.e. we take the two largest observations into account. Fora brief discussion on this method see Smith [589], Section 4.18, and Smith


0 200 400 600 800 1000 1200

05

10

15

0 2 4 6 8 10

02

04

06

08

01

00

u

e(u)

0 5000 10000 15000

1500

2000

2500

3000

3500

Figure 6.3.1 Exploratory data analysis of insurance claims caused by water: thedata (top, left), the histogram of the log{transformed data (top, right), the ME{plot(bottom). Notice the kink in the ME{plot in the range (5 000; 6 000) re ecting thefact that the data seem to cluster towards some speci�c upper value.

6.3 Parameter Estimation for the Generalised Extreme Value Distribution 321[584], where also further references and examples are to be found. The casen = 1, i.e. only one year of observations say, and k > 1 was �rst discussed inWeissman [631].A �nal statement concerning maximum likelihood methodology, againtaken from Smith [589], is worth stressing:The big advantage of maximum likelihood procedures is that theycan be generalised, with very little change in the basic methodol-ogy, to much more complicated models in which trends or othere�ects may be present.If the above quote has made you curious, do read Smith [589].6.3.2 Method of Probability{Weighted MomentsAmong all the ad{hoc methods used in parameter estimation, the methodof moments has attracted a lot of interest. In full generality it consists ofequating model{moments based on H� to the corresponding empirical mo-ments based on the data. Their general properties are notoriously unreli-able on account of the poor sampling properties of second{ and higher{ordersample moments, a statement taken from Smith [589], p. 447. The class ofprobability{weighted moment estimators stands out as more promising. Thismethod goes back to Hosking, Wallis and Wood [334]. De�newr(�) = E (XHr� (X)) ; r 2 N0 ; (6.13)where H� is the GEV and X has df H� with parameter � = (�; �; ). Recallthat for � � 1, H� is regularly varying with index 1=�. Hence w0 is in�nite.Therefore we restrict ourselves to the case � < 1. De�ne the empirical ana-logue to (6.13), bwr(�) = Z +1�1 xHr� (x) dFn(x) ; r 2 N0 ;where Fn is the empirical df corresponding to the data X1; : : : ; Xn. In orderto estimate � we solve the equationswr(�) = bwr(�) ; r = 0; 1; 2 :We immediately obtainbwr(�) = 1n nXj=1Xj;nHr� (Xj;n) ; r = 0; 1; 2 : (6.14)

322 6. Statistical Methods for Extremal EventsRecall the quantile transformation from Lemma 4.1.9(b):(H�(Xn;n); : : : ; H�(X1;n)) d= (Un;n; : : : ; U1;n) ;where Un;n � � � � � U1;n are the order statistics of an iid sequence U1; : : : ; Ununiformly distributed on (0; 1). With this interpretation, (6.14) can be writtenas bwr(�) = 1n nXj=1Xj;n Urj;n ; r = 0; 1; 2 : (6.15)Clearly, for r = 0, the rhs becomesXn, the sample mean. In order to calculatewr(�) for general r, observe thatwr(�) = Z +1�1 xHr� (x) dH�(x) = Z 10 H � (y) yr dy ;where for 0 < y < 1,H � (y) =8<: �� 1� (� ln y)�� if � 6= 0 ;�� ln(� ln y) if � = 0 :This yields for � < 1 and � 6= 0, after some calculation,wr(�) = 1r + 1 �� 1� � (1� �)(1 + r)�� ; (6.16)where � denotes the Gamma function � (t) = R10 e�uut�1 du, t > 0. A com-bination of (6.15) and (6.16) gives us a probability{weighted moment esti-mator b�(1)n . Further estimators can be obtained by replacing Urj;n in (6.15) bysome statistic. Examples are:{ b�(2)n , where Uj;n is replaced by any plotting position pj;n as de�ned inSection 6.2.1.{ b�(3)n , where Urj;n is replaced byEUrj;n = (n� j)(n� j � 1) � � � (n� j � r + 1)(n� 1)(n� 2) � � � (n� r) ; r = 1; 2 :From (6.16), we immediately obtainw0(�) = �� (1� � (1� �)) ;2w1(�)� w0(�) = � � (1� �) �2� � 1� ;3w2(�)� w0(�) = � � (1� �) �3� � 1� ;

6.3 Parameter Estimation for the Generalised Extreme Value Distribution 323and hence 3w2(�) � w0(�)2w1(�) � w0(�) = 3� � 12� � 1 :Applying any of the estimators above to the last equation yields an estima-tor b� of �. Given b�, the parameters � and are then estimated byb = (2 bw1 � bw0) b�� 1� b� ��2b� � 1� ;b� = bw0 + b b� �1� � �1� b� �� ;where bw0, bw1, bw2 are any of the empirical probability{weighted momentsdiscussed above. The case � = 0 can of course also be covered by this method.For a discussion on the behaviour of these estimators see Hosking et al.[334]. Smith [589] summarises as follows.The method is simple to apply and performs well in simulationstudies. However, until there is some convincing theoretical expla-nation of its properties, it is unlikely to be universally accepted.There is also the disadvantage that, at present at least, it does notextend to more complicated situations such as regression modelsbased on extreme value distributions.6.3.3 Tail and Quantile Estimation, a First GoLet us return to the basic set{up of (6.11) and (6.12), i.e. we have an iidsample X1; : : : ; Xn from H�. In this situation, a quantile estimator can bereadily obtained. Indeed, by the methods discussed in the previous sections,we obtain an estimate b� of �. Given any p 2 (0; 1), the p{quantile xp is de�nedvia xp = H � (p); see De�nition 3.3.5. A natural estimator for xp, based onX1; : : : ; Xn, then becomes bxp = H b� (p) :By the de�nition of H� this leads tobxp = b�� b b� �1� (� ln p)�b� � :The corresponding tail estimate for H�(x), for x in the appropriate domain,corresponds to Hb� (x) = 1� exp8<:��1 + b� x� b�b ��1=b�9=; ;

324 6. Statistical Methods for Extremal Eventswhere b� = (b�; b�; b ) is either estimated by the MLE or by a probability{weighted moment estimator.Notes and CommentsA recommendable account of estimation methods for the GEV, includinga detailed discussion of the pros and cons of the di�erent methods, is Buis-hand [99]. Hosking [331] discusses the problem of hypothesis testing withinGEV.If the extreme value distribution is known to be Fr�echet, Gumbel or Wei-bull, the above methods can be adapted to the speci�c df under consideration.This may simplify the estimation problem in the case of � � 0 (Fr�echet, Gum-bel), but not for the Weibull distribution. The latter is due to non{regularityproblems of the MLE as explained in Section 6.3.1. The vast amount of paperswritten on estimation for the three{parameter Weibull re ects this situation;see for instance Lawless [410, 411], Lockhart and Stephens [422], Mann [436],Smith and Naylor [590] and references therein. To indicate the sort of prob-lems that may occur, we refer to Smith [583] who studies the Pareto{likeprobability densitiesf(x;K;�) � c�(K � x)��1 ; x " K ;where K and � are unknown parameters.6.4 Estimating under Maximum Domain of AttractionConditions6.4.1 IntroductionRelaxing condition (6.11), we assume in this section that for some � 2 R,X1; : : : ; Xn are iid from F 2 MDA (H�) : (6.17)By Proposition 3.3.2, F 2 MDA(H�) is equivalent tolimn!1nF (cnx+ dn) = � lnH�(x) (6.18)for appropriate norming sequences (cn) and (dn), and x belongs to a suitabledomain depending on the sign of �. Let us from the start be very clear aboutthe fundamental di�erence between (6.11) and (6.17). Consider for illustrativepurposes only the standard Fr�echet case � = 1=� > 0. Now (6.11) means thatour sample X1; : : : ; Xn exactly follows a Fr�echet distribution, i.e.

6.4 Estimating under Maximum Domain of Attraction Conditions 325F (x) = 1� exp��x�� ; x > 0 :On the other hand, by virtue of Theorem 3.3.7 assumption (6.17) reduces inthe Fr�echet case to F (x) = x�� L(x) ; x > 0 ;for some slowly varying function L. Clearly, in this case the estimation ofthe tail F (x) is much more involved due to the non{parametric characterof L. In various applications, one would mainly (in some cases, solely) beinterested in �. So (6.11) amounts to full parametric assumptions, whereas(6.17) is essentially semi{parametric in nature: there is a parametric part �and a non{parametric part L. Because of this di�erence, (6.17) is much moregenerally considered as inference for heavy{tailed distributions as opposed toinference for the GEV in (6.11).A handwaving consequence of (6.18) is that for large u = cnx+ dn,nF (u) � �1 + � u� dncn ��1=� ;so that a tail{estimator could take on the form�F (u)�b= 1n 1 + b� u� bdnbcn !�1=b� ; (6.19)for appropriate estimators b�, bcn and bdn. As (6.17) is essentially a tail{property,estimation of � may be based on k upper order statistics Xk;n � � � � � X1;n.A whole battery of classical approaches has exploited this natural idea; seeSection 6.4.2. The following mathematical conditions are usually imposed:(a) k(n)!1 use a su�ciently large number of order statistics, but(b) nk(n) !1 as we are interested in a tail property, we should alsomake sure to concentrate only on the upper orderstatistics. Let the tails speak for themselves. (6.20)When working out the details later, we will be able to see where exactlythe properties on (k(n)) enter. Indeed it is precisely this (for the momentperhaps redundant) degree of freedom k which will allow us to obtain thenecessary statistical properties like consistency and asymptotic normality forour estimators.From (6.19) we would in principle be in the position to estimate thequantile xp = F (p), for �xed p 2 (0; 1), as follows

326 6. Statistical Methods for Extremal Eventsbxp = bdn + bcnb� �(n(1� p))�b� � 1� : (6.21)Typically, we will be interested in estimating high p{quantiles outside thesample X1; : : : ; Xn. This means that p = pn is chosen in such a way thatp > 1 � 1=n, hence the empirical df satis�es Fn(p) = 0 and does not yieldany information about such quantiles. In order to get good estimators for�, cn and dn in (6.21) a subsequence trick is needed. Assume for notationalconvenience that n=k 2 N. A standard approach now consists of passing toa subsequence (n=k) with k = k(n) satisfying (6.20). The quantile xp is thenestimated by bxp = bdn=k + bcn=kb� �nk (1� pn)��b� � 1! : (6.22)Why does this work? One reason behind this construction is that we needto estimate at two levels. First, we have to �nd a reliable estimate for �:this task will be worked out in Section 6.4.2. Condition (6.20) will appearvery naturally. Second, we need to estimate the norming constants cn and dnwhich themselves are de�ned via quantiles of F . For instance, in the Fr�echetcase we know that cn = F (1� n�1); see Theorem 3.3.7. Hence estimatingcn is equivalent to the problem of estimating xp at the boundary of our datarange. By going to the subsequence (n=k), we move away from the criticalboundary value 1�n�1 to the safer 1�(n=k)�1. Estimating cn=k is so reducedto estimating quantiles within the range of our data. Similar arguments holdfor dn=k, and indeed for the Gumbel and Weibull case. We may thereforehope that the construction in (6.22) leads to a good estimator for xp. Theabove discussion is only heuristic, a detailed statistical analysis shows thatthis approach can be made to work.In the context of statistics of extremal events it may also be of interest toestimate the following quantity which is closely related to the quantiles xp:xp;r = F (p1=r) ; r 2 N :Notice that xp = xp;1. The interpretation of xp;r is obvious fromp = F r(xp;r) = P (max (Xn+1; : : : ; Xn+r) � xp;r) ;so xp;r is that level which, with a given probability p, will not be exceededby the next r observations Xn+1; : : : ; Xn+r. As an estimate we then obtainfrom (6.22) bxp;r = bdn=k + bcn=kb� �nk �1� p1=r��b� � 1! :

6.4 Estimating under Maximum Domain of Attraction Conditions 327In what follows we will concentrate only on estimation of xp; from the de�-nition of xp;r it is clear how one has to proceed for general r.From the above heuristics we obtain a programme for the remainder ofthis section:(a) Find appropriate estimators for the shape parameter � of the GEV.(b) Find appropriate estimators for the norming constants cn and dn.(c) Show that the estimators proposed above yield reasonable approxima-tions to the distribution tail in its far end and to high quantiles.(d) Determine the statistical properties of these estimators.6.4.2 Estimating the Shape Parameter �In this section we study di�erent estimators of the shape parameter � forF 2 MDA(H�). We also give some of their statistical properties.Method 1: Pickands's Estimator for � 2 RThe basic idea behind this estimator consists of �nding a condition equivalentto F 2 MDA(H�) which involves the parameter � in an easy way. The key toPickands's estimator and its various generalisations is Theorem 3.4.5, whereit was shown that for F 2 MDA(H�), U(t) = F (1� t�1) satis�eslimt!1 U(2t)� U(t)U(t)� U(t=2) = 2� :Furthermore, this limit relation holds locally uniformly. Hence wheneverlimt!1 c(t) = 2 exists for a positive function c,limt!1 U(c(t)t)� U(t)U(t)� U(t=c(t)) = 2� : (6.23)The basic idea now consists of constructing an empirical estimator using(6.23). To that e�ect, let Vn;n � � � � � V1;nbe the order statistics from an iid sample V1; : : : ; Vn with common Pareto dfFV (x) = 1� x�1, x � 1. It follows in the same way as for the quantile trans-formation, see Lemma 4.1.9(b), that(Xk;n)k=1;:::;n d= (U(Vk;n))k=1;:::;n ;where X1; : : : ; Xn are iid with df F . Notice that Vk;n is the empirical (1 �k=n){quantile of FV . Using similar methods as in Examples 4.1.11 and 4.1.12,i.e. making use of the quantile transformation, it is not di�cult to see that

328 6. Statistical Methods for Extremal Eventskn Vk;n P�! 1 ; n!1 ;whenever k = k(n)!1 and k=n! 0. In particular,Vk;n P�!1 and V2k;nVk;n P�! 12 ; n!1 :Combining this with (6.23) and using a subsequence argument, see AppendixA1.2, yields U (Vk;n)� U (V2k;n)U (V2k;n)� U (V4k;n) P�! 2� ; n!1 :Motivated by the discussion above and by (6.23), we now de�ne the Pickandsestimator b� (P )k;n = 1ln 2 ln Xk;n �X2k;nX2k;n �X4k;n : (6.24)This estimator turns out to be weakly consistent provided k !1, k=n! 0:b� (P )k;n P�! � ; n!1 :This was already observed by Pickands [493]. A full analysis on b� (P )k;n is to befound in Dekkers and de Haan [170] from which the following result is taken.Theorem 6.4.1 (Properties of the Pickands estimator)Suppose (Xn) is an iid sequence with df F 2 MDA(H�), � 2 R. Let b� (P ) =b� (P )k;n be the Pickands estimator (6.24).(a) (Weak consistency) If k !1, k=n! 0 for n!1, thenb� (P ) P�! � ; n!1 :(b) (Strong consistency) If k=n! 0, k= ln lnn!1 for n!1, thenb� (P ) a:s:�! � ; n!1 :(c) (Asymptotic normality) Under further conditions on k and F (see Dek-kers and de Haan [170], p. 1799),pk (b� � �) d�! N(0; v(�)) ; n!1 ;where v(�) = �2 �22�+1 + 1�(2 (2� � 1) ln 2)2 : �

6.4 Estimating under Maximum Domain of Attraction Conditions 329

k

xi

20 40 60 80 100

-2-1

01

2

Figure 6.4.2 Pickands{plot for the water{damage claim data; see Figure 6.3.1.The estimate of � appears to be close to 0. The upper and lower lines constituteasymptotic 95% con�dence bands.Remarks. 1) This theorem forms the core of a whole series of results ob-tained in Dekkers and de Haan [170]; on it one can base quantile and tailestimators and (asymptotic) con�dence interval constructions. The quotedpaper [170] also contains various simulated and real life examples in orderto see the theory in action. We strongly advice the reader to go through it,perhaps avoiding upon �rst reading the (rather extensive) technical details.The main idea behind the above construction goes back to Pickands [493].A nice summary is to be found in de Haan [290] from which the derivationabove is taken.2) In the spirit of Section 6.4.1 notice that the calculation of Pickands's esti-mator (6.24) involves a sequence of upper order statistics increasing with n.Consequently, one mostly includes a so{called Pickands{plot in the analysis,i.e. n(k; b� (P )k;n ) : k = 1; : : : ; no ;in order to allow for a choice depending on k. The interpretation of suchplots, i.e. the optimal choice of k, is a delicate point for which no uniformlybest solution exists. It is intuitively clear that one should choose b� (P )k;n fromsuch a k{region where the plot is roughly horizontal. We shall come back tothis point later; see the Summary at the end of this section. �

330 6. Statistical Methods for Extremal EventsMethod 2: Hill's Estimator for � = ��1 > 0Suppose X1; : : : ; Xn are iid with df F 2 MDA(��), � > 0, thus F (x) =x��L(x), x > 0, for a slowly varying function L; see Theorem 3.3.7. Distri-butions with such tails form the prime examples for modelling heavy{tailedphenomena; see for instance Section 1.3. For many applications the knowl-edge of the index � of regular variation is of major importance. If for instance� < 2 then EX21 =1. This case is often observed in the modelling of insur-ance data; see for instance Hogg and Klugman [326].Empirical studies on the tails of daily log{returns in �nance have indicatedthat one frequently encounters values � between 3 and 4; see for instanceGuillaume et al. [285], Longin [424] and Loretan and Phillips [425]. Informa-tion of the latter type implies that, whereas covariances of such data wouldbe well de�ned, the construction of con�dence intervals for the sample au-tocovariances and autocorrelations on the basis of asymptotic (central limit)theory may be questionable as typically a �nite fourth moment condition isasked for.The Hill estimator of � essentially takes on the following form:b� (H) = b� (H)k;n = 0@1k kXj=1 lnXj;n � lnXk;n1A�1 ; (6.25)where k = k(n)!1 in an appropriate way, so that as in the case ofPickands's estimator, an increasing sequence of upper order statistics is used.One of the interesting facts concerning (6.25) is that various asymptoticallyequivalent versions of b� (H) can be derived through essentially di�erent meth-ods, showing that the Hill estimator is very natural. Below we discuss somederivations.The MLE approach (Hill [322]). Assume for the moment that X is a rvwith df F so that for � > 0P (X > x) = F (x) = x�� ; x � 1 :Then it immediately follows that Y = lnX has dfP (Y > y) = e��y ; y � 0 ;i.e. Y is Exp(�) and hence the MLE of � is given byb�n = Y �1n = 0@ 1n nXj=1 lnXj1A�1 = 0@ 1n nXj=1 lnXj;n1A�1 :

6.4 Estimating under Maximum Domain of Attraction Conditions 331A trivial generalisation concernsF (x) = Cx�� ; x � u > 0 ; (6.26)with u known. If we interpret (6.26) as fully speci�ed, i.e. C = u�, then weimmediately obtain as MLE of �b�n = 0@ 1n nXj=1 ln�Xj;nu �1A�1 = 0@ 1n nXj=1 lnXj;n � lnu1A�1 : (6.27)Now we often do not have the precise parametric information of these exam-ples, but in the spirit of MDA(��) we assume that F behaves like a Pareto dfabove a certain known threshold u say. LetK = card fi : Xi;n > u ; i = 1; : : : ; ng : (6.28)Conditionally on the event fK = kg, maximum likelihood estimation of �and C in (6.26) reduces to maximising the joint density of (Xk;n; : : : ; X1;n).From Theorem 4.1.3 we deducefXk;n;:::;X1;n (xk; : : : ; x1)= n!(n� k)! �1� Cx��k �n�k Ck �k kYi=1 x�(�+1)i ; u < xk < � � � < x1 :A straightforward calculation yields the conditional MLEsb� (H)k;n = 0@1k kXj=1 ln�Xj;nXk;n�1A�1 = 0@1k kXj=1 lnXj;n � lnXk;n1A�1bCk;n = kn Xb� (H)k;nk;n :So Hill's estimator has the same form as the MLE in the exact model un-derlying (6.27) but now having the deterministic u replaced by the randomthreshold Xk;n, where k is de�ned through (6.28). We also immediately obtainan estimate for the tail F (x)�F (x)�b= kn � xXk;n��b�(H)k;n (6.29)and for the p{quantilebxp = �nk (1� p)��1=b�(H)k;n Xk;n : (6.30)

332 6. Statistical Methods for Extremal Eventsu

k

15 33 51 69 87 108 132 156 180 204 228 252 276 300 324 348 372 396 420 444 468 492

0.3

0.4

0.5

0.6

0.7

0.8

0.9

31.10 17.10 11.80 8.68 6.92 5.85 5.40 4.83 4.45 4.15 3.88 3.68 3.39 3.21

•••••••••••••••••••••••••••••••••••••••••••••••••••

•••••••••

•••••••••••••••

••••••

• ••••••

••••••

• •• • • • •• • •• • • • •

x10 50 100

0.0

0.2

0.4

0.6

0.8

1.0

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• • • ••• •• • • • •

0 100 200 300

0.00.0

50.1

00.1

50.2

0

x

u

k

15 33 51 69 87 108 132 156 180 204 228 252 276 300 324 348 372 396 420 444 468 492

2223

2425

2627

31.10 17.10 11.80 8.68 6.92 5.85 5.40 4.83 4.45 4.15 3.88 3.68 3.39 3.21

Figure 6.4.3 Tail and quantile estimation based on a Hill{�t; see (6:29) and(6:30). The data are the Danish insurance claims from Example 6.2.9. Top, left:the Hill{plot for � = 1=� as a function of k upper order statistics (lower horizontalaxis) and of the threshold u (upper horizontal axis), i.e. there are k exceedances ofthe threshold u. Top, right: the �t of the shifted excess df Fu(x� u), x � u, on log{scale. Middle: tail{�t of F (x+u), x � 0. Bottom: estimation of the 0:99{quantile asa function of the k upper order statistics and of the corresponding threshold valueu. The estimation of the tail is based on k = 109 (u = 10) and � = ��1 = 1:618.Compare also with the GPD{�t in Figure 6.5.5.

6.4 Estimating under Maximum Domain of Attraction Conditions 333u

k

15 33 51 69 87 108 132 156 180 204 228 252 276 300 324 348 372 396 420 444 468 492

0.6

0.8

1.0

1.2

1.4

695.0 295.0 175.0 130.0 90.6 75.1 61.8 51.0 45.4 40.1 35.6 30.7 28.0 25.7

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••

•••••••••

•••• •••

•••• •••

• ••••• •••

•• •••••••••• • •• • • •• • • • • •

x100 500 1000 5000 10000

0.0

0.2

0.4

0.6

0.8

1.0

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• • ••• • • • •

0 5000 10000 15000

0.00.0

50.1

00.1

50.2

00.2

50.3

0

x

u

k

15 33 51 69 87 108 132 156 180 204 228 252 276 300 324 348 372 396 420 444 468 492

140

160

180

200

220

240

260

695.0 295.0 175.0 130.0 90.6 75.1 61.8 51.0 45.4 40.1 35.6 30.7 28.0 25.7

Figure 6.4.4 Tail and quantile estimation based on a Hill{�t; see (6:29) and(6:30). The data are the industrial �re claims from Example 6.2.9. Top, left: theHill{plot for � = 1=� as a function of k upper order statistics (lower horizontal axis)and of the threshold u (upper horizontal axis), i.e. there are k exceedances of thethreshold u. Top, right: the �t of the shifted excess df Fu(x�u), x � u, on log{scale.Middle: tail{�t of F (x + u), x � 0. Bottom: estimation of the 0:99{quantile as afunction of the k upper order statistics and of the corresponding threshold value u.The estimation of the tail is based on k = 149 (u = 100) and � = ��1 = 1:058.Compare also with the GPD{�t in Figure 6.5.5.

334 6. Statistical Methods for Extremal EventsFrom (6.29) we obtain an estimator of the excess df Fu(x � u), x � u, byusing Fu(x � u) = 1� F (x)=F (u). Examples, based on these estimators areto be found in Figures 6.4.3 and 6.4.4 for the Danish, respectively indus-trial, �re insurance data. We will come back to these data more in detail inSection 6.5.2.Example 6.4.5 (The Hill estimator at work)In Figures 6.4.3 and 6.4.4 we have applied the above methods to the Danish�re insurance data (Figure 6.2.11) and the industrial �re insurance data (Fig-ure 6.2.12); for a preliminary analysis see Example 6.2.9. For the Danish datawe have chosen as an initial threshold u = 10 (k = 109). The correspondingHill estimate has a value b� = 0:618. When changed to u = 18 (k = 47),we obtain b� = 0:497. The Hill{plot shows a fairly stable behaviour in therange (0:5; 0:7). As in most applications, the quantities of main interest arethe high quantiles. We therefore turn immediately to Figure 6.4.3 (bottom),where bx0:99 is plotted across all relevant u{ (equivalently, k{) values. Fork in the region (45; 110) the quantile{Hill{plot shows a remarkably stablebehaviour around the value 24:7. This agrees perfectly with the empirical es-timate of 24.6 for x0:99; see Table 6.2.13. This should be contrasted with thesituation in Figure 6.4.4 for the industrial �re data. For the latter data, theestimate for � ranges from 0.945 for u = 100 (k = 149) over 0.745 for u = 300(k = 49) to 0.799 for u = 500 (k = 29). All estimates clearly correspond to in-�nite variance models! An estimate for x0:99 in the range (180; 200) emerges,again in agreement with the empirical value of 184. We would like to stressat this point that the above discussion represents only the beginning of adetailed analysis. The further discussions have to be conducted together withthe actuary responsible for the underlying data. �The regular variation approach (de Haan [291]). This approach is in thesame spirit as the construction of Pickands's estimator, i.e. base the inferenceon a suitable reformulation of F 2 MDA(��). Indeed F 2 MDA(��) if andonly if limt!1 F (tx)F (t) = x�� ; x > 0 :Using partial integration, we obtainZ 1t (lnx� ln t) dF (x) = Z 1t F (x)x dx ;so that by Karamata's theorem (Theorem A3.6)1F (t) Z 1t (ln x� ln t) dF (x) ! 1� ; t!1 : (6.31)How do we �nd an estimator from this result? Two choices have to be made:

6.4 Estimating under Maximum Domain of Attraction Conditions 335(a) replace F by an estimator, the obvious candidate here is the empiricaldistribution functionFn(x) = 1n nXi=1 IfXi�xg = 1n nXi=1 IfXi;n�xg ;(b) replace t by an appropriate high, data dependent level (recall t!1);we take t = Xk;n for some k = k(n).The choice of t is motivated by the fact thatXk;n a:s:�!1 provided k = k(n)!1 and k=n! 0; see Proposition 4.1.14. From (6.31) the following estimatorresults 1F (Xk;n) Z 1Xk;n (lnx� lnXk;n) dFn(x) = 1k � 1 k�1Xj=1 lnXj;n � lnXk;nwhich, modulo the factor k � 1, is again of the form (b� (H))�1 in (6.25).Notice that the change from k to k � 1 is asymptotically negligible; see Ex-ample 4.1.11.The mean excess function approach. This is essentially a reformulationof the approach above; we prefer to list it separately because of its method-ological merit. Suppose X is a rv with df F 2 MDA(��), � > 0, and fornotational convenience assume that X > 1 a.s. One can now rewrite (6.31)as follows (see also Example 6.2.5)E(lnX � ln t j lnX > ln t)! 1� ; t!1 :So denoting u = ln t and e�(u) the mean excess function of lnX (see De�ni-tion 6.2.3) we obtain e�(u)! 1� ; u!1 :Hill's estimator can then be interpreted as the empirical mean excess functionof lnX calculated at the threshold u = lnXk;n, i.e. e�n(lnXk;n).We summarise as follows.Suppose X1; : : : ; Xn are iid with df F 2 MDA(��), � > 0, thena natural estimator for � is provided by Hill's estimatorb� (H)k;n = 0@1k kXj=1 lnXj;n � lnXk;n1A�1 ; (6.32)where k = k(n) satis�es (6.20).

336 6. Statistical Methods for Extremal EventsBelow we summarise the main properties of the Hill estimator. Before lookingat the theorem, you may want to refresh your memory on the meaning oflinear processes (see Sections 5.5 and 7.1) and weakly dependent strictlystationary processes (see Section 4.4).Theorem 6.4.6 (Properties of the Hill estimator)Suppose (Xn) is strictly stationary with marginal distribution F satisfyingfor some � > 0 and L 2 R0,F (x) = P (X > x) = x��L(x) ; x > 0 :Let b� (H) = b� (H)k;n be the Hill estimator (6.32).(a) (Weak consistency) Assume that one of the following conditions is satis-�ed:{ (Xn) is iid (Mason [441]),{ (Xn) is weakly dependent (Rootz�en, Leadbetter and de Haan [546],Hsing [337]),{ (Xn) is a linear process (Resnick and St�aric�a [529, 531]).If k !1, k=n! 0 for n!1, thenb� (H) P�! � :(b) (Strong consistency) (Deheuvels, H�ausler and Mason [167]) If k=n ! 0,k= ln lnn!1 for n!1 and (Xn) is an iid sequence, thenb� (H) a:s:�! � :(c) (Asymptotic normality) If further conditions on k and F are satis�ed(see for instance the Notes and Comments below) and (Xn) is an iidsequence, then pk �b� (H) � �� d�! N �0; �2� : �Remarks. 3) Theorem 6.4.6 should be viewed as a counterpart to Theo-rem 6.4.1 on the Pickands estimator. Because of the importance of b� (H),we prefer to formulate Theorem 6.4.6 in its present form for sequences (Xn)more general than iid.4) Do not interpret this theorem as saying that the Hill estimator is always�ne. The theorem only says that rather generally the standard statisticalproperties hold. One still needs a crucial set of conditions on F and k(n).In particular, second{order regular variation assumptions on F have to beimposed to derive the asymptotic normality of b� (H). The same applies to thecase of the Pickands estimator. Notice that these conditions are not veri�ablein practice.

6.4 Estimating under Maximum Domain of Attraction Conditions 3370 100 200 300 400 500

0.00.5

1.01.5

2.0

k

xi

HillDEdHPickand

HillDEdHPickands

Figure 6.4.7 Pickands{, Hill{ and DEdH{plots for 2 000 simulated iid data withdf given by F (x) = x�1, x � 1. For reasons of comparison, we choose the Hillestimator for � = 1 as b� (H) = (b�( (H))�1. Various Hill{ and related plots fromsimulated and real life data are for instance given in Figures 6.4.11, 6.4.12 andSection 6.5.2. See also Figure 4.1.13 for a \Hill horror plot" for the tail F (x) =1=(x lnx) and Figure 5.5.4 for the case of dependent data.5) In Example 4.1.12 we prove the weak consistency of Hill's estimator forthe iid case and indicate how to prove its asymptotic normality. Moreover,we give an example showing that the rate of convergence of Hill's estimatorcan be arbitrarily slow.6) As in the case of the Pickands estimator, an analysis based on the Hillestimator is usually summarised graphically. The Hill{plotn(k; b� (H)k;n ) : k = 2; : : : ; no ;is instrumental in �nding the optimal k. Smoothing Hill{plots over a speci�crange of k{values may defuse the critical problem of the choice of k; see forinstance Resnick and St�aric�a [530].7) The asymptotic variance of b�(H) depends on the unknown parameter � sothat in order to calculate the asymptotic con�dence intervals an appropriateestimator of �, typically b� (H), has to be inserted.8) The Hill estimator is very sensitive with respect to dependence in thedata; see for instance Figure 5.5.4 in the case of an autoregressive process.For ARMA and weakly dependent processes special techniques have beendeveloped, for instance by �rst �tting an ARMA model to the data and thenapplying the Hill estimator to the residuals. See for instance the referencesmentioned under part (a) of Theorem 6.4.6. �


k

xi

100 200 300 400 500

0.8

1.0

1.2

1.4

1.6

Hill

k

xi

100 200 300 400 500

0.0

0.5

1.0

1.5

2.0

Pickands

k

xi

100 200 300 400 500

0.2

0.4

0.6

0.8

1.0

1.2

1.4

DEdH

Figure 6.4.8 Pickands{, Hill{ and DEdH{plots with asymptotic 95% con�dencebands for 2 000 absolute values of iid standard Cauchy rvs. The tail of the latter isPareto{like with index � = 1. Recall that, for given k, the DEdH and the Hill esti-mator use the k upper order statistics of the sample, whereas the Pickands estimatoruses 4k of them. In the case of the Pickands estimator one clearly sees the trade{o�between variance and bias; see also the discussion in the Notes and Comments.

6.4 Estimating under Maximum Domain of Attraction Conditions 339Method 3: The Deckers{Einmahl{de Haan Estimator for � 2 RA disadvantage of Hill's estimator is that it is essentially designed forF 2 MDA(H�), � > 0. We have already stressed before that this class of mod-els su�ces for many applications in the realm of �nance and insurance. InDekkers, Einmahl und de Haan [172], Hill's estimator is extended to cover thewhole class H�, � 2 R. In Theorem 3.3.12 we saw that for F 2 MDA(H�),� < 0, the right endpoint xF of F is �nite. We can and do assume thatxF > 0, if necessary we shift the domain of the df. In Section 3.3 we foundthat the maximum domain of attraction conditions for H� all involve somekind of regular variation. As for deriving the Pickands and Hill estimator,one can reformulate regular variation conditions to �nd estimators for any� 2 R. Dekkers et al. [172] come up with the following proposal:b� = 1 +H(1)n + 12 (H(1)n )2H(2)n � 1!�1 ; (6.33)where H(1)n = 1k kXj=1 (lnXj;n � lnXk+1;n)is the reciprocal of Hill's estimator (modulo an unimportant change from kto k + 1) and H(2)n = 1k kXj=1 (lnXj;n � lnXk+1;n)2 :Because H(1)n and H(2)n can be interpreted as empirical moments, b� is alsoreferred to as a moment estimator of �. To make sense, in all the estimatorsdiscussed so far we could (and actually should) replace lnx by ln(1 _ x). Inpractice, this should not pose problems because we assume k=n! 0. Hencethe relation Xk;n a:s:�! xF > 0 holds; see Example 4.1.14.At this point we pause for a moment and see where we are. First of allDo we have all relevant approaches for estimating the shape parameter �?Although various estimators have been presented, we have to answer thisquestion by no! The above derivations were all motivated by analytical resultson regular variation. In Chapter 5 however we have tried hard to convince youthat point process methods are the methodology to use when one discussesextremes, and we possibly could use point process theory to �nd alternativeestimation procedures. This can be made to work; one programme runs underthe heading

340 6. Statistical Methods for Extremal EventsPoint process of exceedances,or, as the hydrologists call it,POT: the Peaks Over Threshold method.Because of its fundamental importance we decided to spend a whole sectionon this method; see Section 6.5.Notes and CommentsIn the previous sections we discussed some of the main issues underlying thestatistical estimation of the shape parameter �. This general area is rapidlyexpanding so that an overview at any moment of time is immediately out-dated. The references cited are therefore not exhaustive and re ect our per-sonal interest. The fact that a particular paper does not appear in the list ofreferences does not mean that it is considered less important.The Hill Estimator: the Bias-Variance Trade{O�Theorem 6.4.6 for iid data asserts that whenever F (x) = x��L(x), � > 0,then the Hill estimator b� (H) = b� (H)k;n satis�espk �b� (H) � �� d�! N �0; �2� ;where k = k(n)!1 at an appropriate rate. However, in the formulationof Theorem 6.4.6 we have not told you the whole story: depending on theprecise choice of k and on the slowly varying function L, there is an importanttrade{o� between bias and variance possible. It all comes down to second{order behaviour of L, i.e. asymptotic behaviour beyond the de�ning propertyL(tx) � L(x), x!1. Typically, for increasing k the asymptotic variance�2=k of b� (H) decreases: so let us take k as large as possible. Unfortunately,when doing so, a bias may enter!A fundamental paper introducing higher{order regular variation techniquesfor solving this problem is Goldie and Smith [274]. In our discussion belowwe follow de Haan and Peng [293]. Similar results are to be found in de Haanand Resnick [298], Hall [305], H�ausler and Teugels [316] and Smith [581] forinstance.The second{order property needed beyond F (x) = x��L(x) is thatlimx!1 F (tx)=F (x)� t��a(x) = t�� t� � 1� ; t > 0 ; (6.34)

6.4 Estimating under Maximum Domain of Attraction Conditions 341exists, where a(x) is a measurable function of constant sign. The right{handside of (6.34) is to be interpreted as t�� ln t if � = 0. The constant � � 0 isthe second{order parameter governing the rate of convergence of F (tx)=F (x)to t��. It necessarily follows that ja(x)j 2 R�; see Geluk and de Haan [248],Theorem 1.9. In terms of U(t) = F (1� t�1), (6.34) is equivalent tolimx!1 U(tx)=U(x)� t1=�A(x) = t1=� t�=� � 1�=� ; (6.35)where A(x) = ��2a(U(x)).The following result is proved as Theorem 1 in de Haan and Peng [293].Theorem 6.4.9 (The bias{variance trade{o� for the Hill estimator)Suppose (6.35), or equivalently (6.34), holds and k = k(n)!1, k=n! 0 asn!1. If limn!1pk A�nk � = � 2 R ; (6.36)then as n!1 pk �b� (H) � �� d�! N � �3�� ; �2� : �Example 6.4.10 (The choice of the value k)Consider the special case F (x) = cx��(1 + x��)for positive constants c, � and �. We can choosea(x) = �x�� ;giving � = �� in (6.34). Since U(t) = (ct)1=�(1 + o(1)), we obtainA(x) = ��2 (cx)��=�(1 + o(1)) :Then (6.36) yields k such thatk � Cn(2�)=(2�+�) ; k !1 ;where C is a constant, depending on �, �, c and �. Moreover, � = 0 if andonly if C = 0, hence k = o(n(2�)=(2�+�)). �

342 6. Statistical Methods for Extremal EventsFrom (6.36) it follows that for k tending to in�nity su�ciently slowly, i.e.taking only a moderate number of order statistics into account for the con-struction of the Hill estimator, � = 0 will follow. In this case b� (H) is anasymptotically unbiased estimator for �, as announced in Theorem 6.4.6.The asymptotic mean squared error equals1k ��2 + �6�2(�� )2� :Theorem 6.4.9 also explains the typical behaviour of the Hill{plot showinglarge variations for small k versus small variations (leading to a biased esti-mate) for large k. As always, the trick consists of�nding the optimal k!Results such as Theorem 6.4.9 are useful mainly from a methodological pointof view. Condition (6.34) is rarely veri�able in practice. We shall come backto this point later; see the Summary at the end of this section.A Comparison of Di�erent Estimators of the Shape ParameterAt this point, we should pose the questionWhich estimators of the shape parameter � should one use?As so often in statistics, there is no clear{cut answer. It all depends on the pos-sible values of � and, as we have seen, on the precise properties of the under-lying df F . Some general statements can however be made. For � = ��1 > 0and dfs satisfying (6.34), de Haan and Peng [293] proved results similar toTheorem 6.4.9 for the Pickands estimator (6.24) and the Dekkers{Einmahl{de Haan (DEdH) estimator (6.33). It turns out that in the case � = 0 theHill estimator has minimum mean squared error. The asymptotic relativee�ciencies for these estimators critically depend on the interplay between �and �. Both the Pickands and the DEdH estimator work for general � 2 R.For � > �2 the DEdH estimator has lower variance than Pickands's. More-over Pickands's estimator is di�cult to use since it is rather unstable; seeFigures 6.4.7 and 6.4.8. There exist various papers combining higher{orderexpansions of F together with resampling methods. The bootstrap for Hill'sestimator has been studied for instance by Hall [306]. An application to theanalysis of high frequency foreign exchange rate data is given by Danielsonand de Vries [151]; see also Pictet, Dacorogna and M�uller [494]. For applica-tions to insurance data see Beirlant et al. [57].

6.4 Estimating under Maximum Domain of Attraction Conditions 3430 500 1000 1500

02

46

810

k0 500 1000 1500

12

34

5

0 500 1000 1500

010

2030

4050

k0 500 1000 1500

0.51.0

1.52.0

2.5

0 500 1000 1500

05

1015

20

k0 500 1000 1500

1.01.5

2.02.5

3.0

Figure 6.4.11 (Warning) A comparative study of Hill{plots (20 � k � 1 900) for1 900 iid simulated data from the distributions: standard exponential (top), heavy{tailed Weibull with shape parameter � = 0:5 (middle), standard lognormal (bottom).The Hill estimator does not estimate anything reasonable in these cases. A (too)quick glance at these plots could give you an estimate of 3 for the exponential dis-tribution. This should be a warning to everybody using Hill{ and related plots! Theymust be treated with extreme care. One de�nitely has to contrast such estimates withthe exploratory data analysis techniques from Section 6.2.

344 6. Statistical Methods for Extremal EventsDA

X

0 500 1000 1500

-.10

0.05

k0 500 1000 1500

1.01.5

2.02.5

3.0

Figure 6.4.12 1 864 daily log{returns (closing data) of the German stock indexDAX (September 20, 1988 { August 24, 1995) (top) and the corresponding Hill{plotof the absolute values (bottom). It gives relatively stable estimates around the value2:8 in the region 100 � k � 300. This is much a wider region than in Figures 6.4.11.This Hill{plot is also qualitatively di�erent from the exact Pareto case; see Figure6.4.7. The deviations can be due to the complicated dependence structure of �nancialtimes series.Besides the many papers already referred to, we also would like to mentionAnderson [13], Boos [79], Cs�org}o and Mason [145], Davis and Resnick [156],Drees [183], Falk [221], H�ausler and Teugels [316], Lo [421] and Smith andWeissman [593].More Estimators for the Index of Regular VariationHahn and Weiner [302] apply Karamata's theorem to derive a joint estimatorof the index of regular variation and an asymmetry parameter for distributiontails. Their method essentially uses truncated moments. An alternative ap-proach based on point process methods is discussed in H�opfner [328, 329] andJacod and H�opfner [330]. Cs�org}o, Deheuvels and Mason [144] study kerneltype estimates of � including the Hill estimator as a special case. Wei [629]proposes conditional maximum likelihood estimation under both, full andpartial knowledge of the slowly varying function.In the case of models likeF (x) = exp��x��L(x) ; � > 0 ; L 2 R0 ;one should consult Beirlant and Teugels [56], Beirlant et al. [57], Chapter 4,Broniatowski [93], Keller and Kl�uppelberg [370] and Kl�uppelberg and Vil-lase~nor [396].

6.4 Estimating under Maximum Domain of Attraction Conditions 345Estimators for the Index of a Stable LawSince the early work by Mandelbrot [432] numerous papers have been pub-lished concerning the hypothesis that logarithmic returns in �nancial data fol-low a stable process with parameter 0 < � < 2. Though the exact stability hasbeen disputed, a growing consensus is formed around the heavy{tailedness oflog{returns. Consequently, various authors focussed on parameter estimationin stable models. Examples include Koutrouvelis [402] using regression typeestimators based on the empirical characteristic function; see also Feuerverger[234], Feuerverger and McDunnough [235] and references therein. McCulloch[446] suggests estimators based on functions of the sample quantiles; thispaper also contains a good overall discussion. Though McCulloch's approachseems optimal in the exact stable case, the situation may dramatically changeif only slight deviations from stability are present in the data. DuMouchel[191, 192] is a good place to start reading on this. For a detailed discussion onthese problems together with an overview on the use of stable distributionsand processes in �nance see Mittnik and Rachev [460, 461] and the referencestherein.6.4.3 Estimating the Norming ConstantsIn the previous section we obtained estimators for the shape parameter �given iid data X1; : : : ; Xn with df F 2 MDA(H�). Recall that the latter con-dition is equivalent to c�1n (Mn � dn) d�! H�for appropriate norming constants cn > 0 and dn 2 R. We also know thatthis relation holds if and only ifnF (cnx+ dn)! � lnH�(x) ; n!1 ; x 2 R :As we have already seen in Section 6.4.1, norming constants enter in quantileand tail estimation; see (6.19). Below we discuss one method how normingconstants can be estimated. Towards the end of this section we give somefurther references to other methods. In Section 3.3 we gave analytic formulaelinking the norming sequences (cn) and (dn) with the tail F . For instance, inthe Gumbel case � = 0 with right endpoint xF = 1 the following formulaewere derived in Theorem 3.3.27cn = a (dn) ; dn = F �1� n�1� ; (6.37)where a(�) stands for the auxiliary function which can be taken in the form

346 6. Statistical Methods for Extremal Eventsa(x) = Z 1x F (y)F (x) dy :Notice the problem: on the one hand, we need the norming constants cn anddn in order to obtain quantile and tail estimates. On the other hand, (6.37)de�nes them as functions of just that tail, so it seems thatthis surely is a race we cannot win!Though this is partly true let us see how far we can get. We will try toconvince you that the appropriate reformulation of the above sentence is:this is a race which will be di�cult to win!Consider the more general set{up F 2 MDA(H�), � � 0, including the for ourpurposes most important cases of the Fr�echet and the Gumbel distribution. InExamples 3.3.33 and 3.3.34 we showed how one can unify these two maximumdomains of attraction by the logarithmic transformationx� = ln(1 _ x) ; x 2 R :Together with Theorem 3.3.26 the following useful result can be obtained.Lemma 6.4.13 (Embedding MDA(H�), � � 0, in MDA(�))Let X1; : : : ; Xn be iid with df F 2 MDA(H�), � � 0, with xF =1 andnorming constants cn > 0 and dn 2 R. Then X�1 ; : : : ; X�n are iid with dfF � 2 MDA(�) and auxiliary functiona�(t) = Z 1t F �(y)F �(t) dy :The norming constants can be chosen asd�n = (F �) (1� n�1) ;c�n = a�(d�n) = Z 1d�n F �(y)F � (d�n) dy � n Z 1d�n F �(y) dy : �In Section 6.4.1 we tried to convince you that our estimators have to be basedon the k largest order statistics Xk;n; : : : ; X1;n, where k = k(n)!1. Fromthe above lemma we obtain estimators if we replace F � by the empiricaldf F �n :

6.4 Estimating under Maximum Domain of Attraction Conditions 347bd�n=k = X�k+1;n = ln(1 _Xk+1;n)bc�n=k = nk Z 1bd�n=k F �n(y) dy= nk Z lnX1;nlnXk+1;n F �n(y) dy= 1k kXj=1 lnXj;n � lnXk+1;n : (6.38)The latter is a version of the Hill estimator. The change from k to k + 1 in(6.38) is asymptotically unimportant; see Example 4.1.11.Next we make the transformation back from F � to F viank P �X� > c�n=k x+ d�n=k� = nk P �X > expnc�n=k x+ d�n=ko� ; x > 0 :Finally we use that F � 2 MDA(�), hence the left{hand side converges to e�xas n!1, provided that n=k !1. We thus obtain the tail estimator�F (x)�b = kn �expn�bd�n=k + lnxo��1�bc�n=k= kn � xXk+1;n��1�bc�n=k :This tail estimator was already obtained by Hill [322] for the exact model(6.26); see (6.30). As a quantile estimator we obtainbxp = �nk (1� p)��bc �n;k Xk+1;n :Time to summarise all of this:Let b� (H) denotes the Hill estimator for �, i.e.b� (H) = 1k kXj=1 lnXj;n � lnXk;n :

348 6. Statistical Methods for Extremal EventsLet X1; : : : ; Xn be a sample from F 2 MDA(H�), � � 0, andk = k(n)!1 such that k=n! 0. Then for x large enough, a tailestimator for F (x) becomes�F (x)�b= kn � xXk+1;n��1�b� (H) :The quantile xp so that F (xp) = p 2 (0; 1) can be estimated bybxp = �nk (1� p)��b� (H) Xk+1;n :Notes and CommentsIt should be clear from the above that similar quantile estimation methodscan be worked out using alternative parameter estimators as discussed inthe previous sections. For instance, both Hill [322] and Weissman [631] basetheir approach on maximum likelihood theory and the limit distribution ofthe k upper order statistics as in Theorem 4.2.8. They cover the whole range� 2 R. Other estimators of the norming constants were proposed by Dijk andde Haan [180] and Falk, H�usler and Reiss [222].6.4.4 Tail and Quantile EstimationAs before, assume that we consider a sampleX1; : : : ; Xn of iid rvs with df F 2MDA(H�) for some � 2 R. Let 0 < p < 1 and xp denote the correspondingp{quantile.The whole point behind the domain of attraction condition F 2 MDA(H�)is to be able to estimate quantiles outside the range of the data, i.e.p > 1� 1=n. The latter is of course equivalent to �nding estimators for thetail F (x) with x large. In Sections 6.3.3 and 6.4.3 we have already discussedsome possibilities. Indeed, whenever we have estimators for the shape param-eter � and the norming constants cn and dn, natural estimators of xp and F (x)can immediately be derived from the de�ning property of F 2 MDA(H�). Wewant to discuss some of them more in detail and point at their most importantproperties and caveats. From the start we would like to stress that estimationoutside the range of the data can be made only if extra model assumptionsare imposed. There is no magical technique which yields reliable results forfree. One could formulate as in �nance:There is no free lunch when it comes to high quantile estimation!

6.4 Estimating under Maximum Domain of Attraction Conditions 349In our discussion below, we closely follow the paper by Dekkers and de Haan[170]. The main results are formulated in terms of conditions on U(t) =F (1 � t�1) so that xp = U(1=(1 � p)). Denoting Un(t) = F n (1 � t�1),where F n is the empirical quantile function,Un� nk � 1� = F n �1� k � 1n � = Xk;n ; k = 1; : : : ; n :Hence Xk;n appears as a natural estimator of the (1�(k�1)=n){quantile. Therange [Xn;n; X1;n] of the data allows one to make a within{sample estima-tion up to the (1�n�1){quantile. For high quantile estimation the followingsituations are of main interest:(a) high quantiles within the sample: p = pn " 1, n(1� pn)! c ; c 2 (1;1] ;(b) high quantiles outside the sample: p = pn " 1, n(1� pn)! c, 0 � c < 1.Case (a) for c =1 is addressed by the following result which is Theorem 3.1in Dekkers and de Haan [170]. It basically tells us that we can just use theempirical quantile function for estimating xp.Theorem 6.4.14 (Estimating high quantiles I)Suppose X1; : : : ; Xn is an iid sample from F 2 MDA(H�), � 2 R, and F hasa positive density f . Assume that the density U 0 is in R��1 . Write p = pnand k = k(n) = [n(1 � pn)], where [x] denotes the integer part of x. If theconditions pn ! 1 and n (1� pn)!1hold then p2k Xk;n � xpXk;n �X2k;n d�! N �0; 22�+1�2=(2� � 1)2� : �Remark. 1) The condition U 0 2 R��1 can be reformulated in terms of F .For instance for � > 0, the condition becomes f 2 R�1�1=�. �In Theorem 3.4.5 we characterised F 2 MDA(H�) through the asymptoticbehaviour of U :limt!1 U(tx)� U(t)U(ty)� U(t) = x� � 1y� � 1 ; x; y > 0 ; y 6= 1 :For � = 0 the latter limit has to be interpreted as lnx= ln y. We can rewritethe above as followsU(tx) = x� � 11� y� (U(t)� U(ty))(1 + o(1)) + U(t) : (6.39)Using this relation, a heuristic argument suggests an estimator for the quan-tiles xp outside the range of the data. Indeed, replace U by Un in (6.39) and

350 6. Statistical Methods for Extremal Eventsput y = 1=2, x = (k � 1)=(n(1� p)) and t = n=(k � 1). Substitute � by anappropriate estimator b�. Doing so, and neglecting o(1){terms one �nds thefollowing estimator of xp:bxp = (k=(n(1� p)))b� � 11� 2�b� (Xk;n �X2k;n) +Xk;n : (6.40)The following result is Theorem 3.3 in Dekkers and de Haan [170].Theorem 6.4.15 (Estimating high quantiles II)Suppose X1; : : : ; Xn is an iid sample from F 2 MDA(H�), � 2 R, and assumethat limn!1 n(1� p) = c for some c > 0. Let bxp be de�ned by (6.40) with b�the Pickands estimator (6.24). Then for every �xed k > c,bxp � xpXk;n �X2k;n d�! Y ;where Y = (k=c)� � 2��1� 2�� + 1� (Qk=c)�expf�Hkg � 1 : (6.41)The rvs Hk and Qk are independent, Qk has a gamma distribution withparameter 2k + 1 and Hk = 2kXj=k+1 Ejjfor iid standard exponential rvs E1; E2; : : :. �Remarks. 2) The case 0 < c < 1 of Theorem 6.4.15 corresponds to extrap-olation outside the range of the data. For the extreme case c = 0, a relevantresult is to be found in de Haan [289], Theorem 5.1. Most of these results de-pend on highly technical conditions on the asymptotic behaviour of F . Thereis a strong need for comparative numerical studies on these high quantileestimators.3) Approximations to the df of Y in (6.41) can be worked out explicitly.4) As for the situation of Theorem 6.4.14, no results seem to exist concern-ing the optimal choice of k. For the consistency of the Pickands estimatorb�, which is part of the estimator bxp, one actually needs k = k(n) ! 1; seeTheorem 6.4.1.5) In the case � < 0 similar results can be obtained for the estimation ofthe right endpoint xF of F . We refer the interested reader to Dekkers andde Haan [170] for further details and some examples. �

6.4 Estimating under Maximum Domain of Attraction Conditions 351SummaryThroughout Section 6.4, we have discovered various estimators for the impor-tant shape parameter � of dfs in the maximum domain of attraction of theGEV. From these and further estimators, either for the location and scaleparameters and/or norming constants, estimators for the tail F and highquantiles resulted. The properties of these estimators crucially depend onthe higher{order behaviour of the underlying distribution tail F . The latteris unfortunately not veri�able in practice.On various occasions we hinted at the fact that the determination ofthe number k of upper order statistics �nally used remains a delicate pointin the whole set{up. Various papers exist which o�er a semi{automatic orautomatic, so{called "optimal", choice of k. See for instance Beirlant et al.[57] for a regression based procedure with various examples to insurancedata, and Danielson and de Vries [151] for an alternative method motivatedby examples in �nance. We personally prefer a rather pragmatic approachrealising that, whatever method one chooses, the \Hill horror plot" (Figure4.1.13) would fool most, if not all. It also serves to show how delicate a tailanalysis in practice really is. On the other hand, in the "nice case" of exactPareto behaviour, all methods work well; see Figures 6.4.7.Our experience in analysing data, especially in (re)insurance, shows thatin practice one is often faced with data which are clearly heavy{tailed andfor which "exact" Pareto behaviour of F (x) sets in for relatively low valuesof x; see for instance Figure 6.2.12. The latter is not so obvious in the worldof �nance. This is mainly due to more complicated dependence structuresin most of the �nance data; compare for instance Figures 6.5.10 and 6.5.11.A "nice" example from the realm of �nance was discussed in Figure 6.4.12.The conclusion "the data are heavy{tailed" invariably has to be backed upwith information from the user who provided the data in the �rst place!Furthermore, any analysis performed has to be supported by exploratorydata analysis techniques as outlined in Section 6.2. Otherwise, situations asexplained in Figure 6.4.11 may occur.It is our experience that in many cases one obtains a Hill{ (or related)plot which tends to have a fairly noticeable horizontal stretch across di�erent(often lower) k{values. A choice of k in such a region is to be preferred.Though the above may sound vague, we suggest the user of extremal eventtechniques to experiment on both simulated as well as real data in order to geta feeling for what is going on. A �nal piece of advice along this route: nevergo for one estimate only. Calculate and plot always estimates of the relevantquantities (a quantile say) across a wide range of k{values; for examplessee Figures 6.4.3 and 6.4.4. In the next section we shall come back to this

352 6. Statistical Methods for Extremal Eventspoint, replacing k by a threshold u. We already warn the reader beforehand:the approach o�ered in Setion 6.5 su�ers from the same problems as thosediscussed in this section.Notes and CommentsSo far, we only gave a rather brief discussion on the statistical estimationof parameters, tails and quantiles in the heavy{tailed case. This area is stillunder intensive investigation so that at present no complete picture can begiven. Besides the availability of a whole series of mathematical results a lot ofinsight is obtained through simulation and real life examples. In the next sec-tion some further techniques and indeed practical examples will be discussed.The interested reader is strongly adviced to consult the papers referred to sofar. An interesting dicussion on the main issues is de Haan [291], where alsoapplications to currency exchange rates, life span estimation and sea leveldata are given; see also Davis and Resnick [156]. In Einmahl [195] a criticaldiscussion concerning the exact meaning of extrapolating outside the data isgiven. He stresses the usefulness of the empirical df as an estimator.6.5 Fitting Excesses over a Threshold6.5.1 Fitting the GPDMethodology introduced so far was obtained either on the assumption thatthe data come from a GEV (see Section 6.3) or belong to its maximumdomain of attraction (see Section 6.4). We based statistical estimation ofthe relevant parameters on maximum likelihood, the method of probability{weighted moments or some appropriate condition of regular variation type.In Section 3.4 we laid the foundation to an alternative approach based onexceedances of high thresholds. The key idea of this approach is explainedbelow.SupposeX;X1; : : : ; Xn are iid with df F 2 MDA(H�) for some � 2 R.First, choose a high threshold u and denote byNu = cardfi : i = 1; : : : ; n ; Xi > ugthe number of exceedances of u by X1; : : : ; Xn. We denote the correspondingexcesses by Y1; : : : ; YNu ; see Figure 6.5.1. The excess df of X is given by

6.5 Fitting Excesses over a Threshold 353u

X

X

X

X

X

Y

Y

Y

1

2

3

YNu

2

3

4

5

1

X13

Figure 6.5.1 Data X1; : : : ; X13 and the corresponding excesses Y1; : : : ; YNu overthe threshold u.Fu(y) = P (X � u � y j X > u) = P (Y � y j X > u) ; y � 0 ;see De�nition 3.4.6. The latter relation can also be written asF (u+ y) = F (u)Fu(y) : (6.42)Now recall the de�nition of the generalised Pareto distribution (GPD) fromDe�nition 3.4.9: a GPD G�;� with parameters � 2 R and � > 0 has distribu-tion tailG�;�(x) = 8><>: �1 + � x��1=� if � 6= 0 ;e�x=� if � = 0 ; x 2 D(�; �) ;where D(�; �) = ( [0;1) if � � 0 ;[0;��=�] if � < 0 :Theorem 3.4.13(b) gives a limit result for F u(y), namelylimu"xF sup0<x<xF�u jF u(x)�G�;�(u)(x)j = 0 ;

354 6. Statistical Methods for Extremal Eventsfor an appropriate positive function �. Based on this result, for u large, thefollowing approximation suggests itself:Fu(y) � G�;�(u)(y) : (6.43)It is important to note that � is a function of the threshold u. In practice, uwill have to be taken su�ciently large. Given such a u, � and � = �(u) areestimated from the excess data, so that the resulting estimates depend on u;see our discussion below.Relation (6.42) then suggests a method for estimating the far end tail ofF by estimating F u(y) and F (u) separately. A natural estimator for F (u) isgiven by the empirical df�F (u)�b= Fn(u) = 1n nXi=1 IfXi>ug = Nun :On the other hand, the generalised Pareto approximation (6.43) (rememberthat u is large!) motivates an estimator of the form(F u(y))b= Gb�;b� (y) (6.44)for appropriate b� = b�Nu and b� = b�Nu .A resulting estimator for the tail F (u + y) for y > 0 then takes on theform (F (u+ y))b= Nun �1 + b� yb��1�b� : (6.45)In the Fr�echet and Gumbel case (� � 0), the domain restriction in (6.45) isy � 0, clearly stressing that we estimate F in the upper tail. An estimator ofthe quantile xp results immediately:bxp = u+ b�b� 0@� nNu (1� p)��b� � 11A : (6.46)Furthermore, for b� < 0 an estimator of the right endpoint xF of F is givenby bxF = u� b�b� :The latter is obtained by putting bxF = bx1 (i.e. p = 1) in (6.46). In Section6.4.1 we said that the method of exceedances belongs to the realm of pointprocess theory. From (6.45) and (6.46) this is clear: statistical properties ofthe resulting estimators crucially depend on the distributional properties of

6.5 Fitting Excesses over a Threshold 355the point process of exceedances (Nu); see for instance Example 5.1.3 andTheorem 5.3.2, and also the Notes and Comments below.The above method is intuitively appealing. It goes back to hydrologists.Over the last 25 years they have developed this estimation procedure underthe acronym of Peaks Over Threshold (POT ) method. In order to work outthe relevant estimators the following input is needed:{ reliable models for the point process of exceedances,{ a su�ciently high threshold u,{ estimators b� and b�,{ and, if necessary, an estimator b� for location.If one wants to choose an optimal threshold u one faces similar problems asfor the choice of the number k of upper order statistics for the Hill estimator.A value of u too high results in too few exceedances and consequently highvariance estimators. For u too small estimators become biased. Theoretically,it is possible to choose u asymptotically optimal by a quanti�cation of a biasversus variance trade{o�, very much in the same spirit as discussed in The-orem 6.4.9. In reality however, the same problems as already encountered forother tail estimators before do occur. We refer to the examples in Section6.5.2 for illustrations on this.One method which is of immediate use in practice is based on the linearityof the mean excess function e(u) for the GPD. From Theorem 3.4.13(e) weknow that for a rv X with df G�;� ,e(u) = E(X � u j X > u) = � + �u1� � ; u 2 D(�; �) ; � < 1 ;hence e(u) is linear. Recall from (6.6) that the empirical mean excess functionof a given sample X1; : : : ; Xn is de�ned byen(u) = 1Nu Xi2�n(u) (Xi � u) ; u > 0 ;where as before Nu = cardfi : i = 1; : : : ; n;Xi > ug = card�n(u). The re-mark above now suggests a graphical approach for chosing u:choose u > 0 such that en(x) is approximately linear for x � u.The key di�culty of course lies in the interpretation of approximately. Onlypractice can tell! One often observes a change in the slope of en(u) for somevalue of u. Referring to some examples on sulphate and nitrate level in anacid rain study Smith [589], p. 460, says the following:The general form of these mean excess plots is not atypical of real data, espe-cially the change of slope near 100 in both plots. Smith [582] observed similar


k

15 82 150 218 286 354 422 490 558 626 694 762 830 898 966

0.0

0.5

1.0

1.5

2.62e+02 1.37e+01 4.58e+00 2.17e+00 1.09e+00 5.49e-01 2.31e-01 4.12e-02

u

k

15 82 150 218 286 354 422 490 558 626 694 762 830 898 9660.5

1.0

1.5

2.0

2.5

3340.0 389.0 164.0 98.7 69.4 49.2 40.2 34.0 28.8 24.6

Figure 6.5.2 A \horror plot" for the MLE of the shape parameter � of a GPD.The simulated data come from a df given by F (x) = 1=(x lnx): Left: sample size1 000. Right: sample size 10 000. The upper horizontal axis indicates the thresholdvalue u, the lower one the corresponding number k of exceedances of u. As for Hillestimation (see Figure 4.1.13) MLE also becomes questionable for such perturbedPareto tails.behaviour in data on extreme insurance claims, and Davison and Smith [163]used a similar plot to identify a change in the distribution of the thresholdform of the River Nidd data. Such plots therefore appear to be an extremelyuseful diagnostic in this form of analysis.The reader should never expect a unique choice of u to appear. We recom-mend using plots, to reinforce judgement and common sense and compareresulting estimates across a variety of u{values. In applications we often pre-fer plots indicating the threshold value u, as well as the number of exceedancesused for the estimation, on the horizontal axes; the estimated value of theparameter or the quantile, say, is plotted on the vertical one. The latter isillustrated in Section 6.5.2. As can be seen from the examples, and indeedcan be proved, all these plots exhibit the same behaviour as the Hill{ andPickands{plots before: high variability for u large (few observations) versusbias for u small (many observations, but at the same time the approximation(6.43) may not be applicable).Concerning estimators for � and �, various methods similar to those dis-cussed in Section 6.4.2 exist.Maximum Likelihood EstimationThe following results are to be found in Smith [585].

6.5 Fitting Excesses over a Threshold 357Recall that our original dataX = (X1; : : : ; Xn) are iid with common df F .Assume F is GPD with parameters � and �, thus the density f becomesf(x) = �� 1 + � x�� 1��1 ; x 2 D(�; �) :The log{likelihood function equals`((�; �);X) = �n ln� ��1� + 1� nXi=1 ln�1 + �� Xi� :Notice that the arguments of the above function have to satisfy the domainrestriction Xi 2 D(�; �). For notational convenience, we have dropped thatpart from the likelihood function. Recall that D(�; �) = [0;1) for � � 0.Now likelihood equations can be derived and solved numerically yielding theMLE b�n, b�n. This method works �ne if � > �1=2, in the latter case one canshow that n1=2 b�n � � ; b�n� � 1! d�! N(0;M�1) ; n!1 ;where M�1 = (1 + �)�1 + �1 12�and N(�;�) stands for the bivariate normal distribution with mean vector �and covariance matrix �. The usual MLE properties like consistency andasymptotic e�ciency hold.Because of (6.43), it is more realistic to assume a GPD for the ex-cesses Y1; : : : ; YN , where N = Nu is independent of the Yi. The resultingconditional likelihood equations can be solved best via a reparametrisation(�; �)! (�; �), where � = ��=�. This leads to the solutionb� = b�(�) = N�1 NXi=1 ln (1� �Yi) ;where � satis�esh(�) = 1� + 1N 1b�(�) + 1! NXi=1 Yi1� �Yi = 0 :The function h(�), de�ned for � 2 (�1;max(Y1; : : : ; YN )), is continuous at 0.Letting u = un !1, Smith [585] derives various limit results for the distri-bution of (b�N ; b�N). As in the case of the Hill estimator (see Theorem 6.4.9), anasymptotic bias may enter. The latter again crucially depends on a second{order condition for F .

358 6. Statistical Methods for Extremal EventsMethod of Probability{Weighted MomentsSimilarly to our discussion in Section 6.3.2, Hosking and Wallis [333] alsoworked out a probability{weighted moment approach for the GPD. This isbased on the quantitites (see Theorem 3.4.13(a))wr = EZ �G�;�(Z)�r = �(r + 1)(r + 1� �) ; r = 0; 1 ;where Z has GPD G�;� . We immediately obtain� = 2w0w1w0 � 2w1 and � = 2� w0w0 � 2w1 :If we now replace w0 and w1 by empirical moment estimators, one obtains theprobability{weighted moment estimators b� and b�. Hosking and Wallis [333]give formulae for the approximate standard errors of these estimators. Theycompare their approach to the MLE approach and come to the conclusionthat in the case � � 0 the method of probability{weighted moments o�ersa viable alternative. However, as we have already stressed in the case of aGEV, maximum likelihood methodology allows us to �t much more generalmodels including time dependence of the parameters and the in uence ofexplanatory variables.6.5.2 An Application to Real DataIn the above discussion we have outlined the basic principles behind the GPD�tting programme. Turning to the practical applications, two main issuesneed to be addressed:(a) �t the conditional df Fu(x) for an appropriate range of x{ (and indeed u{)values;(b) �t the unconditional df F (x), again for appropriate x{values.Though formulae (6.44) and (6.45) in principle solve the problem, in practicecare has to be taken about the precise range of the data available and/or theinterval over which we want to �t. In our examples, we have used a set{upwhich is motivated mainly by insurance applications.Take for instance the Danish �re insurance data. Looking at the ME{plotin Figure 6.2.11 we see that the data are clearly heavy{tailed. In order toestimate the shape parameter � a choice of the threshold u (equivalently,of the number k of exceedances) has to be made. In the light of the abovediscussion concerning the use of ME{plots at this stage, we suggest a �rstchoice of u = 10 resulting in 109 exceedances. This means we choose u from a

6.5 Fitting Excesses over a Threshold 359region above which the ME{plot is roughly linear. An alternative choice wouldperhaps be in the range u � 18. Figure 6.5.5 (top, left) gives the resultingestimates of � as a function of u (upper horizontal axis) and of the numberk of exceedances of u (lower horizontal axis): the resulting plot is relativelystable with estimated values mainly in the range (0:4; 0:6). Compare this plotwith the Hill{plot in Figure 6.4.3. For u = 10 maximum likelihood estimatesb� = 0:497 (s.e.= 0:143) and b� = 6:98 result. A change to u = 18 yieldsb� = 0:735 (s.e.= 0:253) based on k = 47 exceedances.From these estimates, using (6.44), an estimate for the (conditional) excessdf Fu(x) can be plotted. Following standard practice in reinsurance, in Figure6.5.5 (top, right) we plot the shifted df Fu(x� u), x � u. In the language ofreinsurance the latter procedure estimates the probability that a claim liesin a given interval, given that the claim has indeed pierced the level u = 10.Though the above estimation (once u = 10 is chosen) only uses the 109largest claims, a crucial question still concerns where the layer (u = 10 andabove) is to be positioned in the total portfolio; i.e. we also want to estimatethe tail of the unconditional df F which yields information on the frequencywith which a given high level u is pierced. At this point we need the fulldata{set and turn to formula (6.45). A straightforward calculation allows usto express bF (z) as a three{parameter GPD:bF (z) = 1��1 + b� z � u� b�b�0 ��1=b� ; z � u ; (6.47)where b� = b�b� 0@�Nun �b� � 11A and b�0 = b��Nun �b� :We would like to stress that the above df is designed only to �t the datawell above the threshold u. Below u, where the data are typically abundant,various standard techniques can be used; for instance the empirical df. Bycombining both, GPD above u and empirical below u, a good overall �t canbe obtained. There are of course various possibilities to �ne{tune such aconstruction.Finally, using the above �t to F (z), we can give estimates for the p{quantiles, p � F (u). In Figure 6.5.5 (bottom) we have summarised the 0:99{quantile estimates obtained by the above method across a wide range ofu{values (upper horizontal axis), i.e. for each u{value a new model was �ttedand x0:99 estimated. Alternatively, the number of exccedances of u is indicatedon the lower horizontal axis. For these data a rather stable picture emerges.A value in the range (25,26) follows. Con�dence intervals can be calculated.


k

15 48 81 115 148 182 215 249 282 316 349 382 416 449 483

0.0

0.5

1.0

1.5

0.0677 0.0412 0.0335 0.0298 0.0274 0.0249 0.0228 0.0211 0.0198 0.0188

••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••• ••

••• ••

••••••••• ••

••••••••

• • • • ••••• • •

x0.05 0.10

0.0

0.2

0.4

0.6

0.8

1.0

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• •••••••••• •• ••••• • •

0.0 0.05 0.10 0.15 0.20 0.25 0.30

0.00.0

50.1

00.1

50.2

0

x

u

k

15 48 81 115 148 182 215 249 282 316 349 382 416 449 483

0.054

0.056

0.058

0.060

0.0661 0.0402 0.0333 0.0276 0.0249 0.0227 0.0209 0.0198 0.0187 0.0178

Figure 6.5.3 The positive BMW log{returns from Figure 6.2.10. Top, left: MLE of� as a function of u and k with asymptotic 95% con�dence band. Top, right: GPD{�tto Fu(x�u), x � u, on log{scale. Middle: GPD tail{�t for F (x+u), x � 0. Bottom:estimates of the 95%{quantile as a function of the threshold u (upper horizontal axis)and of the corresponding number k of the upper order statistics (lower horizontalaxis). A GPD with parameters � = 0:24, � = 0:013 is �tted, corresponding tou = 0:0355 and k = 100; i.e. the distribution almost has an in�nite 4th moment.


k

15 48 81 115 148 182 215 249 282 316 349 382 416 449 483

0.0

0.5

1.0

1.5

0.0661 0.0402 0.0333 0.0276 0.0249 0.0227 0.0209 0.0198 0.0187 0.0178

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• ••

••••• ••

••••• • ••

• • ••• • •

x0.05 0.10

0.0

0.2

0.4

0.6

0.8

1.0

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• ••••••• •••• • ••• • •

0.0 0.05 0.10 0.15 0.20 0.25 0.30

0.00.0

50.1

00.1

50.2

0

x

u

k

15 48 81 115 148 182 215 249 282 316 349 382 416 449 483

0.056

0.058

0.060

0.062

0.0677 0.0412 0.0335 0.0298 0.0274 0.0249 0.0228 0.0211 0.0198 0.0188

Figure 6.5.4 The absolute values of the negative BMW log{returns from Figure6.2.10. Top, left: MLE of � as a function of u and k with asymptotic 95% con�denceband. Top, right: GPD{�t to Fu(x � u), x � u, on log{scale. Middle: GPD tail{�t for F (x + u), x � 0. Bottom: estimates of the 95%{quantile as a function ofthe threshold u (upper horizontal axis) and of the corresponding number k of theupper order statistics (lower horizontal axis). A GPD with parameters � = 0:343,� = 0:011 is �tted, corresponding to u = 0:0345 and k = 100; i.e. the distributionhas an in�nite 3rd moment. As mentioned in the discussion of Figure 6.2.10, theleft tail of the distribution appears to be heavier than the right one.


k

15 48 81 115 148 182 215 249 282 316 349 382 416 449 483

0.0

0.5

1.0

1.5

31.10 14.30 9.23 6.67 5.57 4.87 4.37 3.99 3.67 3.30

•••••••••••••••••••••••••••••••••••••••••••••••••••

•••••••••

•••••••••••••••

••••••

• ••••••

••••••

• •• • • • •• • •• • • • •

x10 50 100

0.0

0.2

0.4

0.6

0.8

1.0

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• • • ••• •• • • • •

0 100 200 300

0.00.0

50.1

00.1

50.2

0

x

u

k

15 48 81 115 148 182 215 249 282 316 349 382 416 449 483

2223

2425

26

31.10 14.30 9.23 6.67 5.57 4.87 4.37 3.99 3.67 3.30

Figure 6.5.5 The Danish �re insurance data; see Figure 6.2.11. Top, left: MLE forthe shape parameter � of the GPD. The upper horizontal axis indicates the thresholdu, the lower one the number k of exceedances/upper order statistics involved in theestimation. Top, right: �t of the shifted excess df Fu(x � u), x � u, on log{scale.Middle: GPD tail{�t for F (x+u), x � 0. Bottom: estimates of the 0:99{quantile asa function of u (upper horizontal axis) and k (lower horizontal axis). A GPD withparameters � = 0:497 and � = 6:98 is �tted, corresponding to k = 109 exceedancesof u = 10. Compare also with Figure 6.4.3 for the corresponding Hill{�t.


k

15 48 81 115 148 182 215 249 282 316 349 382 416 449 483

0.5

1.0

1.5

2.0

2.5

3.0

3.5

695.0 224.0 141.0 88.6 68.9 51.1 42.9 36.5 30.3 27.0

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••

•••••••••

•••• •••

•••• •••

• ••••• •••

•• •••••••••• • •• • • •• • • • • •

x100 500 1000 5000 10000

0.0

0.2

0.4

0.6

0.8

1.0

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• • • ••• • • • •

0 5000 10000 15000

0.00.0

50.1

00.1

50.2

00.2

50.3

0

x

u

k

15 48 81 115 148 182 215 249 282 316 349 382 416 449 483

200300

400500

600

695.0 224.0 141.0 88.6 68.9 51.1 42.9 36.5 30.3 27.0

Figure 6.5.6 The industrial �re insurance data; see Figure 6.2.12. Top, left: MLEfor the shape parameter � of the GPD. The upper horizontal axis indicates thethreshold u, the lower one the number k of exceedances/upper order statistics in-volved in the estimation. Top, right: �t of the shifted excess df Fu(x � u), x � u,on log{scale. Middle: GPD tail{�t for F (x + u), x � 0. Bottom: estimates of the0:99{quantile as a function of u (upper horizontal axis) and k (lower horizontalaxis). A GPD with parameters � = 0:747 and � = 48:1 is �tted, corresponding tok = 149 exceedances of u = 100. Compare also with Figure 6.4.4 for the correspond-ing Hill{�t.

364 6. Statistical Methods for Extremal EventsThe software needed to do these, and further analyses are discussed in theNotes and Comments below.Figure 6.5.6 for the industrial �re data (see Figure 6.2.12) and Figure6.5.3 for the BMW share prices (see Figure 6.2.10) can be interpreted in asimilar way.Mission Improbable: How to Predict the UnpredictableOn studying the above data analyses, the reader may have wondered whywe restricted our plots to x0:99 for the Danish and industrial insurance data,and x0:95 for the BMW data. In answering this question, we restrict atten-tion to the insurance data. At various stages throughout the text we hintedat the fact that extreme value theory (EVT) o�ers methodology allowing forextrapolation outside the range of the available data. The reason why we arevery reluctant to produce plots for high quantiles like 0.999 or even 0.9999, isthat we feel that such estimates are to be treated with extreme care. RecallRichard Smith's statement from the Preface: "There is always going to be anelement of doubt, as one is extrapolating into areas one doesn't know about.But what EVT is doing is making the best use of whatever data you haveabout extreme phenomena." Both �re insurance data{sets have informationon extremes, and indeed EVT has produced models which make best use ofwhatever data we had at our disposal. Using these models, estimates for thep{quantiles xp for every p 2 (0; 1) can be given. The statistical reliability ofthese estimates becomes, as we have seen, very di�cult to judge in general.Though we can work out approximate con�dence intervals for these estima-tors, such constructions strongly rely on mathematical assumptions whichare unveri�able in practice.In Figures 6.5.7 and 6.5.8 we have reproduced the GPD estimates forx0:999 and x0:9999 for both the Danish and the industrial �re data. Theseplots should be interpreted with the above quote from Smith in mind. Forinstance, for the Danish �re insurance data we see that the estimate of about25 for x0:99 jumps at 90 for x0:999 and at around 300 for x0:9999. Likewise forthe industrial �re, we get an increase from around 190 for x0:99 to about 1 400for x0:999 and 10 000 for x0:9999. These model{based estimates could form thebasis for a detailed discussion with the actuary/underwriter/broker/clientresponsible for these data. One can use them to calculate so-called techni-cal premiums, which are to be interpreted as those premiums which we asstatisticians believe to most honestly re ect the information available fromthe data. Clearly many other factors have to enter at this stage of the dis-cussion. We already stressed before that in dealing with high layers/extremesone should always consider total exposure as an alternative. Economic con-


k

15 48 81 115 148 182 215 249 282 316 349 382 416 449 483

90100

110120

13031.10 14.30 9.23 6.67 5.57 4.87 4.37 3.99 3.67 3.30

u

k

15 48 81 115 148 182 215 249 282 316 349 382 416 449 483

300400

500600

700

31.10 14.30 9.23 6.67 5.57 4.87 4.37 3.99 3.67 3.30

Figure 6.5.7 GPD{model based estimates of the 0:999{quantile (top) and 0:9999{quantile (bottom) for the Danish �re insurance data. WARNING: for the inter-pretation of these plots, read "Mission improbable" on p. 364.u

k

15 48 81 115 148 182 215 249 282 316 349 382 416 449 483

1000

1100

1200

1300

1400

1500

1600

695.0 224.0 141.0 88.6 68.9 51.1 42.9 36.5 30.3 27.0

u

k

15 48 81 115 148 182 215 249 282 316 349 382 416 449 483

10000

20000

30000

40000

695.0 224.0 141.0 88.6 68.9 51.1 42.9 36.5 30.3 27.0

Figure 6.5.8 GPD{model based estimates of the 0:999{quantile (top) and 0:9999{quantile (bottom) for the industrial �re data. WARNING: for the interpretationof these plots, read "Mission improbable" on p. 364.

366 6. Statistical Methods for Extremal Eventssiderations, management strategy, market forces will enter so that by usingall these inputs we are able to come up with a premium acceptable both forthe insurer as well as the insured. Finally, once the EVT model{machinery(GPD for instance) is put into place, it o�ers an ideal platform for simu-lation experiments and stress{scenarios. For instance, questions about thein uence of single or few observations and model{robustness can be anal-ysed in a straightforward way. Though we have restricted ourselves to a moredetailed discussion for the examples from insurance, similar remarks applyto �nancial or indeed any other kind of data where extremal events play animportant role.Notes and CommentsThe POT method has been used by hydrologists for more than 25 years. Ithas also been suggested for dealing with large claims in insurance; see forinstance Kremer [405], Reiss [520] and Teugels [612, 613]. It may be viewedas an alternative approach to the more classical GEV �tting.In the present section, we gave a brief heuristic introduction to the POT.The practical use of the GPD in extreme value modelling is best to belearnt from the fundamental papers by Smith [588], Davison [162], Davi-son and Smith [163], North [477] and the references therein. Falk [221] usesthe POT method for estimating �. Its theoretical foundation was alreadylaid by Pickands [493] and developed further for instance by Smith [585] andLeadbetter [413]. The statistical estimation of the parameters of the GPD isalso studied in Tajvidi [605].The POT model is usually formulated as follows:(a) the excesses of an iid (or stationary) sequence over a high threshold uoccur at the times of a Poisson processes;(b) the corresponding excesses over u are independent and have a GPD;(c) excesses and exceedance times are independent of each other.Here one basically looks at a space{time problem: excess sizes and exceedancetimes. Therefore it is natural to model this problem in a two{dimensionalpoint process or in a marked point process setting; see Falk et al. [222], Sec-tion 2.3 and Section 10.3, and Leadbetter [413] for the necessary theoreticalbackground. There also the stationary non{iid case is treated. Using thesetools one can justify the above assumptions on the excesses and exceedancetimes in an asymptotic sense. A partial justi�cation is to be found in Sec-tion 5.3.1 (weak convergence of the point process of exceedances to a Poissonlimit) and in Theorem 3.4.13(b) (GPD approximation of the excess df).

6.5 Fitting Excesses over a Threshold 367The POT method allows for �tting GPD models with time{dependentparameters �(t), �(t) and �(t), in particular one may include non{stationaritye�ects (trends, seasonality) into the model; see for instance Smith [588]. Theseare further attractive aspects of GPD �tting.Example 6.5.9 (Diagnostic tools for checking the assumptions of the POTmethod)In Figures 6.5.10 and 6.5.11 we consider some diagnostic tools (suggested bySmith and Shively [591]) for checking the Poisson process assumption for theexceedance times in the POT model. Figure 6.5.10 (top) shows the excessesover u = 10 by the Danish �re insurance claims; see Figure 6.2.11. The leftmiddle �gure is a plot of the �rst sample autocorrelations of the excesses,indicating that the data are independent. In the right middle �gure the cor-responding inter{arrival times of the exceedances appear. If these times camefrom a homogeneous Poisson process they should be iid exponential; see Ex-ample 2.5.2. The (Lowess smoothed) curve in the �gure indicates possibledeviations from the iid assumptions; it is basically a smoothed mean value ofthe data and estimates the reciprocal of the intensity of the Poisson process.The curve is almost a straight line, parallel to the horizontal axis. In theleft bottom �gure a QQ{plot of the inter{arrival times versus exponentialquantiles is given. The exponential �t is quite convincing. The sample auto-correlations of the inter{arrival times (bottom, right) indicate that they areindeed uncorrelated.The picture changes for the absolute values of the negative log{returns ofthe BMW share prices (see Figure 6.2.10). In Figure 6.5.11 the excesses overu = 0:0344 are given. Their sample autocorrelations are close to zero (middle,left). However, the inter{arrival times of the exceedance times (middle, right)have a tendency to form clusters of large/small values. This does not seemto in uence the constancy of their smoothed mean value curve. The corre-sponding QQ{plot (bottom, left) against exponential quantiles shows a cleardeviation from a straight line. The sample autocorrelation at the �rst lagindicates dependence of inter{arrival times (bottom, right). A similar pictureemerges for the positive log{returns. �Concerning software: most of the analyses done in this and other chaptersmay be performed in any statistical software environment; pick your favourite.We have mainly used S{Plus. An introduction to the latter package is forinstance to be found in Spector [597]. Venables and Ripley [622] give a niceintroduction to modern applied statistics with S{Plus. The programs usedfor the analyses of the present chapter were written by Alexander McNeiland can be obtained from http://www.math.ethz.ch/�mcneil/software.html.We thank Richard Smith for having made some code available forming


Time

050

100

150

200

250

260180 260182 260184 260186 260188 260190

Lag

AC

F

0 5 10 15 20

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

•

•

••

•

•

•

•

•

•••

•

•

•

•

•

•

•

•

•

•

•

••

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

••

•

•

•

•

••

•

•

•

•

•

•

•

•

•

•

•

•

••

•

•

••

•

•

•

••

•

•

•

•••

•

•

•

•

•

•

•••

•

•

•

•••

•

•

•

•

•

•

•

•

•

•

•

••

••

•

0 20 40 60 80 100

02

04

06

08

01

00

12

01

40

•••••••••••••••••••••••••••

• ••••••••••••••••• ••••••••

• •• •••••••••••••

•• •• •••• •• • • ••••• •• • •• • • • • • ••

• • ••

•••

•

•

•

0 20 40 60 80 100 120 140

01

23

45

Lag

AC

F

0 2 4 6 8 10

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

Figure 6.5.10 Top: the excesses over u = 10 of the Danish �re insurance data;see Figure 6.2.11. Middle, left: the sample autocorrelations of the excesses. Middle,right: the inter{arrival times of the exceedances and smoothed mean values curve.Bottom, left: QQ{plot of the the inter{arrival times against exponential quantiles.Bottom, right: sample autocorrelations of these times. See Example 6.5.9 for furthercomments.

6.5 Fitting Excesses over a Threshold 369

Time

0.05

0.10

0.15

0.20

0.25

05.02.73 05.02.77 05.02.81 05.02.85 05.02.89 05.02.93

Lag

AC

F

0 5 10 15 20

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

•

•

••••••••

•••

•

••

••••••

•

•

•

•

•

•

•

•

••

••

•

•

•

•

•

•

•

•

•

•••

•

•••

••••••••

•

•

••••

•

•

•

••••••••••••

•

•

••••••

•

•••

•

•

•

••

•

•

•

0 20 40 60 80 100

02

00

40

06

00

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••

•••••••••••

••• ••• •••••••

•• • • ••• • • • •

••

••

•

•

•

0 200 400 600

01

23

45

Lag

AC

F

0 2 4 6 8 10

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

Figure 6.5.11 Top: the excesses over u = 0:0344 of the absolute values of thenegative BMW log{returns; see Figure 6.2.10. Middle, left: the sample autocorre-lations of the excesses. Middle, right: the inter{arrival times of the exceedancesand smoothed mean values curve. Bottom, left: QQ{plot of the inter{arrival timesagainst exponential quantiles. Bottom, right: sample autocorrelations of these times.See Example 6.5.9 for further comments.

370 6. Statistical Methods for Extremal Eventsthe basis of the above programs. Further S-Plus programs providing con-�dence intervals for parameters in the GPD have been made available byNader Tajvidi under http://www.math.chalmers.se/�nader/software.html.Various customised packages for extreme value �tting exist. Examples areXTREMES, which comes as part of Falk et al. [222], and ANEX [18].In the context of risk management, RiskMetrics [537] forms an interest-ing software environment in which various of the techniques discussed so far,especially concerning quantile (VaR) estimation, are to be found.

Documents

6 Statistical - kuweb.math.ku.dk/~mogens/springerchap6.pdf · Extremal Ev en ts for Insurance and Finance Decem b er 9, 1996 Springer-V erlag Berlin Heidelb erg New Y ork London P