Reiss and Thomas’ automatic selection of the number of extremes

Computational Statistics & Data Analysis 47 (2004) 689–704www.elsevier.com/locate/csda

Reiss and Thomas’ automatic selection of thenumber of extremes�

Cl*audia Nevesa , M.I. Fraga Alvesb;∗aUIMA, Department of Mathematics, University of Aveiro, Portugal

bCEAUL, DEIO, Faculty of Sciences, University of Lisbon, Lisbon, Portugal

Received 1 March 2002; received in revised form 1 November 2003

Abstract

Most widely used semi-parametric estimators of the extreme value parameter depend on thenumber of upper extremes which locate where the tail of a distribution begins. In the presenceof a random sample with 5nite size, the problem concerning the choice of the number of upperextremes is not easy to handle. This number k is not only governed by the sample size n, butalso ruled by parameters characterizing F . When the underlying distribution function is known,the optimum value k can be attained through the minimization of the asymptotic mean squarederror of the considered estimator. Nevertheless, the merit of such procedure may be compromisedby the assessment of k values equaling the sample size n. An alternative procedure entails thatthe adequate number k should be the value which minimizes a mean distance encapsulating apenalty term just as presented by Reiss and Thomas (Statistical Analysis of Extreme Values,Birkh>auser, Basel, 1997, P. 121). The performance evaluation of Reiss and Thomas’ heuristicprocedure will be carried out undertaking the asymptotically determined k values as a reference.c© 2003 Elsevier B.V. All rights reserved.

MSC: 62G05; 62G32

Keywords: Generalized extreme value distribution; Generalized Pareto distribution; Semi-parametricestimation; Mean squared error; Regular variation; Simulation

1. Introduction

The ordinal statistics X1:n6X2:n6 · · ·6Xn:n are the legacy of independent randomvariables X1; X2; : : : ; Xn, with common distribution function (d.f.) F , after arrangingthese in nondecreasing order.

� Research partially supported by FCT/POCTI/FEDER.∗ Corresponding author.E-mail addresses: [email protected] (C. Neves), [email protected] (M.I. Fraga Alves).

0167-9473/$ - see front matter c© 2003 Elsevier B.V. All rights reserved.doi:10.1016/j.csda.2003.11.011

mailto:[email protected]

mailto:[email protected]

690 C. Neves, M.I. Fraga Alves / Computational Statistics & Data Analysis 47 (2004) 689–704

Due to their nature, semi-parametric models, are never speci5ed in detail by hand.Instead, the only assumption made is that F ∈D(G�), i.e., for some index �∈R; F isin the domain of attraction of an extreme-value distribution G�

∃an¿0bn∈R :Fn(anx + bn) →

n→∞G�(x);

for all x, with

G�(x) :=

{exp(−(1 + �x)−1=�); 1 + �x¿ 0 if � �= 0;

exp(−exp(−x)); x∈R if � = 0;

the Generalized extreme value distribution (GEV), giving rise to the problem ofestimating the tail index �∈R (also known as regular variation parameter).

The following necessary and suHcient condition for F ∈D(G�) was established byde Haan (1984) (4rst order extended regular variation property):

limp→0

Q(px) − Q(p)a(p)

=x−� − 1

�;

for x¿ 0 and some positive function a, with Q(p) := (1−F)←(p) denoting the upperquantile function (q.f.) pertaining to F . In order to get the limit distributions of es-timators of �, some second-order conditions are needed, depending on a second-orderparameter �6 0 (see de Haan and Stadtm>uller, 1996, for details on second-order ex-tended regular variation).

Observe that the limiting function (x−� − 1)=� is the upper q.f. of the GeneralizedPareto (GP) distribution (see (7)). This fact reJects its exceptional role in extremevalue theory. Further results concerning the importance of the GP and GEV modelsand respective domains of attraction, in the context of extreme value approximations,may be found in Reiss (1989) or in Reiss and Thomas (1997).

The task of estimating the shape parameter � from empirical observations, corre-sponds to the establishment of inferences about the tail behavior of the underlyingdistribution, namely if it has a 5nite endpoint or not and if the tail of the density isdeclining exponentially fast or according to a power function.

In the above framework, we encourage the selection of the k largest observationsfrom a random sample which will, hopefully, characterize the situation being modelled,in contrast with the alternative possibility of making inferences by retaining observa-tions above a given high threshold u.

The most widely used semi-parametric estimators of the tail index (Hill, 1975;Pickands, 1975; Dekkers et al., 1989, among others) depend precisely on the numberof upper extremes which locate the tail of F . Provided a random sample of 5nite sizen, the non-trivial problem relating the choice of the number of upper extremes number,in a way that these might present a satisfactory picture of the tail, has encounteredvarious approaches. It is rather frequent for methodologies dedicated to k selection tocomprise theoretical results which, in turn, include speci5cation of dependencies be-tween k and n allowing, however, the 5rst to increase with the latter in a controlledmanner. In general, from a random sample, we will be able to draw the k highest

C. Neves, M.I. Fraga Alves / Computational Statistics & Data Analysis 47 (2004) 689–704 691

observations representing only a tiny fraction of the total sample, i.e., k = k(n) → ∞such that (k(n))=n → 0 as n → ∞. Several studies (Hall, 1982; Dekkers and de Haan,1993; de Haan and Peng, 1998) concentrate on choosing k to minimize the asymptoticmean squared error of the adopted semi-parametric estimator. More recently, adaptivemodi5ed procedures for automated selection of the optimal sample fraction, such asbootstrap methods (Danielsson et al., 1997; Draisma et al., 1999; Gomes and Oliveira,2001), sequential procedures (Drees and Kaufmann, 1998), direct estimation of theasymptotic mean squared error (Beirlant et al., 1999) or other adaptive estimator basedprocedures (Drees et al., 2000), brought an improvement in practicality. Naively, thegreat majority of these methods incorporate the required search in a framework wheresequences k(n) do not depend exclusively on n and �, but are also given in termsof the second-order regular variation parameter �6 0, meaning that k = k(n; �; �).Nevertheless, the kind of procedures just described should be taken into considerationat any particular instance, speci5cally when obtaining k through the minimization ofthe asymptotic mean squared error inherited by the adopted � estimator.

On the other hand, Reiss and Thomas’ method pursues the same objective of se-lecting k by means of a whole diLerent perspective. Such procedure seeks the valuek in complete ignorance about the distribution F yielding as appropriate k the valuek∗ which minimizes a mean distance encapsulating a penalty term. In a certain sense,this weigh coeHcient is intended to be more severe with respect to � estimates withorigin in observations taken further away from the actual tail.

Without diminishing the practical importance of traditional methods but motivatedinstead by the lack of studies relating the most convenient choice for the penalty term,we revisit here Reiss and Thomas’s heuristic method with the ultimate aim of presentinga detailed study of its performance. Our main purpose is, therefore, to present the mostrelevant results obtained on the light of a simulation study.

The outline of this paper is as follows. In Section 2, we de5ne some semi-parametricestimators, often appearing in the literature, indicating only roughly their familiar prop-erties of consistency, asymptotic normality and asymptotic eHciency. In Section 3, werestrict our attention to the optimal sequences, i.e., sequences which asymptoticallyminimize the asymptotic mean squared error of the referred semi-parametric estima-tors. In Section 4, we draw conclusions concerning an adequate speci5cation of theweigh coeHcient. The performance of the automatic choice procedure is evaluated interms of the deviations produced by � estimates obtained by means of both heuristicand asymptotic methodologies of picking k. Finally, in Section 5 we will pay specialattention to the analysis of the procedure’s behavior for diLerent kinds of tails exhib-ited by distinct underlying distributions belonging to the GEV domain of attraction. Apractical illustration of the performance of Reiss and Thomas’ method (RT-method)puts an end to this section.

2. Semi-parametric estimators

The present section will be dedicated to the de5nition of some frequently usedsemi-parametric estimators with their asymptotic properties being narrowly speci5ed.


Assume that x∞ := sup{x: F(x)¡ 1}¿ 0, and consider the following estimatorsbased on the k = m + 1 upper order statistics:

M ( j)m;n :=

1m

m−1∑i=0

(log Xn−i:n − log Xn−m:n)j; j = 1; 2:

For �¿ 0, or equivalently, in the presence of heavy tails, Hill’s estimator (Hill, 1975)is given by

�̂Hm;n := M (1)

m;n =1m

m−1∑i=0

logXn−i:nXn−m:n

: (1)

Mason (1982) proved its weak consistency for intermediate sequences m(n) such that(m(n))=n → 0 as n → ∞. Under certain second-order conditions, Hill’s estimatorasymptotic normality was shown by Davis and Resnick (1984), H>aeusler and Teugels(1985), Goldie and Smith (1987) and Dekkers et al. (1989), namely

√m(�̂H

m;n − �) ∼ N(0; �2):

For general �, Pickands (1975) de5ned a location-invariant estimator as follows:

�̂Pm;n :=

1log 2

logXn−m+1:n − Xn−2m+1:n

Xn−2m+1:n − Xn−4m+1:n(2)

and proved its weak consistency. Dekkers and de Haan (1989) showed its asymptoticnormality

√m(�̂P

m;n − �) ∼ N(0; �2(�)); where �2(�) =

�2(22�+1 + 1)

((2� − 1) log 2)2 ; � �= 0;

3

log4 2; � = 0:

Still for general �, Dekkers et al. (1989) de5ned as the Moment estimator

�̂Mm;n := M (1)

n + 1 − 12{1 − (M (1)

n )2=M (2)n }−1: (3)

Under extra conditions imposed upon the tail of F , the same authors concluded aboutthe asymptotic normality of the Moment estimator in the presence of an intermediatesequence m(n) verifying (m(n))=n → 0 as n → ∞

√m(�̂M

m;n − �) ∼ N(0; �2(�));

with

�2(�) =

1 + �2; �¿ 0;

(1 − �)2(1 − 2�)[4 − 8

1 − 2�1 − 3�

+(5 − 11�)(1 − 2�)(1 − 3�)(1 − 4�)

]; �¡ 0:

More recently, Fraga Alves (2001), has introduced a Hill-type estimator which accu-mulates the location-invariance property, namely,

�̂HInvm0 ;m;n =

1m0

m0−1∑i=0

logXn−i:n − Xn−m:n

Xn−m0:n − Xn−m:n(4)


and proved its weak consistency; assuming, additionally, that the tail of F veri5es asecond-order regularly varying condition, the asymptotic normality of this estimator isachieved

√m(�̂HInv

m0 ;m;n − �) ∼ N(0; �2):

On the other hand, Resnick and StOaricOa (1997, 1998) have suggested alternative ver-sions of Hill and Moment estimators. By means of a smoothing technique have de5neda Smooth Hill estimator,

av�̂Hm;n =

1(u − 1)m

um∑p=m+1

�̂Hp;n; (5)

where u∈N and u¿ 1. Usually u may be chosen between n0:1 and n0:2. Resnick andStOaricOa (1997) also veri5ed that the smoothing technique allows a variance reductionbut is still quite innocuous towards the bias. The Smooth Moment estimator is givenby

av�̂Mm;n =

1(1 − s)m

m∑p=�ms�+1

�̂Mp;n (6)

with s∈ (0; 1) usually taken between 0:3 and 0:5.The estimates process associated with these new versions, av�̂k;n, has revealed to be

less sensitive to small changes in k because it frequently exhibits a relatively stabletrajectory in the proximity of the true � value. In contrast, Pickands’ estimator pred-icates high volatility of the sample paths as a function of k. In this light, choosinga value of k for this estimator appears useless and we lay Pickands’ estimator aside.Notwithstanding, in Neves and Fraga Alves (2003) we go further in the present studyand include results edi5ed by Draisma et al. (1999), which allow to 5nd asymptoticallyoptimal sequences using Pickands’ estimator, along with results concerning RT-method.

3. Optimal choice of the sample fraction in tail index estimation

If the d.f. F is known, the number k can be obtained, at a glance, when undertakingthe optimality criterium in the sense of minimizing the asymptotic mean squared error,MSE∞(�̂k;n). In general, under this perspective, the de5nition of optimal k, kopt, stemsfrom an approximation to the estimator’s asymptotic distribution using a 5nite samplesize n. The upper sample fraction kopt=n is thus the result of the minimization of themean squared error.

The task of choosing kopt is addressed here by attempting to estimate the tail index�, provided random samples of size n from the GP(�), distribution

W�(x) = 1 + logG�(x); −16 logG�(x)6 0 (7)

and from the GEV(�), distribution. We take kopt(n) =mopt(n) + 1 as our optimal se-quence although for the most part it is suHcient for us to work directly with mopt(n).


3.1. Results for the Hill’s estimator

With respect to GP(�) model with �¿ 0 (� = −�), following de Haan and Peng(1998) procedure, the optimal sequence, i.e., the sequence which asymptotically mini-mizes the Hill’s estimator asymptotic mean squared error, is given straightforward:

mopt(n) ∼{

(1 + �)2

2�

}1=(2�+1)

n2�=(2�+1); MSE∞(�̂Hmopt ;n) =

�(1 + 2�)2mopt

:

While for the GEV(�) distribution with �¿ 0 (� = max(−1;−�)), de Haan and Peng(1998) conditions, yield an optimal number of upper extremes just as presented belowin a companion mean squared error

mopt(n) ∼

{(1 + �)2

2�

}1=(2�+1)

n2�=(2�+1); 0¡�¡ 1;

2{n

3

}2=3; � = 1;

2n2=3; �¿ 1;

MSE∞(�̂Hmopt ;n) =

�(1 + 2�)2mopt

; 0¡�¡ 1;

32mopt

; � = 1;

3�2

2mopt; �¿ 1:

3.2. Results for the Moment estimator

Supported on the procedure by Dekkers and de Haan (1993) and on the basis ofthe GP(�) distribution, (� = −|�|; � �= 0), the optimal number of highest observationsto use in tail index estimation and corresponding asymptotic mean squared error aregiven by

mopt(n) ∼

{545

}1=5

n4=5; � = −1;

{ −2�5(� + 1)2

(1 − �)2(1 − 3�)2�2(�)

}1=(2�+1)

n2�=(2�−1); �¡ 0; � �= −1;

(log n − logmopt(n))5

100; � = 0;{

(1 + �2)(1 + �)4

2�5

}1=(2�+1)

n2�=(2�+1); �¿ 0;


Table 1Turning points nt for the Moment estimator in GP(�) distribution

� −1:4 −1:3 −1:2 −1:1 −1:0 −0:9 −0:8 −0:7 −0:6nt 818 1483 3440 14401 11 16802 4755 2510 1796

� −0:5 −0:4 −0:3 −0:2 −0:1 0.1 0.2 0.3 0.4nt 1621 1874 3051 9073 116819 73938 3371 642 219

MSE∞(�̂Mmopt ;n) =

1mopt

(�2(−1) +

65

); � = −1;

�2(�)(1 − 2�)−2�mopt

; �¡ 0; � �= −1;

25(log n − logmopt)2 +

1mopt

; � = 0;

(1 + �2)(1 + 2�)2�mopt

; �¿ 0:

It should be noticed that the practical utility of similar results is somewhat contro-versial. In fact, the remaining idea is that optimal sequences kopt = kopt(n; �; �) are,eLectively, of those speci5c orders. For 5nite n, such approximations may result inlack of applicability since we may be confronted with kopt values laying outside thesample. Despite that fact, the problem we principally address here takes precisely theform of 5nding kopt with respect to the function s←, de5ned as the generalized inverseof s appearing in Dekkers and de Haan (1993) results concerning � �= 0. An illustrativeexample of this situation stems from the fact that s is a decreasing function. Hence,relations (8) and (9) hold

kn6 1 ⇔ n¿

�2(1 − �)2

s(1)if �¿ 0; (8)

kn6 1 ⇔ n¿ s(1) if �¡ 0: (9)

Table 1 brings a practical interpretation of relations (8) and (9). Table 1 reports, withrespect to some values of �, the sample size n = nt needed to achieve kopt ¡n.

For � ranging from −1:4 to −0:1 with step 0.1 (notation: �= −1:4(0:1) − 0:1) or inthe case �=0:1(0:1)0:4, the determined turning points, nt , aim to alert the reader for theneed of a careful use of kopt(n) sequences. The value n= nt corresponding to �=−1:0seems rather troublesome while revealing a paradoxal behavior of � values near −1.But what really happens is that in the process of deduction of kopt we must resort to ahigher order approximation to the sampled distribution since an approximation similar


to the one taken for the remaining values of � would produce sequences k(n) going to∞ very fast.

Again, undertaking Dekkers and de Haan (1993) methodology but for the GEV(�) (�=max(−1;−|�|); � �= 0), we obtain

mopt(n) ∼

{(1 − �)2(1 − 2�)2

23�2(�)(2 − �)2

}−1=3

n2=3; �¡ − 1;

{2�2(−1)}1=3n2=3; � = −1;{ −2�5(1 + �)2

�2(�)(1 − �)2(1 − 3�)2

}1=(2�−1)

n2�=(2�−1); −1¡�¡ 0;

(log n − logmopt(n))5

100; � = 0;{

(1 + �)4(1 + �2)2�5

}1=(2�+1)

n2�=(2�+1); 0¡�¡ 1;

{649

}1=3

n2=3; � = 1;

{(1 − 2�)2

25(1 + �2)

}−1=3

n2=3; �¿ 1;

MSE∞(�̂Mmopt ;n) =

3�2(�)2mopt

; �¡ − 1;

3�2(−1)2mopt

; � = −1;

(2� − 1)�2(�)2�mopt

; −1¡�¡ 0;

25(log n − logmopt)2 +

1mopt

; � = 0;

(1 + 2�)(1 + �2)2�mopt

; 0¡�¡ 1;

3mopt

; � = 1;

3(1 + �2)2mopt

; �¿ 1:

Similar remarks to previous ones, regarding the 5nite sample properties, could takeplace here.


4. Automatic choice of the number k of upper extremes

Reiss and Thomas (1997) have proposed a heuristic method of choosing the numberof upper extremes to use in the tail index estimation. This methodology, incorporatedin “Xtremes” package, selects k in an automatic manner adopting as optimum the valuek∗ which minimizes

1k

∑i6k

i�|�̂i; n − med(�̂1; n; : : : ; �̂k;n)|; 06 �612: (10)

Reiss and Thomas have also suggested to minimize the following modi5cation of (10):

1k − 1

∑i¡k

i�(�̂i; n − �̂k;n)2; 06 �612: (11)

As far as the authors know, no methodology regarding the speci5cation of � hasbeen available. The main motivation of our work is then to substantially reduce themagnitude of variation of �, i.e., to 5nd good sets of weights in which the methodoften computes k∗=k∗(�) capable of locating reasonable estimates of �, when adoptingthe semi-parametric estimators previously mentioned: Hill’s, Moment and respectivesmooth versions and the location invariant Hill-type estimator.

We have proceeded with the generation of 5000 replicates of independent samples ofsize n=1000 from GP(�) and GEV(�) models with �=−3:0(0:1)3:0. For each sampleor replicate, the heuristic procedure was applied in order to compute k∗ = k∗(�) withrespect to �= 0(0:1)0:5. Averaging over all samples, we got a mean number of upperextremes, k∗(�), and a simulated mean squared error, MSE((�̂k∗(�); n)). As the intentionhere was to identify eventual similarities relating the performance of RT-method for thesemi-parametric estimators involved in, the same 5000 samples were preserved duringall the present work.

Since the focus is solely on a simulation study, the performance evaluation of Reissand Thomas’ procedure with respect to � was carried out undertaking the followingmeasure:∑

�∈�[MSE(�̂k∗(�); n) − MSE∞(�̂kopt ;n)]

2; (12)

where � is a suitably chosen set of values of �. The value of � to be selected as themost appropriate should be one for which (12) is minimal.

Notice that, in a certain sense, (12) represents a measure between two mean squarederrors (taking MSE∞ as the optimal reference value), ousting the alternative possibilityof comparing k∗(�) with the ‘optimal’ kopt itself, since this procedure may be foundmeaningless for 5nite sample sizes, as illustrated in Section 3. The � intervals taken asrelevant for each estimator yearned to connect, somehow, asymptotic results with thesimulation outcome with respect to each �=0(0:1)0:5 and to articulate inferences dom-inated by GP(�) and GEV(�) models. This was attempted via numerical minimizationof (12) for each estimator, within each model.


From this point on, we shall reveal matching results for GP(�) and GEV(�) modelsin order to consolidate the present study. When no syntony between the two modelswas found in what concerns the � value to be speci5ed, we have decided for theselection of a suitable intermediate �. This was born out in practice: resorting to theMoment estimator, further simulations were conducted for � = 0:05(0:05)0:45 whichhave demonstrated, to be pursuing the direction of a slight agreement between the twomodels.

On the light of the information provided by the simulation study, soon became clearthat the mean number of k∗ = k∗(�) delivered often appears associated with � = 0.Also, it is easy to recognize the ability of � values near zero to promote highersubsample sizes because the penalty term i� becomes approximately 1 and, therefore,quite harmless.

Taking all into consideration, we found rather feasible combinations of semi-parametric estimator/� value/version (10) or (11) of RT-method:

(1) Hill’s estimator:If the only existing information about the tail index is that �¿ 0, then select

� = 0 in version (10);If it is reasonable to ascertain that �¿ 1, then select � = 0 encapsulated in

version (10) of the method, whereas for 0¡�6 1, choose � = 0:3 in (10).(2) Moment estimator:

Applying the method with the Moment estimator, the results seem to reveal aslight hostility between GEV and GP distributions. However, generally speakingit is possible to advise

For �¡ 0, � = 0:35 in version (11) as the most appropriate choice.If �¿ 0, select � = 0:4 encapsulated in (10);If more detailed information is available, namely, if the true � is less or equal

to 0.5, � = 0:5 gathers our preferences concerning the minimization of (12) ei-ther the underlying model is GEV or GP distribution. However, for the GP dis-tribution, superiority is on version’s (10) side but, for the GEV distribution,version (11) prevails against (10). Since this methodology is mended to dealwith any model, choosing between the two versions, (10) or (11), has becomethe issue. The overall conclusion stems from the numerical study of the ratioMSE(10)(�̂k∗ ; n)=MSE(11)(�̂k∗ ; n): if �6 0:5, select �=0:5 with (11); if 0:5¡�6 1,make � = 0:35 in version (10); for 1¡�6 2, select � = 0:25 in (11); if �¿ 2,choose � = 0:25 encapsulated in (10).

(3) Hill-type estimator:Apply version (10) of RT-method with �= 0, regardless of any prior informa-

tion involving the true positive �.(4) Smooth Hill estimator:

For positive �, under the GP distribution, the method of automatic choice of thenumber of extremes has its best performance when � = 0 in version (11). Underthe GEV model, the method performs reasonably well with �=0 in version (10).Because, under the GEV distribution, both versions of the method have producedsimilar mean squared errors, we recommend the use of (11) with � = 0.


If it is possible to verify that 0¡�6 1, then choose �=0:05 in (11), whetherin the case of �¿ 1, select � = 0 in (11).

(5) Smooth Moment estimator (s = 0:3):For �¡ − 0:5, we select the couple � = 0:45 and version (10).For �¿− 0:5, the appropriate choice is � = 0 with version (10).

The overall conclusion we emphasize in this paper is that � = 0 has been frequentlyseen as the most proper choice, independently of the version of the method at stake,in the sense that, in most cases, the minimum of (12) is attained at � = 0.

5. Illustrative examples

In this section we illustrate the performance of RT-method through its applicationto sets of samples taken from distinct parent distributions. The purpose, at present,is to study the robustness of the heuristic method for models diLering from the onespreviously considered (GP and GEV distributions) regarding the advisable rules for thechoice of � presented in Section 4.

We consider seven distributions as key examples

• Cauchy distribution (� = 1; � = −2):

F(x) =12

+1�

arctan x; x∈R:• Burr (�; �; ) distribution (� = 1=� ; � = −1= ):

F(x) = 1 −(

�� + x�

)

; x¿ 0;

with (�; �; ) = (1; 2; 2) or (�; �; ) = (1; 12 ; 2).

• Weibull ( ; �) distribution (� = � = 0):

F(x) = 1 − exp{− x−�}; x¿ 0;

with ( ; �) = (1; 12 ).• Standard normal distribution (� = � = 0).

• Uniform distribution on the unit interval (� = � = −1).• Reversed Burr (�; �; ) distribution (� = −1=� ; � = −1= ):

F(x) = 1 −(

�� + (x+ − x)−�

)

; x¡x+;

with (�; �; ) = (1; 4; 1) and x+ = 1.

We have simulated 5000 samples of size n= 1000 taken from the selected parent dis-tributions. For each sample we have obtained an estimate of the extreme value index �by the use of Hill’s and Smooth Moment estimators, embedded in RT-method. Aver-aging over all samples, we got the mean number of upper extremes k∗ and respectivesimulated mean squared error MSE(�̂k∗ ; n).


In the beginning of this paper we gave heed to the fact that the optimal choiceof the number of extremes is subjected to a variety of diLerent approaches. Matthysand Beirlant (2000) contains a review of several adaptive methods with applicationto a practical example and a number of simulated data sets. However, we wish toremain within the scope of an adaptive sequential choice of the number of extremes.Thus, with respect to the Hill’s estimator, we will consider also a theoretically basedadaptive procedure by Drees and Kaufmann (1998). This method relies on resultsconcerning the Hall model which forms a generalization of the Pareto model. Since,for any intermediate sequence k(n), the maximum random Juctuation of the series√i(�̂H

i; n − �); 16 i6 k(n) − 1, is of order√

log log n, Drees and Kaufmann deriveda stopping time criterion for an asymptotically equivalent deterministic sequence inorder to select an optimum value k∗. This method should produce the intended eLectof retaining k∗ under which the bias starts to dominate the variance part of the series(see Corollary 1 of Drees and Kaufmann, 1998). The disclosed advantage of the methodis that the second-order parameter �6 0 may be either set equal to a 5xed value �0 ormay be estimated. After conducting exhaustive simulations for moderate sample sizes,such as n = 1000, it turned out that the estimation of the parameter � results only ina token of the real value. In fact, for many distributions, better results are obtained if� is 5xed (at �0 = −1) instead of being estimated.

For 5xed �0=−1, Drees and Kaufmann’s method (DK-method) can be accomplishedby following the path:Step 1: rn = 2:5 × �̃n × n0:25, with �̃n = �̂H

2√n;n.

Step 2: Obtain Vkn(rn) := min{k =1; : : : ; n−1 : maxi¡k√i|�̂H

i; n − �̂Hk;n|¿rn}. If condi-

tion maxi¡k

√i|�̂H

i; n − �̂Hk;n|¿rn is satis5ed for some k such that kn(rn) well de5ned, then

move forward to step 3. Else, assign 0:9 × rn to rn and repeat step 2.Step 3: For "∈ (0; 1) (in particular " = 0:7), determine

k∗DK =

[13

(2�̃2n)

1=3( Vkn(r"n)

( Vkn(rn))"

)1=(1−")];

where [x] denotes the largest integer smaller or equal to x.Theorem 1 of Drees and Kaufmann (1998) ensures that k∗DK is a consistent estimator

of kopt if the underlying distribution function satis5es the Hall condition. Unfortunately,k∗DK is not always well de5ned since the procedure may return k∗DK ¡ 2. Nevertheless,when applying DK-method to Cauchy and both Burr models, those samples were ig-nored and we were left with N = 4959 samples for the Cauchy distribution, N = 4999for Burr (1; 2; 2) and N = 4986 samples for the second Burr model. The results arepresented in Table 2.

On the other hand, version (10) of RT-method was applied with �=0:3 because, forCauchy and both Burr distributions, the true value of � is less or equal to 1. Simulationresults are summarized in Table 3. The minimal empirical mean squared error of Hill’sestimator is also available: MSE(�̂Sim

k∗ ; n) denotes the empirical mean squared error of�̂Hk;n which was found to be minimal at k∗sim.


Table 2Optimal number of extremes by the use of Hill’s estimator in DK-method

Hill k∗DK �̂DK

k∗ MSE(�̂DKk∗ )

Cauchy 101 1.0118 0.0129Burr(1,2,2) 63 0.3003 0.0044Burr(1,4,1) 175 0.2745 0.0011Burr(1; 1

2 ; 2) 73 1.2183 0.0796Burr(1,1,1) 131 1.0635 0.0161

Table 3Optimal number of extremes by the use of Hill’s estimator in RT-method

Hill k∗RT k∗

sim �̂RTk∗ MSE(�̂RT

k∗ ) MSE(�̂Simk∗ )

Cauchy 183 (� = 0:3) 153 0.9771 0.0351 0.0093Burr(1,2,2) 78 (� = 0:3) 41 0.2766 0.0041 0.0035Burr(1,4,1) 143 (� = 0:3) 111 0.2497 0.0022 0.0009Burr(1; 1

2 ; 2) 78 (� = 0:3) 41 1.1062 0.0697 0.0558Burr(1,1,1) 143 (� = 0:3) 111 0.9999 0.0353 0.0132

RT-method does not take into account prior knowledge about the underlying dis-tribution function but it is still capable of producing better results than DK-method.This applies, in particular, to both Burr(1,2,2) and Burr(1; 1

2 ; 2) models for which theparameter � equals − 1

2 . Although Burr distribution is of Hall type, the performanceof DK-method seems to be inevitably aLected by the misspeci5cation of parameter�. Further simulation results built on distributions possessing second-order parameter� = −1 allowed to corroborate the latter. As an example, Tables 2 and 3 contain alsothe results for Burr(1,4,1) and Burr(1,1,1) models which now lead to a preference forDK-method.

Series of Smooth Moment estimates are dealt with version (10) of RT-method usingan adequate � settled in accordance the true value of the extreme value index of theunderlying distribution and item (5) of Section 4. Simulation results are presented inTable 4. Unlike Smooth moment estimator, for which we have obtained the highestvalues of empirical MSE (in Table 4), the application of RT-method does not seem tointroduce a signi5cant increment on the MSE of Hill’s estimator: notice the proximityof values of MSE(�̂RT

k∗ ) and MSE(�̂Simk∗ ), speci5cally for Burr(1,2,2) and Burr(1; 4; 1)

distributions. Smooth version of the Moment estimator is clearly aLected by its em-bedding in RT-method, in the sense that there is a substantial increase of the resultantMSE.

We come to an end with two practical applications: a 5rst one considering theNorwegian portfolio data set in 1990, taken from Beirlant et al. (1996), used as wellin Beirlant et al. (2001) to illustrate diLerent methods of threshold selection and togive an idea of the typical problems with claim data modelling and in Beirlant andGoegebeur (2003); the other application is built on Standard & Poor’s 500 (S&P500)


Table 4Optimal number of extremes by the use of Smooth Moment estimator

Smooth Moment k∗RT k∗

sim �̂RTk∗ MSE(�̂RT

k∗ ) MSE(�̂Simk∗ )

Cauchy 402 (� = 0) 299 0.9704 0.0733 0.0113Burr(1; 2; 2) 669 (� = 0) 344 0.1232 0.0323 0.0082Burr

(1; 1

2 ; 2)

308 (� = 0) 97 1.2706 0.2607 0.0423

Weibull(1; 1

2

)333 (� = 0) 35 0.50566 0.3679 0.0654

Normal(0,1) 189 (� = 0) 127 −0:3155 0.2527 0.0165Uniform(0,1) 527 (� = 0:45) 605 −1:0664 0.0315 0.0072Rev. Burr(1; 4; 1) 423 (� = 0) 999 −0:3720 0.0429 0.0061

Hill

1

0.8

0.6

0.4

0.2

2 102 202 302 402 502

k

602

0

-0.2

-0.4

Smooth Moment

k*DK = 100 k*

RT = 34

k*RT = 626

Fig. 1. Hill and Smooth Moment estimates plot using the Fire Claims data.

closing values from January 1980 up to Friday, October 16, 1987, already used inMatthys and Beirlant (2000).

Fig. 1 is a Hill and Smooth Moment estimates plot for the sample of 628 5reclaims. The plots seem to be reasonably Jat for a wide range of values around 0.6.Here, RT-methods enables us to discard the inherent subjectivity of inferring the valueof � from the stable region in the graph. Following the considerations made in Sec-tion 4, we applied the original form (10) of RT-method, by the use of Hill’s es-timator and � = 0:3: the optimal number of upper extremes, k∗ = 342, resulted inthe point estimate �̂k∗ = 0:6453. As the Hill’s estimates process seems to be glidedover the neighborhood of 0.6 before acquiring substantial bias but remaining less than1 for all k, the speci5cation of � proves to be rather easy. Notice that DK-method


Hill

0.8

0.7

0.6

0.5

0.4

0.3

0.2

2 102 202 302 402 k

k*DK = 29

k*DK = 104

Fig. 2. Hill plot, {(k; �̂Hk;n); 1¡k ¡ 500}, with respect to S&P500 percentage falls.

for the Hill’s estimator has selected k∗ = 100 with �̂k∗ = 0:6544. Using Smooth Mo-ment estimator and � = 0 in RT-method, the value k∗ = 626 determined the estimate�̂k∗ = 0:5650.

From now on we will consider the S&P500 data. We will base our inference on theabsolute values of the 950 negative percentage changes (negative returns). A realisticmodel should reJect the stylized features of the return series: returns are not i.i.d. butlow correlated and are heavy-tailed. The tail index of the distribution in the origin ofthe data needs to be estimated by Hill’s estimator because we are dealing with weakdependence and, therefore, our purposes would not be served by the use of SmoothMoment estimator. Fig. 2 corresponds to the plot of Hill’s estimates for the 500 mostextreme percentage falls. RT-method in its original version (10), with �=0:3, producedthe optimal number of upper extremes k∗ = 104 which, in turn, was responsible forthe point estimate �̂k∗ = 0:2729. By the use of DK-method we obtained k∗ = 29 andthe point estimate �̂k∗ = 0:24791.

Finally, we wish to stress out that the cases studied here allowed to foresee thatsuch a heuristic methodology must coexist with the estimates plot, and should neverbe dissociated from the traditional procedures of monitoring the most accurate portionsof the plot. The improvement of this study stays on the admonition of a particularpoint value for the parameter of interest, �, resulting from a computational algorithmthat is merely a convenient weighted distance, avoiding in a certain sense an inherentsubjective choice of the estimated value for subsequent statistical inferences. Here, wehave prescribed particular weights to be taken into account in practice.


References

Beirlant, J., Dierckx, G., Goegebeur, Y., Matthys, G., 1999. Tail index estimation and an exponentialregression model. Extremes 2 (2), 177–200.

Beirlant, J., Goegebeur, Y., 2003. Regression with response distributions of Pareto-type. Comput. Statist.Data Anal 42 (4), 595–619.

Beirlant, J., Matthys, G., Dierckx, G., 2001. Heavy-tailed distributions and rating. Astin Bull. 31 (1),37–58.

Beirlant, J., Teugels, J.L., Vynckier, P., 1996. Practical Analysis of Extreme Values. Leuven UniversityPress.

Danielsson, J., de Haan, L., Peng, L., de Vries, C.G., 1997. Using a bootstrap method to choose the samplefraction in tail index estimation. Working Paper, Erasmus University Rotterdam.

Davis, R., Resnick, S., 1984. Tail estimates motivated by extreme-value theory. Ann. Statist. 12, 1467–1487.Dekkers, A.L.M., de Haan, L., 1989. On the estimation of the extreme-value index and large quantile

estimation. Ann. Statist. 17, 1795–1832.Dekkers, A.L.M., de Haan, L., 1993. Optimal choice of sample fraction in extreme-value estimation.

J. Multivariate Anal. 47, 173–195.Dekkers, A.L.M., Einmahl, J., de Haan, L., 1989. A moment estimator for the index of an extreme-value

distribution. Ann. Statist. 17, 1833–1855.Draisma, G., de Haan, L., Peng, L., Pereira, T.T., 1999. A bootstrap method to achieve optimality in

estimating the extreme-value index. Extremes 2 (4), 367–404.Drees, H., Kaufmann, E., 1998. Selecting the optimal sample fraction in univariate extreme value estimation.

Stochastic Process Appl. 75, 149–172.Drees, H., de Haan, L., Resnick, S., 2000. How to make a Hill plot. Ann. Statist. 28, 254–274.Fraga Alves, M.I., 2001. A location invariant Hill-type estimator. Extremes 4 (3), 199–217.Goldie, C., Smith, R., 1987. Slow variation with remainder: theory and applications. Quart. J. Math. 38,

45–71.Gomes, M.I., Oliveira, O., 2001. The bootstrap methodology in statistics of extremes-choice of the optimal

sample fraction. Extremes 4 (4), 331–358.de Haan, L., 1984. Slow variation and characterization of domains of attraction. In: Tiago de Oliveira,

J. (Ed.), Statistical Extremes and Applications. Reidel Publishing, Dordrecht, pp. 31–48.de Haan, L., Peng, L., 1998. Comparison of tail index estimators. Statist. Neerlandica 52, 60–70.de Haan, L., Stadtm>uller, U., 1996. Generalized regular variation of second-order. J. Austral. Math. Soc. A

61, 381–395.H>aeusler, E., Teugels, J.L., 1985. On asymptotic normality of Hill’s estimator for the exponent of regular

variation. Ann. Statist. 13, 743–756.Hall, P., 1982. On some simple estimates of an exponent of regular variation. J. Roy. Statist. Soc. Ser. B

44, 37–42.Hill, B.M., 1975. A simple general approach to inference about the tail of a distribution. Ann. Statist. 3,

1163–1174.Mason, D., 1982. Laws of large numbers for sums of extreme values. Ann. Probab. 10, 754–764.Matthys, G., Beirlant, J., 2000. Adaptive threshold selection in tail index estimation. In: Embrechts, P. (Ed.),

Extremes and Integrated Risk Management. Risk Books, London, pp. 37–49.Neves, C., Fraga Alves, M.I., 2003. Automatic choice of the number of extremes Cadernos de Matem[etica

CM03, Universidade de Aveiro, pp. I–30.Pickands III, J., 1975. Statistical inference using extreme order statistics. Ann. Statist. 3, 119–131.Reiss, R.-D., 1989. Approximate Distributions of Order Statistics, Springer Series in Statistics. Springer, New

York.Reiss, R.-D., Thomas, M., 1997. Statistical Analysis of Extreme Values, with Applications to Insurance,

Finance, Hydrology and Other Fields. Birkh>auser, Basel.Resnick, S.I., StOaricOa, C., 1997. Smoothing the Hill estimator. Adv. Probab. 29, 271–293.Resnick, S.I., StOaricOa, C., 1998. Smoothing the Moment estimator of the extreme value parameter. Extremes

1 (3), 263–293.

Documents

Reiss and Thomas’ automatic selection of the number of extremes