103
logo Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow, Russia February, 13-18, 2006, Bamberg, Germany June, 19-23, 2006, Brest, France May, 14-19, 2007, Trondheim, Norway PhD course Natalia Markovich Analysis methods of heavy-tailed data

Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

Embed Size (px)

Citation preview

Page 1: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Analysis methods of heavy-tailed data

Natalia Markovich

Institute of Control SciencesRussian Academy of Sciences, Moscow, Russia

February, 13-18, 2006, Bamberg, GermanyJune, 19-23, 2006, Brest, France

May, 14-19, 2007, Trondheim, NorwayPhD course

Natalia Markovich Analysis methods of heavy-tailed data

Page 2: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Chapter 1

Introduction:

definitions and basic properties of classes of heavy-taileddistributions. Tail index estimation. Methods for the selection ofthe number of the largest order statistics in Hill estimator.Rough methods for the detection of heavy tails and the numberof finite moments.

The Section 1 contains the introduction

with necessary definitions, basic properties and examples ofheavy-tailed data. The tail index indicates the shape of the tailand therefore it is the basic characteristic of heavy-tailed data.Methods of tail index estimation are presented.Finally, several rough tools for the detection of heavy-tails, thenumber of finite moments and dependence are considered.

Natalia Markovich Analysis methods of heavy-tailed data

Page 3: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Main assumption.

Let X1, . . . ,Xn be a sample of i.i.d. r.v.s distributed with theheavy-tailed DF F (x).

Definition

A DF F (x) (or the r.v. X ) is called heavy-tailed if its tailF (x) = 1 − F (x) > 0, x ≥ 0, satisfies ∀y ≥ 0

limx→∞

P{X > x + y |X > x} = limx→∞

F (x + y)/F (x) = 1.

Comparison of probability tails.

Natalia Markovich Analysis methods of heavy-tailed data

Page 4: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Examples of heavy- and light-tailed distributions

Heavy-tailed Subexponential:distributions: Pareto, Lognormal, Weibull

with shape parameter less than 1.With regularly varyingtails: Pareto, Cauchy, Burr, Frechet,Zipf-Mandelbrot law.

Light-tailed exponential, gamma, Weibulldistributions with shape parameter more than 1,

normal, finite distributions.

Natalia Markovich Analysis methods of heavy-tailed data

Page 5: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Heavy-tailed distributions have been accepted asrealistic models for various phenomena:

WWW-session characteristicssizes and durations of sub-sessions; sizes of responsesinter-response time intervals

on/off-periods of packet traffic

file sizes

service-time in queueing model

flood levels of rivers

major insurance claims

extreme levels of ozon concentrations

high wind-speed values

wave heights during a storm

low and high temperatures

Natalia Markovich Analysis methods of heavy-tailed data

Page 6: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Basic definitions

Let X n = {X1, . . . ,Xn} be a sample of i.i.d. r.v. distributed withthe DF F (x) and Mn = max(X1,X2, . . . ,Xn).Extreme value theory assumes that for a suitable choice ofnormalizing constants an,bn it holds

P{(Mn − bn)/an ≤ x} = F n(bn + anx) →n→∞ Hγ(x), x ∈ R,

and an Extreme Value DF Hγ(x) is of the following type:

Hγ(x) =

exp(−x−1/γ), x > 0, γ > 0 ’Frechet’,exp(−(−x)−1/γ), x < 0, γ < 0 ’Weibull’,exp(−e−x), γ = 0, x ∈ R ’Gumbel’.

Definition

The parameter γ is called the extreme value index (EVI) anddefines the shape of the tail of the r.v. X .The parameter α = 1/γ is called tail index.

Natalia Markovich Analysis methods of heavy-tailed data

Page 7: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Pickands’s theorem:

The limit distribution of the excess distribution of the Xi ’s is nec-essarily of the Generalized Pareto form

limu↑xF ,u+x<xF

P (X1 − u > x |X1 > u) → (1 + γx)−1/γ+ , x ∈ R,

wherexF = sup{x ∈ R : F (x) < 1}

is the right endpoint of the distribution F and the shape parameterγ ∈ R.

Therefore, the Generalized Pareto distribution (GPD) with DF

Ψσ,γ(x) = 1 − (1 + γx/σ)−1/γ+

is often used as a model of the tail of the distribution.

Natalia Markovich Analysis methods of heavy-tailed data

Page 8: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

The classes of heavy-tailed distributions

distributions with regularly varying tails ( RVT )(X ∈ R−1/γ)

P{X > x} = x−1/γℓ(x),∀x > 0, γ > 0,

where ℓ is a slowly varying function, i.e.

limx→∞

ℓ(tx)/ℓ(x) = 1, ∀t > 0.

Examples:

Pareto, Burr, Frechet distributions belong to RVT .

Examples of ℓ(x) give

c ln x , c ln(ln x) and all functions converging to positiveconstants.

Natalia Markovich Analysis methods of heavy-tailed data

Page 9: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

The classes of heavy-tailed distributions

subexponential distributions ( S)(X ∈ S)

P{Sn > x} ∼ nP{X1 > x} ∼ P{Mn > x} as x → ∞,

where Sn = X1 + ...Xn, n ≥ 2, Mn = maxi=1,...,n{Xi}.

Example:

Weibull with shape parameter less than 1 belongs to S.

Intuitively, subexponentiality means

that the only way the sum can be large is by one of thesummands getting large (in contrast, in the light-tailed case allsummands are large if the sum is so).

Natalia Markovich Analysis methods of heavy-tailed data

Page 10: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Basic properties of regularly varying distributions:

Lemma

Let X ∈ R−α. Then,

(i) X ∈ S.

(ii) E{Xβ} <∞ if β < α, E{Xβ} = ∞ if β ≥ α.

(iii) If α > 1, then X r ∈ R1−α and

P{X r > x} ∼ ℓ(x)x1−α/((α− 1)E{X}) as x → ∞.

(iv) If Y is non-negative and independent of X such thatP{Y > x} = ℓ2(x)x−α2 , then X + Y ∈ R−min(α,α2) andP{X + Y > x} ∼ P{X > x} + P{Y > x} as x → ∞.

(v) If Y is non-negative and independent of X such thatE{Y α+ε} <∞ for some ε > 0 then XY ∈ R−α and

P{XY > x} ∼ E{Y α}P{X > x} as x → ∞.

Natalia Markovich Analysis methods of heavy-tailed data

Page 11: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Important property for the rough detection of heavytails and the number of finite moments:

Let X ∈ R−α.

Then E{Xβ} <∞, if β < 1/γ; E{Xβ} = ∞, if β > 1/γ.

Examples:

If α = 2, γ = 0.5, thenEX1 <∞ (the first moment is finite),EX 2

1 = ∞ (the second moment is infinite or does not exist).

If α = 0.5, γ = 2, thenEX1 = ∞, EX 2

1 = ∞... (all moments are infinite or do notexist).

Natalia Markovich Analysis methods of heavy-tailed data

Page 12: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Estimators of tail index

X(1) ≤ X(2) ≤ ... ≤ X(n)

are order statistics of the sample X n = {X1,X2, ...,Xn}For γ > 0:

Hill’s estimator

γH(n, k) =1k

k∑

i=1

ln X(n−i+1) − ln X(n−k)

Ratio estimator Goldie, Smith, (1987)

an = an(xn) =n∑

i=1

ln(Xi/xn)I{Xi > xn}/n∑

i=1

I{Xi > xn},

xn is am arbitrary threshold level, e.g., xn = X(n−k)

Natalia Markovich Analysis methods of heavy-tailed data

Page 13: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Estimators of tail index

For γ ∈ R:

”Moment estimator”, Dekkers, Einmahl, de Haan, (1989):

γM(n, k) = γH(n, k) + 1 − 0.5(

1 − (γH(n, k))2/Sn,k ))−1

,

Sn,k = (1/k)∑k

i=1

(ln X(n−i+1) − ln X(n−k)

)2

” UH estimator”, Berlinet, (1998):

γUH(n, k) = (1/k)

k∑

i=1

ln UHi−ln UHk+1,UHi = X(n−i)γH(n, i)

Pickands’s estimator

γP(n, k) = 1/ ln 2 ln(X(k) − X(2k)

)/(X(2k) − X(4k)

)

No recursiveness!Natalia Markovich Analysis methods of heavy-tailed data

Page 14: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Group estimator, Davydov, Paulauskas, Rackauskas,(2000):

The sample X n is divided into l groups V1, ...,Vl , each groupcontaining m r.v.s, i.e. n = l · m.

Estimator of the function of the tail index:

zl = (1/l)∑l

i=1 kli = αα+1 = 1

1+γ ,

where

kli = M(2)li /M(1)

li , M(1)li = max{Xj : Xj ∈ Vi}

and M(2)li is the second largest element in the same group Vi .

Natalia Markovich Analysis methods of heavy-tailed data

Page 15: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Group estimator, Davydov, Paulauskas, Rackauskas,(2000).Theoretical background.

1 For distributions with regularly varying tails

1 − F (x) = x−αℓ(x),

and l = m = [√

n], it is proved

zl →a.s. α

α+ 1=

11 + γ

.

2 For distributions

1 − F (x) = C1x−α + C2x−β + o(x−β),

with β = 2α it holds

l(l−1l∑

i=1

kli−α(1+α)−1)

(l∑

i=1

k2li − l−1

l∑

i=1

kli

)−1/2

→p N(0,1).

Natalia Markovich Analysis methods of heavy-tailed data

Page 16: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Recursiveness of the estimate zl . On-line estimation.

Having the next group of observations Vl+1 it follows

γl+1 =

(l

l + 1· 1

1 + γl+

kl+1l+1

l + 1

)−1

− 1

After getting i additional groups with m elements eachVl+1, ...,Vl+i

γl+i = (l + i)(

l1 + γl

+ kl+1l+1+ ...+ kl+il+i

)−1

− 1,

i.e.

γl+i is obtained using γl by O(1) calculations.

Natalia Markovich Analysis methods of heavy-tailed data

Page 17: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Recursiveness of the estimate zl . On-line estimation.

Sincezl+i = 1/(1 + γl+i)

it holds

zl+i =

lzl +

i∑

j=1

kl+jl+j

/(l + i),

bias(zl+i) = bias(zl), var(zl+i) = var(zl)l/(l + i),

var(zl+i) < var(zl) for ∀i > 0

Natalia Markovich Analysis methods of heavy-tailed data

Page 18: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

The selection of the number of the largest orderstatistics in Hill estimator

A Hill plot {(k , γH(n, k)),1 ≤ k ≤ n − 1}):the estimate of γH(n, k) is chosen from an interval in whichthis function demonstrate stability.Plot of the mean excess function{(u,e(u)) : X(1) < u < X(n)}, where

e(u) =n∑

i=1

(Xi − u)1{Xi > u}/n∑

i=1

1{Xi > u}

is the sample mean excess function over the threshold u.If this plot follows a reasonably straight line above a certainvalue of u, then this indicates that excesses over u followeP(u) = (1 + γu)/(1 − γ) of generalized Pareto distributionwith positive shape parameter. One can take the nearestorder statistics to u as the estimate of k .

Natalia Markovich Analysis methods of heavy-tailed data

Page 19: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

The sensitivity of the Hill estimate to k .

0 500 10004

2

0

2

4

6

8

10

12

14

16

k

gam

ma(k

)

0 500 10004

2

0

2

4

6

8

10

k

gam

ma(k

)

0 500 10002

1.5

1

0.5

0

0.5

1

1.5

k

gam

ma(k

)

The Hill’s estimate against k for 15 samples of Weibull(left), Pareto (middle) and Frech et (right) distributions, allwith shape parameter α = 1/γ = 0.5. Sample size isn = 1000.

Natalia Markovich Analysis methods of heavy-tailed data

Page 20: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Plot of the mean excess function.

Left: Weibull distribution. Right: Pareto distribution. Theshape parameter is 0.5, sample size 1000.

Natalia Markovich Analysis methods of heavy-tailed data

Page 21: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Bootstrap method for k selection.

Minimizing of the empirical bootstrap estimate of the meansquared error of γ:

MSE(γ) = E (γ − γ)2 → mink.

The bootstrap estimate is obtained by drawing B sampleswith replacement from the original data set X n.

Smaller re-samples {X ∗1 , ...,X∗n1} of the size

n1 = nβ , 0 < β < 1,are used.

The corresponding smaller k1 and an optimal k are relatedby:

k = k1(n/n1)α, 0 < α < 1,

where β = 1/2 and α = 2/3, P.Hall, (1990) .

Natalia Markovich Analysis methods of heavy-tailed data

Page 22: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Empirical bootstrap estimate of the MSE(γ) is

MSE∗(n1, k1) =(

b∗(n1, k1))2

+ var∗(n1, k1) → mink1

,

where

b∗(n1, k1) =1B

B∑

b=1

γ∗b(n1, k1) − γ(n, k),

var∗(n1, k1) =1

B − 1

B∑

b=1

(γ∗b(n1, k1) −

1B

B∑

b=1

γ∗b(n1, k1)

)2

are the empirical bootstrap estimates of the bias and thevariance,

γ∗b is the Hill’s estimate constructed by some re-sample of thesize n1 with the parameter k1.

Natalia Markovich Analysis methods of heavy-tailed data

Page 23: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Double bootstrap method for k selection.

Danielsson, de Haan, Peng and de Vries, (1997) requiresless parameters than bootstrap method, Hall, (1990):

n1 and B are required, α is not required.

Auxiliary statistic:

MSE(zn,k ) = E(

zn,k − z⋆n,k

)2,

wherezn,k = Mn,k − 2(γH(n, k))2,

Mn,k =1k

k∑

j=1

(log X(n−j+1) − log X(n−k)

)2,

z⋆n,k is a bootstrap estimate of zn,k .

Natalia Markovich Analysis methods of heavy-tailed data

Page 24: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Double bootstrap method for k selection.

Mn,k/2γH(n, k) and γH(n, k) are consistent estimates for γ,

=⇒ zn,k → 0 as n → ∞.

=⇒ AMSE(zn,k ) = E(zn,k )2 → mink.

Natalia Markovich Analysis methods of heavy-tailed data

Page 25: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Double bootstrap procedure is

Draw B bootstrap sub-samples of the size n1 ∈ (√

n,n)(e.g., n1 ∼ n3/4) from the original sample and determinethe value k⋆

n1that minimizes MSE of zn1,k .

Repeat this for B sub-samples of the size n2 = [n21/n] ([x ]

is the integer part of the number) and determine the valuek⋆

n2that minimizes MSE of zn2,k .

Calculate koptn from the formula

koptn = [

(k⋆n1

)2

k⋆n2

(1 − 1

ρ1

) 22ρ1−1

], ρ1 =log k⋆

n1

2 log(k⋆n1/n1)

,

and estimate γ by the Hill’s estimate with koptn .

The method is robust with respect to the choice of n1, Gomes,Oliveira, (2000).

Natalia Markovich Analysis methods of heavy-tailed data

Page 26: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Sequential procedure for k selection

is based on the theoretical result:

√i(γH(n, i) − γ) ∼ (log log n)1/2, 2 ≤ i ≤ k

in probability, Drees & Kaufmann, (1998) .

Natalia Markovich Analysis methods of heavy-tailed data

Page 27: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Sequential procedure for k selection. Algorithm.

An initial estimate γ0 = γH(n,2√

n) for the parameter γ isobtained by the Hill’s estimate.For rn = 2.5γ0n0.25 we compute

k(rn) = min{k ∈ 1, . . . ,n − 1| max2≤i≤k

√i(γH(n, i)−γH(n, k)) > rn}.

If rn is too large and max2≤i≤k√

i(γH(n, i) − γH(n, k)) > rn

is not satisfied it is recommended repeatedly replace rn by0.9rn until k(rn) i well defined.Similarly, k(r ε

n) for ε = 0.7 is computed.Optimal k

kopt =13

(k(r ε

n)

(k(rn))ε

)1/(1−ε)

(2γ0)1/3

is calculated and γ is estimated by γH(n, kopt).

The method is sensitive to the choice of rn.Natalia Markovich Analysis methods of heavy-tailed data

Page 28: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Plot method for m selection in the group estimator.

The plot {(m,1/zm − 1)} for Pareto distribution with γ = 1,the true γ is shown by dotted line. Sample sizes aren = {150,500,1000}.

Natalia Markovich Analysis methods of heavy-tailed data

Page 29: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Plot method for m selection in the group estimator.

Plot: {(m, zm),m0 < m < M0}, m0 > 2, M0 < n/2m = n/l , zm = (m/n)

∑[n/m]i=1 k(n/m)i

From consistency result zl →a.s. 11+γ it follows that

there must be an interval [m−,m+]

such that zm ≈ α/(1 + α) = (1 + γ)−1 for all m ∈ [m−,m+].

We suggest choosing the average value

z = mean{1/zm − 1 : m ∈ [m−,m+]}

and m∗ ∈ [m−,m+] as a point such that zm∗ = z.

Natalia Markovich Analysis methods of heavy-tailed data

Page 30: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Bootstrap method for m automatical selection.

Minimizing of the empirical bootstrap estimate of the meansquared error of (1 + γl)

−1, l = n/m:

MSE(γl) = E

(1l

l∑

i=1

kli −1

1 + γ

)2

→ minm.

The bootstrap estimate is obtained by drawing B sampleswith replacement from the original data set X n.

Smaller re-samples {X ∗1 , ...,X∗n1} of the size

n1 = nd , 0 < d < 1,are used.

The re-sample is divided into l1 subgroups.

The size of subgroups m1 and m are related by:m = m1(n/n1)

c , 0 < c < 1,where m1 is the size of subgroups in re-samples.

Natalia Markovich Analysis methods of heavy-tailed data

Page 31: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Empirical bootstrap estimate of the MSE

MSE∗(l1,m1) =(

b∗(l1,m1))2

+ var∗(l1,m1) → minm1,

where

b∗(l1,m1) =1B

B∑

b=1

zbl1 − zl ,

var∗(l1,m1) =1

B − 1

B∑

b=1

(zb

l1 −1B

B∑

b=1

zbl1

)2

are the empirical bootstrap estimates of the bias and thevariance,zb

l1= 1

l1

∑l1i=1 kl1 i

is the group estimator constructed by somere-sample.

How to select c and d?

Natalia Markovich Analysis methods of heavy-tailed data

Page 32: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Simulation study: the selection of c and d .

Asymptotic theory (P.Hall, (1990)) recommends

d = 1/2 and c = 2/3

for the bootstrap estimation of the parameter k of the Hill’sestimate of γ.

We check c = {0.05,0.1(0.1); 0.5} for a fixed d = 0.5.

Relative bias and the square root of the mean squared error:

Biasγ =1γ

1

NR

NR∑

i=1

γi − γ

,

RMSEγ =1γ

√√√√ 1NR

NR∑

i=1

(γi − γ)2

Natalia Markovich Analysis methods of heavy-tailed data

Page 33: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Conclusions:

the best values of c for the fixed d = 0.5 arec = {0.3 ÷ 0.5};

the bias of the group estimator is larger for Weibulldistribution.

Further research:

proof of theoretically best values of c and d for the groupestimator using the bootstrap.

Natalia Markovich Analysis methods of heavy-tailed data

Page 34: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Rough methods for the detection of heavy tails and thenumber of finite moments.

1. Ratio of the maximum to the sumLet X1,X2, ...,Xn be i.i.d. r.v.s. We define the followingstatistic

Rn(p) =Mn(p)

Sn(p), n ≥ 1, p > 0, where

Sn(p) = |X1|p + . . . |Xn|p,Mn(p) = max

(|X1|p, . . . , |Xn|p

), n ≥ 1

to check the moment conditions of the data.

For different values of p the plot of n → Rn(p) gives

a preliminary information about the distribution P{|X | > x}.Then E|X |p <∞ follows, if Rn(p) is small for large n. For largen a significant difference between Rn(p) and zero indicates thatthe moment E|X |p is infinite.

Natalia Markovich Analysis methods of heavy-tailed data

Page 35: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Rough methods for the detection of heavy tails and thenumber of finite moments.

2. QQ-plot A QQ-plot (or ”quantiles against quantiles”-plot)draws the dependence

{(

X(k),F←(

n−k+1n+1

)): k = 1, ...,n}, where

X(1) ≥ ... ≥ X(n) are the order statistics of the sample, andF← is an inverse function of the DF F .

Often, the QQ-plot is built as a dependence of exponentialquantiles

against the order statistics of the underlying sample. Then F←

is an inverse function of the exponential DF. Then a linearQQ-plot corresponds to the exponential distribution.

Natalia Markovich Analysis methods of heavy-tailed data

Page 36: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Rough methods for the detection of heavy tails and thenumber of finite moments.

3. Plot of the mean excess function

en(u) =

n∑

i=1

(Xi − u)1{Xi > u}/n∑

i=1

1{Xi > u}

1 Heavy-tailed distributions: e(u) → ∞, u → ∞.2 A Pareto distribution: a linear e(u).3 An exponential distribution: the constant e(u) = 1/λ.4 Light-tailed distributions: e(u) → 0, u → ∞.

Natalia Markovich Analysis methods of heavy-tailed data

Page 37: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Rough methods for the detection of heavy tails and thenumber of finite moments.

Plots of function e(u) forseveral distributions.

Natalia Markovich Analysis methods of heavy-tailed data

Page 38: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Rough methods for the detection of heavy tails and thenumber of finite moments.

4. Hill’s and other estimators of the tail index

The Hill estimate works bad if1 the underlying DF does not have a regularly varying tail,2 the tail index α = 1/γ is not positive,3 the sample size is not large enough,4 the tail is not heavy enough, i.e. γ is not big,5 F ⊆ Rα, since it strongly depends on the slowly varying

function ℓ(x) that is usually unknown.

The disadvantages of Hill’s estimate show

that one has to apply several estimates of the tail index to dealwith the complex analysis of data.

Natalia Markovich Analysis methods of heavy-tailed data

Page 39: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Definition of the independence.

Definition

Random variables ξ1, ξ2, ..., ξn are independent if for anyx1, x2, ..., xn it holds

P{ξ1 ≤ x1, ξ2 ≤ x2 , ..., ξn ≤ xn}= P{ξ1 ≤ x1}P{ξ2 ≤ x2}...P{ξn ≤ xn}

or in terms of cumulative distribution functions

F (x1, x2, ..., xn) = F1(x1)F2(x2)...Fn(xn),

where Fk (xk ) is a distribution function of the r.v. ξk .

Natalia Markovich Analysis methods of heavy-tailed data

Page 40: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Rough methods for the dependence detection.

Correlation between two r.v.s X1 and X2 is defined by

ρ(X1,X2) =cov(X1,X2)√

var(X1)var(X2),

wherecov(X1,X2) = E(X1X2) − E(X1)E(X2)

is a covariance, and var(x) is a variance.

1 If X1 and X2 are independent then ρ(X1,X2) = 0.2 If ρ(X1,X2) = 0 then not necessary X1 and X2 are independent!3 If Gaussian r.v.s X1 and X2 are not correlated then they are

independent.4 For non-Gaussian r.v.s it may be not true.

Natalia Markovich Analysis methods of heavy-tailed data

Page 41: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Rough methods for the dependence detection.

Generally, the covariances and correlations cannot indicate thedependence.

1 They describe the degree to which two r.v.s. are linearlydependent:

ρ(X1,X2) ∈ [−1,1].

2 ρ(X1,X2) = ±1 ⇔ X1 and X2 are perfectly linearly dependent,i.e. X2 = α+ βX1, for some α ∈ R and β 6= 0.

What tool could be an appropriate to indicate thedependence?

Natalia Markovich Analysis methods of heavy-tailed data

Page 42: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Exact methods for the dependence detection.

The mixing conditions of a stationary sequence should beconsidered for dependence detection.

Definition

(Rosenblatt (1956b)) The strictly stationary ergodic sequenceof random vectors Xt is strongly mixing with rate function φk forσ-field, if

supA∈σ(Xt ,t≤0),B∈σ(Xt ,t>k)

|P (A ∩ B)−P (A) P (B) | = φk → 0, k → ∞.

The rate φk shows how fast the dependence between the pastand the future decreases.

Natalia Markovich Analysis methods of heavy-tailed data

Page 43: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Exact methods for the dependence detection.

Another measure of dependence is given by the dependenceindex:

βn = supx ,y

n∑

j=1

|fj(x , y) − f (x)f (y)|,

where f (x) is a marginal probability density function (PDF) of astationary sequence {Xj , j = 1,2, ...},fj(x , y) is a joint PDF of X1 and X1+j , j = 1,2, ....

1 for i.i.d. sequences βn = 0 for all n,2 for sequences with strong long range dependence βn may tend

to infinity, and3 in between βn may converge to a finite limit at various rates.

It is difficult to estimate such dependence measures bystatistical tools.

Natalia Markovich Analysis methods of heavy-tailed data

Page 44: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Rough methods for the dependence detection.

The autocorrelation function ( ACF ) is

ρX (h) = ρ(Xt ,Xt+h) = E ((Xt − E(Xt))(Xt+h − E(Xt+h))) /Var (Xt)

Let {Xt , t = 0,±1,±2, ...} be a stationary sample series. Thestandard sample ACF at lag h ∈ Z is

ρn,X (h) =

∑n−ht=1 (Xt − X n)(Xt+h − X n)∑n

t=1(Xt − X n)2,

where X n = 1n

∑nt=1 Xt represents the sample mean.

The accuracy of ρn,X (h) may be poor if the sample size n issmall or h is large. The relevance of ρn,X (h) is determined by itsrate of convergence to the real ACF . When the distribution ofthe Xt ’s is very heavy-tailed (in the sense that EX 4

t = ∞), thisrate can be extremely slow.

Natalia Markovich Analysis methods of heavy-tailed data

Page 45: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

The autocorrelation function for heavy-tailed data.

For heavy-tailed data with infinite variance it is better to use themodified sample ACF :

ρn,X (h) =

∑n−ht=1 XtXt+h∑n

t=1 X 2t

,

i.e. without X n.

However, this estimate may behave in a very unpredictable wayif one uses the class of non-linear processes in the sense thatthis sample ACF may converge in distribution to anon-degenerate r.v. depending on h.For linear models it converges in distribution to a constantdepending on h, Davis and Resnick (1985).

Natalia Markovich Analysis methods of heavy-tailed data

Page 46: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Confidence intervals of the sample ACF: linearprocesses.

The causal ARMA process (autoregressive moving average)

Xt has the representation

Xt =∞∑

j=0

ψjZt−j , t = 0,±1, ..., (1)

where Zt is an i.i.d. noise sequence and {ψj} is a sequence ofreal numbers providing the convergence of random series in(1), i.e.,

∑∞j=0 |ψj | <∞, Brockwell and Davis (1991).

The model ARMA(p,q) has the formXt =

∑pj=0 θjXt−j +

∑qj=0 ψjZt−j , t = 1, ...,n.

The model MA(q) has the form Xt =∑q

j=0 ψjZt−j .

Natalia Markovich Analysis methods of heavy-tailed data

Page 47: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Confidence intervals of the sample ACF: linearprocesses.

Bartlett’s formula.

If the process is linear (1),∑∞

j=−∞ |ψj | <∞ and EZ 4t <∞, then

standard sample ACF ρn,X (i) has the asymptotic joint normaldistribution with mean ρX (i) (i.e. ACF) and variancevar

(ρn,X (i)

)= cii/n,

cii =∞∑

k=−∞

[ρ2X (k + i) + ρX (k − i)ρX (k + i) + 2ρ2

X (i)ρ2X (k)

− 4ρX (i)ρX (k)ρX (k + i)], (2)

as n → ∞.The Bartlett’s formula (2) allows to check the hypothesisρn,X (i) = 0.

Natalia Markovich Analysis methods of heavy-tailed data

Page 48: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Confidence intervals of the sample ACF.

Example

For i.i.d. white noise Zt we have ρZ (0) = 1 and ρZ (i) = 0 fori 6= 0 (since Zt and Zt+i are independent) andvar

(ρn,Z (i)

)= 1/n by (2).

For ARMA process driven by such a white noise Zt

1 the sample ACF is approximately normally distributed withmean 0 and variance 1/n for sufficiently large n.

2 It provides 95% confidence interval with the bounds ±1.96/√

nfor the sample ACF.

3 The hypothesis ρn,X (i) = 0 is accepted if ρn,X (i) falls within thisinterval.

What would happened if1 the noise Zt is not normal, and (or)2 the process is not linear?

Natalia Markovich Analysis methods of heavy-tailed data

Page 49: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Confidence intervals of the heavy-tailed sample ACFρn,X (h): linear processes.

ARMA process with i.i.d. regularly varying noise and tail index0 < α < 2 (infinite variance).

ρn,X (h) estimates the quantity∑

j ψjψj+h/∑

j ψ2j that represents

the autocorrelation cov(X0,Xh) in the case of a finite variance.

Illusion: heavy-tailed sample ACF ρn,X (h) can be applied toheavy-tailed processes without problem.

Recommendation (Resnick (2006), p.349):

For α < 1 use ρn,X (h);

for 1 < α < 2 the classical sample ACF ρn,X (h).

The calculation of confidence intervals in both cases is noteasy.

Natalia Markovich Analysis methods of heavy-tailed data

Page 50: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Confidence intervals of the heavy-tailed sample ACFρn,X (h): nonlinear processes.

GARCH(p,q) process (Mikosch (2002)):

Xt = µ+ σtZt , t ∈ Z ,

σ2t = α0 +

p∑

i=1

αiX2t−i +

q∑

j=1

βjσ2t−j ,

(Zt) is an i.i.d. sequence,

σt and Zt are independent for a fixed t ,

the mean µ is estimated from the data, particularly µ = 0,

αi and βj are non-negative constants.

If the marginal distribution of the time series is veryheavy-tailed, namely, the fourth moment is infinite, then theasymptotic normal confidence bounds for the sample ACF arenot applicable anymore.

Natalia Markovich Analysis methods of heavy-tailed data

Page 51: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Testing of long-range dependence (LRD).

Definition

A stationary process (Xt) is long range dependent if∑∞h=0 |ρX (h)| = ∞, where ρX (h) is the ACF at lag h ∈ Z , and

short range dependent otherwise.

This property implies that even though ρX (h)’s areindividually small for large lags, their cumulative effect isimportant.

To detect LRD by statistical procedures one replaces ρX (h)by the sample ACF ρn,X (h).

The LRD effect is typical for long time series, e.g., severalthousand points. One can look at lags 250,300,350, etc.

Usual short range dependent data sets would show asample ACF dying after only a few lags and then persistingwithin the 95% Gaussian confidence window ±1.96/

√n.

Natalia Markovich Analysis methods of heavy-tailed data

Page 52: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Testing of long-range dependence: Hurst parameter.

One can assume that for some constant cρ > 0

ρX (h) ∼ cρh2(H−1) for large h and H ∈ (0.5,1),

(LRD case).

The constant H ∈ (0.5,1) is called the Hurst parameter.

The closer H is to 1 the slower is the rate of ρX (h) to zero ash → ∞, i.e., the longer is the range of dependence in the timeseries.

Kettani and Gubner (2002):

Hn = 0.5(1 + log2(1 + ρn,X (1))

).

Natalia Markovich Analysis methods of heavy-tailed data

Page 53: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Dependence detection by bivariate data

Measures

Rank coefficient Kendall’s τ can be estimated by the sample:

ρτ =2Sτ

n(n − 1), where Sτ =

n∑

i=1

n∑

j=i+1

sign(rj − ri

),

where ri is the order number of the individuum by the secondproperty with the number i by the first property.

Rank coefficient Spearman’s ρ can be estimated by the sample:

ρS = 1 − 6Sρ

n3 − n, where Sρ =

n∑

i=1

(ri − i)2

Pickands dependence function A(t).

ρτ and ρS can be represented by means of A(t).Natalia Markovich Analysis methods of heavy-tailed data

Page 54: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Pickands dependence function

Let (X1,Y1), ..., (Xn,Yn) be a bivariate i.i.d. random sample

with a bivariate extreme value distribution G(x , y).

Example: X1 is a TCP-flow file size, Y1 is a duration of itstransmission.

Similarly to univariate case it implies, that there existnormalizing constants aj,n > 0 and bj,n ∈ R, j = 1,2 such thatas n → ∞,

P{(M1,n − b1,n

)/a1,n ≤ x ,

(M2,n − b2,n

)/a2,n ≤ y} (3)

= F n (a1,nx + b1,n,a2,ny + b2,n)→ G(x , y),

where M1,n = max (X1, ...,Xn), M2,n = max (Y1, ...,Yn) are thecomponent-wise maxima.The vector (M1,n,M2,n) will in general not be presence in theoriginal data.

Natalia Markovich Analysis methods of heavy-tailed data

Page 55: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Pickands dependence function.

Definition

Let F1(x) and F2(y) be the DFs of X and Y .

Let Gj(x), j = 1,2 be a univariate extreme value DF and Fj(x)is in its domain of attraction.

G(x , y) may be determined by margins G1(x) and G2(y) by therepresentation

G(x , y) = exp(

log (G1(x)G2(y)) A(

log (G2(y))

log (G1(x)G2(y))

)),

where A(t), t ∈ [0,1], is the Pickands dependence function.

Natalia Markovich Analysis methods of heavy-tailed data

Page 56: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Pickands dependence function: properties.

In the bivariate case the function A(t) satisfies two properties:

1 (1 − t) ∨ t ≤ A(t) ≤ 1, t ∈ [0,1], i.e. A(0) = A(1) = 1 andlies inside the triangle determined by points (0,1), (1,1)and (0.5,0.5);

2 A(t) is convex.

Case A(t) ≡ 1 corresponds to a total independence

and A(t) = (1 − t) ∨ t corresponds to a total dependence.

Natalia Markovich Analysis methods of heavy-tailed data

Page 57: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

A-estimators

Caperaa et al. (1997):

AHTn (t) =

((1/n)

n∑

i=1

min

(ξi/ξn

1 − t,ηi/ηn

t

))−1

,

Hall and Tajvidi (2000):

log ACn (t) =

1n

n∑

i=1

log max(

t ξi , (1 − t)ηi

)

−t1n

n∑

i=1

log ξi − (1 − t)1n

n∑

i=1

log ηi .

Here ξi = − log G1(Xi) and ηi = − log G2(Yi), i = 1, ...,n,ξn = n−1∑n

i=1 ξi , ηn = n−1∑ni=1 ηi .

Natalia Markovich Analysis methods of heavy-tailed data

Page 58: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Problems of A-estimators

1 The estimators are not convex. They may be improved bytaking a convex hull.

2 The margins G1(x) and G2(x) are unknown. One has toreplace them by their estimates G1(x) and G2(x), e.g., byempirical DFs constructed by component-wise maximaover blocks of data. The amount of these maxima may bevery moderate that may reflect on the accuracy of anempirical DF.

3 The component-wise maxima may be not observabletogether, i.e. there are such pairs of maxima which do notpresence in the sample.

Natalia Markovich Analysis methods of heavy-tailed data

Page 59: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Bivariate quantile curve

Both estimates of A(t) allow to get the estimate G(x , y) andconstruct bivariate quantile curves

Q(G,p) = {(x , y) : G(x , y) = p}, 0 < p < 1, (4)

G(x , y) ≈ exp

log

(G1(x)G2(y)

)A

log(

G2(y))

log(

G1(x)G2(y))

,

assuming that G1(x) = p(1−w)/A(w) and G2(x) = pw/A(w) forsome w ∈ [0,1] in order to get G(x , y) = p, Beirlant et al.(2004).The quantile curve consists of the points

Q(G,p) = {(

G−11 (p(1−w)/A(w)), G−1

2 (pw/A(w)))

: w ∈ [0,1]}.

Natalia Markovich Analysis methods of heavy-tailed data

Page 60: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Web-traffic analysis

Characteristics of sub-sessions:the size of a sub-session (s.s.s);

the duration of a sub-session (d.s.s.).

Characteristics of the transferred Web-pages:

the size of the response (s.r.);

the inter-response time (i.r.t.).

Natalia Markovich Analysis methods of heavy-tailed data

Page 61: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Description of the Web data

Main statistical characteristics of the Web-data.

s.s.s.(B) d.s.s.(sec) s.r.(B) i.r.t.(sec)Sample 373 373 7107 7107SizeMini 128 2 0 6.543 · 10−3

mumMaxi 5.884 · 107 9.058 · 104 2.052 · 107 5.676 · 104

mumMean 1.283 · 106 1.728 · 103 5.395 · 104 80.908StDev 4.079 · 106 5.206 · 103 4.931 · 105 728.266

Natalia Markovich Analysis methods of heavy-tailed data

Page 62: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Results of the Web traffic analysis.

Estimation of the tail index for Web-traffic characteristics.

r.v. c mb γbl γp

l γHn,k

s.s.s. 0.3 8 1.179 0.877 0.960.4 10 0.8560.5 22 0.902

s.r. 0.3 72 0.75 0.763 0.7630.4 71 0.870.5 92 0.85

i.r.t. 0.3 42 0.69 0.495 0.380.4 65 0.6250.5 156 0.611

d.s.s. 0.3 10 0.658 0.739 0.50.4 13 0.5390.5 18 0.683

Natalia Markovich Analysis methods of heavy-tailed data

Page 63: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Results of the Web traffic analysis.

Notations to the previous table:

γbl is the group estimator with the bootstrap selected

parameter m (mb);

γpl is the group estimator with the plot selected parameter

m;

γHn,k is the Hill’s estimator with the plot selected parameter

k .

Natalia Markovich Analysis methods of heavy-tailed data

Page 64: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

The EVI estimation by the Hill’s estimator and the groupestimator γl for the data sets size of sub-sessions (left) andduration of sub-sessions (right).

Natalia Markovich Analysis methods of heavy-tailed data

Page 65: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

The EVI estimation by the Hill’s estimator and the groupestimator γl for the data sets inter-response times (left) andsize of responses (right).

Natalia Markovich Analysis methods of heavy-tailed data

Page 66: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Conclusions from analysis of the tail index:

1 the distributions of considered Web-traffic characteristicsare heavy-tailed;

2 at least βth moments, β ≥ 2 of the distribution of the s.s.s.,s.r., d.s.s. are not finite;

3 the distribution of i.r.t. has two finite moments;

4 it might be possible for s.s.s. (when 1 < γ < 2) that α < 1and the expectation could be also not finite.

Natalia Markovich Analysis methods of heavy-tailed data

Page 67: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Results of the Web traffic analysis.

Analysis of values Rn(p).

the values Rn(p) are dramatically large for large n andp ≥ 2, e.g., in the case of the duration of sub-sessions andn = 350 ln Rn(p) ≈ 10 for p = 2 and ln Rn(p) ≈ 103 forp = 3.

Hence, one may conclude that all moments of theconsidered r.v.s apart from the first one are not finite.

Natalia Markovich Analysis methods of heavy-tailed data

Page 68: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

n → ln Rn(p) of the duration of sub-sessions (left) and thesize of sub-sessions (right) for a variety of p-values.

Natalia Markovich Analysis methods of heavy-tailed data

Page 69: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

n → ln Rn(p) of the inter-response times (left) and the sizeof responses (right) for a variety of p-values.

Natalia Markovich Analysis methods of heavy-tailed data

Page 70: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Results of the Web traffic analysis.

Analysis of the mean excess function.

the plots u → en(u) tend to infinity for large u implyingheavy tails.

These plots are close to a linear shape for all sets of data.The latter implies that the considered distributions can bemodelled by a DF of a Pareto type.

Natalia Markovich Analysis methods of heavy-tailed data

Page 71: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Exceedance en(u) against the threshold u for the durationof sub-sessions (left) and the size of sub-sessions (right).

Natalia Markovich Analysis methods of heavy-tailed data

Page 72: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Exceedance en(u) against the threshold u for theinter-response times (left) and the size of responses (right).

Natalia Markovich Analysis methods of heavy-tailed data

Page 73: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Results of the Web traffic analysis.

Analysis of QQ-plots.

Two versions of QQ-plots are shown.

Right top plots show, that the distribution of the d.s.s. isclose to lognormal and Pareto distributions; distributions ofs.s.s., i.r.t. and s.r. are close to Pareto.

Left top plots show that the exponential distribution cannotbe used as an appropriate model for these r.v.s.;distributions of d.s.s., s.s.s., i.r.t. and s.r. are close to aGeneralized Pareto distribution with the DF

Ψσ,γ(x) =

{1 − (1 + γx/σ)−1/γ , γ 6= 0,1 − exp (−x/σ) , γ = 0,

with different values of the parameters γ and σ.

The QQ-plot does not give a unique model to fit theunderlying distribution.

Natalia Markovich Analysis methods of heavy-tailed data

Page 74: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

QQ-plots for the duration of sub-sessions.

Left: exponential quantiles (top) and quantiles of theGPD(0.3;1) (bottom) against d.s.s./s (the linear curvescorrespond to the QQ-plot of the same distributions).Right: Empirical DFs of the r.v. Ui = F (Xi). Exponential,Pareto, Weibull, lognormal and normal distributions are used asmodels of F . The linear curve corresponds to the exponentialdistribution.

Natalia Markovich Analysis methods of heavy-tailed data

Page 75: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

QQ-plot for the size of sub-sessions.

Left: exponential quantiles (top) and quantiles of theGPD(1;0.015) (bottom, first plot) and the GPD(0.05;0.3)(bottom, second plot) against s.s.s./s (the linear curvescorrespond to the QQ-plot of the same distributions).Right: Empirical DFs of the r.v. Ui = F (Xi). Exponential,Pareto, Weibull, lognormal and normal distributions are used asthe models of F . The linear curve corresponds to theexponential distribution.

Natalia Markovich Analysis methods of heavy-tailed data

Page 76: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

QQ-plots for the inter-response times.

Left: exponential quantiles (top) and quantiles of theGPD(0.8;0.015) (bottom) against i.r.t./s (the linear curvescorrespond to the QQ-plot of the same distributions).Right: Empirical DFs of the r.v. Ui = F (Xi). Exponential,Pareto, Weibull, lognormal and normal distributions are used asmodels of F . The linear curve corresponds to the exponentialdistribution.

Natalia Markovich Analysis methods of heavy-tailed data

Page 77: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

QQ-plot for the size of responses.

Left: exponential quantiles (top) and quantiles of theGPD(1;0.015) (bottom) against s.r./s (the linear curvescorrespond to the QQ-plot of the same distributions).Right: Empirical DFs of the r.v. Ui = F (Xi). Exponential,Pareto, Weibull, lognormal and normal distributions are used asthe models of F . The linear curve corresponds to theexponential distribution.

Natalia Markovich Analysis methods of heavy-tailed data

Page 78: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Results of the Web traffic analysis.

ACF analysis of Web-data.

The previous analysis shows that

the considered Web data are heavy-tailed with infinitevariance. Therefore, the application of formula

ρn,X (h) =

∑n−ht=1 XtXt+h∑n

t=1 X 2t

is relevant.

Natalia Markovich Analysis methods of heavy-tailed data

Page 79: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Summary results of the preliminary analysis

Comparison of the recommended methods for Web traffic data

Amount of finite moments Type of distributionr.v. Rn(p) Hill & Group QQ-plot en(u)

estimators.s.s. 1 1 GPD(0.015; 1), Pareto(B) GPD(0.05; 0.3) -liked.s.s. 1 1 GPD(1; 0.3), Pareto(sec) lognormal -likes.r. 1 1 GPD(0.015; 1) Pareto(B) -likei.r.t. 1 2 GPD(0.015; 0.8) Pareto(sec) -like

Natalia Markovich Analysis methods of heavy-tailed data

Page 80: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Testing of dependence

ACF estimation by the modified sample ACF and the standardsample ACF for the data sets s.s.s. (first two plots left), d.s.s.(last two plots right).

The dotted horizontal lines indicate 95% asymptotic confidencebounds (±1.96/

√n) corresponding to the ACF of i.i.d.

Gaussian r.v.s.

Natalia Markovich Analysis methods of heavy-tailed data

Page 81: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Testing of dependence

ACF estimation by the modified sample ACF and the standardsample ACF for the data sets i.r.t. (first two plots left), s.r. (lasttwo plots right).

The dotted horizontal lines indicate 95% asymptotic confidencebounds (±1.96/

√n) corresponding to the ACF of i.i.d.

Gaussian r.v.s.

Natalia Markovich Analysis methods of heavy-tailed data

Page 82: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Hurst parameter estimation for Web traffic data

Table: Hurst parameter estimation for Web traffic.

Data s.s.s. d .s.s. i .r .t . s.r .Hn 0.493 0.488 0.508 0.507

Main conclusion:

all data sets are heavy-tailed and not long-range dependent.

Natalia Markovich Analysis methods of heavy-tailed data

Page 83: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

TCP-flow analysis

We observe

TCP-flow sizes S and

durations D

gathered from one source destination pair.

Motivation is to estimate

the distribution of the maximal rate (or throughput)R = S/D and

the expected throughput R (or S/D)

that the transport system provides.

Natalia Markovich Analysis methods of heavy-tailed data

Page 84: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Description of the TCP-flow data

The analyzed data consist of

TCP-flow sizes and durations of transmissions have beenmeasured from the mobile network of the Finnish operatorElisa;

mobile TCP connections from periods of low, average and highnetwork load conditions;

TCP flows on port 80 (a WWW (HTTP) application).

The number of analyzed flows is 610 000 and, for practicalreasons, we consider 61 disjoint bivariate samples, each of sizen = 10 000.

Natalia Markovich Analysis methods of heavy-tailed data

Page 85: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Description of the TCP-flow data

Statis Unit Definition Sample Mean Sample Variance

tic Min Max Min MaxSize kB Content 9.0 20.3 1303 204553

Transmitted 9.5 20.1 1357 206658

Dura sec SYN-FIN 18.2 30.4 2219 52125tion

The results, [min,max] ranges over all 61 samples.

’Content’ refers to the size of the downloaded web content and’Transmitted’ means Content plus segments retransmitted byTCP. Both are measures of the size of a flow. ’SYN-FIN’ meansfrom the three-way handshaking (synchronization) to finish.

Natalia Markovich Analysis methods of heavy-tailed data

Page 86: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Results of the TCP-flow data analysis

Table: Estimation of the EVI for flow sizes (“Content” and“Transmitted”) and durations (“SYN-FIN”)

γH (n, k) γl γM(n, k) γUH(n, k)Min Max Min Max Min Max Min Max

Content 0.59 0.87 0.45 0.98 0.51 0.98 0.52 0.99

Transmit 0.58 1.15 0.45 0.94 0.52 0.96 0.53 0.97ted

SYN-FIN 0.52 1.00 0.38 0.77 0.36 0.86 0.37 0.82

Natalia Markovich Analysis methods of heavy-tailed data

Page 87: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Conclusions from analysis of the tail index forTCP-flow data:

1 the distributions of TCP-flow size and duration areheavy-tailed;

2 all estimators apart of the group estimator γl indicate thatthe flow sizes samples (both content and transmitted) mayhave infinite variance under the assumption that theirdistributions are regularly varying;

3 some samples of flow durations may have two finite firstmoments.

Natalia Markovich Analysis methods of heavy-tailed data

Page 88: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Results of the TCP-flow data analysis

Table: Comparison of the “rough” methods for TCP-flow data

Amount of first finite Type of distributionmoments

Rn(p) Estimators of γ QQ-plot en(u)

Content 1 2 or 1 GPD(1,1.3) Pareto-like

Transmit 1 2 or < 1 GPD(1,1.3) Pareto-ted likeSYN-FIN < 1 2 or < 1 GPD(1,0.85) Pareto-

like

Natalia Markovich Analysis methods of heavy-tailed data

Page 89: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Testing of dependence

ACF estimation by the modified sample ACF and the standardsample ACF of one sub-sample (n = 1000) of the TCP-flowsizes (first two plots left), and durations (last two plots right).

The horizontal lines indicate 95% asymptotic confidencebounds (±1.96/

√n).

Conclusions:

The TCP-flow sizes may be independent.

The ACFs of the TCP-flow durations have three clusters thatmay indicate the dependence.

Natalia Markovich Analysis methods of heavy-tailed data

Page 90: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Testing of long-range dependence

Table: Hurst parameter estimation for Web traffic and TCP-flow data

Data TCP − flow size TCP − flow durationHn 0.498 0.506

Main conclusion:

TCP-flow size and duration data sets are heavy-tailed and notlong-range dependent.

Natalia Markovich Analysis methods of heavy-tailed data

Page 91: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Problems of throughput investigation

The distributions of both S and D are heavy-tailed andtheir expectations may not be finite. Thus, ES/D may benot computable.

Since S and D are dependent and positive, then the DF ofthe ratio R = S/D is defined by

FR(x) = P{S/D ≤ x} =

∫ ∞

0

∫ zx

0f (y , z)dydz

=

∫ ∞

0

∫ zx

0dF (y , z),

f (y , z) is a joint PDF of S and D, and its expectation by

ER =

∫ ∞

0xdFR(x),

if the latter integral converges.

Natalia Markovich Analysis methods of heavy-tailed data

Page 92: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Bivariate analysis of TCP-flow data

First, we check the dependencebetween the pairs

(S1,D1), ..., (Sn,Dn) to apply(3). For this purpose, we cancalculate the ACF of the r.v.sri =

√S2

i + D2i , i = 1, ...,n.

The sample ACF of {ri} is smallin absolute value at all lags(possible exception are two lagsthat do not persist within 95%confidence interval). One maysuppose that the sizes-durationpairs are independent.

Natalia Markovich Analysis methods of heavy-tailed data

Page 93: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Bivariate analysis of TCP-flow data

0 10 20 30 40

Block Maximaof Size (MB)

0

1

2

3

4

Blo

ckM

axim

aof

Dur

atio

n(h

ours

)

Scatter plot of pairs of blockmaxima (M j

S,m,MjD,m),

j = 1, . . . ,610, when the blocksize is m = 1 000. The pairsthat are presented in the initialsample are marked by dots asfar as the pairs that are notpresented (e.g., the maximalsize in the group does notnecessarily correspond to themaximal duration in this group)by circles. Lines D = S/384and D = S/42 indicate 384 kb/s(EDGE) and 42 kb/s (GPRS)access rates.

Natalia Markovich Analysis methods of heavy-tailed data

Page 94: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Testing of dependence of the block maxima

0 10 20 30 40 50 60

Lag

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

AC

F

0 10 20 30 40 50 60

Lag

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

AC

F

Estimates of standard sample ACF of the both maxima samples

of size 61 corresponding to TCP-flow sizes (left), and durations(right). The dotted horizontal lines indicate 95% asymptoticconfidence bounds ±1.96/

√n.

Natalia Markovich Analysis methods of heavy-tailed data

Page 95: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Testing of dependence of the block maxima

0 100 200 300 400 500 600

Lag

-0.2

-0.1

0

0.1

0.2

AC

F

0 100 200 300 400 500 600

Lag

-0.2

-0.1

0

0.1

0.2

AC

F

Estimates of standard sample ACF of the both maxima samples

of size 610 corresponding to TCP-flow sizes (left), anddurations (right). The dotted horizontal lines indicate 95%asymptotic confidence bounds ±1.96/

√n.

Natalia Markovich Analysis methods of heavy-tailed data

Page 96: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Testing of distribution of the block maxima

The Generalized Extreme Value (GEV) distribution

Hγ(x) =

{exp(−(1 + γ

( x−µσ

))−1/γ), γ 6= 0

exp(−e−(x−µ

σ)), γ = 0.

is applied as a model of the block maxima distribution.

Maximum likelihood estimates of GEV parameters by blockmaxima of size 610 of TCP-flow data

Statistic Definition γ µ σ

Size Content 0.332259 7075.92 4605.53Duration SYN-FIN 0.10263 3775.8 2433.27

Natalia Markovich Analysis methods of heavy-tailed data

Page 97: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Testing of distribution of the block maxima

10000 20000 30000 40000

Sorted Sample

10000

20000

30000

40000

Qua

ntile

sof

GE

V

2000 4000 6000 8000 100001200014000

Sorted Sample

2500

5000

7500

10000

12500

15000

Qua

ntile

sof

GE

V

QQ-plots of block maxima samples

corresponding to TCP-flow sizes (left) and durations (right).

Natalia Markovich Analysis methods of heavy-tailed data

Page 98: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Testing of dependence of the TCP-flow data

0 0.2 0.4 0.6 0.8 1

t

0.5

0.6

0.7

0.8

0.9

1A

(t)

0 0.2 0.4 0.6 0.8 1

t

0.5

0.6

0.7

0.8

0.9

1

A(t

)

The estimation of the Pickands dependence function

by estimators ACn (t) (dashed line) and AHT

n (t) (solid line). Themaxima sample of size 61 (left) and of size 610 (right). Themarginal distributions G1(x) and G2(x) of TCP-flow sizes anddurations are estimated by GEV.

Conclusions: TCP-flow size and duration are dependent.

Natalia Markovich Analysis methods of heavy-tailed data

Page 99: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Bivariate quantile curves of the TCP-flow data

Using estimates of A(t) one can construct bivariate quantilecurves of TCP-flow data by (4).

0 10000 20000 30000 40000 50000

Flow Size

0

5000

10000

15000

20000

Flo

wD

urat

ion

p = 0.9

p = 0.95

p = 0.75

0 10000 20000 30000 40000 50000

Flow Size

0

5000

10000

15000

20000

Flo

wD

urat

ion

p = 0.9

p = 0.95

p = 0.75

p = 0.975

Estimated quantile curves of TCP-flow data

for p ∈ {0.75,0.9,0.95} corresponding to estimator ACn (t): the

maxima sample of size 61 (left), of size 610 (right).

Natalia Markovich Analysis methods of heavy-tailed data

Page 100: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Conclusions from bivariate analysis of TCP-flow data

The analysis is made from samples of moderate size.

Size S and duration D are heavy-tailed with probablyinfinite second moment.

Their distributions are complicated in the sense that theydo not belong to any known parametric models.

Estimates of the Pickands dependence function show thatS and D are dependent.

Bivariate quantile curves show that the bivariate extremevalue distribution of (S,D) is ’not quite heavy-tailed’ in thesense that not many observations fall in the ”outliers area”,i.e. beyond the 97.5% quantile curve. This can be a specialproperty of this mobile TCP data.Bivariate quantile curves are sensitive to, at least,

1 estimation of parameters of margins of G(x , y) andestimates of A(t) and

2 the amount of component-wise maxima, or to the block size.Natalia Markovich Analysis methods of heavy-tailed data

Page 101: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Notes

Considering the real data three items have to be investigated:

1 the preliminary detection of heavy tails;2 the dependence structure of univariate data;3 the dependence structure of multivariate data.

Natalia Markovich Analysis methods of heavy-tailed data

Page 102: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Reference:

1. Markovich, N.M. and Krieger, U.R. (2005) StatisticalInspection and Analysis Techniques for Traffic Data Arisingfrom the Internet, HETNETs’04 special journal issue on”Convergent Multi-Service Networks and Next GenerationInternet: Performance Modelling and Evaluation”pp.72/1-72/9.

2. Beirlant, J., Goegebeur, Y., Teugels, J. and Segers, J.(2004) Statistics of Extremes: Theory and Applications.Wiley, Chichester, West Sussex.

3. Davis, R. and Resnick, S. (1985) Limit theory for movingaverages of random variables with regularly varying tailprobabilities. Ann. Probability 13, pp.179-195.

4. Embrechts, P., Kluppelberg, C. and Mikosch, T. (1997)Modeling Extremal Events. Springer, Berlin.

5. Brockwell P.J. and Davis R.A. (1991) Time series:Theory and Methods, 2nd edition. Springer, New York.

Natalia Markovich Analysis methods of heavy-tailed data

Page 103: Analysis methods of heavy-tailed data - John Wiley & … Analysis methods of heavy-tailed data Natalia Markovich Institute of Control Sciences Russian Academy of Sciences, Moscow,

logo

Reference:

6. Kettani, H. and Gubner, J.A. (2002) A novel approach tothe estimation of the hurst parameter in self-similar traffic.In: Proceedings of IEEE conference on local computernetworks, Tampa, Florida.

7. Resnick, S.I. (2006) Heavy-Tail Phenomena. Probabilisticand Statistical Modeling. Springer, New York.

Natalia Markovich Analysis methods of heavy-tailed data