Tail Risk of Multivariate Regular Variation · Tail Risk of Multivariate Regular Variation ... Abstract Tail risk refers to the risk associated with extreme values and is often a

Tail Risk of Multivariate Regular Variation

Harry Joe∗ Haijun Li†

Third Revision, May 2010

Abstract

Tail risk refers to the risk associated with extreme values and is often affected by extremal

dependence among multivariate extremes. Multivariate tail risk, as measured by a coherent risk

measure of tail conditional expectation, is analyzed for multivariate regularly varying distribu-

tions. Asymptotic expressions for tail risk are established in terms of the intensity measure that

characterizes multivariate regular variation. Tractable bounds for tail risk are derived in terms

of the tail dependence function that describes extremal dependence. Various examples involving

Archimedean copulas are presented to illustrate the results and quality of the bounds.

Key words and phrases: Coherent risk, tail conditional expectation, regularly varying, cop-

ula, tail dependence.

MSC2000 classification: 62H20, 91B30.

1 Introduction

The performance (gain or loss, etc.) of a financial portfolio at the end of a given period is often

evaluated by a real-valued random variable X. A risk measure % is defined as a measurable mapping,

with some coherency principles, from the space of all the performance variables into R [28], and

these coherency principles provide a set of operational axioms that % should satisfy in order to

accurately characterize risky behaviors of portfolios. The coherent risk measure, introduced in [5]

for analyzing economic risk of financial portfolios, is an example of such an axiomatic approach.

Let L be the convex cone1 consisting of all the performance variables which represent losses of

financial portfolios at the end of a given period. Note that −X, where X ∈ L, represents the net

worth of a financial position. A mapping % : L → R is called a coherent risk measure if % satisfies

the following four economically coherent axioms:

∗[email protected], Department of Statistics, University of British Columbia, Vancouver, BC, V6T 1Z2,

Canada. This author is supported by NSERC Discovery Grant.†[email protected], Department of Mathematics, Washington State University, Pullman, WA 99164, U.S.A.

This author is supported in part by NSF grant CMMI 0825960.1A subset L of a linear space is a convex cone if x1 ∈ L and x2 ∈ L imply that λ1x1 + λ2x2 ∈ L for any λ1 > 0

and λ2 > 0. A convex cone is called salient if it does not contain both x and −x for any non-zero vector x.

1

1. (monotonicity) For X1, X2 ∈ L with X1 ≤ X2 almost surely, %(X1) ≤ %(X2).

2. (subadditivity) For all X1, X2 ∈ L, %(X1 +X2) ≤ %(X1) + %(X2).

3. (positive homogeneity) For all X ∈ L and every λ > 0, %(λX) = λ%(X).

4. (translation invariance) For all X ∈ L and every l ∈ R, %(X + l) = %(X) + l.

The interpretations of these axioms have been well documented in the literature (see, e.g., [28] for

details), and risk %(X) for loss X corresponds to the amount of extra capital requirement that has

to be invested in some secure instrument so that the resulting position %(X)−X is acceptable to

regulators/supervisors. The general theory of coherent risk measures was developed for arbitrary

real random variables in [12], and the convex measures that combine subadditivity and positive

homogeneity into the convexity property were extended to cadlag processes in [9], and to abstract

spaces in [14] that include deterministic, stochastic, single or multi-period cash-stream structures.

It follows from the duality theory that any coherent risk measure %(X) arises as the supremum

of expected values of X, taken over with respect to a convex set of probability measures on envi-

ronmental states, all of them being absolutely continuous with respect to the underlying physical

measure. If the set is taken to be the set of all conditional probability measures conditioning on

events with probability greater than or equal to p, 0 < p < 1, then the corresponding coherent

risk measure is known as the worst conditional expectation WCEp(X), which, in the case that loss

variable X is continuous, equals to the tail conditional expectation (TCE) defined as follows,

TCEp(X) := E(X | X > VaRp(X)), (1.1)

where VaRp(X) := infx ∈ R : PrX > x ≤ 1 − p is known as the Value-at-Risk (VaR) with

confidence level p (i.e., p-quantile). The VaR has been widely used in risk management, but it

violates the subadditivity of coherency on convex cone L and often underestimates risks. Although

VaR is coherent on a much smaller convex cone consisting of only linearized portfolio losses from

elliptically distributed risk factors, the non-subadditivity of VaR can occur in the situations where

portfolio losses are skewed or heavy-tailed with asymmetric dependence structures [28]. It can be

shown that for continuous losses, TCE is the average of VaR over all confidence levels greater than

p, focusing more than VaR does on extremal losses. Thus, TCE is more conservative than VaR at

the same level of confidence (i.e., TCEp(X) ≥ VaRp(X)) and provides an effective tool for analyzing

tail risks. The TCE is also related to the expected residual lifetime, a performance measure widely

used in reliability theory and survival analysis.

For light-tailed loss distributions, such as normal distributions, TCE and VaR at the same

level p of confidence are asymptotically equal as p → 1. Another example of light-tailed losses is

the phase-type distribution2. The explicit relation between TCE and VaR for the phase-type loss

distributions was obtained in [8], from which asymptotic equivalence of TCE and VaR as p → 1

2That is, the hitting time distribution of a finite-state Markov chain.

2

is evident. It is precisely the heavy-tails of loss distributions that make TCE more effective in

analyzing tail risks. Formally, a non-negative loss variable X with distribution function (df) F has

a heavy or regularly varying right tail at ∞ with heavy-tail index α if its survival function is of the

following form (see, e.g., [7] for detail),

F (r) := 1− F (r) = r−αL(r), r > 0, α > 0, (1.2)

where L is a slowly varying function; that is, L is a positive function on (0,∞) with property

limr→∞

L(cr)

L(r)= 1, for every c > 0. (1.3)

For example, the Pareto distribution with survival function F (r) = (1+r)−α, r ≥ 0, has a regularly

varying tail. It can be easily verified that if α > 1 for Pareto loss variable X, then

TCEp(X) ≈ α

α− 1VaRp(X), as p→ 1. (1.4)

In fact, (1.4) holds for any loss distribution (1.2) with heavy-tail index α > 1. Observe that

TCEp(X) =E(XIX > VaRp(X))

PrX > VaRp(X)

=1

PrX > VaRp(X)

(VaRp(X) PrX > VaRp(X)+

∫ ∞VaRp(X)

PrX > xdx

), (1.5)

where I(A) hereafter denotes the indicator function of set A. By the Karamata theorem (see, e.g.,

[31]), we have∫ ∞VaRp(X)

PrX > xdx ≈ 1

α− 1VaRp(X) PrX > VaRp(X), as p→ 1. (1.6)

Plug this estimate into (1.5), we obtain (1.4) for any regularly varying distribution with α > 1.

The asymptotic formula (1.4) of TCE for univariate tail risks is fairly straightforward, but

the multivariate case remains unsettled and is the focus of this paper. Consider a random vector

X = (X1, . . . , Xd) from a multi-assets portfolio at the end of a given period, where the i-th

component Xi corresponds to the loss of the financial position on the i-th market. A risk measure

R(X) for loss vector X corresponds to a subset of Rd consisting of all the deterministic portfolios

x such that the modified positions x−X is acceptable to regulators/supervisors. The coherency

principles that are similar to the univariate case were formulated in [20] for multivariate risk measure

R(X), and it was further shown in [6] that for continuous loss vectors, multivariate TCEs are

coherent in the sense of [20]. Note, however, that multivariate TCEs, to be formally defined in

Section 2, are subsets of Rd, which lack tractable expressions even for some widely used multivariate

distributions, such as multivariate normals. The effect of dependence among losses X1, . . . , Xd in

different assets on the multivariate TCE also remains difficult to understand. In this paper, we

3

study asymptotic behaviors of multivariate TCEs for multivariate regularly varying distributions.

Our method, based on tail dependence functions developed in [29, 18], not only yields explicit

asymptotic expressions of multivariate TCEs for various multivariate distributions, but also leads

to better insights into how the dependence among extreme losses would affect analysis on tail risks.

The rest of the paper is organized as follows. In Section 2, we briefly discuss the multivariate

coherent risk measures, and then obtain the tail estimates of TCEs for multivariate regular variation

in terms of intensity measures and their asymptotic bounds in terms of tail dependence functions.

In Section 3, we present several examples to examine the quality of the bounds. Section 4 concludes

the paper with some remarks and Appendix in Section 5 details two lengthy proofs. Throughout

this paper, measurability of functions and sets are often assumed without explicitly mention, and

the maximum operator is denoted by ∨.

2 Tail Risks of Multivariate Regular Variation

To explain the vector-valued coherent risk measures, we use the notations from [20]. Let K be a

closed, salient convex cone1 of Rd such that Rd+ ⊆ K. The convex cone K induces a partial order

on Rd: x ≤K y if and only if y ∈ x + K. Note that a convex cone K must be an upper set3

with respect to partial order ≤K induced by itself. Moreover, if A is an upper set with respect to

partial order ≤K , then for any x ∈ A and k ∈ K, x + k ≥K x, leading to x + k ∈ A and thus

A+K ⊆ A. Observe that we always have A+K ⊇ A due to the fact that any closed convex cone

must contain the origin. Conversely, if A + K = A for some subset A, then for any y ≥K x with

x ∈ A, y ∈ x+K ⊆ A+K = A, implying that A must be upper with respect to partial order ≤K .

Hence, A is an upper set with respect to partial order ≤K if and only if A+K = A.

If K = Rd+, then the ≤K-order becomes the usual component-wise order. For any two loss

random vectors X and Y on the probability space (Ω,F ,P), define X ≤K Y if and only if Y −X ∈K, P-almost surely. Using the partial order ≥K rather than the usual component-wise partial order

can account for some financial market frictions such as transaction cost, etc..

Definition 2.1. Consider random loss vectors on a probability space (Ω,F ,P). A vector-valued

coherent risk measure R(·) is a measurable set-valued map satisfying that R(X) ⊂ Rd is closed for

any loss random vector X and 0 ∈ R(0) 6= Rd, as well as the following axioms:

1. (Monotonicity) For any X and Y , X ≤K Y implies that R(X) ⊇ R(Y ).

2. (Subadditivity) For any X and Y , R(X + Y ) ⊇ R(X) +R(Y ).

3. (Positive Homogeneity) For any X and positive s, R(sX) = sR(X).

4. (Translation Invariance) For any X and any deterministic vector l, R(X + l) = R(X) + l.

3A set S is called upper (lower) with respect to partial order ≤K if s ≤K (≥K) s′ and s ∈ S imply that s′ ∈ S.

4

Note that the risk set R(X) consists of all the deterministic portfolios x such that the multi-

variate portfolio x−X is acceptable to the regulator/supervisor. The motivation for set-valued

risk measures is that investors are sometimes not able to aggregate their multivariate portfolios

on various security markets because of liquidity problems and/or transaction costs between the

different security markets (e.g., having assets in several currencies at the same time). See [20] for

details.

When d = 1, %(X) := infr : r ∈ R(X) is a univariate coherent risk measure satisfying the

four axioms discussed in Section 1, and thus R(X) = [%(X),∞). It was shown in [20] that the

worst conditional expectation for random vector X, defined as

WCEp(X) := x ∈ Rd : E(x−X | B) ≥K 0, ∀B ∈ F with P(B) ≥ 1− p, 0 < p < 1,

is a vector-valued coherent risk measure. Since WCEp(X) = ∩B∈F with P(B)≥1−p(E(X | B) +K)

and K is an upper set, WCEp(X) is also an upper set. For any continuous random vector X,

WCEp(X) equals the tail conditional expectation (TCE) for X, defined as in [6] by,

TCEp(X) := x ∈ Rd : E(x−X |X ∈ A) ≥K 0, ∀A ∈ Qp(X)

=⋂

A∈Qp(X)

(E(X |X ∈ A) +K), 0 < p < 1, (2.1)

where Qp(X) = A ⊆ Rd : A is Borel-measurable and A+K = A,PrX ∈ A ≥ 1− p is the set

of all the upper sets (with respect to ≤K) with probability mass greater than or equal to 1 − p.Observe that TCEp(X) is a convex and upper set that consists of all the portfolios x of capital

reserves that can be used to cover the expected losses E(X |X ∈ A) in the events that X ∈ A.

Note that multivariate coherent risk measures discussed in [20, 6] are defined for essentially

bounded random vectors. To discuss asymptotic properties, these measures have to be extended to

the set of all random vectors on Rd = [−∞,∞]d. This can be done using the idea in [12] that allows

vectors in R(X) to have components taking the value of ∞; that is, the positions corresponding to

these components are so risky, whatever that means, that no matter what the capital added, the

positions will remain unacceptable. We need also to exclude the situations where components of

the vectors in R(X) take the value of −∞, which would mean that arbitrary amounts of capitals

could be withdrawn without endangering the portfolios (see [12] for details). As a matter of fact,

it can be easily verified that TCEp(X) is coherent in the sense of Definition 2.1 if X, which may

not be bounded, has a continuous density function.

The extreme value analysis of TCE TCEp(X) as p → 1 boils down to analyzing asymptotic

behaviors of E(X | X ∈ rB) as r → ∞ for various upper set B, for which multivariate regular

variation suits well. A non-negative random vector X with joint df F is said to have a multivariate

regularly varying (MRV, see [30]) distribution F if there exists a Radon measure µ (i.e., finite on

compact sets), called the intensity measure, on Rd+\0 such that

limr→∞

PrX ∈ rBPr||X|| > r

= µ(B), (2.2)

5

for any relatively compact set B ⊂ Rd+\04 with µ(∂B) = 0, where || · || denote a norm on Rd.Any MRV df F with support in Rd+ admits the following spectral representation: for all continuous

points x of µ,

limr→∞

1− F (rx)

1− F (r1)= lim

r→∞

PrX/r ∈ [0, x]cPrX/r ∈ [0, 1]c

= kµ([0, x]c), (2.3)

where k > 0 is a constant and µ([0, x]c) =∫Sd−1+

max1≤j≤d (uj/xj)α S(du) for a finite measure S on

Sd−1+ := x ∈ Rd+ : ||x|| = 1. Non-degenerate margins Fj , 1 ≤ j ≤ d, of an MRV df F are regularly

varying in the sense of (1.2). Since F1, . . . , Fd are usually assumed to be tail equivalent [31], we

have that F j(x) = Lj(x)/xα, 1 ≤ j ≤ d, where Li(x)/Lj(x) → cij as x → ∞, 0 < cij < ∞. We

assume hereafter that cij = 1 for notational convenience. If cij 6= 1 for some i 6= j, we can properly

rescale the margins and the results still follow. We also assume that the heavy-tail index α > 1 to

ensure the existence of expectations. The examples and properties of MRV distributions, including

the relation between MRV distributions and multivariate extreme value distributions with identical

Frechet margins can be found in [30, 31].

The asymptotic relation between TCEp(X) and intensity measure µ is given below and its

proof is detailed in Appendix in Section 5.

Theorem 2.2. Let X be a non-negative loss vector that has an MRV df with intensity measure µ.

1. Let B be an upper set bounded away from 0. Then limr→∞ r−1E(Xj | X ∈ rB) =∫∞

0µ(Aj(w)∩B)

µ(B) dw =: uj(B;µ), where Aj(w) := (x1, . . . , xd) ∈ Rd : xj > w, 1 ≤ j ≤ d.

2. Let Q||·|| := B ⊆ Rd : B +K = B,B ∩ Sd−1+ 6= ∅, B ⊆ (Bd)c, and Bd := x ∈ Rd : ||x|| < 1denote the open unit ball in Rd with respect to the norm || · ||. As p→ 1,

TCEp(X) ≈⋂

B∈Q||·||

VaR1−(1−p)/µ(B)(||X||) ((u1(B;µ), . . . , ud(B;µ)) +K) .

Remark 2.3. 1. Theorem 2.2 provides the multivariate extension of (1.4) and shows how ex-

tremal dependence, as described by the intensity measure, would quantitatively affect tail

risks. It also provides a unified tool to analyze the structural properties of tail asymptotics of

TCEs for various portfolio and risk aggregations of loss vector (X1, . . . , Xd). For example, the

tail asymptotics of TCEs of the portfolio aggregation∑d

i=1Xi can be obtained from Theorem

2.2 (1) by taking B = x :∑d

i=1 xi > 1 (also see [3]). The tail estimate obtained in Theorem

2.2 (2) can be also applied to analyzing coherent aggregations [20] of extremal risks.

2. Theorem 2.2 (1) can be used in analyzing portfolio tail risk decomposition. For example, for

any 1 ≤ j ≤ d,

E(Xj

∣∣∣ d∑i=1

Xi > VaRp

( d∑i=1

Xi

))≈ VaRp

( d∑i=1

Xi

)uj(B;µ), as p→ 1,

4Here Rd+ = [0,∞]d is compact and the punctured version Rd+\0 is modified via the one-point uncompactification

(see, e.g., [31]).

6

where B = x :∑d

i=1 xi > 1. The tail estimate of E(Xj |

∑di=1Xi > VaRp(

∑di=1Xi)

)provides the contribution to the total tail risk attributable to risk j, as measured by TCEs.

The risk allocation/decomposition with TCE for elliptically distributed loss vectors can be

found in [24].

3. The computation of VaR for the norm ||X|| is difficult in general, but the tail estimate of

VaRp(||X||), when p→ 1, is relatively simple in light of (2.2). The tail estimates of VaR of the

sum are obtained in [2, 1, 4, 22, 13] in a similar spirit. For the maximum norm of loss vector

(X1, . . . , Xd) with identical margins, the VaR can be estimated from the asymptotic relation

Prmax1≤i≤dXi > r ≈ PrX1 > r/µ(B) for sufficiently large r, where B = (1,∞)× Rd−1.

In the situations that the asymptotic expression obtained in Theorem 2.2 may be intractable,

we can utilize the method of tail dependence functions introduced in [29, 18] to derive tractable

bounds for TCE. For notational convenience, we only consider the case where K = Rd+ in the

remainder of this paper.

The idea is to separate the margins from the dependence structure of df F , so that TCE’s

can be expressed asymptotically in terms of the marginal heavy-tail index and tail dependence of

the copula of F . Assume that df F of random vector X = (X1, . . . , Xd) has continuous margins

F1, . . . , Fd, and then from [32], the copula C of F can be uniquely expressed as

C(u1, . . . , ud) = F (F−11 (u1), . . . , F−1d (ud)), (u1, . . . , un) ∈ [0, 1]d,

where F−1j , 1 ≤ j ≤ d, are the quantile functions of the margins. The extremal dependence

of a df F can be described by various tail dependence parameters of its copula C. The upper

tail dependence parameters, for example, are the conditional probabilities that random vector

(U1, . . . , Ud) := (F1(X1), . . . , Fd(Xd)) with standard uniform margins belongs to upper tail orthants

given that a univariate margin takes extreme values:

λU = limu↓0

PrU1 > 1− u, . . . , Ud > 1− u | Ud > 1− u = limu↓0

C(1− u, . . . , 1− u)

u, (2.4)

where C denotes the survival function of C. Bivariate tail dependence has been widely studied

[16], and various multivariate versions of tail dependence parameters have also been introduced

and studied in [21, 25]. In fact, various upper tail dependence parameters can be represented by

the upper tail dependence function [21, 29, 18], defined as follows,

b∗(w) := limu↓0

C(1− uwj , 1 ≤ j ≤ d)

u, ∀w = (w1, . . . , wd) ∈ Rd+. (2.5)

The lower tail dependence can be similarly studied but we focus only on upper tail dependence in

this paper. It was shown in [18] that b∗(w) > 0 for all w ∈ Rd+ if and only if λU > 0. Unlike λU ,

however, the tail dependence function provides all the extremal dependence information [29, 18, 26].

7

Using the inclusion-exclusion principle, we define the upper exponent function of C as follows

a∗(w) :=∑

S⊆1,...,d,S 6=∅

(−1)|S|−1b∗S(wi, i ∈ S;CS), (2.6)

where b∗S(wi, i ∈ S;CS) denotes the upper tail dependence function of the margin CS of C with

component indexes in S.

The intensity measure µ and tail dependence function b∗ of an MRV distribution F are uniquely

determined from each other and their detailed relations can be found in [26]. In particular,

b∗(w) =µ(∏di=1[w

−1/αi ,∞])

µ([1,∞]× Rd−1+ ), and

µ([w,∞])

µ([0,1]c)=b∗(w−α1 , . . . , w−αd )

a∗(1, . . . , 1). (2.7)

Using this equivalence and Theorem 2.2 (1), E(X | X ∈ rB) can be asymptotically expressed in

terms of the tail dependence function b∗ for sufficiently large r. But the asymptotic estimation of

TCEp(X) via Theorem 2.2 (2) is still cumbersome because B ∈ Q||·|| can be quite arbitrary. More

tractable bounds for TCEp(X) can be established directly using the tail dependence, as shown in

the next theorem whose proof is detailed in Appendix in Section 5.

Theorem 2.4. Let X be a non-negative loss vector with an MRV df F and heavy-tail index α > 1.

Assume that the copula C of F has a positive upper tail dependence function b∗(w) > 0. Let ||·||max

denote the maximum norm.

1. For 1 ≤ j ≤ d,

limr→∞

1

rE(Xj |X ∈ r(x,∞]) =

∫ ∞0

b∗(x−α1 , . . . , (wj ∨ xj)−α, . . . , x−αd )

b∗(x−α1 , . . . , x−αd )dwj .

2. Let Sj(b∗, α) :=

∫∞0

b∗(1,...,1,(wj∨1)−α,1,...,1)b∗(1,...,1) dwj , 1 ≤ j ≤ d. For sufficiently small 1− p,

TCEp(X) ⊆ VaR1−(1−p)a

∗(1,...,1)b∗(1,...,1)

(||X||max)(

(S1(b∗, α), . . . , Sd(b

∗, α)) + Rd+).

3. For sufficiently small 1− p,

VaRp(||X||max)(

(s1(b∗, α), . . . , sd(b

∗, α)) + Rd+)⊆ TCEp(X)

where, for 1 ≤ j ≤ d,

sj(b∗, α) :=

α

α− 1

1

b∗(1, . . . , 1)+

∑∅6=S⊆i:i 6=j

(−1)|S|

∫ 10 wjd b

∗j∪S(w−αj , 1, . . . , 1;Cj∪S)

b∗(1, . . . , 1),

and b∗j∪S(w−αj , 1, . . . , 1;Cj∪S) denotes the upper tail dependence function of the multivari-

ate margin Cj∪S evaluated with the j-th argument being w−αj and others being one.

Observe that if d = 1, then Theorem 2.4 (2) and (3) reduce to (1.4). In multivariate risk

management, the upper (subset) bound presented in Theorem 2.4 (3) is more important, because it

provides a set of portfolios of conservative reserves so that even in worst case scenarios the resulting

positions are still acceptable to regulators/supervisors.

8

3 Illustrative Examples of Bounds for Tail Risks

We have some examples to examine the quality of the results in Theorem 2.4 when used as approx-

imations. The examples show that they are better with more tail dependence and a larger ζ, where

ζ is in the exponent of the second order expansion

C(1− uwj , 1 ≤ j ≤ d) ≈ u b∗(w) + u1+ζ b∗2(w), u→ 0. (3.1)

It is intuitive that if ζ is larger (especially if ζ ≥ 1), then the second order term is less important.

Note that for the Frechet upper bound copula, CU (1 − uw) = uminw1, . . . , wd, and there is no

second order term.

Example 3.1. (a) Analysis of complete dependence (the Frechet upper bound). Let CU be the

Frechet upper bound copula of dimension d. Then b∗(w) = minw1, . . . , wd and b∗(1) = 1,

a∗(1) = 1. In part (2) of Theorem 2.4, 1 − (1 − p)a∗/b∗ = p, and for α > 1, Sj(b∗, α) =

1 +∫∞1 min1, w−αdw = 1 + (α− 1)−1 = α/(α− 1). In part (3) of Theorem 2.4, for α > 1,

sj(b∗, α) = α/(α − 1) +

∑∅6=S⊆i:i 6=j(−1)|S|0 = α/(α − 1). That is, the expressions in parts

(2) and (3) coincide.

(b) Analysis of near independence. As the d-variate copula C (with tail dependence) moves

towards independence, b∗(1) → 0 and a∗(1) → d and 1 − (1 − p)a∗(1)/b∗(1) > 0 only if

p > 1−b∗(1)/a∗(1) so that for small b∗(1), the result in part (2) of Theorem 2.4 is non-trivial

only for large p near 1. This is a hint that all of the limiting results of Theorem 2.4 are

worse for weak tail dependence. In this case, one has to use Theorem 2.2 to approximate the

multivariate TCE.

Example 3.2. We show some details for two copula families to illustrate Theorem 2.4. The first

copula is the exchangeable MTCJ copula (or Mardia-Takahasi-Cook-Johnson copula, see [27, 33,

11]), and the second is a mixture of the MTCJ copula and the independence copula. Second order

expansions of the tail dependence functions are obtained and the approximation from part (1) of

Theorem 2.4 is summarized in Tables for some special cases.

(a) The MTCJ copula in dimension d, with dependence increasing in δ, is:

C(u; δ) =[u−δ1 + · · ·+ u−δd − (d− 1)

]−1/δ, δ > 0. (3.2)

Let wj > 0 for j = 1, . . . , d, and let W := w−δ1 + · · ·+ w−δd . Then

C(uw; δ) = u[w−δ1 + · · ·+ w−δd − (d− 1)uδ]−1/δ = uW−1/δ[1− (d− 1)uδ/W]−1/δ

≈ uW−1/δ[1 + (d− 1)δ−1uδ/W] = ub∗(w; δ) + u1+δb∗2(w; δ), as u→ 0,

where b∗(w; δ) =W−1/δ = (w−δ1 +· · ·+w−δd )−1/δ, b∗2(w; δ) = (d−1)δ−1(w−δ1 +· · ·+w−δd )−1/δ−1.

The second order term of C(uw; δ) is O(u1+ζ), where ζ = δ increases with more dependence.

9

Suppose (X1, . . . , Xd) is multivariate Pareto of the form used in [27]; the univariate survival

function is x−α for x > 1 for all d margins and the survival copula is given in (3.2). That is,

F (x) = C(x−α1 , . . . , x−αd ; δ) =[xδα1 + · · ·+ xδαd − (d− 1)

]−1/δ, xj > 1, j = 1, . . . , d. (3.3)

An expression for the conditional expectation (given for the first component only because of

symmetry) is:

E [X1|X1 > x1, . . . , Xd > xd] = x1 +

∫∞0 F (x1 + z1, x2, . . . , xd) dz1

F (x1, . . . , xd),

leading to TCE

r−1E [X1 | X1 > rx1, . . . , Xd > rxd] = x1 +

∫∞0 F (rx1 + rw1, rx2, . . . , rxd) dw1

F (rx). (3.4)

The above expectations exist for α > 1.

• Exact calculation of the last summand in (3.4):∫∞0 C

((r[x1 + w1])

−α, (rx2)−α, . . . , (rxd)

−α; δ)dw1

C((rx1)−α, . . . , (rxd)−α; δ

)=

∫∞0

[(r[x1 + w1])

αδ + (rx2)αδ + · · · (rxd)αδ − (d− 1)

]−1/δdw1[

(rx1)αδ + · · ·+ (rxd)αδ − (d− 1)]−1/δ .

• First order approximation of the last summand in (3.4):∫∞0 b∗

((x1 + w1)

−α, x−α2 , . . . , x−αd ; δ)dw1

b∗(x−α1 , . . . , x−αd ; δ

) =

∫∞0

((x1 + w1)

αδ + xαδ2 + · · ·+ xαδd)−1/δ

dw1(xαδ1 + · · ·+ xαδd

)−1/δ .

This can be computed via numerical integration. Let the numerator and denominator

of the above be denoted as N1 := N1(x;α, δ) and D1 := D1(x;α, δ).

• Second order approximation of the last summand in (3.4):

r−αN1 + r−α(1+δ)∫∞0 b∗2

((x1 + w1)

−α, x−α2 , . . . , x−αd ; δ)dw1

r−αD1 + r−α(1+δ)b∗2(x−α1 , . . . , x−αd ; δ

)=N1 + (d− 1)r−αδδ−1

∫∞0

((x1 + w1)

αδ + xαδ2 + · · ·+ xαδd)−1/δ−1

dw1

D1 + (d− 1)r−αδδ−1(xαδ1 + · · ·+ xαδd

)−1/δ−1 .

Table 1 has some (representative) results to show how the approximations compare; we take

r = (1 − p)−1/α, d = 2, x1 = x2 = 1, p = 0.999, α = 2 and 5, and δ ∈ [0.1, 1.9] . The table

shows that the first order approximation is worse only when the dependence is weak and the

exponent ζ of the second order term is much less than 1; in these cases, the second order

term of the expansion is useful.

10

(b) Mixture model with MTCJ and independence copulas. Now, the second order term is between

O(u) and O(u2), depending on the amount of dependence in the copula. Let

C(u; δ, β) = (1− β)d∏j=1

uj + β[u−δ1 + · · ·+ u−δd − (d− 1)]−1/δ, δ > 0, 0 < β < 1

so that dependence increases as δ and β increase. Let W := w−δ1 + · · ·+ w−δd . Then

C(uw; δ, β) ≈ (1− β)udd∏j=1

wj + βuW−1/δ[1 + (d− 1)δ−1uδ/W

]= u b∗(w; δ, β) + u1+ζb∗2(w; δ, β),

where

b∗(w; δ, β) = βW−1/δ = β(w−δ1 + · · ·+ w−δd )−1/δ,

b∗2(w; δ, β) =

(d− 1)βδ−1(w−δ1 + · · ·+ w−δd )−1/δ−1 if δ < d− 1,

(1− β)∏dj=1wj + (d− 1)βδ−1(w−δ1 + · · ·+ w−δd )−1/δ−1 if δ = d− 1,

(1− β)∏dj=1wj if δ > d− 1,

and ζ = δ if δ < d− 1 and ζ = d− 1 if δ ≥ d− 1. The second order term is not far from the

first order term if δ is near 0 (i.e., weak dependence). Similar to part (a), we list the exact

TCE and the first/second order approximations for the last summand in (3.4).

• Exact (assuming α > 1 as before): with Px =∏dj=1 x

−αi ,

β∫∞0

(r[x1 + w1])

αδ + (rx2)αδ + · · ·+ (rxd)

αδ − (d− 1)−1/δ

dw1 + (1− β)r−dαPxx1/(α− 1)

β

(rx1)αδ + · · ·+ (rxd)αδ − (d− 1)−1/δ

+ (1− β)r−dαPx

since∫∞0 (x1 + w)−αdw = x−α+1

1 /(α− 1).

• First order approximation: this is the same as in part (a) because β cancels from the

numerator and denominator.

• Second order approximation: this is the same as in part (a) for δ < d− 1. For δ ≥ d− 1,

one gets∫∞0 b∗

((x1 + w1)

−α, x−α2 , . . . , x−αd ; δ, β)dw1 + r−α(d−1)

∫∞0 b∗2

((x1 + w1)

−α, x−α2 , . . . , x−αd ; δ, β)dw1

b∗(x−α1 , . . . , x−αd ; δ, β

)+ r−α(d−1)b∗2

(x−α1 , . . . , x−αd ; δ, β

)Table 2 has some (representative) results to show how the approximations compare; we take

r = (1 − p)−1/α, d = 2, x1 = x2 = 1; p = 0.999, β = 0.25, α = 2 and 5, δ ∈ [0.1, 1.9].

The conclusions are similar to Table 1, except the first and second order approximations

are slightly off in the last decimal place shown, even for δ > 1. The accuracy is of order

O(ud) = O(u2) for δ > 1 rather than the order O(u1+δ) in part (a).

11

Table 1: Values of exact TCE minus x1, together with first/second order approximations for the

bivariate MTCJ copula with Pareto survival margins; r = (1− p)−1/α, x1 = x2 = 1, p = 0.999.

α = 2 α = 5

δ exact appr1 appr2 exact appr1 appr2

0.1 2.114 4.063 3.349 0.3955 0.5556 0.5079

0.3 2.257 2.464 2.290 0.4382 0.4639 0.4428

0.5 1.968 2.000 1.969 0.4133 0.4180 0.4134

0.7 1.761 1.766 1.761 0.3883 0.3892 0.3883

0.9 1.622 1.624 1.622 0.3690 0.3692 0.3690

1.1 1.526 1.526 1.526 0.3543 0.3543 0.3543

1.3 1.456 1.456 1.456 0.3429 0.3429 0.3429

1.5 1.402 1.402 1.402 0.3338 0.3338 0.3338

1.7 1.360 1.360 1.360 0.3263 0.3263 0.3263

1.9 1.326 1.326 1.326 0.3200 0.3200 0.3200

Table 2: Values of exact TCE minus x1, together with first/second order approximations for the

bivariate mixture of independence and MTCJ copulas, with Pareto survival margins; r = (1 −p)−1/α, x1 = x2 = 1, p = 0.999, β = 0.25.

α = 2 α = 5

δ exact appr1 appr2 exact appr1 appr2

0.1 1.951 4.063 3.349 0.3742 0.5556 0.5079

0.3 2.227 2.464 2.290 0.4338 0.4639 0.4428

0.5 1.957 2.000 1.969 0.4114 0.4180 0.4134

0.7 1.755 1.766 1.761 0.3872 0.3892 0.3883

0.9 1.622 1.624 1.622 0.3683 0.3692 0.3690

1.1 1.523 1.526 1.526 0.3538 0.3544 0.3542

1.3 1.453 1.456 1.455 0.3424 0.3429 0.3428

1.5 1.400 1.402 1.402 0.3334 0.3338 0.3337

1.7 1.358 1.360 1.360 0.3259 0.3263 0.3262

1.9 1.324 1.326 1.325 0.3197 0.3200 0.3199

12

Table 3: Bounds for parts (2) and (3) of Theorem 2.4 for the MTCJ copula, with Pareto survival

margins; p = 0.999, (1−p)−1/αα/(α−1) = 63.25 and 4.98 provides an intermediate value for α = 2

and 5 respectively.

α = 2 α = 5

δ LB2 UB2 LB3 UB3 LB2 UB2 LB3 UB3

0.2 21.46 2908. 11.53 31340. 2.954 211.4 2.133 2208.

0.5 47.21 375.8 41.61 1175. 4.270 30.05 3.967 105.2

0.8 55.07 216.3 51.97 488.9 4.613 17.33 4.456 43.01

1.0 57.48 177.0 55.23 353.8 4.718 14.16 4.605 30.76

1.5 60.29 132.4 59.10 220.2 4.841 10.52 4.782 18.74

2.0 61.45 112.9 60.72 169.2 4.893 8.944 4.857 14.21

3.0 62.38 94.96 62.02 126.8 4.935 7.500 4.918 10.47

4.0 62.74 86.54 62.53 108.4 4.952 6.826 4.942 8.872

5.0 62.91 81.66 62.77 98.22 4.960 6.435 4.953 7.988

8.0 63.11 74.54 63.05 84.07 4.970 5.869 4.967 6.764

Example 3.3. We show the quality of the approximations in parts (2) and (3) of Theorem 2.4 for

(3.3) with survival copula (3.2). Since b∗(w) = (w−δ1 + · · · + w−δd )−1/δ, the margins are given by

b∗S(wj : j ∈ S) = (∑

j∈S w−δj )−1/δ, and these can be used to compute sj(b

∗, α) and Sj(b∗, α) via

numerical integrations. The exponent function a∗ is in (2.6). If (X1, . . . , Xd) has the distribution

in (3.3), the distribution of Xmax = maxX1, . . . , Xd is

FXmax(x) = F (x, . . . , x) = 1 +

d∑j=1

(−1)j(d

j

)(jxαδ − j + 1)−1/δ, x > 0.

Based on this distribution, expressions of the form VaRg(p)(||X||max) can be computed numerically.

Because of exchangeability, parts (2) and (3) have the form

UBd[1d,∞] ⊆ TCEp(X) ⊆ LBd[1d,∞].

Table 3 lists the values of LBd and UBd for d = 2, 3 with α = 2 and 5. As might be expected, the

ratio UBd/LBd decreases as δ and α increase, and increases as d increases.

Example 3.4. We consider general Archimedean copulas which satisfy a regular variation con-

dition. Consider a loss vector (X1, . . . , Xd) that has regularly varying margins with heavy-tail

index α > 1, and the Archimedean survival copula C(u;φ) = φ(∑d

i=1 φ−1(ui)) where the Laplace

transform φ is regularly varying at ∞ in the sense of (1.2) with tail index β > 0. It follows from

13

Proposition 2.8 of [18] that b∗(w1, . . . , wd) = (w−1/β1 + · · · + w

−1/βd )−β. Observe that (X1, . . . , Xd)

is more tail dependent as β decreases. Thus, for 1 ≤ j ≤ d,

Sj(b∗, α) = 1 + dβ

∫ ∞1

(wα/β + d− 1

)−βdw.

sj(b∗, α) =

α

α− 1dβ + dβ

∑∅6=S⊆i:i 6=j

(−1)|S|[(|S|+ 1)−β −∫ 1

0(wα/β + |S|)−βdw].

It follows from Theorem 2.4 that computable asymptotic bounds are given by

(S1(b∗, α), . . . , Sd(b

∗, α)) + Rd+ ⊇ limp→1

TCEp(X)

VaRp(||X||max)⊇ (s1(b

∗, α), . . . , sd(b∗, α)) + Rd+.

Since

limβ→0

∫ ∞1

(wα/β + d− 1

)−βdw =

∫ ∞1

w−αdw =1

α− 1, and lim

β→0

∫ 1

0(wα/β + |S|)−βdw = 1,

we obtain that for fixed α > 1, limβ→0 sj(b∗, α)/Sj(b

∗, α) = 1, for 1 ≤ j ≤ d. That is, asymptotic

subset and superset bounds are approximately identical for small β.

Remark 3.5. With the point process approach applied to data in the tails, the tail dependence

function b∗ can be estimated, and then the results in the theorems can be used. An outline of the

steps is as follows.

1. For risk variable j, a heavy-tail index estimation method, such as the Hill estimation, can be

applied to data values above a threshold to get an estimated univariate tail index αj . If any

risk variable shows a thin tail (i.e., exponentially decayed), it can be removed from calcula-

tions of multivariate extremal risk. For the remaining risks with possibly different heavy-tail

indexes, make appropriate power transforms and rescale the data so that exceedances above

the threshold have a Pareto distribution with tail index α in the middle of the range of the

αjs (see page 310 of [31]).

2. Transform to Frechet margins and use the point process likelihood approach for the joint tails

of the risk variables [10, 19, 16]. The exponent function a∗ corresponds to the intensity mea-

sure of the point process. For example, the simplest exchangeable Gumbel model discussed

in Example 3.2 (a), nested Gumbel models [15], scale mixture models [17, 25], and several

other parametric models of a∗ all have tractable forms so that the point process likelihood

can be easily optimized numerically; included are models with flexible dependence (labeled as

MM1, MM2, MM3 in [16]) which have a parameter denoting a minimal dependence level and

additional parameters for each pair that add onto the minimal dependence. An estimated b∗

can be obtained from a∗ using the inclusion-exclusion relation.

14

3. Combining α in step 1 and b∗ in step 2, Theorem 2.4 can be used to obtain bounds on the

scaled risks, with one-dimensional numerical integration. With the rescaled risk variables

X1, . . . , Xd, let the thresholds Tj satisfy FXj (T ) = q for all j, where q might be in the range

[0.5, 0.8]. Let FXj (xj) = q + (1 − q)[1 − (1 + xj − Tj)−α] for xj > Tj with the estimated

common α. With a parametric a∗, we use the copula C(u1, . . . , ud) = e−a∗(− log u1,...,− log ud)

in the tail region. For x1 > T1, . . . , xd > Td, the estimated tail distribution is:

FX1,...,Xd(x1, . . . , xd) = C(FX1(x1), . . . , FX1(x1)). (3.5)

The tail conditional expectation E(Xj | X ∈ r(x,∞]) can be evaluated using the estimated

α and b∗ or using (3.5) through one-dimensional numerical integration, like in Example 3.2,

provided rxj > Tj for all j.

4. For non-rectangular upper sets B such that rB ⊂∏dj=1[Tj ,∞), E(Xj | X ∈ rB) can be

evaluated by simulation of the tail of (3.5). This is better than fitting a distribution to the

entire data for simulation, to avoid extrapolation from a fit that is dominated from the middle

of the data, and to reduce the simulation sample size.

5. Parts (2) and (3) of Theorem 2.4 are useful as a way to more quickly give insight on the effect

of the univariate tail index α and the amount of tail dependence (represented by b∗) on the

size of TCEp(X) for p near 1. The pattern in the limit should carry over to the non-limit in

the tail region.

4 Concluding Remarks

Our results illustrate how tail risk is quantitatively affected by extremal dependence and also show

how the tool of tail dependence functions can be used to estimate such an asymptotic relation.

Similar to the univariate case (1.4), the multivariate tail conditional expectation TCEp(X) as

p → 1 is essentially linearly related to the value-at-risk of an aggregated norm of X. In contrast

to the univariate case where the asymptotic proportionality constant is related to the heavy-tail

index α, the asymptotic proportionality constants in the multivariate case depend not only on the

heavy-tail index α but also on the tail dependence structure.

As illustrated in the paper, the lower and upper bounds for multivariate TCEs become approx-

imately equal for highly tail dependent distributions, and thus our method is especially effective

for analyzing extremal risks for loss variables with significant tail dependence. For example, non-

overlapping aggregations of large numbers of loss variables in high-dimensional portfolios can have

strong tail dependence even though loss variables themselves only demonstrate weak tail depen-

dence; see [23]. When the lower and upper bounds are far apart, reducing the class of relevant

upper sets is suggested.

The quality of the bounds presented in Theorem 2.4 might be poor for the distributions with

weaker tail dependence. In this situation, one may aggregate loss variables with weak tail depen-

15

dence, which also corresponds to choosing some reduced class of specific upper sets B in Theorem

2.2, so that better bounds can be obtained. One can also use the higher order expansions such

as (3.1) to reveal the dependence structure at sub-extreme levels so that more accurate, tractable

bounds can be developed. Our numerical examples via the second order expansion show some

significant improvements in the presence of weak tail dependence, but more theoretical studies are

indeed needed in this area.

Acknowledgments. The authors would like to thank two referees, an Editor and Editor-in-Chief

for their comments that lead to an improvement of the presentation of this paper.

5 Appendix: Proofs

5.1 Proof of Theorem 2.2

Proof. To estimate E(X |X ∈ rB) for any upper set B bounded away from 0, consider,

E(Xj |X ∈ rB) =

∫ ∞0

PrXj > x |X ∈ rBdx = r

∫ ∞0

PrXj > rw,X ∈ rBPrX ∈ rB

dw. (5.1)

for any 1 ≤ j ≤ d. We first argue that we can pass the limit through the integration (5.1). Since

PrXj > rw,X ∈ rB ≤ Pr Xj > rw , (5.2)

it follows from the Karamata theorem (1.6) that for any fixed c > 0,

limr→∞

∫ ∞c

Pr Xj > rwPrX ∈ rB

dw = limr→∞

∫ ∞rc

Pr Xj > xrPrX ∈ rB

dx =c

α− 1limr→∞

Pr Xj > rcPrX ∈ rB

.

Let Aj(w) := (x1, . . . , xd) ∈ Rd : xj > w, then via (2.2), we have,

limr→∞

∫ ∞c

Pr Xj > rwPrX ∈ rB

dw =c

α− 1

µ(Aj(c))

µ(B)=

∫ ∞c

µ(Aj(w))

µ(B)dw, (5.3)

where the last equality follows from the direct calculation via (2.3). Because of (5.2), (5.3) and the

generalized dominated convergence theorem, we have from (2.2) that for any c > 0,

limr→∞

∫ ∞c


dw =

∫ ∞c

limr→∞


dw =

∫ ∞c

µ(Aj(w) ∩B)

µ(B)dw,

which implies that for any small ε > 0, there exists rε such that for all r ≥ rε,∣∣∣ ∫ ∞0


dw −∫ ∞0

µ(Aj(w) ∩B)

µ(B)dw∣∣∣ ≤ ∫ ε/3

0


dw

+∣∣∣ ∫ ∞

ε/3


dw −∫ ∞ε/3

µ(Aj(w) ∩B)

µ(B)dw∣∣∣+

∫ ε/3

0

µ(Aj(w) ∩B)

µ(B)dw

≤∫ ε/3

0


dw +ε

3+

∫ ε/3

0

µ(Aj(w) ∩B)

µ(B)dw ≤ ε

3+ε

3+ε

3= ε,

16

where the last inequality follows due to the fact that PrXj > rw,X ∈ rB ≤ PrX ∈ rB and

µ(Aj(w) ∩B) ≤ µ(B). Therefore, we have from (5.1) that

limr→∞

1

rE(Xj |X ∈ rB) = lim

r→∞

∫ ∞0


dw =

∫ ∞0

µ(Aj(w) ∩B)

µ(B)dw. (5.4)

This concludes the proof of statement (1).

For statement (2), we simplify (2.1) asymptotically. For any upper set A ∈ Qp(X), there

exists an upper set B with B ∩ Sd−1+ 6= ∅ and a positive number rB such that A = rBB. Since

PrX ∈ rB is decreasing in r, we can find rB,p ≥ rB for any A = rBB such that PrX ∈ A ≥PrX ∈ rB,pB = 1− p, as p→ 1. It follows from (5.4) that E(Xj |X ∈ rB,pB) is asymptotically

increasing for sufficiently small 1−p and goes to +∞ as p→ 1, and thus we have E(X |X ∈ A) ≤E(X |X ∈ rB,pB) for sufficiently small 1−p. Since E(X |X ∈ A) +K ⊇ E(X |X ∈ rB,pB) +K

for sufficiently small 1− p, and rB,pB ∈ Qp(X), we have,

limp→1

[( ⋂B∈Q

(E(X |X ∈ rB,pB) +K)

)\ TCEp(X)

]

= limp→1

⋂A∈Qp(X)

(E(X |X ∈ A) +K)

\ TCEp(X)

= ∅,

where Q := B ⊆ Rd : B + K = B,B ∩ Sd−1+ 6= ∅, B is bounded away from 0 and PrX ∈rB,pB = 1− p. That is, (2.1) can be rewritten as follows, for sufficiently small 1− p,

TCEp(X) ≈⋂B∈Q

(E(X |X ∈ rB,pB) +K). (5.5)

For any B ∈ Q, there exists a real number rB with rB ≥ 1 such that rBB ∈ Q||·|| = B ⊆ Rd :

B + K = B,B ∩ Sd−1+ 6= ∅, B ⊆ (Bd)c. That is, for any B ∈ Q with PrX ∈ rB,pB = 1 − p, we

can find a B′ ∈ Q||·|| and a real number rB′,p (e.g., rB′,p = rB,p/rB) such that rB,pB = rB′,pB′.

Thus (5.5) can be rewritten further as

TCEp(X) ≈⋂

B∈Q||·||,PrX∈rB,pB=1−p

(E(X |X ∈ rB,pB) +K), (5.6)

for sufficiently small 1−p. Observe that as p→ 1, rB,p →∞, and thus it follows from (2.2) that for

sufficiently small 1−p, µ(B) Pr||X|| > rB,p ≈ 1−p, implying that rB,p ≈ VaR1−(1−p)/µ(B)(||X||)as p→ 1. Therefore, (5.4) and (5.6) imply that

TCEp(X) ≈⋂

B∈Q||·||

VaR1−(1−p)/µ(B)(||X||) ((u1(B;µ), . . . , ud(B;µ)) +K)

as p→ 1, where uj(B;µ) =∫∞0

µ(Aj(w)∩B)µ(B) dw, 1 ≤ j ≤ d.

17

5.2 Proof of Theorem 2.4

Proof. Since margins F1, . . . , Fd of F are tail equivalent [31], we have that F j(x) = Lj(x)/xα,

1 ≤ j ≤ d, where Li(x)/Lj(x)→ 1 as x→∞.

(1) Without loss of generality, let j = 1. The straightforward calculation shows

E(X1 |X > rx) =

∫ ∞0

PrX1 > x,X1 > rx1, . . . , Xd > rxdPrX1 > rx1, . . . , Xd > rxd

dx

= rx1 +

∫ ∞rx1

PrX1 > x,X2 > rx2, . . . , Xd > rxdPrX1 > rx1, . . . , Xd > rxd

dx

= r

(x1 +

∫ ∞x1

PrX1 > rw,X2 > rx2, . . . , Xd > rxdPrX1 > rx1, . . . , Xd > rxd

dw

)= r

(x1 +

∫ ∞x1

PrU1 > F1(rw), U2 > F2(rx2), . . . , Ud > Fd(rxd)PrU1 > F1(rx1), . . . , Ud > Fd(rxd)

dw

).

Applying the Karamata theorem and generalized dominated convergence theorem, we are allowed

to pass the limit through the integral. Since Lj , 1 ≤ j ≤ d, are slowly varying and the margins are

tail equivalent, we have,

limr→∞

1

rE(X1 |X > rx)

= x1 + limr→∞

∫ ∞x1

PrU1 > 1− L1(rw)/(rw)α, . . . , Ud > 1− Ld(rxd)/(rxd)αPrU1 > 1− L1(rx1)/(rx1)α, . . . , Ud > 1− Ld(rxd)/(rxd)α

dw

= x1 +

∫ ∞x1

limr→∞

PrU1 > 1− w−αL1(r)r−α, U2 > 1− x−α2 L1(r)r

−α, . . . , Ud > 1− x−αd L1(r)r−α

PrU1 > 1− x−α1 L1(r)r−α, . . . , Ud > 1− x−αd L1(r)r−αdw

= x1 +

∫ ∞x1

limu→0

PrU1 > 1− w−αu, U2 > 1− x−α2 u, . . . , Ud > 1− x−αd uPrU1 > 1− x−α1 u, . . . , Ud > 1− x−αd u

dw

= x1 +

∫ ∞x1

b∗(w−α, x−α2 , . . . , x−αd )

b∗(x−α1 , x−α2 , . . . , x−αd )dw =

∫ ∞0

b∗((w1 ∨ x1)−α, x−α2 , . . . , x−αd )

b∗(x−α1 , . . . , x−αd )dw1.

(2) It follows from (5.6) that as p→ 1,

TCEp(X) ⊆⋂

x∈Sd−1+

(E(X |X ∈ rx,p(x,∞]) + Rd+)

where rx,p satisfies PrX ∈ rx,p(x,∞] = 1 − p. Since b∗(1) > 0, it follows from Theorem 2.4

of [25] that µ((1,∞]) > 0. Since ||X||max is regularly varying at ∞, we have for sufficiently

small 1− p, there exists r1,p, such that µ((1,∞]) Pr||X||max > r1,p = 1− p, which implies that

r1,p ≈ VaR1−(1−p)/µ((1,∞])(||X||max) as p → 1. Observe that as p → 1, r1,p → ∞, and thus it

follows from (2.2) that for sufficiently small 1− p,

PrX ∈ r1,p(1,∞] ≈ µ((1,∞]) Pr||X||max > r1,p = 1− p.

18

Therefore, as p→ 1,

TCEp(X) ⊆⋂

x∈Sd−1+

(E(X |X ∈ rx,p(x,∞]) + Rd+) ⊆ E(X |X ∈ r1,p(1,∞]) + Rd+. (5.7)

Since ||X||max > r1,p if and only if X ∈ r1,p[0, 1]c, the constant k in (2.3) equals 1 and µ([0, 1]c) = 1.

It then follows from (2.7) that µ((1,∞]) = b∗(1, . . . , 1)/a∗(1 . . . , 1), and thus from (1) that as p→ 1,

E(X |X ∈ r1,p(1,∞]) ≈ VaR1−(1−p)a

∗(1,...,1)b∗(1,...,1)

(||X||max)(S1(b∗, α), . . . , Sd(b

∗, α))

where Sj(b∗, α) =

∫∞0

b∗(1,...,1,(wj∨1)−α,1,...,1)b∗(1,...,1) dwj , 1 ≤ j ≤ d. Plug this into (5.7), we obtain (2).

(3) In light of (5.6), consider, for any B ∈ Q||·||maxwith PrX ∈ rB,pB = 1− p,

E(Xj |X ∈ rB,pB) =E(XjIX ∈ rB,pB)

PrX ∈ rB,pB.

Since (1,∞]d ⊆ B ⊆ [0,1]c for any B ∈ Q||·||max, we have

E(Xj |X ∈ rB,pB) ≤E(XjIX ∈ rB,p[0,1]c)

PrX ∈ rB,p(1,∞]d=

∫ ∞0

PrXj > x ∩ X ∈ rB,p[0,1]cPrX ∈ rB,p(1,∞]d

dx.(5.8)

If x > rB,p then

PrXj > x ∩ X ∈ rB,p[0,1]c = PrXj > x.

If x ≤ rB,p then

PrXj > x ∩ X ∈ rB,p[0,1]c = PrXj > x ∩ (∪di=1Xi > rB,p)

= Pr∪di=1(Xj > x ∩ Xi > rB,p) = Pr(∪i 6=jXj > x,Xi > rB,p) ∪ Xj > rB,p

=∑

S⊆i:i 6=j

(−1)|S| PrXj > rB,p, Xi > rB,p, i ∈ S −∑

∅6=S⊆i:i 6=j

(−1)|S| PrXj > x,Xi > rB,p, i ∈ S

= PrXj > rB,p+∑∅6=S⊆i:i 6=j

(−1)|S| (PrXj > rB,p, Xi > rB,p, i ∈ S − PrXj > x,Xi > rB,p, i ∈ S) . (5.9)

Since the margins are tail equivalent and slowly varying, we have, for any 0 ≤ wj ≤ 1, and any

∅ 6= S ⊆ i : i 6= j,

limp→1

PrXj > rB,pwj , Xi > rB,p, i ∈ SPrX ∈ rB,p(1,∞]d

= limp→1

PrUj > 1− w−αj r−αB,pLj(rB,pw), Ui > 1− r−αB,pLi(rB,p), i ∈ SPrUi > 1− r−αB,pLi(rB,p), 1 ≤ i ≤ d

= limrB,p→∞

PrUj > 1− w−αj r−αB,pL1(rB,p), Ui > 1− r−αB,pL1(rB,p), i ∈ SPrUi > 1− r−αB,pL1(rB,p), 1 ≤ i ≤ d

= b∗j∪S(w−αj , 1, . . . , 1;Cj∪S)/b∗(1, . . . , 1),

19

where b∗j∪S(w−αj , 1, . . . , 1;Cj∪S) denotes the upper tail dependence function of the multivariate

margin Cj∪S evaluated with the j-th argument being w−αj and others being one. Similarly,

limp→1

PrXj > rB,p, Xi > rB,p, i ∈ SPrX ∈ rB,p(1,∞]d

=b∗j∪S(1, . . . , 1;Cj∪S)

b∗(1, . . . , 1),

limp→1

PrXj > rB,pPrX ∈ rB,p(1,∞]d

=1

b∗(1, . . . , 1). (5.10)

Using the bounded convergence theorem, we then have, for sufficiently small 1− p,∫ 1

0

∑∅6=S⊆i:i 6=j

(−1)|S|PrXj > rB,p, Xi > rB,p, i ∈ S − PrXj > rB,pwj , Xi > rB,p, i ∈ S

PrX ∈ rB,p(1,∞]ddwj

≈∑

∅6=S⊆i:i 6=j

(−1)|S|b∗j∪S(1, . . . , 1;Cj∪S)−

∫ 10 b∗j∪S(w−αj , 1, . . . , 1;Cj∪S)dwj

b∗(1, . . . , 1). (5.11)

Plug (5.10) and (5.11) into (5.9), and we have, for sufficiently small 1− p,∫ rB,p

0


dx ≈rB,p

b∗(1, . . . , 1)+

rB,p∑

∅6=S⊆i:i 6=j

(−1)|S|b∗j∪S(1, . . . , 1;Cj∪S)−

∫ 10 b∗j∪S(w−αj , 1, . . . , 1;Cj∪S)dwj

b∗(1, . . . , 1).(5.12)

On the other hand, using the Karamata theorem (1.6), we have, for sufficiently small 1− p,∫ ∞rB,p


dx =

∫ ∞rB,p

PrXj > xPrX ∈ rB,p(1,∞]d

dx

≈ rB,p1

α− 1

PrXj > rB,pPrX ∈ rB,p(1,∞]d

≈rB,p

(α− 1)b∗(1, . . . , 1). (5.13)

Combining (5.12) and (5.13) into (5.8), we have, for sufficiently small 1− p,

E(Xj |X ∈ rB,pB) ≤ α

α− 1

rB,pb∗(1, . . . , 1)

+ rB,p∑

∅6=S⊆i:i 6=j

(−1)|S|b∗j∪S(1, . . . , 1;Cj∪S)−

∫ 10 b∗j∪S(w−αj , 1, . . . , 1;Cj∪S)dwj

b∗(1, . . . , 1).

As p → 1, rB,p ≈ VaR1−(1−p)/µ(B)(||X||max) ≤ VaR1−(1−p)/µ([0,1]c)(||X||max) = VaRp(||X||max)

due to the fact that µ([0,1]c) = 1. Thus, for sufficiently small 1− p,E(Xj |X ∈ rB,pB)

VaRp(||X||max)≤ α

α− 1

1

b∗(1, . . . , 1)

+∑

∅6=S⊆i:i 6=j

(−1)|S|b∗j∪S(1, . . . , 1;Cj∪S)−

∫ 10 b∗j∪S(w−αj , 1, . . . , 1;Cj∪S)dwj

b∗(1, . . . , 1)= sj(b

∗, α),

for any B ∈ Q||·||max, where the equality follows from the integration by parts. Therefore,

TCEp(X) ⊇ VaRp(||X||max)(

(s1(b∗, α), . . . , sd(b

∗, α)) + Rd+),

for sufficiently small 1− p.

20

References

[1] Albrecher, H., Asmussen, S. and Kortschak, D. (2006). Tail asymptotics for the sum of two

heavy-tailed dependent risks. Extremes, 9:107–130.

[2] Alink, S., Lowe, M. and Wuthrich, M. V. (2004). Diversification of aggregate dependent risks.

Insurance: Math. Econom., 35:77–95.

[3] Alink, S., Lowe, M. and Wuthrich, M. V. (2005). Analysis of the expected shortfall of aggregate

dependent risks, ASTIN Bulletin, 35(1):25–43.

[4] Alink, S., Lowe, M. and Wuthrich, M. V. (2007). Diversification for general copula dependence.

Statistica Neerlandica, 61:446–465.

[5] Artzner, P., Delbaen, F., Eber, J.M. and Heath, D. (1999). Coherent measures of risks. Math-

ematical Finance 9:203–228.

[6] Bentahar, I. (2006). Tail conditional expectation for vector-valued risks. Discussion paper

2006-029, http://sfb649.wiwi.hu-berlin.de, Technische Universitat Berlin, Germany.

[7] Bingham, N. H., Goldie, C. M. and Teugels, J. L. (1987). Regular Variation. Cambridge

University Press, Cambridge, UK.

[8] Cai, J. and Li, H. (2005). Conditional tail expectations for multivariate phase-type distribu-

tions. J. Appl. Prob. 42:810–825.

[9] Cheridito, P., Delbaen, F. and Kluppelberg, C. (2004). Coherent and convex monetary risk

measures for bounded cadlag processes. Stochastic Processes and their Applications, 112:1–22.

[10] Coles, S. G. and Tawn, J. A. (1991). Modelling extreme multivariate events. J. R. Statist. Soc.,

B, 53:377–392.

[11] Cook, R.D. and Johnson, M.E. (1981). A family of distributions for modelling non-elliptically

symmetric multivariate data. J. Roy. Statist. Soc. B, 43:210–218.

[12] Delbaen, F. (2002). Coherent risk measure on general probability spaces. Advances in Fi-

nance and Stochastics-Essays in Honour of Dieter Sondermann, Eds. K. Sandmann, P. J.

Schonbucher, Springer-Verlag, Berlin, 1–37.

[13] Embrechts, P., Neslehova, J. and Wuthrich, M. V., (2009). Additivity properties for value-

at-risk under Archimedean dependence and heavy-tailedness. Insurance: Mathematics and

Economics, 44(2):164–169.

[14] Follmer, H. and Schied, A. (2002). Convex measures of risk and trading constraints. Finance

and Stochastics, 6:426–447.

21

[15] Joe, H. (1993). Parametric family of multivariate distributions with given margins. J. Multi-

variate Anal., 46:262–282.

[16] Joe, H. (1997). Multivariate Models and Dependence Concepts. Chapman & Hall, London.

[17] Joe, H. and Hu, T. (1996). Multivariate distributions from mixtures of max-infinitely divisible

distributions. J. Multivariate Anal., 57:240–265.

[18] Joe, H., Li, H. and Nikoloulopoulos, A.K. (2010). Tail dependence functions and vine copulas.

Journal of Multivariate Analysis, 101:252–270.

[19] Joe, H., Smith, R. L. and Weissman, I. (1992), Bivariate threshold methods for extremes. J.

R. Statist. Soc. B. 54:171–183.

[20] Jouini, E., Meddeb, M. and Touzi, N. (2004). Vector-valued coherent risk measures. Finance

and Stochastics 8:531–552.

[21] Kluppelberg, C., Kuhn, G. and Peng, L. (2008). Semi-parametric models for the multivariate

tail dependence function – the asymptotically dependent. Scandinavian Journal of Statistics,

35(4):701–718.

[22] Kortschak, D. and Albrecher, H. (2009). Asymptotic results for the sum of dependent non-

identically distributed random variables. Methodol. Comput. Appl. Probab. 11:279–306.

[23] Kousky, C. and Cooke, R. M. (2009). Climate Change and Risk Management: Challenges for

insurance, adaptation and loss estimation. Discussion paper RFF DP 09-03-Rev, Resources

For the Future (http://www.rff.org/RFF/Documents/).

[24] Landsman Z. and Valdez, E. (2003). Tail conditional expectations for elliptical distributions.

North American Actuarial Journal, 7:55–71.

[25] Li, H. (2009). Orthant tail dependence of multivariate extreme value distributions. Journal of

Multivariate Analysis, 100:243–256.

[26] Li, H. and Sun, Y. (2009). Tail dependence for heavy-tailed scale mixtures of multivariate

distributions. J. Appl. Prob. 46 (4):925–937.

[27] Mardia, K.V. (1962). Multivariate Pareto distributions. Ann. Math. Statist., 33:1008–1015.

[28] McNeil, A. J., Frey, R., Embrechts, P. (2005). Quantitative Risk Management: Concepts,

Techniques, and Tools. Princeton University Press, Princeton, New Jersey.

[29] Nikoloulopoulos, A.K., Joe, H. and Li, H. (2009). Extreme value properties of multivariate t

copulas. Extremes, 12:129–148.

22

[30] Resnick, S. (1987). Extreme Values, Regular Variation, and Point Processes, Springer, New

York.

[31] Resnick, S. (2007). Heavy-Tail Phenomena: Probabilistic and Statistical Modeling. Springer,

New York.

[32] Sklar, A. (1959). Fonctions de repartition a n dimensions et leurs marges. Publ. Inst. Statist.

Univ. Paris, 8:229–231.

[33] Takahasi, K. (1965). Note on the multivariate Burr’s distribution. Ann. Inst. Statist. Math.,

17:257–260.

23