7
R. MATÚŠ THE MODELLING OF HYDROLOGICAL JOINT EVENTS ON THE MORAVA RIVER USING AGGREGATION OPERATORS KEY WORDS Archimedean copula Extreme value copula Archimax copula Archimax dependence function A Morava River ABSTRACT This article aims to study the multivariate dependence properties and joint probability distributions of complex hydrological variables and design-appropriate copula functions for their probabilistic description in the specific physiographic properties of the Morava River in south-west Slovakia. Various steps involved in investigating the dependence between two random variables and in modeling it using Archimedean, Extreme Value (EV) and Archimax copulas are exhibited. Our approach allows us to model the dependence structure independently of the marginal distributions, which is not possible with standard classical methods. The methodology has been applied on the joint modeling of maximum annual flood peak flows and volumes. GOF tests using Wilcoxon p-values have shown that the Archimax Copula with an Archimedean generator φ and our proposed dependence function A gives visibly better results than the Archimedean or EV copulas. Furthermore, the Archimax copula with the proposed dependence function A could also simply model asymmetric data that frequently occurrs in hydrology. This approach using copulas is promising since it allows us to take into account a wide range of correlation that happens in hydrology. Rastislav Matúš Department of Water Resources Management, Faculty of Civil Engineering, Slovak University of Technology in Bratislava, Radlinského 11, 813 68 Bratislava, Slovakia, [email protected] Research field: copulas, flood frequency analysis 2009/3 PAGES 9 – 15 RECEIVED 10. 12. 2007 ACCEPTED 1. 6. 2009 1. INTRODUCTION A floodpeak plays a key role in assessing the hydrologic safety of localities of interest as in the estimation of the incidence and severity of floods (De Michele, et al., 2004). Unfortunately, the statistical methods widely used in engineering applications are usually directed only at an analysis of peak discharges. The utilization of such a statistical result seems to be inappropriate since the duration and volume of critical flows are often as important as the peak and could lead to a severe underestimation of the risk associated with a given event (Adamson, et al., 1999). For instance, flood protection levels may fail, not only as a result of overtopping but also because of extreme durations of high water levels causing saturation and collapse. Long periods of a flood event in a mainstream may increase the backwater effects in tributary channels, thereby extending the flood impact upstream in these lateral channels. Classical multivariate distributions (multivariate normal, bivariate Pareto, bivariate gamma, etc.) are widely used, although we necessarily need the same family for each marginal distribution. Furthermore, extensions of more than just the bivariate case are not clear, and the parameters of the marginal distributions are also used to model the dependence between the random variables (Favre, et al., 2004). A construction of multivariate distributions based on Sklar’s theorem (Sklar, 1959) does not suffer from these drawbacks. 2009 SLOVAK UNIVERSITY OF TECHNOLOGY 9

HYDROLOGICAL JOINT EVENTS ON THE MORAVA RIVER USING AGGREGATION OPERATORS · 2010-02-09 · lateral channels. Classical multivariate distributions (multivariate normal, bivariate

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HYDROLOGICAL JOINT EVENTS ON THE MORAVA RIVER USING AGGREGATION OPERATORS · 2010-02-09 · lateral channels. Classical multivariate distributions (multivariate normal, bivariate

R. MATÚŠ

THE MODELLING OF HYDROLOGICAL JOINT EVENTS ON THE MORAVA RIVER USING AGGREGATION OPERATORS

KEY WORDS

• Archimedean copula• Extreme value copula• Archimax copula• Archimax dependence function A• Morava River

ABSTRACT

This article aims to study the multivariate dependence properties and joint probability distributions of complex hydrological variables and design-appropriate copula functions for their probabilistic description in the specific physiographic properties of the Morava River in south-west Slovakia. Various steps involved in investigating the dependence between two random variables and in modeling it using Archimedean, Extreme Value (EV) and Archimax copulas are exhibited. Our approach allows us to model the dependence structure independently of the marginal distributions, which is not possible with standard classical methods. The methodology has been applied on the joint modeling of maximum annual flood peak flows and volumes. GOF tests using Wilcoxon p-values have shown that the Archimax Copula with an Archimedean generator φ and our proposed dependence function A gives visibly better results than the Archimedean or EV copulas. Furthermore, the Archimax copula with the proposed dependence function A could also simply model asymmetric data that frequently occurrs in hydrology. This approach using copulas is promising since it allows us to take into account a wide range of correlation that happens in hydrology.

Rastislav MatúšDepartment of Water Resources Management, Faculty of Civil Engineering, Slovak University of Technology in Bratislava, Radlinského 11, 813 68 Bratislava, Slovakia, [email protected] field: copulas, flood frequency analysis

2009/3 PAGES 9 – 15 RECEIVED 10. 12. 2007 ACCEPTED 1. 6. 2009

1. INTRODUCTION

A floodpeak plays a key role in assessing the hydrologic safety of localities of interest as in the estimation of the incidence and severity of floods (De Michele, et al., 2004). Unfortunately, the statistical methods widely used in engineering applications are usually directed only at an analysis of peak discharges. The utilization of such a statistical result seems to be inappropriate since the duration and volume of critical flows are often as important as the peak and could lead to a severe underestimation of the risk associated with a given event (Adamson, et al., 1999). For instance, flood protection levels may fail, not only as a result of

overtopping but also because of extreme durations of high water levels causing saturation and collapse. Long periods of a flood event in a mainstream may increase the backwater effects in tributary channels, thereby extending the flood impact upstream in these lateral channels. Classical multivariate distributions (multivariate normal, bivariate Pareto, bivariate gamma, etc.) are widely used, although we necessarily need the same family for each marginal distribution. Furthermore, extensions of more than just the bivariate case are not clear, and the parameters of the marginal distributions are also used to model the dependence between the random variables (Favre, et al., 2004). A construction of multivariate distributions based on Sklar’s theorem (Sklar, 1959) does not suffer from these drawbacks.

2009 SLOVAK UNIVERSITY OF TECHNOLOGY 9

Page 2: HYDROLOGICAL JOINT EVENTS ON THE MORAVA RIVER USING AGGREGATION OPERATORS · 2010-02-09 · lateral channels. Classical multivariate distributions (multivariate normal, bivariate

10 THE MODELLING OF HYDROLOGICAL JOINT EVENTS ON THE MORAVA RIVER USING ...

2009/3 PAGES 9 — 15

1.1 General Theory about Copulas

The theory about copulas can be found in general textbooks such as those of Nelsen (1999), Joe (1997) and Salvadori, et al. (2007). To define a copula, consider p uniform (on the unit interval) and random variables U1, U2... Up, whose joint distribution function C is defined as

, (1.1)

where u denotes realizations. Those p variables are distribution functions (also referred to as probability integral transformations) of p outcomes X1,X2,…,Xp (that we wish to understand); in other words they are the marginal distribution functions F1,F2,…Fp of the multivariate distribution function

, (1.2)

which is defined using a copula function, evaluated at realizations x1,x2,…,xp. Sklar (1959) showed that the converse of (1.2) also holds, i.e., any multivariate distribution function F can be written as a copula function. Furthermore, if the marginals are continuous, then there is a unique copula representation. Thus the copula function provides a unifying and flexible way to study joint distributions (with different marginals) and allows for the modeling of the dependence structure independently of the marginal distributions. In the remainder of the article we limit the discussion to the bivariate case for simplicity reasons.

1.2 Dependence

Schweizer and Wolf (1981) established that the copula accounts for all the dependence between two random variables, X1 and X2, in the following sense. Consider g1 and g2, two strictly increasing functions (but otherwise arbitrary) over the range of X1 and X2. Then the transformed variables g1(X1) and g2(X2) have the same copula as X1 and X2. Thus, the manner in which X1 and X2 “move together” is captured by the copula, regardless of the scale in which each variable is measured (Frees and Valdez, 1998). One of the measures of association that could be expressed solely in terms of the copula function is Kendall’s τ. Unlike the well-known Pearson’s correlation coefficient, which can also measure the nonlinear dependence between variables, it is independent of the marginals (thus not affected by any nonlinear changes of scale) and can be used to estimate the parameters of several copulas.

1.3 Archimedean Copulas

The Archimedean representation allows us to reduce the study of a multivariate copula to a single univariate function. For simplicity,

we consider bivariate copulas so that p = 2. Assume that φ is a convex, decreasing function with domain (0, 1] and range [0, ∞), that is φ: [0,1] → [0,∞], such that φ (1) = 0. Use φ−1 for the inverse function of φ. Then the function

for

(1.3)

is said to be an Archimedean copula, and φ is its generator (Nelsen, 1999). Different choices of generator yield different families of copulas (Nelsen, 1999, and Joe, 2007). Archimedean copulas present several interesting properties (symmetry, associativity, etc.). Their estimation is easier due to the simplified relation to Kendall’s tau,

, (1.4)

If the symmetry in the arguments becomes limiting, desired extensions are possible (Joe, 2007).

1.4 The Extreme Value Copula

A copula is said to be an EV copula if for all t>0, the scaling property C(ut,vt) = (C(u,v))t holds for all (u,v) I2. EV copulas are max-stable, meaning that if (X

1, Y

1), (X

2, Y

2), . . . , (Xn, Yn) are i.i.d.

random pairs from an EV copula C and Mn=max{X1,X

2, . . . ,Xn},

Nn=max{Y1, Y

2, . . . , Yn}, a copula associated with the random pair

(Mn,Nn) is also C. It can be shown (Gumbel, 1960) that EV copulas can be represented in the form:

(1.5)

where A: [0,1] → [1/2,1] is a convex function such that max(t,1- t) < A(t) < 1 for all t ∈[0,1]. The function A is called the dependence function.

1.5 The Archimax copula

Capérraŕ, et al. (2000) combined the EV and Archimedean copula classes into a single class called Archimax copulas, represented in the form

(1.6)

where A is a valid dependence function and a valid Archimedean generator. Archimax copulas reduce to Archimedean copulas for

Page 3: HYDROLOGICAL JOINT EVENTS ON THE MORAVA RIVER USING AGGREGATION OPERATORS · 2010-02-09 · lateral channels. Classical multivariate distributions (multivariate normal, bivariate

2009/3 PAGES 9 — 15

11THE MODELLING OF HYDROLOGICAL JOINT EVENTS ON THE MORAVA RIVER USING ...

A(t)=1 and to EV copulas for φ(t)=–log(t) . In the latter they proved that it is a valid copula for any combination of valid function and A. Matúš and Bacigál (2007) have constructed an Archimax dependence function A. Furthermore, to be able to simulate the data for an extreme data analysis and so requirement of the derivation of the Archimax copula, we divided dependence function A into three parts as follows:

(1.7)

where a∈(0,1), b=max(a,1–a), c∈[a,1], D3→-2a(1+b)c+(-1+b)c2+a2(-1+b+4c)/(4(-1+a)ac), D2→(-1+b)(-c+a(-1+2c))/(2(-1+a)ac), D1→(1-b)/(4a-4a2c).The Archimax copulas reduce to Archimedean copulas for a=0, b=1, c=0 and to the Fréchet-Hoeffding upper boundary C(u,v)=min(u,v) for a=b=0.5, c=0. A multivariate approach using the Archimax copula with our proposed dependence function A could also simply model asymmetric hydrological variables. The Archimax copula with proposed dependence function A has the following form:

(1.8)

Then the conditional copula conditioned to the random variable u has the form: (1.9)

Algorithm 1.1 to simulate random pairs to our Archimax Copula, it has the following steps:Step 1. Generate uniform random variables s, q of length nStep 2. Compute v1, v2 and q1, q2:

Step 3. Search the interval for a root of the function fi with respect to q as its argument:

The desired simulated pairs are (si ,vi).

2. APPLICATION: BIVARIATE FREQUENCY ANALYSIS

2.1 Definition Problem

Many hydrological engineering planning, design and management problems require a detailed knowledge of flood event characteristics, such as flood peak, volume and duration. Flood frequency analysis often focuses on flood peak values and hence provides a limited assessment of flood events. This application concerns the bivariate frequency analysis of the peak flow and volume of the Morava River. The watershed is situated in the southwest of Slovakia. The underlying data (Fig. 2.1) are the annual peak flows Qmax and

Fig 1.1 Dependence function A: a=0.3, b=0.8, c=0.1 (upper curve); a=0.5, b=0.5, c=0 (lower one).

Page 4: HYDROLOGICAL JOINT EVENTS ON THE MORAVA RIVER USING AGGREGATION OPERATORS · 2010-02-09 · lateral channels. Classical multivariate distributions (multivariate normal, bivariate

12 THE MODELLING OF HYDROLOGICAL JOINT EVENTS ON THE MORAVA RIVER USING ...

2009/3 PAGES 9 — 15

Volumes V, which are derived using the program code obtained in Delphi. Flow discharges were measured in Moravský sv. Ján from 1921 to 2002. Figure 2.2a shows the flows and volumes as a time series.

2.2 Modeling and results

The annual maximum flows Qmax were fitted by a GEV distribution (Beirlant, et al., 2004). We obtained a Gumbel distribution Q ~ GEV(417.81, 211.479, 0.0). For the appropriate volumes V we obtained V ~ Weibull(1.234e+09, 8.738e-01). The parameters were estimated using several methods discussed in (Beirlant, et al., 2004), the maximum likelihood method and the weighted method (see Fig. 2.3).In the second step we modeled the link between the two variables. Before a copula model for the pair (X,Y) was sought, visual tools were used to check for the presence of dependence. The scatter plot of the ranks shown in Fig. 2.2b suggests the presence of a positive association between the peak flow and volume, as might be expected. This is confirmed by the χ-plot (Fischer and Switzer, 2001) and the K-plot (Genest and Boies, 2003), reproduced in the c) and d) panels of Fig. 2.2, respectively. As can be seen, most of the points fall outside the “confidence band” of the χ-plot. An obvious curvature is also apparent in the K-plot. To qualify the degree of dependence in the pair (X,Y), the sample value of Kendall’ s tau was computed, τ = 0.5382122.

We considered 27 families of Archimedean copulas as stated in Nelsen (1999) and Joe (2007); EV copulas (Gumbel (1960), Galambos (1987), Hüsler and Reiss (1989), Tawn (1988), and the BB5 copula (Joe, 2007)) and Archimax copulas denoted by the Archimedean generator φ and dependence function A described in Section 1.5. Some of these families could be eliminated off-hand, given that the degrees of dependency they span were insufficient to account for the association observed in the data set. For the latter, the semi-parametric estimation (Genest, et al., 1995) method using the maximum likelihood estimation of the copula parameter θ was used

, (2.1)

where cθ denotes the copula’s density. The nonparametric estimation (Genest and Rivest, 1993) of a one-parametric Archimedean copula parameter θ was evaluated using equation 1.4. To help sieve through the remaining models, graphic diagnostic tools were used, for instance, those based on the QQ-plots of the univariate distribution function of the copula against standard uniform quantiles (Fischer and Switzer, 2001; Genest and Rivest, 1993). Furthermore, we have checked the closeness of the simulated data to the observed ones (Fig. 2.4). Parameters a, b, c in the case of the Archimax copulas were selected from a set of 417 possible triplets by fitting empirical copula

Fig. 2.1 Annual Maximum Peak Discharge Qmax, Volume V and Duration D (Morava River, 1941).

Page 5: HYDROLOGICAL JOINT EVENTS ON THE MORAVA RIVER USING AGGREGATION OPERATORS · 2010-02-09 · lateral channels. Classical multivariate distributions (multivariate normal, bivariate

2009/3 PAGES 9 — 15

13THE MODELLING OF HYDROLOGICAL JOINT EVENTS ON THE MORAVA RIVER USING ...

. (2.2)

Simultaneously with the graphic methods, goodness-of-fit (GOF) tests to evaluate the quality of the fit were used (Kolmogorov-Smirnov, Pearson’s χ2 Wilcoxon and t test).We can conclude that the Archimax copulas with our proposed dependence function A fits the given extreme hydrological data most appropriately (see Table 1). Following the mentioned graphic methods and GOF-tests, several families of copulas have a good fitting plot and p-value. As a further graphic check, 10.000 pairs of points were generated from Cθ’ (see on Fig. 2.5, left). Furthermore, the margins of 10.000 random pairs (Ui,Vi) from each of the estimated copula models Cθ’ were transformed back into the original units using the marginal distributions F‘ and G‘ identified in Section 2.1 for peak and volume (Genest and Favre, 2007). The resulting scatter plots of pairs (Xi,Yi)=(F‘ -1(Ui), G‘ -1(Vi)) are displayed in Fig. 2.5 right along with the actual observations. We decided to model our data with the Archimax copula using the proposed dependence function A and Archimedean generator A.14 φ(t) = (t – 1/θ - 1)θ, where θ = 1.812, a=0.1, b=0.95, c=0.1.

Fig. 2.2 a) Time series of max. annual flows Qmax and Volumes V (the thinner curve represents Volumes (105) b) Scatter plot of the ranks Xi,Yi ; c) χ-plot d) K-plot.

Fig. 2.3 Histograms with fitted Gumbel and Weibull distributions.

Fig. 2.4 QQ-plots of empirical vs. an EV and Archimax copula.

Page 6: HYDROLOGICAL JOINT EVENTS ON THE MORAVA RIVER USING AGGREGATION OPERATORS · 2010-02-09 · lateral channels. Classical multivariate distributions (multivariate normal, bivariate

14 THE MODELLING OF HYDROLOGICAL JOINT EVENTS ON THE MORAVA RIVER USING ...

2009/3 PAGES 9 — 15

According to the conditional Archimax copula from equation 1.9, we get the range of all possible flood volumes V to a given maximum registered flood peak Qmax of 1573 m3/s equal from 20 918 422 to 71 681 881 54 m3 (Fig. 2.5c). The latter one is used when estimating the return period T=1/(1-C(u,v)). For example, the return period for Qmax=1573 m3/s and the maximum flood volume from the range Vmax = 7168188154 m3 is 236 years.

3. CONCLUSIONS

Copula functions provide an excellent area for possible future research as they may be approached from a mathematical, statistical or computational point of view. In recent years, numerous successful

applications of copula methodology have been made, most notably in survival analysis, actuarial science and finance. Most of these models have also been used in hydrology; however, asymmetry that notably occurrs in hydrological or hydro-meteorological data was not taken into account. A multivariate approach using the Archimax copula with our proposed dependence function A could also simply model asymmetric variables. The latter gives visibly better results than other families of copulas. However, investigating different statistical techniques used in calibrating copulas to hydrological data is needed. This would aid in identifying copula functions that are relevant to specific hydrological or hydro-meteorological problems. Furthermore, we could, for example, relate Kendall’s tau or Spearman’s rho with physiographic data (like watershed area, slope, etc.), which are always available in practical cases. Favre, et al. (2004) also proposed to estimate parameters using a Bayesian approach. This method is more suitable than the maximum likelihood method when the sample size is small, as is usually the case in hydrology. The trivariate modeling of flow, volume and duration is also of great interest for hydrologists (Grimaldi, et al., 2005).

AcknowledgementsThe study was solved within the projects VEGA 1/2032/05: „Stochastic Analysis of Hydro-meteorological Processes: Modeling of the Unstationarity, Heteroskedascity and Unlinearity“, and APVT 20 – 003204: „The Latest Methods of the Stochastic and Unstochastic Modeling of the Uncertainty and Their Engineering Applications”.

Fig. 2.5 a) Simulation and b) transformation of 10.000 pairs (Ui,Vi) from Archimax Cθn with φ(t)=A.14, c) possible flood volumes to a given maximum registered flood peak of 1573 m3/s equal from 20918422 to 7168188154 m3.

Table 1 The best GOF-test results in the class with corresponding parameter estimates.

Page 7: HYDROLOGICAL JOINT EVENTS ON THE MORAVA RIVER USING AGGREGATION OPERATORS · 2010-02-09 · lateral channels. Classical multivariate distributions (multivariate normal, bivariate

2009/3 PAGES 9 — 15

15THE MODELLING OF HYDROLOGICAL JOINT EVENTS ON THE MORAVA RIVER USING ...

REFERENCES

[1] Adamson, P.T. - Metcalfe, A.V. - Parmentier, B. (1999) Bivariate Extreme Value Distributions: An Application of the Gibbs Sampler to the Analysis of Floods. Water Resour. Res., 35, 2825– 2832.

[2] Beirlant, J. - Goegebeur, Y. - Segers, J. - Teugels, J. (2004) Statistics of Extremes: Theory and Applications. Wiley.

[3] Capéraa, P. - Fougeres, A.L. - Genest, C. (2000) Bivariate Distributions with Given Extreme Value Attractor. J. Multivariate Anal., 72, 30– 49.

[4] Favre, A.C. - Adlouni, S. El. - Perreault, L. - Thiémonge, N. - Bobée, B. (2004) Multivariate Hydrological Frequency Analysis Using Copulas. Water Resour. Res., 40.

[5] Fisher, N.I. - Switzer, P. (2001) Graphical Assessment of Dependence: Is a Picture Worth 100 Tests? Amer. Statist., 55:233–239.

[6] Frees, E.W. - Valdez, E.A. (1998) Understanding Relationships Using Copulas. North American Actuarial Journal, 2(1):1-25.

[7] Galambos, J. (1987) The Asymptotic Theory of Extreme Order Statistics. Malabar, FL.: Kreiger Publishing Co.

[8] Genest, C. - Favre, A.C. (2007) Everything You Always Wanted to Know About Copula Modeling But Were Afraid to Ask. Journal of Hydrologic Engineering, 12, 347-368.

[9] Genest, C. - Boies, J.C. (2003) Detecting Dependence with Kendall Plots. Amer. Statist., 57:275–284.

[10] Genest, C. - Ghoudi, K. - Rivest, L. (1995) A Semi-parametric Estimation Procedure of Dependence Parameters in Multivariate Families of Distributions. Biometrika 82, pp. 543–552.

[11] Genest, C. - Rivest, L.P. (1993) Statistical Inference Procedures for Bivariate Archimedean Copulas. J. Amer. Statist. Assoc., 88:1034–1043.

[12] Grimaldi, S. - Serinaldi, F. - Napolitano, F. - Ubertini, L. (2005) A 3-copula Function Application for Design Hyetograph Analysis. Proceedings of symposium S2 held during the Seventh IAHS Scientific Assembly at Foz do Iguaçu, Brazil, April 2005. IAHS Publ. 293.

[13] Gumbel, E. J. (1960) Distributions des Valeurs Extrémes en Plusiers Dimensions. Publ. Inst. Statist. Univ. Paris, 9, 171–173, 1960.

[14] Hűsler, J., R.D. - Reiss (1989) Maxima of Normal Random Vectors: Between Independence and Complete Dependence. Statist. Probab. Lett., 7, 283–286.

[15] Joe, H. (1997) Multivariate Models and Dependence Concepts. Chapman and Hall, New York.

[16] De Michele, C. - Salvadori, G. - Canossi, M. - Petaccia, A. - Rosso, R. (2004) Bivariate Statistical Approach to Check Adequacy of Dam Spillway. J. Hydrol. Eng. May 6.

[17] Matúš, R. - Bacigál, T. (2007) Selection of the Right Copula for Hydrological Extremes. Journal of Electrical Engineering Vol. 57, 1-4.

[18] Nelsen, R. B. (1999) An Introduction to Copulas. Lecture Notes in Statistics, Springer-Verlag, New York.

[19] Salvadori, G. - De Michele, C. - Kottegoda, N.T. - Rosso, R. (2007) Extremes in Nature. An Approach Using Copulas. Water Science and Technology Library, Vol. 56. Springer. 296 pp.

[20] Schweizer, B. - Wolf, E. F. (1981) On Nonparametric Measures of Dependence for Random Variables. Ann. Stat., 9, 879–885.

[21] Sklar, A. (1959) Fonctions de Répartition ŕ n Dimensions et Leurs Marges. Publ. Inst. Stat. Univ. Paris, 8, 229–231.

[22 ] Tawn, J. A. (1988) Bivariate Extreme Value Theory: Models and Estimation. Biometrika, 75, 397–415.