957
Preface J. Neyman, one of the pioneers in laying the foundations of modern statistical theory, stressed the importance of stochastic processes in a paper written in 1960 in the following terms: "Currently in the period of dynamic indeterminism in science, there is hardly a serious piece of research, if treated realistically, does not involve operations on stochastic processes". Arising from the need to solve practical problems, several major advances have taken place in the theory of stochastic processes and their applications. Books by Doob (1953; J. Wiley and Sons), Feller (1957, 1966; J. Wiley and Sons) and Lo~ve (1960; D. van Nostrand and Co., Inc.), among others, have created growing awareness and interest in the use of stochastic processes in scientific and technological studies. Journals such as Journal of Applied Probability, Advances in Applied Probability, Zeitschrift ffir Wahrscheinlichkeitsthorie (now called Probability Theory and Related Topics), Annals of Probability, Annals of Applied Probability, Theory of Probability and its Applications and Stochastic Processes and their Applications, have all contributed to its phenomenal growth. The literature on stochastic processes is very extensive and is distributed in several books and journals. There is a need to review the different lines of researches and developments in stochastic processes and present a consolidated and comprehensive account for the benefit of students, research workers, teachers and consultants. With this in view, North Holland has decided to bring out two volumes in the series, Handbook of Statistics with the titles: Stochastic Processes." Theory and Methods and Stochastic Processes: Modeling and Simulation. The first volume is going to press and the second volume, which is under preparation will be published soon. The present volume comprises, among others, chapters on the following topics: Point Processes (R. K. Milne), Renewal Theory (D. R. Grey), Markov Chains with Applications (R. L. Tweedie), Diffusion Processes (S. R. S. Varadhan), Martingales and Applications (M. M. Rao), Ito's Stochastic Calculus and its Applications (S. Watanabe), Continuous-time ARMA Processes (P. J. Brock- well), Random Walk and Fluctuation Theory (N. H. Bingham), Poisson Approximation (A. D. Barbour), Branching Processes (K. B. Athreya and A. N. Vidyashankar), Gaussian Processes (W. Li and Qi-Min Shao), L6vy Processes (J. Bertoin), Pareto Processes (B. C. Arnold), Stochastic Processes in Reliability (M. Kijima et al.), Stochastic Processes in Insurance and Finance (P. A. L. Embrechts et al.), Stochastic Networks (H. Daduna), Record Sequences

(Book) Handbook of Statistics Vol 19 - Elsevier Science 2001 - Stochastic Processes. Theory and Methods North Holland - Shanbhag - Rao

Embed Size (px)

DESCRIPTION

statistics

Citation preview

  • Preface

    J. Neyman, one of the pioneers in laying the foundations of modern statistical theory, stressed the importance of stochastic processes in a paper written in 1960 in the following terms: "Currently in the period of dynamic indeterminism in science, there is hardly a serious piece of research, if treated realistically, does not involve operations on stochastic processes". Arising from the need to solve practical problems, several major advances have taken place in the theory of stochastic processes and their applications. Books by Doob (1953; J. Wiley and Sons), Feller (1957, 1966; J. Wiley and Sons) and Lo~ve (1960; D. van Nostrand and Co., Inc.), among others, have created growing awareness and interest in the use of stochastic processes in scientific and technological studies. Journals such as Journal of Applied Probability, Advances in Applied Probability, Zeitschrift ffir Wahrscheinlichkeitsthorie (now called Probability Theory and Related Topics), Annals of Probability, Annals of Applied Probability, Theory of Probability and its Applications and Stochastic Processes and their Applications, have all contributed to its phenomenal growth.

    The literature on stochastic processes is very extensive and is distributed in several books and journals. There is a need to review the different lines of researches and developments in stochastic processes and present a consolidated and comprehensive account for the benefit of students, research workers, teachers and consultants. With this in view, North Holland has decided to bring out two volumes in the series, Handbook of Statistics with the titles: Stochastic Processes." Theory and Methods and Stochastic Processes: Modeling and Simulation. The first volume is going to press and the second volume, which is under preparation will be published soon.

    The present volume comprises, among others, chapters on the following topics: Point Processes (R. K. Milne), Renewal Theory (D. R. Grey), Markov Chains with Applications (R. L. Tweedie), Diffusion Processes (S. R. S. Varadhan), Martingales and Applications (M. M. Rao), Ito's Stochastic Calculus and its Applications (S. Watanabe), Continuous-time ARMA Processes (P. J. Brock- well), Random Walk and Fluctuation Theory (N. H. Bingham), Poisson Approximation (A. D. Barbour), Branching Processes (K. B. Athreya and A. N. Vidyashankar), Gaussian Processes (W. Li and Qi-Min Shao), L6vy Processes (J. Bertoin), Pareto Processes (B. C. Arnold), Stochastic Processes in Reliability (M. Kijima et al.), Stochastic Processes in Insurance and Finance (P. A. L. Embrechts et al.), Stochastic Networks (H. Daduna), Record Sequences

  • vi Preface

    with Applications (J. A. Bunge and C. M. Goldie), Associated Sequences and Related Inference Problems (B. L. S. Prakasa Rao and I. Dewan). Additionally, the volume includes contributions that address some specific research problems; among these are A. Klopotowski and M. G. Nadkarni, Y. Kakihara, A. Bobrowski et al., and R. N. Bhattacharya and C. Waymire. There are also further chapters that deal with some general or specific themes: the chapter of I. V. Basawa is on Inference in Stochastic Processes and those of B. L. S. Prakasa Rao and of C. R. Rao and D. N. Shanbhag concentrate on covering characterization and identifiability of some Stochastic Processes and related Probability Distributions.

    An effort is made in this volume to cover as many branches of stochastic processes as possible. Also to get the balance right, we have retained some chapters with applied flavour in this volume. In the planned second volume, we keep the option of including one or two chapters of theoretical nature, assuming that they provide with avenues for future research to specialists in applied areas.

    We are most grateful to all the contributors and the referees for making this project successful. Some of the contributors have also reviewed other researchers' work. In particular, we are indebted to D. R. Grey, M. Manoharan and J. Ferreira for accepting the task of reviewing several chapters. Also, we would like to thank the publishing editors of Elsevier, Drs G. Wanrooy and N. van Dijk for their patience and encouragement. Finally, we would like to thank Department of Statistics, The Pennsylvania State University, USA, and Department of Proba- bility and Statistics, The University of Sheffield, UK, for providing us with facilities to edit this volume. This project is supported by the US Army Research Grant DAA H 04-96-1-0082.

    D. N. Shanbhag C. R. Rao

  • Contributors

    O. Arino, Department of Mathematics, University of Pau, 64000 Pau, France IRD, LIA/GEODES, 32, Avenue Henri-Varagnat, 93243 Bondy, France, e-mail: [email protected] (Ch. 8)

    B. C. Arnold, Department of Statistics, University of California, Riverside, CA 92521, USA, e-mail: [email protected] (Ch. 1)

    K. B. Athreya, Departments of Mathematics and Statistics, Iowa State University, Arnes, IA 50011, USA, e-mail: [email protected] (Ch. 2)

    A. D. Barbour, Department of Applied Mathematics, University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland, e-mail: [email protected] (Ch. 4)

    I. V. Basawa, Department of Statistics, University of Georgia, Athens, Georgia 30602-1952, USA, e-mail: [email protected] (Ch. 3)

    J. Bertoin, Laboratoire de Probabilites, Universit6 Pierre et Marie Curie, 175 rue du Chevaleret, F-75013 Paris, France, e-mail: [email protected] (Ch. 5)

    N. H. Bingham, Department of Mathematical Sciences, Brunel University, Uxbridge, Middlesex UB8 3PH, UK, e-mail: [email protected] (Ch. 7)

    R. Bhattacharya, Department of Mathematics, Indiana University, Bloomington, IN 47405, USA, e-mail: [email protected] (Ch. 6)

    A. Bobrowski, Department of Mathematics, University of Houston, 4800 Calhoun Road, Houston, TX 77204-3476, USA On leave from Department of Mathematics, Technical University of Lublin, ul. Nadbystrzycka 38A, 20-618 Lublin, Poland, e-mail: [email protected] (Ch. 8)

    P. J. Brockwell, Statistics Department, Colorado State University, Fort Collins, CO 80523-1877, USA, e-mail: [email protected] (Ch. 9)

    J. Bunge, Department of Statistical Science & Department of Social Statistics, Cornell University, Ithaca, New York 14853-3901, USA, e-mail." [email protected] (Ch. 10)

    R. Chakraborty, Human Genetics Center, School of Public Health, University of Texas, Health Science Center, P.O. Box 20334, Houston, TX 77225, USA, e-mail: [email protected] (Ch. 8)

    H. Daduna, Institute of Mathematical Stockastics, Department of Mathematics, University of Hamburg, Bundesstrasse 55, D-20146, Hamburg, Germany, e-mail: [email protected] (Ch. 11)

    XV

  • xvi Contributors

    I. Dewan, Indian Statistical Institute, 7, S.J.S. Sansanwal Marg, New Delhi 110 016, India, e-mail: [email protected] (Ch. 20)

    P. Embrechts, Department of Mathematics, ETHZ, CH-8092 Zurich, Switzerland, e-mail: [email protected] (Ch. 12)

    R. Frey, Swiss Banking Institute, University of Zurich, Plattenstr. 14, CH-8032 Zurich, Switzerland, e-maib [email protected] (Ch. 12)

    H. Furrer, Swiss Re Life & Health, Mythenquai 50/60 CH-8002 Zurich, Switzerland, e-mail: [email protected] (Ch. 12)

    C. M. Goldie, School of Mathematical Sciences, University of Sussex, Brighton BN1 9QH, England, e-mail: [email protected] (Ch. 10)

    D. R. Grey, Department of Probability and Statistics, The University of Sheffield, Sheffield, $3 7RH, UK, e-mail: [email protected] (Ch. 13)

    Y. Kakihara, Department of Mathematics, University of California, Riverside, CA 92521-0135, USA, e-mail: [email protected] (Ch. 14)

    M. Kijima, Faculty of Economics, Tokyo Metropolitan University, 1-1 Minami- Ohsawa, Hachiohji, Tokyo 192-0397, Japan, e-mail: [email protected] (Ch. 15)

    M. Kimmel, Department of Statistics, Rice University, P.O. Box 1892, Houston, TX 77251, USA, e-mail: [email protected] (Ch. 8)

    A. Ktopotowski, Institut Galilde, Universit~ Paris XIII, 93430 Villetaneuse cedex, France, e-mail.'[email protected] (Ch. 16)

    H. Li, Department of Pure and Applied Mathematics, Washington State University, Pullman, Washington 99164-3113, USA, e-mail: [email protected] (Ch. 15)

    W. V. Li, Department of Mathematical Sciences, University of Delaware, Newark, DE 19716, USA, e-mail: [email protected] (Ch. 17)

    R. K. Milne, Department of Mathematics and Statistics, The University of Western Australia, Nedlands 6907, Australia, e-maiL" [email protected] (Ch. 18)

    M. G. Nadkarni, Department of Mathematics, University of Mumbai, Kalina, Mumbai, 400098 India, e-mail:[email protected] (Ch. 16)

    B. L. S, P. Rao, Indian Statistical Institute, 7, S.J.S. Sansanwal Marg, New Delhi, 110016, India, e-mail: [email protected] (Chs. 19, 20)

    C. R. Rao, Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA, e-maiL" [email protected] (Ch. 21)

    M. M. Rao, Department of Mathematics', University of California, Riverside, California 92521-0135, USA, e-mail: [email protected] (Ch. 22)

    M. Shaked, Department of Mathematics, University of Arizona, Tucson, AZ 85721, USA, e-mail." [email protected] (Ch. 15)

    D. N. Shanbhag, Probability and Statistics Section, School of Mathematics and Statistics, University of Sheffield, Sheffield, $3 7RH, UK, e-maiL" [email protected] (Ch. 21)

    Q.-M. Shao, Department of Mathematics, University of Oregon, Eugene, OR 97403, USA, e-maiL" [email protected] (Ch. 17)

  • Contributors xvii

    R. L. Tweedie, Division of Biostatistics School of Public Health, University of Minnesota, Minneapolis, MN 55455-0378, USA, e-mail: tweedie@ space.state.colostate.edu (Ch. 23)

    S. R. S. Varadhan, New York University, Courant Institute of Mathematical Sciences, New York, NY 10012-1185, USA, e-mail: [email protected] (Ch. 24)

    A. N. Vidyashankar, Department of Statistics, University of Georgia, Athens, GA 30612, USA, e-mail: [email protected] (Ch. 2)

    S. Watanabe, Department of Mathematics, Graduate School of Science, Kyoto University, Kyoto 606-8502 Japan, e-mail: [email protected] (Ch. 25)

    E. C. Waymire, Department of Mathematics, Oregon State University, Corvallis, OR 97331, USA, e-mail: [email protected] (Ch. 6)

  • D. N. Shanbhag and C. R. Rao, eds., Handbook of Statistics, Vol. 19 | 2001 Elsevier Science B.V. All rights reserved. 1_

    Pareto Processes

    Barry C. Arno ld

    I. Introduction

    As Vito Pareto (1897) observed, many economic variables have heavy tailed distributions not well modelled by the normal curve. Instead, he proposed a model subsequently christened, in his honor, the Pareto distribution. The defining feature of this distribution is that its survival function P(X > x) decreases at the rate of a negative power of x as x -+ oc. Thus we have

    P(X > x) ~ cx ~[x ---+ oc] . (1.1)

    A spectrum of generalizations of Pareto's distribution have been proposed for modelling economic variables. A convenient survey may be found in Arnold (1983). The classical Pareto distribution has a survival function of the form

    {'x(x) = (x/a) -~, x > a (1.2)

    where a > 0 is a scale parameter and e > 0 is an inequality parameter. I f X has distribution (1.2) we will write X ~ ~(I)(o-, e).

    A minor modification of (1.2), obtained by introducing a location parameter, is as follows

    . Fx(x) = 1 4- x > # (1.3)

    Here # is a location parameter, a a scale parameter and e an inequality parameter. I fX has distribution (1.3) we write X ~ ~(/ / ) (#, o5 e).

    A third variant of Pareto's distribution has as its survival function

    Fx(x) = 1 + c~ x > # (1.4)

    where # is a location parameter, a is a scale parameter and e is an inequality parameter. I f X has distribution (1.4), we write Y ~ N(III)(#, o, ~).

    Clearly all three of the Pareto distributions (1.2)-(1.4) exhibit the tail behavior (1.1) postulated by Pareto. In practice, it is difficult to discriminate between models (1.3) and (1.4) and the choice may justifiably be made on the basis of which

  • 2 B.C. Arnold

    model is mathematically more tractable. In Section 2 we will review distributional properties of these Pareto models. However, in economic applications one rarely encounters random samples from specific distributions. More commonly one encounters realizations of stochastic processes. An argument (modelled after Pareto's arguments) can be made to justify an assumption that the observed processes will have Pareto marginal distributions. The present chapter will have as its chief focus, a survey of stationary stochastic processes with Pareto marginal distributions. They will provide reasonable alternatives to the normal and/or log-normal processes which are frequently used to model economic time series. One-dimensional processes will generally be described but natural extensions to multivariate settings will be pointed out along the way.

    2. Distributional properties of Pareto variables

    A convenient reference for distributional properties of the Pareto and various generalized Pareto distributions is Arnold (1983). In this section we will document only those properties that play salient roles in the development of the stochastic processes to be described in subsequent sections.

    The classical Pareto distribution is intimately related to the (shifted or trans- lated) exponential distribution. A random variable Y has a translated exponential distribution, written Y ~ T exp(#, ~) if it admits the representation

    Y = # + aZ (2.1)

    where # E R, a E R + and Z has a standard exponential distribution; i.e.

    P(Z>z)=e z, z>0 . (2.2)

    I fX has a classical Pareto distribution, specifically X ~ ~(I)(a, e), then if we define Y = log X it is readily verified that Y ~ Texp(log o-, l/e). In particular the loga- rithm of a classical Pareto variable with unit scale parameter, i.e. N(1, e), will have an exponential distribution with scale parameter 1/e or, equivalently, intensity e.

    If XI,X2,... are i.i.d, exponential random variables with common intensity 2 and if N, independent of the Xis, has a geometric (p) distribution, i.e., P(N=n)=-p(1-p)"-m,n=l,2, . . . , then Y=~=IX , - will again have an exponential distribution with intensity p2. Thus pY d=X1 (where here and henceforth, ~ indicates equality in distribution). Perhaps the simplest verification of the fact that pY d= X1 involves evaluation of the moment generating function of pY by conditioning on N. If we recall the simple relationship between the classical Pareto distribution and the exponential distribution, we may immediately write down a distribution result involving geometric products of Pareto variables. Specifically, if X1,X2,... are i.i.d. ~(I)(1, e) random variables and if N, inde- pendent of the Xs, has a geometric (p) distribution, then if we define

    N

    Y = l - Ix i i=1

  • Pareto processes 3

    we find that

    Yp xl (2.3)

    Since minima of independent exponential random variables are exponentially distributed, a similar property holds for classical Pareto variables. Specifically, if X1,... ,Xn are independent random variables with X~ ~ N(I)(a, o~i), i = 1,2, . . . , k then

    i=l,2,...,n i=1

    As we shall see, properties (2.3) and (2.4) will be useful in constructing a variety of stochastic processes with classical Pareto marginals.

    The Pareto 0II) family of distributions is closed under geometric minimi- zation. Specifically, if X1,X2,. . . are independent, identically distributed ~(II I)(#, ~, c~) variables and if N, independent of the X,.s has a geometric (p) distribution, we define

    Y = min Xi . (2.5) i

  • 4 B .C . Arnold

    from that of Y = mini

    k xi -- #i = 1+ , x i>#i , i= l ,2 , . . . , k . (3.1)

    .=

    For any kl < k, all kl dimensional marginals are again of the (kl-dimensional) form (3.1) (merely set "unwanted" xis equal to their corresponding value #i in (3.1)). To indicate partitioning of the k-dimensional vector, X, into kl and k - kl dimensional subvectors, we use the notation X = (2,2'). Analogously we parti- tion # = (/2,)i) and a = (6-, "_d). Then we have

    2 ~ M* (3.2)

  • Pareto processes 5

    and of course, univariate marginals are of the Pareto (II) form displayed in (1.3). Conditional distributions are also in the same family. One finds

    [~ ~_ y~ e'..a Me(kl)(II) (fi_, c (2_)~_, c~ 4. k - k l) (3.3)

    where

    E k X

    i=kll \ Gi /I

    A convenient stochastic representation of the multivariate Pareto (II) distri- bution is available. One may construct an MP (k) (II) random vector X by defining

    Xi = p, + ai(W~/Z), i= l ,2 , . . . , k (3.4)

    where the W/s are independent standard exponential variables and where Z (independent of the ms) is a F(~, 1) random variable.

    A minor modification of the representation (3.4) yields a multivariate Pareto (III) distribution. We will have X ~ MP(k) (III) (p, a_, ~_) if

    Xi = #i 4- cri(WilZ) 1/cq, i = 1,2,... ,k (3.5)

    where W1, W2,..., Wk and Z are independent standard exponential variables. The corresponding joint survival function is of the form

    = - - , X i>#i , i= 1,2,. . . ,k . (3.6) i=1 \ O-i J

    This will have kl-dimensional marginals of the MP (kl)(III) form. The conditionals are, it turns out, of a related Pareto (IV) form (see Arnold, 1983, Chapter 6 for details).

    Properties of the classical Pareto and the Pareto (III) distribution in con- junction with the information provided by the representations (3.4) and (3.5) can be used to generate a wide variety of alternative multivariate Pareto distributions. For example, we know from (3.5) that certain scale mixtures of Weibulls yield Pareto (III) variables. In (3.5) the W/s were taken to be independent standard exponential variables. Instead, the joint distribution of W could be taken to be any one of the wide variety of multivariate exponential distributions (with stan- dard exponential marginals) available in the literature. Arnold (1975) and Block (1975) catalog a variety of such distributions. See also Hutchinson and Lai (1990).

    An alternative route, again relying on the available plethora of multivariate exponential distributions, involves marginal transformations. Begin with W having standard exponential marginals. Then define

    Xi=#i+a i (e ~/~-1) , i= l ,2 , . . . , k (3.7)

    to get a multivariate Pareto (II) vector X (set #i = o-i to get a k-variate Pareto (I) distribution). Alternatively define

  • 6 B.C. Arnold

    Y i=#i+a i (e N- l ) 1/~, i= 1,2 , . . . ,k (3.8)

    to get a multivariate Pareto (III) distribution. For example, a suitable choice of multivariate exponential distribution for W (namely, one for which f'w(_w) = (~=1 ew~ - k+ 1) -1, w > 0_) yields, using (3.7), the MP(k)( I I ) (~,a,c~) displayed in (3.1).

    The fact that geometric sums of independent exponential variables are again exponential can and has been used extensively to generate multivariate expo- nential distributions. One can use dependent or independent geometric random variables in conjunction with vector random variables with dependent or inde- pendent exponential marginals. A parallel spectrum of possible constructions involve geometric minima.

    Suppose XI,X2,. . . are independent, identically distributed, random vectors and suppose that N, independent of the Xs, has a geometric (p) distribution. Using a coordinatewise definition of the minimum of random vectors, we may define

    Y = min X i . (3.9) - - i < N - -

    Now suppose that for some vector _c(p) we have

    c(p)_V ~ x_l . (3.10)

    where the multiplication of vectors is assumed to be done coordinatewise. We, following Rachev and Resnick (1991), will say that the common distribution of the Xis is min-geometric stable (max-geometric-stability is obviously defined in a parallel fashion). In one dimension, for (3.10) to hold for every p, it was necessary and sufficient that the Xs have Pareto (III) distributions. In k dimensions a rich class of solutions to (3.10) is available. We know that the marginals must be of the Pareto (III) form. However the joint distribution can have a variety of forms. Using results in Resnick (1987, Chapter 5) and in Rachev and Resnick (1991), relating min-geometric stable distributions to min-stable distributions which are then related to multivariate extreme distributions, we arrive at the following characterization of joint distributions for the Xs such that (3.10) holds for every p E (0, 1). There must exist non-negative integrable functions J}(s), i = 1,2, . . . , k on [0, 1] satisfying

    01j~(s)ds = 1, i=- 1 ,2 , . . . ,k

    and

    f'x(x_) = 1 + max (s) x i -# i ds (3.11) l_

  • Pareto processes 7

    [ 1 fix(X_)= 1+ (x i -# i ) / , x_># (3.t2) i=1\ ai / ] (a simple transform of a family of k-dimensional logistic distributions discussed by Strauss, 1979).

    Semi-Pareto variants of (3.11) are available (they will generally satisfy (3.10) for just one fixed value of p). They are of the form

    [ f01 ] F'x(x_) = 1 + max[fi(s)gi(x)]ds, l . (3.15) - - P i=1 \ O'i / /

    Equation (3.15) provides another example of a min-geometric stable Pareto (III) distribution (because a geometric sum of i.i.d, geometric random variables is again geometric). This will be true for any Y obtained via (3.9) beginning with any choice of distribution of the Xjs with Pareto (III) marginals. Thus, repeated geometric minimization will not lead to a broader class than (3.11). In fact, we have an alternative description of the class of all k-dimensional min-geometric stable Pareto (III) distributions. They are of the form (3.9) for somep E (0, 1) and some arbitrary choice of Fx(x_) with Pareto (III) marginals.

    A broader class of k-dimensional Pareto (III) variables can be encountered if we define _Y by

    Y,- = min X/j (3.16) j_

  • 8 B.C. Arnold

    of multivariate exponential distributions introduced in Arnold (1975)). See Arnold (1990) for an example of a highly flexible family of multivariate Pareto (III) distributions derived using a geometric minimization model with dependent Nis.

    It will be recalled that in one dimension, geometric-product stability was a salient property of the Pareto (I) (o-, c~) distribution. It is natural to seek multi- variate parallels of this. A k-dimensional random vector X has a distribution that is geometric-multiplication stable if whenever we take X1,X2,... i.i.d, with the same distribution as X and N independent of the X~s being a geometric (p) random variable we have a(p)[I-[i 1 for every j)

    ~0x_(s_) = E (j_I~IXJXJ), s < 0 . (3.17)

    Referring to Mittnick's and Rachev's representation theorem for multivariate geometric summation stable distributions (using Laplace transforms instead of characteristic functions since our random vector Z has non-negative coordinates) we may write

    qgx(S ) = [1 - logq~(s_)] 1 (3.18)

    where ~b(s) is the Laplace transform of a non-negative stable random vector with degenerate marginal distributions, i.e., of some stable non-negative random vector Z_ with Zj = c 9 w.p. 1 for each j. This last curious constraint is required to ensure that the marginals of X are of the Pareto (I) form as desired.

    As we have seen, some of the multivariate Pareto distributions introduced above have, in addition to Pareto marginal distributions, Pareto conditional distributions. It is of interest to explore the class of distributions with Pareto conditionals since they may provide plausible alternatives to the usual multi- variate Pareto distributions. For notational convenience we will focus our dis- cussion on the bivariate case. A helpful reference on distributions with specified conditionals is Arnold et al. (1992).

  • Pareto processes 9

    Suppose that we wish to identify all bivariate distributions for a random vector (X, Y) such that all of its conditionals are of the Pareto (II) form, i.e., such that for each y > 0

    X]Y = y ~ Pareto (II)(#1(y), o- 1 ~), 0~ 1 (y)) (3.19)

    and for each x > 0,

    Y IX = x ~ Pareto (II)(/~2(x), o-2(x), c~2(x)) . (3.20)

    We saw earlier that the modified Mardia bivariate Pareto (II) distribution, (3.1), had Pareto (II) conditionals (see (3.3)). In addition, it has Pareto (II) marginals. If we wish to identify all possible distributions with Pareto (II) conditionals as in (3.19)-(3.20), we need to consider the following equation obtained by writing the joint density of (X, Y) as a product of a marginal and a conditional density in two possible ways

    [fy(y)/al(y)]I(x > #I(Y)) = ~fx(x ) /a2(x ) ] I (y >/~2(x)) (3.21) x_/~ 1 (y)] ~1 (y)+l [1 + y-uz(x)] ~2(x)+1

    1 ~,(y) j ~-77~7-J +

    In its most general form, (3.21) is difficult to solve. It can be readily solved in the special case in which we assume #1 (Y) =/~1,/*2(x) = #2, cq (x) = c~2(y) = ~. In this case, following arguments presented in Arnold (1987), we find that the joint density of (X, Y) must be of the form

    fx ,y (x ,y ) = c(2)[20 + 21(x - I.fi) + 22(y - #2)

    + 212(x - &)(y - #2)]-(~+I)I(x > #l)I(y > #2) (3.22)

    for suitable choices of the parameters 20, 21,22,212. c(_2) is a normalizing constant chosen to ensure that the density integrates to 1.

    The location parameters #l and/z 2 can assume any real value. The 2s must all be non-negative and, to ensure integrability, we must distinguish two cases:

    (i) ~ E [0, 1]. In this case we require 21 > 0, ")~2 > 0,212 > 0 and 20 _> 0. (ii) ~ E (1, ec). In this case we must have 2o > 0,21 > 0,22 > 0 and 212 _> 0.

    If the joint density is of the form (3.22), the corresponding conditional dis- tributions are of the following simple form

    ( 20 +22(y--/-~2) ) (3.23) XIY - -y ~ Pareto (II) #1,~ ~_ 21~C~-2),c~

    and

    20 Jr- 21(X -- /~1) ) (3.24) Y]X = x ~ Pareto (II) ~2 ,~_ ~z~- ~-l)' c~ .

    We turn next to consider joint densities for (X, Y) which have Pareto (III) conditionals, i.e., such that for every y > 0,

  • 10 B.C. Arnold

    X I Y = y ~ Pareto (III)(//l(y), crl (y), ~1 (Y)) (3.25)

    and for each x > O,

    Y Ix = x ~ Pareto ( I I I ) ( / /2 (x) , a2(x) , c~2(x)) . (3 .26)

    Writing the joint density as a product of a marginal and a conditional density in the two possible ways yields the following equation.

    f r (Y) ~Z~ 5 > (y))

    [ Fx-&(Y)lcq(Y)] 2

    = Jxv~j ~ L~J ~2(x)-llvY < >//2(x)) (3.27)

    Equation (3.27) like (3.21) is difficult to solve in general. A special case, in which solution is straightforward, occurs when //L(Y)=//1,//2(x) =//2 and ~l(X)= el, e2(y) = c~2. In such a case, it is evident thatX = (X - / / i ) ~ and I? = (Y -//2) ~= will have Pareto (II) conditionals (with / /= 0 and ~ = 1). It follows from our earlier observations that

    f2s(2,fi) = c(_2)[1 J[1.~ + 2233 + .~12#33]-2I(~ " > O)I()~ > O) (3.28)

    and then, transforming back to (X, Y), we find

    fxy(x,y) ---- C(.),) O{I(X -- //l)a' lc~2(y -- //2)~2-1/(X > //1)/(Y )" //2)

    [1 -- J~l (X -- //i) cq @ ~2(Y -- /'/2) c~2 /~12( x -- //1)cq (y -- //2)c~2] 2

    (3.29)

    and

    Y lX=x~(HZ) //2, ~2_}_)q2(x_//1)~lj ,~2) (3.31)

    In Section 8, we will describe certain Markov processes with transition probabilities governed by the conditional distributions in (3.23) and (3.30). In

    where 21 > 0, 22 > 0,212 > 0, ~1 > 0, ~2 > 0 and//~ E R, #2 C R. If the joint den- sity of (X, Y) is of the form (3.29) then the corresponding conditional distribu- tions are of a relatively simple form

    ( [ 1+/~2(Y_--__//2)~ ll/al "~ XIY : y ~ ~(111) //1, ~7- - - ~' (3.30) _}_ ~ul2(Y __ ]12) _j 9~I)

  • Pareto processes 11

    order for these chains to be stationary we require exchangeable versions of the joint densities (3.22) and (3.29) (i.e., 21 = 22,/~1 = #2 and, in (3.29), ~1 = c~2).

    Note that the models (3.22) and (3.29), which have Pareto conditionals, in general, do not have Pareto marginals. The exceptional case occurs in model (3.22) if )L12 = 0 (and necessarily ~ > 1) in which situation, the model reduces to Mardia's bivariate Pareto distribution (cf. (3.1), note that the e in (3.1) is one less than the c~ in (3.22)).

    Multivariate versions of the Pareto conditionals densities (3.22) and (3.29) are readily identified. They will have conditional densities of X/ given X(i)= x_(0 (where X(i ) is X with the ith coordinate deleted) of, respectively, Pareto (II) or Pareto (III) form. They are

    (a) Multivariate Pareto (II) conditionals distribution.

    fx(x) = ~_ I-[(xi - #i) "i I(x_ > ~) (3.32) i=1

    where ~k is the set of all vectors 0s and ls of dimension k. All the fi~s are non-negative in (3.32). Some, but not all, can be zero.

    (b) Multivariate Pareto (III) conditionals distribution.

    fX(X__) = I-[~-l [O~i(Xi -- fli)~i-llI(x- ) [l) (3.33) - - 2

    where ~k is as defined following (3.32). The fi, s in (3.33) are non-negative and again, some but not all can be zero.

    4. Pareto processes

    Any stochastic process whose marginal distributions are of the Pareto and/or generalized Pareto form could legitimately be called a Pareto process. Un- doubtedly, the same could be said for any Markov process whose transition distributions are of the Pareto form. It is clearly impossible to survey all such processes. Our attention will be concentrated on processes which can be said to be autoregressive in nature and whose structure mirrors the classical normal auto- regressive process, the differences of course being,

    (i) Pareto distributions will play the role played by normal distributions and (ii) Geometric multiplication or minimization will replace addition in modeling

    dependence of the value of the process at time n on its values at previous times.

    The classical normal autoregressive processes have proved to be flexible and useful modeling tools. The Pareto processes to be introduced can be expected to

  • 12 B. C. Arnold

    better model time series with heavy tailed marginals. As we shall see, a variety of sample path behaviors will be encountered among the different models and selection of the appropriate model will depend on knowledge of the typical nature of the stochastic evolution of the process being modelled. For example, a series exhibiting frequent flat spots in its trajectory of random duration might be well fitted by a particular type of Pareto process, while a series exhibiting steady growth interrupted by random catastrophic decreases might be better fitted by an alternative model. We will introduce, in Sections 5 and 6, a spectrum of first order autoregressive models in which the value of the process at time n, depends on its value at the immediately preceding time point, n - 1, and on an independent "innovation" variable; as with normal processes. Subsequently, in Section 7, we will discuss variant processes, including absolutely continuous versions, higher order autoregressions, analogs to moving averages and ARMA models and multivariate processes. The discussion in Sections 5 and 6 will be quite detailed. The development of higher order models will be considerably more cursory. It is recognized that, in applications, higher order models (and models with even more bells and whistles) need to be considered; however the basic ideas discussed in Sections 5 and 6, should be adequate to permit the interested researcher to flesh out the needed details on a case by case basis. In Section 7, a brief introduction is provided to related topics such as Semi-Pareto processes, general minification processes, processes with Pareto conditionals and processes involving Markovian minimization.

    5. Autoregressive elassiealPareto processes

    The two basic processes to be presented are direct transformation of well known exponential processes, those introduced by Lawrence and Lewis (1981) and by Gaver and Lewis (1980). It thus seems appropriate to label the processes with the names of these researchers.

    5.1. The Lawrence-Lewis classical Pareto process

    Begin with a sequence, the innovation sequence, { n}n=l of independent identi- cally distributed .~(I)(1, c~) random variables (recall (1.2)). We will use these to construct an autoregressive stationary stochastic process with the Pareto (I) (a, ~) distribution as its stationary distribution. For n = 1,2,. . . define

    Xn = o-e p with probability p

    = X~_le p with probability 1 -p (5.1)

    where p E (0, 1). Such a process will be called a first order Lawrence-Lewis Pareto (I) process or more briefly an LLP(I)(1) process (a first order autoregressive Pareto (I) process). The notation is clearly designed to extend to higher order

    U processes. If we introduce a sequence { n}~=l of i.i.d. Bernoulli (p) random variables the process can be described by a more compact formulation:

  • Pareto processes 13

    I~ ~ l-U~ ~U,,op Xn = t .An-1) ~' % . (5.2)

    If we take logarithms in this expression, we arrive at the Lawrence and Lewis (1981) NEAR process with exponential marginals. If a process, {Xn}, is defined using (5.1) (equivalently (5.2)), then provided that X0 ~ N(I)(o-, ~), it is readily verified that, for every n, Xn ~ N(I)(o-, c~). It is then, as advertised, a stationary process with the desired common distribution of the X,s. The influence of the parameters ~ and p on the sample path behavior of the process can be appreciated upon viewing the simulated sample paths displayed in Figure 1. Note that if the initial distribution (that of X0) is not of the Pareto (I) (a, c~) form, the process will not be stationary but it will be the case that X, converges in distribution to the Pareto (I) (o-, ~) form as n --* oo.

    alpha=l ,p=.3

    2.

    o. ~-~, . . .~ ~;o go do 8"0 l b0

    Time

    alpha=2,p=.3

    o . . . . . .

    0 20 40 60 80 100 Time

    alpha=4,p=.3

    /iX / 8. .~. P o

    0 i0 4'0 ;o do loo Time

    alpha=l ,p=.5 o

    0 20 40 60 80 100

    Time

    alpha=2,p=.5

    0 20 40 60 80 100

    Time

    alpha=4,p=.5

    o

    0 20 40 60 80 100 Time

    alpha=l ,p=.7

    8

    o " l

    0 20 40 60 80 100 Time

    aipha=2,p=.7

    o4

    0 20 40 60 80 100

    Time

    alpha=4,p=.7 o

    . . . . . .

    0 20 40 60 80 100 Time

    alpha=l ,p=.9

    O1 . . . . . .

    0 20 40 60 80 100

    Time

    alpha=2,p=.9

    0 20 40 60 80 100 Time

    alpha=4,p=.9

    ilN o l

    0 20 40 60 80 100 Time

    Fig. 1. Simulated sample paths of Lawrence-Lewis classical Pareto processes.

  • 14 B. C. Arnold

    Provided that e > 2 (to ensure the existence of second moments) we can compute the autocovariance (cov(Xn_a,X,)) by conditioning on U, in (5.2), recalling that X~-I, e,, and U~ are independent. We find

    cov(Xn_i ,Xn) = _ (1 -p)c~2a 2 (5.3) (~ - 1)2(c~ - 2)(c~ -p )

    and consequently the autocorrelation is given by

    p(X,_ I ,X,) - (1 -p )~ (5.4) 0~- -p

    The negative sign of the autocorrelation may be used as a diagnostic key in deciding whether a particular time series might appropriately be modelled by a Lawrence-Lewis Pareto (I) process. It is possible to write down expressions for autocorrelations corresponding to lags bigger than 1, but they do not have a simple form. The following expression for E(X,X,+k) gives the flavor of the complications encountered in such computations

    pc~2 [.1--(1--p)k(~_~p)k] (1 - -p )kc~(O: )k E(X,X,+k) = (c~ - 1 ) (e -p ) L 1 - (1 -p)(~@p) j + ~ ~ "

    (5.5)

    Fluctuation probabilities (i.e., P(X,_I < X,) are potentially useful diagnostic tools for modelling purposes. For the Lawrence Lewis Pareto (I) process we find

    P(Xn 1 < X . ) = P (X ._ 1 < yl~Un~n)

    = P(x21 < =pP(X,-I < e p) + (1 -p)P(e~ > 1)

    1 - - (5 .6 )

    l+p

    Equations (5.4) and (5.6) provide quick consistent method of moments esti- mates ofp and e based on an observed sample path realization from an LLP(I)(1) process.

    5.2. The Gaver-Lewis classical Pareto process g oo Again we begin with an innovation sequence, { n}n=l of independent identically

    distributed ~(I)(a, c~) random variables. This time for n = 1,2,.. . we define

    X, = ~pX15pe, with probability p

    = aPX~5~ with probability 1 -p (5.7)

    where p E (0, 1). Such a process will be called a first order Gaver-Lewis Pareto (I) process (GLP(I)(1)). Higher order extensions are possible. The logarithm of the

  • Pareto processes 15

    process defined by (5.7) is recognizable as the Gaver-Lewis (1980) exponential U, oo process, hence the name of the corresponding Pareto process. If { n}n=l is a

    sequence of i.i.d. Bernoulli (p) random variables, independent of the ens then we can describe the process in a slightly more succinct form

    ~yl-p% (5.8) y . = o" x . 1~.

    Such a process has a stationary distribution of the classical Pareto (I) (o-, ) form. Consequently, if X0 ~ N(I)(o-, e), the process is completely stationary. The in- fluence of the parameters e and p on the sample path behavior of the process can be understood by referring to Figure 2, which displays simulated sample paths for various values of the parameters.

    o

    o .

    o .

    o .

    o.

    g

    o

    o ~o

    o

    o

    o o

    o

    o

    o

    0.

    o.

    alpha=l ,p=.3 alpha=l ,p=.5

    ' ~0 4o 6o 80 100 Time

    alpha=2,p=.3

    0 20 40 60 80 100

    Time

    alpha=4,p=.3 llll 20 40 60 80 100

    Time

    1:3

    o~

    oo

    o

    o, ~0

    o .

    o,

    0 20 40 60 80 100

    Time

    alpha=2,p=.5

    i 0 20 40 60 80 100

    Time

    alpha=4,p=.5

    i llll llrrjl[rt 0 20 40 60 80 100

    Time

    alpha=l ,p=.7

    0 20 40 60 80 100

    Time

    alpha=2,p=.7

    o

    o

    o, , . . . . .

    0 20 40 60 80 100

    Time

    alpha=4,p=.7

    g.

    8

    o

    o

    o

    o . . . . . .

    20 40 60 80 100 Time

    alpha=1 ,p=.9

    o 1

    0 20 40 60 80 100

    Time

    alpha=2,p=.9

    0 20 40 60 80 100

    Time

    alpha=4,p=.9 o

    0 20 40 60 80 100 Time

    Fig. 2. Simulated sample paths of Gaver-Lewis classical Pareto processes.

  • 16 B. C. ArnoM

    Assuming c~ > 2, the autocovariance structure of the process can be investi- gated. We find

    CoV(Xn_ l ,Xn) = (1 -p )~ (c~ +p - 2)(e - 1) 2 (5.9)

    and consequently the lag 1 autocorrelation is given by

    D(yn l'Yn ) = (1 -- 2) (5.10) + p -- 2)

    Observe that this autocorrelation is positive (in contrast to that of the Lawrence- Lewis Pareto process which has negative autocorrelation (cf. Eq. (5.4)). It may be remarked that the corresponding exponential processes (the Lawrence-Lewis and Gaver-Lewis processes) are not distinguishable by their autocorrelation structure.

    The fluctuation probabilities of the Gaver-Lewis Pareto process are also po- tentially useful for modelling diagnostics. We find, using (5.8), and assuming o- = 1 without loss of generality,

    ..1 -p _U~ P(Xn-1 < X.) = P(Xn-1 < An_ 1 ~,. }

    = pP(XP._I < ~.)

    = p2/(1 +p) (5.11)

    Equations (5.10) and (5.11) provide quick consistent method of moments estimates of p and c~ based on an observed sample path realization from the Gaver-Lewis Pareto process.

    The structure of the Gaver-Lewis process leads to a curious free lunch for estimating p. It is possible and indeed probable that the process will generate runs of values which are such that their logarithms form a geometric progression. When this happens, we can determine p exactly from the sample path realization! A similar situation will be encountered with the Yeh-Arnold-Robertson process to be introduced in the next section. One way to avoid this anomalous behavior is to consider absolutely continuous variant processes (see Section 7).

    6. Autoregressive Pareto (III) processes

    The role played by multiplication in Section 5 will, in this section, be played by minimization. Again two basic processes will be discussed. The first was intro- duced in Yeh et al. (1988). The second, in a slightly disguised and in fact time reversed form, was introduced in Arnold (1989). The present formulations are designed to highlight the close parallels between these two Pareto (III) processes and the two classical Pareto processes introduced in the last section.

  • Pareto processes 17

    6.1. The Ye~Arnold-Robertson Pareto ( I l l ) process g oo Begin with an innovation sequence { ,,}n=l of independent identically distributed

    Pareto (III) (0, a, c~) random variables. For n = 1,2,.. . , following Yeh et al. (1988), define

    Xn = p-1/~Xn 1 with probability p

    = min{p-1/~X,-1, ~n} with probability 1 -p (6.1)

    where p E (0, 1). Such a process will be called a first order Yeh Arnold- Robertson Pareto (III) process (YARP(III)(1)). If we define { n}~=l to be a sequence of i.i.d. Bernoulli (p) random variables (independent of the e,s) then we can describe the process in more succinct form

    - f 1"~ 1 )

    where 1/0 is interpreted as +ec. By conditioning on U,, it is readily verified that the YARP(III)(1) process has a Pareto (III) (0, o-, c~) stationary distribution and will be a completely stationary process if Xo ~ ~(III)(O,{r,~). Representative simulated sample paths for a variety of values of p and ~ are displayed in Figure 3.

    The lag one autocovariances of this process involve evaluation of and inte- gration of incomplete beta functions. They are most easily obtained by simulation or by numerical integration. See Yeh et al. (1988) where a brief table of ap- proximate autocorrelations may be found.

    Fluctuation probabilities are not difficult to evaluate for the YARP(III)(1) process. Referring to (6.1) or (6.2), and noting that p-1/~ > 1, we have

    P(X,_I < Xn) = 1 - P(X, ~ > X,)

    = 1 - (1 -p)P(x._l >

    _ 1 +p (6.3) 2

    (since en and X,_I are i.i.d.). This simple expression for P(Xn-i < Xn) can be used to develop a simple consistent estimate of p based on an observed sample path from the process. Estimation of the other parameters of the process (~ and a) can be accomplished via the method of moments. Yeh, Arnold and Robertson (1988) recommend a logarithmic transformation be used to avoid moment assumptions.

    This stochastic process has some remarkable features. One is that, with posi- tive probability, it can generate runs of values in exact geometric progression. An analogous phenomenon was noted in the study of the Gave>Lewis Pareto pro- cess. In the present case, perusal of a long sample path from a YARP(III)(1) process would allow us to know exactly the value ofp -1/~. A minor modification of the process, described below in Section 7, will be free of this defect. It can be observed that runs of values in exact geometric progression (due to a constant inflation factor) might be quite appropriate in certain economic scenarios.

  • 18 B. C. Arnold

    .

    o

    o.

    o.

    o

    o

    o

    o .

    o.

    o

    o.

    o .

    o.

    0

    alpha=l ,p=.3

    ~0 4o 6o 80 l oo Time

    alpha=2,p=.3

    II

    ; ~0 4"o 6"o ~0 i~0 Time

    alpha=4,p=.3

    1

    2'0 40 60 8o 100 Time

    o.

    o.

    o.

    oi

    o co.

    o~

    o 4

    ,4

    o-I

    0

    alpha=l ,p=.5

    L_ 0 20 40 60 80 100

    Time

    alpha=2,p=.5

    ~0 4'0 60 8"0 100 Time

    alpha=4,p=.5

    20 40 60 80 100 Time

    alpha=l ,p=.7

    o o

    o

    o .

    o .

    oiL_ ax ' 2'0 4o 60 o0 lo0

    Time

    alpha=2,O=. 7 o

    o

    o

    o

    o

    0 20 40 60 80 100 Time

    alpha=4,p=.7

    o Q .

    o I

    o

    o

    o I o

    0 2'0 40 60 8o 100 T ime

    alpha=l ,p=.9

    o o

    o

    o ~o

    o

    o

    0 20 40 60 80 100 Time

    alpha=2,p=.9

    o

    olt_ [ 0 20 40 60 80 100

    Time

    alpha=4,p=.9

    o o

    o

    o

    o

    o

    ~o 4o 6"o 8o 100 T ime

    Fig. 3. Simulated sample paths of Yeh-Arnold Robertson Pareto (III) processes.

    A second unusua l property of the process involves its remarkab ly well behaved extreme values. To see this define

    T~ = rain X,. (6.4) O

  • Pareto processes 19

    Tn ~ min ei (6.6) i_ t)

    = [1 -4- (~)~1-1{[ 1 +p( t ) ~] / [1 + (t)~ 1 }n, t >_ 0. (6.7)

    From (6.7), the asymptotic behavior of T~ is readily determined, viz.: n(1 -p)i/~Tn/a ~ Weibull (e).

    To determine the distribution of M~ = max0_

  • 20 B. C. Arnold

    and

    M = max Xi (6.13) O

  • Pareto processes 21

    alpha=l ,p=.3

    g.

    g.

    J o 20 40 60 80 100

    Time

    alpha=2,p=.3 g

    g

    g.

    g.

    o ,

    g.

    g.

    g-

    o4

    J 20 40 60 80 100

    Time

    alpha=4,p=.3

    ) 20 40 60 80 100 Time

    alpha=l ,p=.5 g.

    g.

    g.

    o4 20 40 60 80 100

    Time

    a]pha=2,p=.5

    el

    g4 o~ ~1

    0 20 40 60 80 100

    Time

    alpha=4,p=.5 g

    g

    I o"

    20 40 60 80 100

    Time

    ga

    g

    g

    o

    g4

    gt g4

    ~4

    gt

    o"1

    o ,

    g,

    g.

    g.

    o4

    alpha=l ,p=.7

    0 20 40 60 80 100 Time

    alpha=2,p=,7

    20 40 60 80 100 Time

    alpha=4,p=.7

    ) 20 40 60 80 100 Time

    g

    g

    g

    g

    0

    g

    g

    o~,

    g.

    g.

    g~

    gt

    ~7

    alpha=l ,p=.9

    20 40 60 80 100 Time

    alpha=2,p=.9

    o . . . . . .

    0 20 40 60 80 100

    Time

    alpha=4,p=.9

    If !! _

    0 20 40 60 80 100 Time

    Fig. 4. Simulated sample paths of Arnold Pareto (III) processes.

    Consequently, by conditioning on Un, we find

    1 P(X,_I =Xn) = 1 + -P log( l -p )

    P

    _ - - ( lpp) 2 e(Xo_l Xn)=- - - s - - I -p - log( l -p ) ] p -

    (6.16)

    (6.17)

    (6.1s)

  • 22 B. C. Arnold

    In fact for any k,

    P(Xn-1 : Xn = Yn+i . . . . . Yn+k-1)

    = [ l+ l ;P log(1 -p) l ~ (6.19)

    Thusf lat spots in the process can occur. Indeed when X,_I is small, flat spots are quite likely since

    P(X, l = X, . . . . . X,+k-llX,,-1 = Xn-1)

    = [1 + (1 -p) (~)~]k . (6.20)

    In addition, from (6.19) and (6.20), it is evident that values o fp close to 1 are conducive to more frequent occurences of flat spots in the process.

    The simple fluctuation probabilities permit straightforward consistent esti- mation of the parameter p. A variety of methods are available for consistent estimation of a and c~, taking advantage of the stationarity of the process. Any of the techniques described, for example, in Arnold (1983) for estimation based on i.i.d. Pareto (III) (0, a, e) samples can be used to consistently estimate the parameters o-, c~ based on the stationary sequence X0, X~, X2,....

    7. Extensions and modifications

    7.1. Higher order processes

    The four basic processes introduced in Sections 5 and 6 were labelled first order processes. This indicates that the conditional distribution of X,, given the past, depended only on the value of the process one time unit before, i.e., on X, 1. Paralleling the development of normal autoregressive processes of higher orders, we can consider processes analogous to our Pareto processes in which X, depends on k previous values Xn_ 1 ,Xn-2,... ,Xn-k. A kth order version of the Lawrence- Lewis Pareto (I) process would be of the form

    Xn = o-?n with probability p0

    =- Xn 1 ?n with probability pl

    = Xn-k~n with probability Pk (7.1)

    k where Y~j=oPi = 1 and the innovation sequence, en, is chosen to have a common distribution that is selected to ensure that the process is a stationary Pareto (I) (a,a) process. If we consider Yn = log X~, then (7.1) will correspond to a sta-

  • Pareto processes 23

    tionary Pareto (I) (a, e) process if Yn is a stationary kth order exponential process. A good survey of the theory associated with such processes may be found in Lawrence and Lewis (1985). It turns out that an assumption that ?n ~ N(I)(1, e/po) is appropriate to ensure that the process is a stationary one with Xn ~ ~( I ) (a , e), n = 0, 1,2,.. . . More general models of the form

    Xn = a?n with probability p0

    = XC~_l?~ with probability pl

    = X~k_k~, with probability Pk (7.2)

    where Cl, c2, . . . , ck > 0, would be more flexible. However, for k > 2, it is ditticult to determine the appropriate distribution for the innovations {?~}, or indeed to determine whether there exists a distribution for the ~s which will guarantee that (7.2) describes a Pareto (I) (~r, ~) process.

    How should we define a kth order version of the Gaver-Lewis Pareto (I) process? Note from (5.8), that the first order process can be written in the form

    1 pN Xn = Xn 1 '?'n (7.3)

    where ?, = aPe~ ". A natural kth order version of such a process would be defined by

    Xn = j 5n (7.4)

    k where ~j=l PJ < 1 and the common distribution of the ?~s is suitably selected. If we write Yn = log X, where {Xn} satisfies (7.4), then the process {Yn} will have a standard linear autoregressive structure of order k with an exponential stationary distribution for I7,,7. Details for the analysis of such processes may be found in (for example) Brockwell and Davis (1991).

    Higher order versions of the Yeh et al. process were described in Yeh-Shu (1983). They will be of the form (cf. Eq. (6.2)).

    Xn = min{clX~_l, 5~} with probability pl

    = min{c2Xn_2, ~} with probability p2

    = min{ckX~_k, ~} with probability Pk (7.5)

    k where ~j=~pj = 1 and the innovation sequence {?,} is chosen to ensure a sta- tionary Pareto (III) (0, ~r, ~) distribution for X,. As Yeh-Shu (1983) points out, in many cases when k = 2, it is possible to give an explicit description of the form of the required innovation distribution, but when k > 2 it seems difficult to deter- mine the appropriate distribution for ?~.

  • 24 B. C. Arnold

    A kth order version of the Arnold process (cf. (6.14)) would be of the form

    X, = ~ with probability P0

    = min{X~_l, ~} with probability pl

    = min{X~_k, ~} with probability Pk (7.6)

    where ~f~-oPJ = 1 and the innovation sequence {~} is chosen to ensure that Xn ~ ~(Iff i(0, o-,~). Here, too, difficulties are encountered in identifying the needed distribution of ~, when k > 2.

    Analogous moving average and autoregressive-moving average models can also be defined Yeh-Shu discusses such extensions of the Yeh-Arnold-Robertson process. Davis and Resnick (1989) provide material relevant to the study of min- ARMA processes; actually they describe max-ARMA processes of the form

    X~ = max{q51X~_l, ~b2Yn_2 , . . . , ~n-p , gn, O lgn-1 , . . . , Oqen-q} (7.7)

    7.2. Absolutely continuous modifications

    Both the Gaver-Lewis process and the Yeh-Arnold-Robertson process have a singular joint distribution for Xn and Xn_ 1 . It was observed that this feature of the two processes allowed for exact estimation of some of the parameters of the models. Realistically, this seems implausible in practice. In any event, it seems appropriate to investigate modifications of the processes which will avoid this pitfall There are many possible approaches to this problem. Perhaps the simplest one is that suggested at the end of Yeh et al. (1988). They propose that instead of using one fixed value of p (see Eqs. (57) and (6.1)) at each stage of the process, we use a randomly selected value of p. Specifically the absolutely con- tinuous version of the Gaver-Lewis Pareto (I) process begins with { n}n=0, a sequence of i.i.d. ~(I)(a, e) random variables and {B,},~_I a sequence of i.i.d. random variables, independent of the e~s, whose common distribution function G has support (0, 1). We then take X0 = e0 and for n _> 1, given X~_I = xn_l and Bn = b~, define

    bn bn- 1 Xn = a x,_ 1 en with probability b,

    bo b,-1 with probability 1 - b~ (7.8) zG Xn 1

    It follows that, given B~ = b~, X~ will have a ~(I)(o-, ~) distribution and so un- conditionally X~ ~ ~(I)(a, ~) and our process is stationary. Provided that G, the common distribution of the B,s, is absolutely continuous then the joint distri- bution of (X,,_I,Xn) will be absolutely continuous. A convenient one parameter family of absolutely continuous distributions which might be used for G, the common distribution of the Bns above, is the power function distribution family; i.e., distributions of the form

  • Pareto processes 25

    Ga(x)=x a, 0 1, given X,-1 = x,-i and Bn ---- b~, define

    Xn = b21/~X~_l with probability bn

    = min{b21/~X~_l, en} with probability 1 - b, . (7.10)

    It follows that, given B~ = b~, X, will have a Pareto (III) (0, a, e) distribution and, consequently, unconditionally Xn ~ ~(I I I )(0, a, c~). The resulting stationary pro- cess will be such that the joint distribution of (X,-1,X,) is absolutely continuous provided that G, the common distribution of the B,s, is absolutely continuous. Arnold and Robertson (1989) discuss a closely related logistic process in some detail.

    It should be noted that the fluctuation probabilities for the absolutely continuous version of the Gaver-Lewis Pareto (I) process and the Yeh-Arnold- Robertson Pareto (III) process remain relatively simple. We have for the absolutely continuous GLP (I) process

    P(X, 1 < X~) = E(P(X,_I < Xn[Bn))

    =E " (7.11)

    and for the absolutely continuous YARP(III) process

    P(Xn-1 < Xn) = E(P(Xn-1 < XnlBn)

    = [1 +E(Bn)]/2 (7.12)

    Of course, one can make the same modification (replacing p by a random variable Bn) in the Lawrence Lewis Pareto (I) process and the Arnold Pareto (III) process. The corresponding fluctuation probabilities for these modified processes are ob- tainable by considering Eqs. (5.6) and (6.16) (6.18). We need to treat the symbolp appearing in these expressions as a random variable with distribution G and compute the expected values of the right hand sides of the equations, e.g., for the absolutely continuous version of the Lawrence-Lewis process

    E ( 1 ) (7.13) P(Xn-1 < Xn) = ~ "

  • 26 B. C. Arnold

    7.3. Multivariate processes

    All four of the Pareto processes introduced in Sections 5 and 6 admit simple multivariate extensions. The innovation variables become innovation vectors and will be constrained to have suitable stability properties. The k-dimensional Lawrence-Lewis Pareto process begins with a sequence of innovation vectors {-~},=0 which are independent identically distributed with a common k-variate geometric multiplication stable distribution with ~(I)(1, ej) , j = 1,2, . . . , k, mar- ginal distributions (refer to (3.18) which gives the general form of a generating function for such variables). Now for n = 1,2,. . . define, for some p E (0, 1),

    X~ = o-e p with probability p

    = X,_~e p with probability 1 - p (7.14)

    In this defining equation, multiplication of vectors and raising of a vector to a power is understood to be done coordinatewise. Then, provided we take

    X0 z o-_~ 0 ,

    we will have a completely stationary process with a k-variate multiplication stable stationary distribution with ~(I)(ai, c~i) marginals. The marginal process

    X, " oo { n(J)}~=0 for j = 1,2, . . . , k are of course Lawrence-Lewis processes of the kind introduced in Section 5.

    In analogous fashion the k-dimensional Gave~Lewis process can be defined in terms of the same k-variate geometric multiplication stable innovation sequence {_en}~ 0. Define

    X~ = crpxI,-P_e, with probability p

    ~pyl-p with probability 1 - p (7.15) z v x~n_ 1

    (recall again that operations in (7.15) are performed coordinatewise) and set X--0 = -~-~0. This process has marginal processes of the Gaver-Lewis form.

    To construct k-dimensional analogs of the Yeh-Arnol~Robertson and Arnold processes we need an innovation sequence {-~n) which has for the common distribution of the _~s, one which is k-variate geometric-minimum stable (with ~(III)(0, o-j,~z) marginals). Refer to (3.11) for the general form of such distri- butions. Now for n = 1,2,. . . define, for some p E (0, 1),

    X~ =p-1/~-X~_~ with probability p

    = min{p-1/~-Xn_l, ~_~} with probability 1 - p (7.16)

    (where all operations are performed coordinatewise thus, for example, with probability p,X~(j) = p 1/~iX~_l (j)). Then, provided we set X 0 = ~, we will have a k-variate process whose stationary distribution is k-variate geometric minimum stable with Pareto (III) (0, a j, ~i) marginals. Of course the marginal processes are univariate Yeh-Arnold Robertson processes.

  • Pareto processes 27

    The k-variate Arnold process begins with the same innovation sequence {-~,}n~0 as that used in the Yeh-Arnold-Robertson process, but now we define

    X~ = min{X~ 1, (1 -p)-l/-~_5} with probability p

    = (1 - p)-1/~-5_~ with probability 1 - p . (7.17)

    If X0 = -e0 then we have a stationary process whose marginal processes are of the Arnold type.

    In practice, it would be necessary to make some assumptions about the structure of the distribution chosen for _e, in (7.14) and (7.15) or of_~, in (7.16) and (7.17). Some assumption specifying that the distribution of ~ or of ~, is known except for a few parameters would seem to be required since, for example, esti- mation of the structure functions f l (s), f2 (s) , . . . , fk(s) appearing in (3.11) would appear to be infeasible.

    8. Related processes

    There are numerous possible variations beginning with our basic Pareto pro- cesses. In this section, attention will be generally focussed on one-dimensional processes, unless the extension to k-dimensions involves no more than notational adjustment via underlining.

    8.1. Semi-Pareto processes

    In the definitions of the Yeh Arnold-Robertson and Arnold processes the in- novation sequence {en}, was taken to have a Pareto (III) (0, a, ~) distribution. Such a choice of innovation distribution led to a stationary process because the Pareto (III) distribution was geometric minimum stable. However for a fixed choice of p, the process would still be completely stationary if the ens had a semi-Pareto distribution as defined in (2.9). Such semi-Pareto versions of the Yeh-Arnold-Robertson process were introduced by Pillai (1991). Analogous semi-Pareto version of the Arnold process can be constructed and, moreover, k-dimensional versions of those processes can be readily envisioned.

    Fix a particular value o fp E (0, 1). Then begin with a k-dimensional innova- tion sequence {-%}n~0 having a common semi-Pareto (p) distribution as described in Eq. (3.13). Now for a Yeh-Arnold-Robertson semi-Pareto process define, exactly as in (7.16),

    Xn = p-1/~-X, 1 with probability p

    = min{p 1/~-Xn_l,~n} with probability 1 -p . (8.1)

    Provided we set X 0 = _%, we will have a stationary k-variate semi-Pareto process with marginals given by (3.13).

  • 28 B. C. Arnold

    Analogously a k-variate semi-Pareto process of the Arnold type is obtainable using the same innovation sequence as used in (8.1) (i.e., with density (3.13)) and with successive X,s defined as in (7.17).

    The material in Section 6 dealing with extremes of the YARP(III) process continues to hold true if the distribution of the gis is semi-Pareto instead of Pareto (III). Thus if {X~} is as defined in (6.1) but with the e~s being i.i.d, with common survival function

    P(gn > X) = [1 --ep(X)] 1

    where ep(x) = (1/p) ep(pkx) for the specific choice of p used in the definition (6.1), then (Pillai, 1991) the level crossing processes (6.8) are Markovian with transition matrices now given by

    ' I p+ep(t) 1 -p 1 P = [1 + ep(t)] (8.2) ~(1 -p)ep(t) 1 +pep(t) J

    This can be used to obtain the asymptotic distribution of the maximum, M, (as defined in (6.5)). More generally Chrapek et al. (1996) discuss the asymptotic distribution of M~ (~) the kth largest observation in Xo,XI,X2,... ,X~.

    8.2. A Markovian variant of the geometric minimization scheme

    Suppose that { n}n=-oo is a doubly infinite sequence of i.i.d. Bernoulli (1 -p ) random variables. Two natural sequences of geometric random variables can be constructed from these Bernoulli variables. For t = 0, 1,2,. . . define

    N +=1 if and only if U t= l

    and, for i = 2 ,3 , . . . ,

    N + = i if and only if g t = O, Ut+ 1 ~- 0 , . . . , g t+i_ 2 = O, gt+i_ 1 = 1 . (8.3)

    These N+s are geometric (p) random variables with possible values 1,2,.. . . Nt + is basically the waiting time until a success in the Un sequence beginning at time t and going forward. If instead we chose to go backward we may define

    N t = 1 if and only if U t = 1

    and, for i = 2, 3, . . .

    N 7 = i if and only if Ut = 0, . . . , Ut-i+2 = O, Ut-i+l = 1 . (8.4)

    Now, consider a sequence {-%}n~0 of i.i.d, random vectors with common distri- bution of the form (3.11) (i.e., k-variate geometric stable with Pareto (III) (0, aj, c~j) marginals). Assume that the {-,}n=0 and the {Un}n~-~ processes are independent. We can now define a forward innovation process by

  • Pareto processes 29

    Xt=(1-p) -~ rain _et+i_ 1 (8.5) i= 1,2,...,Nt +

    and a backward innovation process by

    Xt=(1-p) --~ rain et_i+ 1 . (8.6) i - l ,2,. . .,N 7

    The two processes (8.5) and (8.6) are time reversals of each other. The backward innovation model (8.6) is essentially identical to the k-variate Arnold process described in (7.17) (they would be identical if the index set of the process (7.17) had been chosen to be n = 0, 4-1, +2, . . . instead of 0, 1,2,...). One advantage of the representations (8.5) and (8.6) is that they readily admit simple extensions in which the i.i.d, sequence of Bernoulli variables used in their definition is replaced by a Markovian sequence.

    The idea was first described in Arnold (1993) in the context of one-dimensional logistic processes but it is readily adapted to the current situation. We will focus on the forward innovation process (8.5), though of course a parallel development could be described for the backward innovation process (8.6). Instead of having {Un} be a sequence of i.i.d. Bernoulli (1 -p ) random variables we assume that {U,} denotes a stationary first order Markov chain with state space {0, 1} and transition matrix

    ( Po 1 -po) (8.7) P = 1 - Pl Pl

    where 0 < pO,pl < 1. The long run distribution of this chain is

    P(Ui = 0) = (1 -p0)/ (2 -P0 -P l ) (8.8) P(Ui = 1) = (1 -p l ) / (Z -p0 -P l )

    Now consider a sequence {e,}n~ 1 of i.i.d, innovations assumed to be independent of the Markov process U0, U1,.... For each integer i, let N(i) denote the number of Us, beginning with the ith U, that must be observed until the first instance in which a U is equal to 1 (i.e., until state 1 is visited). Then define

    X, = rain ~-n+j-1 (8.9) l< j

  • 30 B. C. Arnold

    By conditioning on N(n) we have

    = oo

    Z P(X > xlN(n ) = k)P(N(n) = k) k=l

    oo

    ZIP(e0 > x)]kp(X(n) = k) k=l

    1 -p0 P (x) + (1 -p0)(1 -p l ) 2 - P0 - Pl - - 2 - P0 - Pl (1 -p0~(x) ) "

    (8.11)

    In order to have Fx(x) of the form (3.11), we need to solve the quadratic equation (8.11) for ~(x) (only one of the solutions to (8.11) will be a legitimate survival function). In the one-dimensional case Arnold (1993) shows that the common distribution of the X~s will be Pareto (III) (#, o-, ~) if we take the common dis- tribution of the e~s to be such that

    (8 .12)

    where

    1 -P0 Un - (1 -p0)(1 -P l ) U~ (8.13) Z,=l 2 -p0-p l (2 -p0-p l ) (1-p0U~)

    in which the U~s are i.i.d. ~(0, 1) random variables. In that paper, suggestions are given regarding parameter estimation for the process. Extensions to higher order Markov processes are of course possible. Arnold (1993) describes a one-dimen- sional process involving second order Markovian minimization in some detail.

    8.3. Pareto conditionals processes

    The arguments usually given to justify the use of processes with Pareto marginal distributions admit relatively minor modifications which lead to consideration of processes with Pareto conditionals instead of Pareto marginals. In the realm of normal processes an assumption of normal marginals or one of normal condi- tionals are essentially the same. This is of course not true for other distributions. We will briefly describe Pareto conditionals processes in this sub-section. For simplicity we will focus on one-dimensional processes; higher dimensional pro- cesses can of course be analogously treated.

    Our interest is in stochastic processes {Xn}n~0 which have the property that the conditional distribution of Xn given that X,-1 = xn-1 is of the Pareto (II) form with parameters which are functions ofxn_l. In addition, we will require that the processes be stationary.

    For example, if we require that

    X.IXn-1 = x~-i ~ Pareto (II)(]g(Xn-i), a(Xn-1), O:(Xn 1))

  • Pareto processes 31

    for given functions #(.), ~(.) and ~(.) then we have a first order Markov process which, under quite general conditions, will have a well defined stationary distri- bution. In general, however, it will not be easy to identify this stationary distri- bution. One case in which the long run distribution can be identified is related to the class of bivariate distributions with Pareto (II) conditionals discussed in Section 3. It may be observed that a joint density of the form

    alpha=l, delta=l, gamma=l

    co

    e4

    o

    10 20 30 40 50

    Time

    alpha=2, delta=l, gamma=l

    ,1

    o

    0 10 20 30 40 50

    Time

    alpha=4, delta=l, gamma=l

    co

    0 10 20 30 40 50

    Time

    alpha=l, delta=2, gamma=l

    l Jl II I 10 20 30 40 50

    Time

    alpha=2, delta=2, gamma=l

    0 10 20 30 40 50

    Time

    alpha=4, delta=2, gamma=l

    co

    q,

    " II 10 20 30 40 50

    Time

    alpha=l, delta=5, gamma=2

    co

    10 20 30 40 50

    Time

    alpha=2, delta=& gamma=2

    cq

    o

    0 lo ~o 3'0 ,;o ~'o Time

    alpha=4, delta=5, gamma=2

    0 10 20 30 40 50

    Time

    alpha=l, delta=l, gamma=2

    col

    o

    0 10 20 30 40 50

    Time

    alpha=2, delta=l, gamma=2

    co .

    o

    0 10 20 30 40 50

    Time

    alpha=4, delta=l, gamma=2

    e3.

    04

    o

    0 10 20 30 40 50

    Time

    Fig. 5. Simulated sample paths of Pareto conditionals processes.

  • 32 B. C. Arnold

    fx ,Y(X,y) (X (1 -}- 7X-~ 7y-~- (~xy)-(c~+I)I(x > 0,y > 0) (8.14)

    where ? > 0 and 6 > 0, has all of its conditionals of X given Y = y and of Y given X = x of the Pareto (II) form (cf. Eq. (3.22)). In addition, it has identical marginal distributions for X and Y, namely

    f z (x ) o( (7 + 6x)-1( 1 + ?x) -~ (8.15)

    The corresponding conditional distributions are

    ( 1+ 7x, c~,/ (8.16) YIX = x ~ Pareto (II) 0, 7 - -~ x

    Using these observations we may readily describe a stationary process with Pareto (II) conditionals as follows. Assume that X0 has density (8.15) and that, for each n,

    X~IX,_I =xn-1 ~Pareto (II) 0, _}_(~Xn_ 1 ,0~ .

    Estimation of the parameters of this stationary process can be accomplished using the method of moments (see Arnold et al. (1995) for discussion of the moments of the densities (8.14) and (8.15)). Some simulated sample paths for realizations of the process (8.17) are displayed in Figure 5.

    References

    Arnold, B. C. (1975). Multivariate exponential distributions based on hierarchical successive damage. J. Appl. Probab. 12, 142-147.

    Arnold, B. C. (1983). Pareto Distributions. International Cooperative Publishing House, FaMand, Maryland.

    Arnold, B. C. (1987). Bivariate distributions with Pareto conditionals. Statist. Probab. Lett. 5, 263-266.

    Arnold, B. C. (1989). A logistic process constructed using geometric minimization. Statist. Probab. Lett. 7, 253~57.

    Arnold, B. C. (1990). A flexible family of multivariate Pareto distributions. J. Statist. Plann. Inf 24, 249-258.

    Arnold, B. C. (1993). Logistic processes involving Markovian minimization. Comm. Stat&t. Theo. Meth. 22, 1699-1707.

    Arnold, B. C., E. Castillo and J. M. Sarabia (1992). Conditionally specified distributions. Lecture Notes in Statisties, Vol. 73. Springer Verlag, Berlin.

    Arnold, B. C. and J. T. Hallett (1989). A characterization of the Pareto process among stationary stochastic processes of the form X, = cmin(X~_l, Y,). Statist. Probab. Lett. 377-380.

    Arnold, B. C. and L. Laguna (1976). A stochastic mechanism leading to asymptotically Paretian distributions. Business Econ. Statist. Sect. Proc. Amer. Statist. Assoc. 208~10.

    Arnold, B. C. and C. A. Robertson (1989). Autoregressive logistic processes. Z AppL Probab. 26, 524-531.

    Arnold, B. C., C. A. Robertson and H. C. Yeh (1986). Some properties of a Pareto type distribution. Sankhya Ser. A 48, 404-408.

  • Pareto processes 33

    Arnold, B. C., J. M. Sarabia and E. Castillo (1995). Distribution with conditionals in the Pickands- DeHaan generalized Pareto family. Journal of the Indian Association for Productivity, Quality and Reliability 20, 28-35.

    Block, H. W. (1975). Physical models leading to multivariate exponential and negative binomial distributions. Modeling and Simulation 6, 445-450.

    Brockwell, P. J. and R. A. Davis (1991). Time Series: Theory and Methods, 2nd Edition. Springer Verlag, New York.

    Chrapek, M., J. Dudkiewicz and W. Dziubdziela (1996). On the limit distributions of kth order statistics for semi-Pareto processes. Appl. Math. 24, 189-193.

    Davis, R. A. and S. I. Resnick (1989). Basic properties and prediction of Max-ARMA processes. Adv. Appl. Prob. 21,781-803.

    Gaver, D. P. and P. A. W. Lewis (1980). First order autoregressive gamma sequences and point processes. Adv. Appl. Prob. 12, 72zP745.

    Hutchinson, T. P. and C. D. Lai (1990). Continuous Bivariate Distributions Emphasizing Applications. Rurnsby Scientific Publishing, Adelaide, Australia.

    Lawrence, A. J. and P. A. W. Lewis (1981). A New Autoregressive Time Series Model in Exponential Variables (NEAR(l)). Adv. Appl. Prob. 13, 826 845.

    Lawrence, A. J. and P. A. W. Lewis (1985). Modelling and residual analysis of nonlinear autore- gressive time series in exponential variables. J. R. Statist. Soc. B 47, 162 202.

    Mardia, K. V. (1962). Multivariate Pareto distributions. Ann. Math. Statist. 33, 1008 1015. Mittnik, S. and S. T. Rachev (1991). Alternative multivariate stable distributions and their applica-

    tions to financial modeling. In Stable Processes and Related Topics (Eds. S. Cambanis, G. Sam- orodnitsky and M. S. Taqqu), pp. 107-119. Birkhauser: Boston.

    Pareto, V. (1897). Cours d'economie Politique, Vol. II. F. Rouge, Lausanne. Pillai, R. N. (1991). Semi-Pareto processes. J. Appl. Prob. 28, 461-465. Rachev, S. T. and S. Resnick (1991). Max-geometric infinite divisibility and stability. Technical Report

    No. 108, University of California, Santa Barbara. Resnick, S. I. (1987). Extreme Values, Regular Variation and Point Processes. Springer, New York. Strauss, D. J. (1979). Some results on random utility. J. Math. Psychology 20, 35 52. Yeh, H. C., B. C. Arnold and C. A. Robertson (1988). Pareto processes. J. Appl. Probab. 25, 291-301. Yeh-Shu, H. C. (1983). Pareto processes. Ph.D. dissertation, University of California, Riverside.

  • D. N. Shanbhag and C. R. Rao, eds., Handbook of Statistics, Vol. 19 ~') 2001 Elsevier Science B.V. All rights reserved.

    Branching Processes

    K. B. A threya and A. N. Vidyashankar

    In this survey we give a concise account of the theory of branching processes. We describe the branching process of a single type in discrete time followed by the multitype case. Continuous time branching process of a single type is discussed next followed by branching processes in random environments in discrete time. Finally we deal with branching random walks.

    1. Introduction

    The subject of branching processes is now over half a century old. The problem of survival of family names in British peerage has already been attempted in the last century by Rev. Watson, although the correct solution by Steffenson appeared only in the 1930s. The subject took off in the late 1940s and 50s with the work of Kolmogorov, Yaglom, and Sevastyanov and their students in Russia; and Harris and Bellman in the United States. Harris's authoritative book [38] appeared in 1960 and stimulated much research on the subject. The book by Mode (1971) on multitype branching processes came out in 1969. Then in 1972 the book by Athreya and Ney (1972) was published. Jagers (1975) wrote a book on branching processes with biological applications in mind. On a more abstract level, the book by Asmussen and Hering (1983) came out in 1982. Seneta and Heyde (1977) wrote a scholarly book in the 70s on the early history of branching processes.

    The subject of branching processes has had obvious implications for popula- tion dynamics, but with the development of computer science it has found new applications in areas such as algorithms, data structures, combinatorics, and molecular biology, especially in molecular DNA sequencing. This led to a con- ference titled 'Classical and Modern Branching Processes' at the IMA, Minne- apolis where new developments were surveyed and open problems identified. The proceedings of the conference were published in 1997 (Athreya).

    Also in the mid 1980s Dynkin, building up on the earlier work of Fisher and Feller on population genetics and that of the Japanese school of Watanabe, Ikeda and Nagasawa on branching Markov processes, introduced the notion of super

    35

  • 36 K. B. Athreya and A. N. Vidyashankar

    processes (with deep connections to the theory of partial differential equations) which arose as scaled limits of branching processes that allowed random move- ment of particles. This has become a major area of contemporary research in probability theory. (see Dawson (1991)).

    Thus the area of branching processes is alive and well. New applications continue to be discovered and, in turn, inspire new questions for the subject.

    The goal of the present article is to give a quick and succinct introduction to this exciting area of research. The literature is vast and one has had to make a selection of topics. What is presented here does reflect the authors's interests and preferences. Apart from the books mentioned earlier, we must refer to the work of the Swedish school, led by Jagers, on general branching processes with greater level of dependencies. For an account of this, see Jagers (1991) and the references therein. We also have not dealt with the problems of statistical inference in branching processes. Apart from the book of Guttorp (1991), the work of Dion and Essebbar (1993) with its extensive bibliography is very helpful. We end this introduction with an outline of the rest of the paper.

    The next section deals with the so-called simple branching process of single type. This is followed by the multitype case. Continuous time branching process of single type is discussed next followed by branching processes in random en- vironments. The final section deals with branching random walks.

    2. Branching processes: Single type case

    Let {pj : j = 0, 1,2,...} be a probability distribution and {{hi : i = 1,2, . . . , n = 0,1,2,. . .} be independent random variables with a common distribution P({11 = j) = Pj, J = 0, 1,2,.... For any nonnegative integer k, let Z0 = k,

    k Z1

    i=1 i=1

    and recursively define z,

    Zn+,=Z~, , i for n :0 ,1 ,2 , . . . . (2.1) i--1

    The sequence {Zn}~ is called a branching process with initial population size k and offspring distribution {pj}. The random variable Z~ is to be thought of as the population size of the nth generation and the recursive relation (1) says that Z~+1 is the sum of the offspring of all the individuals of the nth generation. The independence of the offspring sizes 4~,i among themselves and of the past history of the process renders {Zn}~ a Markov chain with state space N + : {0, 1,2,...} and transition probability matrix

    pkj = P ~l,i = J (2.2)

  • Branching processes 37

    where for k = 0 the sum ~ is defined as zero. F rom now on we rule out the deterministic case when pj = I for some j. It is clear that if Z0 = 0 then Zn = 0 for all n > 1, i.e., 0 in an absorbing barrier. The event {Zn = 0 for some n > 0} is called the extinction event. By considering the cases P0 = 0 and p0 > 0 it is easy to conclude that for any initial condition Z0, the events of extinction or explosion, i.e., the event {Zn ---, ec as n ~ oe} are the only two possibilities with positive probabil ity. That is, the populat ion size Zn does not fluctuate for ever. Two natural questions are: what is the probabil ity of extinction and at what rate does Z~ go to ee on the event of explosion? Since the only data for the problem is the offspring distribution {pj}, we seek answers to the above two questions in terms of {pj}. Let

    q=P(Zn=O for some n>_ l lZ0=l ) (2.3)

    It is easy to see that (2.1) implies the key property that the lines of descent of distinct individuals are independent. Thus, for k >_ 0,

    P(Z~ = 0 for some n _> l lZ0 = k) = qk (2.4)

    and hence

    where

    OO

    q: P(Z.:o k=0

    O@

    : ZP(Z~ : 0 k=0

    0 1,Zl : kIZo = 1)

    for some n > 1121 = k)P(Zl = klZ 0 = 1)

    f (s)=_Zpjs j, 0 1 are referred to respectively as subcritieal, critical and supercritical cases. Thus, in the first two cases, the populat ion dies out with probabi l i ty one and in the last case there is positive probabi l i ty that the

  • 38 K. B. Athreya and A. N. Vidyashankar

    population size goes off to infinity as n increases. One qualitative difference be- tween the subcritical and critical cases is that if T -= min{n : n > 1, Zn = 0} is the extinction time, then the mean value of T, i.e., ET is always finite for m < 1 and can be infinite for m -- 1 even though P(T < ec) = 1 in both cases. (see Seneta (1967) for details). Other differences will become clear later(see for instance Theorem 5 and Theorem 7).

    We now describe the fundamental limit theorems associated with supercritical branching processes. The first limit theorem describes the behavior of the branching process in the supercritical case. (Kesten and Stigum (1966) and Athreya (1971).)

    TnEORE~ 2. The sequence

    W,, = Zn/m" (2.7)

    is a nonnegative martingale and hence converges with probability 1 (w.p.1) to a limit W. Further,

    (i) P(W = 0[Z0 = 1) is one or q according as

    Oo

    ~j(logj)pj=oc or 1} and {Z~ : n _> 1} are two independent copies of branching processes, then z~lztn converges to a random variable W and if m < eo then P(0 < W < oo) = 1; however, if m = oc then P(W = 0) and P(W = oo) are both positive. This says that in the infinite mean case, a branching process initiated by distinct ancestors could have different growth rates with positive probability. For more complete results consult the works of Grey (1979, 1980), and Schuh and Barbour (1977).

    It follows from the above theorem that, under a finite mean assumption on the offspring distribution function, Z2IZn+I converges to m(< ec) as n --~ oc w.p. 1 on

    .rz.+~ 1 } is given the set of explosion. A central limit theorem for the sequence LW;-., n _> below (see Athreya (1968) and Heyde (1971)).

  • Branching processes 39

    THEOREM 3. Assume P0 = 0 and E(Z 2+~) < oo. Set 0-2 -z E(Z 2) - m 2. Then

    where N(0, 0-2) is a normal random variable with mean 0 and variance 0-2. [] A law of iterated logarithm associated with the above convergence has been

    established by Heyde (1971) and some large deviation results are contained in Athreya and Vidyashankar (1993), and Athreya (1994).

    THEOREM 4. Assume P0 = 0, Pl # 0, and E(Z 2r+~) < oc for some r _> 1 and 6 > 0 and mrpl > 1. Then

    1 / Z~+I "~ l im- -P / l i -m >e j =C(e)=ZP( IXk - -mI>e)qk .-+oop~ \l z. ~>~

    where )Tk = ~2~_~Xj where Xjs are i.i.d, as Zl and qk = lim,--+ooP(z~ -k). Fur- thermore, the limit C(e) is a finite positive constant. []

    A number of related large deviation results concerning the rates of convergence of the martingale W~ to W and other refinements of Theorem 4 can be found in Athreya (1994).

    We now move on to describe the critical branching processes. The first result in this direction describes the behavior of the process conditioned on non-extinction.

    THEOREM 5. Let m = 1 and 0 -2 z ~o j 2p j __ 1 be finite. For any initial Z0 7 { 0, and 0 < x < oo

    lim p(Zn < 0-2 ) ,-~oo \n - -2 xlZn >0 = 1 - -e -x . (2.9) []

    Thus, given that the population is not extinct at time n, its size Zn behaves like n times an exponential random variable with mean 20- -2 . It follows from the above theorem that the sequence tz,'rz"+l :n_> 1 } conditioned on non-extinction, converges to 1 in probability as n -+ oc. The large deviations associated with this conver- gence have been considered by Athreya and Vidyashankar (1997) and is the content of our next theorem concerning critical branching processes.

    THEOREM 6. Let m = 1 and assume that E(Z 2r+~) < oc for some r > 1 and 6 > 0. Then

    lim nP( Zn+l _ 1 > eIZn > 0) = q(e) (2.10)

    where 0 < q(e) < co. []

    Theorems 4 and 6 show that the large deviation decay rate in the supercritical case is geometric while in the critical case it is only algebraic. This difference in the

  • 40 K. B. Athreya and A. N. Vidyashankar

    rates can be attributed to the behavior of the generating functions [see Athreya and Vidyashankar (1997)].

    We now move on to the subcritical case. In this case we know from Theorem 1 that Z~ ---+ 0 as n --+ ec. The next theorem describes the behavior of Z~ when one conditions on Z,, > 0 and brings out yet another distinction between subcritical and critical cases.

    THEOREM 7. Let m < 1. Then for any initial Z0 ~ 0,

    limP(Z, =jIZ~ > 0) = 7rj (2.l l)

    exists for all j _> 1 and ~z j = 1; furthermore, ~j_>lj~j < oe if and only if EZ1 logZ1 < cx~. []

    The above results describe the probabilistic behavior of the process.The sta- tistical problem of estimating the mean m and other parameters of a supercritical branching process has received considerable attention in the literature; for a classical treatment of the problem refer to the book of Guttorp (1991). More recently, Wei and Winnicki (1990), Winnicki (1991), Sriram et al. (1991), and Datta and Sriram (1995) have investigated the estimation of m from a branching process with immigration. Applications in the context of polymerase chain reaction have been investigated by Sun and Waterman (1997).

    3. Branching process: Multitype case

    Consider now a population with k types of individuals. Assume a type i individual produces children of all types according to a probability distribution {pi(j) : j= (j l , j2,.. . , jk), jr=O, 1,2,..., r= 1,2, . . . ,k}. Assume as before all individuals produce offspring independently of each other and of the past history of the process. Let Z~i be the number of type i individuals in the nth generation. Then the vector Zn = (Z~I, Z,2, . . . , Znk) of population sizes in the nth generation evolves by the recursive relation

    k z~i

    Zn+l = Z Z o,,rg(i) (3.1) i=1 r=l

    where {~(~)~: r = 1 ,2 , . . . ,n = 0, 1 ,2 , . . . , i= 1,2, . . . ,k} are independent random vectors wi'th g(i) having distribution pi(.). This renders {Z~}~ a Markov chain ~n~r with state space (N+) k, the k dimensional nonnegative integer lattice. The state (0, 0 , . . . , 0) is an absorbing state and so if the distributions Pi(.) are not degen- erate then the population either dies out or explodes as n --+ oe.

    As before, (3.1) yields the independence of lines of descent. The key role of the mean m in the single type case is played by the maximal eigenvalue of the mean matrix M -- ((mij)) where

    mij = E(Zl j IZo =e l ) 1 ~ i, j ~ k , (3.2)

  • Branching processes 41

    ez being the unit vector in the ith direction, i.e. ez = (0, 0 , . . . , 1 , . . . , 0), 1 at the ith place and 0 elsewhere.

    We assume that the process {Zn}~ is irreducible in the sense that for each ( i , j ) , there exists nij such that m (n~:) > 0 where ml~ ) is the (i,j)th element of the rth power of M. By the Perron-Frobenius theorem it follows that M has a maximal eigen- value p that a) is strictly positive, b) is of algebraic and geometric multiplicity one, c) admits strictly positive left and right eigenvectors u and v normalized such that

    u- l= l , u .v=l , Mu=pu, v~M=pv ~ (3.3)

    where, ' stands for transpose and stands for the dot product and 1 is the vector with all components equal to one and d) is such that all other eigenvalues 2 of M satisfy ]21 < p. Let

    q i=P(Zn=O for some n>_ l IZ0=ei ) (3.4)

    be the extinction probability starting with one ancestor of type i and for s = (s l , s2 , . . . , sk )

    fz(s) = E(s zl/Z0 = ei) (3.5)

    Zlj where 0 _< sj _< 1 for j = 1 ,2 , . . . , k and s z~ = l~ ls j and

    f(s) = (f l(s),f2(s),. . . ,J~(s)) . (3.6)

    Then, as in the one type case, it can be shown that q = (ql,q2,... ,qk) is the solution of the equation

    q = f(q) (3.7)

    that is smallest in the sense that if q~ = (q'~, @. . . , q~) is another solution to (2.7) with 0 _< q~ _< 1 then qz _< ql for all 1 < i < k. The next result is an analogue of Theorem 1 for the multitype case.

    THEOREM 8. Let {Zn}~ be a k type irreducible Galton-Watson branching pro- cess with mean matrix M = ((mzj)). Then qi < 1 for all i = 1,2, . . . , k if and only if p > 1 where p is the Perron-Frobenius eigenvalue of M. []

    The cases p < 1, p = 1, p > 1 are referred to respectively as suberitieal, eritieal and supereritieal. The next theorem describes the behavior of the process in the supercritical case (Kesten and Stigum (1966), Athreya (1970), Athreya and Ney (1970), Athreya and Ney (1972)).

    THEOREM 9. Let u and v be as in (2.3) and p > 1. Then

    U- Z n _= (3.8) pn

    is a nonnegative martingale and hence converges w.p. 1. to a limit W. Further,

    P(W = O[Z0 = el) = qi for all i

  • 42 K. B. Athreya and A. N. Vidyashankar

    if and only if

    E(ZljlogZljlZo = el) < ec for all i , j (3.9)

    in which case

    E(WIZ0 = ei) = vi for all i

    and W has an absolutely continuous distribution with a continuous strictly positive density on (0, co). []

    As in single type case even if (3.9) fails to hold, there always exist Seneta constants cn such that u.zn converges to a nontrivial limit W and Cn+l/Cn

    Cn converges to p and z, converges to v on the set of non-extinction. This result ,.-ZT,, was established by Hoppe (1976). Thus the relative proportions of the various types stabilizes to a deterministic distribution and the growth rates of all types are identical to the exponential rate of p".

    In the multitype supercritical case the population vector Z~ is such tha