672
llII(IIl1!j!ItIIIj!! Contents Foreword by David Hendry Preface Acknowledgements List of symbols and abbreviations xi XV Part I lntroduction 1 Econometric modelling, a preliminary view 1.1 Econometries - a brief historical overview 1.2 Econometric modelling - a sketch of a methodology Looking ahead 2 DescriptiYe study of data 2.1 Histograms and their numerical characteristics 2.2 Frequency curves 2.3 Looking ahead 15 22 23 23 27 29 Part 11 Probability theory 3 Probability 3.1 The notion of probability 3.2 The axiomatic approach 3.3 Conditional probability 33 34 37 43

[Spanos] Statistical Foundations of Econometric Modelling

Embed Size (px)

DESCRIPTION

Statistical book

Citation preview

  • llII(IIl1!j!ItIIIj!!!j(I!IlIlIl1lIlllI1l

    Contents

    Foreword by David HendryPrefaceAcknowledgementsList of symbols and abbreviations

    xiXV

    Part I lntroduction

    1 Econometric modelling, a preliminary view1.1 Econometries - a brief historical overview1.2 Econometric modelling - a sketch of a

    methodologyLooking ahead

    2 DescriptiYe study of data2.1 Histograms and their numerical characteristics2.2 Frequency curves2.3 Looking ahead

    1522

    23232729

    Part 11 Probability theory

    3 Probability3.1 The notion of probability3.2 The axiomatic approach3.3 Conditional probability

    33343743

  • Contents

    ,' 4 RandomYariables and probability distributions

    4.1 The concept of a random variable4.2 The distribution and density functions4.3 The notion of a probability model4.4 Some univariate distributions4.5 Numerical characteristics of random variables

    5 Random vtors and their distributions5.1 Joint distribution and density functions5.2 Some bivariate distributions5.3 Marginal distributions5.4 Conditional distributions

    6 Functions of random Yariables 966.1 Functions of one random variable 966.2* Functions of several random variables 996.3 Functions of normally distributed random variables,

    a summary 108Looking ahead 109Appendix 6.1 - The normal

    and relateddistributions 110

    474855606268

    7878838589

    7 The general notion of expectation7.1 Expectation of a function of random variables7.2 Conditional expectation7.3 Looking ahead

    8* Stochastic processes8.1 The concept of a stochastic process8.2 Restricting the time-heterogeneity of a stochastic

    process8.3 Restricting the memory of a stochastic process8.4 Some special stochastic processes8.5 Summary

    9 Limit theorems9.1 The early limit theoremsi?.

    2 The 1aw of la rge numbers

  • Contents

    10* lntroduction to asymptotic theory10.1 lntroduction10.2 Modes of convergence10.3 Convergence of moments10.4 The tbig 0' and little o' notation10.5 Extending the limit theorems10.6 Error bounds and asymptotic expansions

    Statistical inference

    11 The nature of statistical inference11.1 lntroduction11.2 The sampling model11.3 The frequency approach11.4 An overview of statistical inference11.5 Statistics and their distributions

    Appendix 11.1 -. The empirical distributionfunction

    12 Estimation I - properties of estimators12.1 Finite sample properties12.2 Asymptotic properties12.3 Predictors and their properties

    Part III

    183183185192194198202

    213213215219221223

    228

    231232244247

    252253256257

    285285290296299303306

    13 Estimation 11 - methods13.1 The method of least-squares13.2 The method of moments13.3 The maximum likelihood method

    14 Hypothesis testing and confidence regions14.1 Testing, definitions and concepts14.2 Optimal tests14.3 Constructing optimal tests14.4 The likelihood ratio test procedure14.5 Confidence estimation14.6 Prediction

    15* The multivariate normal distribution 3 1215.1 Multivariate distributions 31215.2 The multivariate normal distribution 31515.3 Quadraticforms related to the normal distribution 319

  • Contents

    15.4 Estimation15.5 Hypothesis testing and confidence regions

    16* Asymptotic test procedures16.1 Asymptotic properties16.2 The likelihood ratio and related test procedures

    320323

    326326328

    Part IV The Iinear regression and related statisticalmodels

    17 Statistical models in onometrics17.1 Simple statistical models17.2 Economic data and the sampling model17.3 Economic data and the probability model17.4 The statistical generating mechanism17.5 Looking ahead

    Appendix 17.1 - Data

    18 The Gauss Iinear model18.1 Specification18.2 Estimation18.3 Hypothesis testing and confidence intervals18.4 Experimental design18.5 Looking ahead

    The Iinear regression model I - spification,estimation and testing 369

    l9. 1 lntroduction 36919.2 Specification 37019.3 Discussion of the assumptions 37519.4 Estimation 37819.5 Specification testing 39219.6 Prediction 40219.7 The residuals 40519.8 Summary and conclusion 408

    Appendix 19.1 - A note on measurement systems 409

    20 The Iinear regression model 11 - departures fromthe assumptions underlying the statistical GM 412The stochastic linear regression model 413The statistical parameters of interest 418

    357357359363366367

    339339342346349352355

  • Contents

    Weak exogeneityRestrictions on the statistical parameters ofinterest

    20.5 Collinearity20.6 S'Near' collinearity

    The Iinear regression model III - departures fromthe assumptions underlying the probability model 443

    2 1.1 Misspecification testing and auxiliary regressions 4432 1.2 Normality 4472 1.3 Linearity 4572 1.4 Homoskedasticity 4632 1.5 Parameter time invariance 4722 1.6 Parameter structural change 48 1

    Appendix 21.1 - Variance stabilisingtransformations

    422432434

    22 The linear regression model IV - departures fromthe sampling model assumption 493

    22.1 Implications of a non-random sample 49422.2 Tackling temporal dependence 50322.3 Testing the independent sample assumption 51122.4 Looking back 521

    Appendix 22. 1 - Deriving the conditionalexpectation 523

    23 The dynamic Iinear regression model23.1 Specification23.2 Estimation23.3 Misspecification testing23.4 Specification testing23.5 Prediction23.6 Looking back

    24 The multiYariate linear regression model24.1 lntroduction24.2 Specification and estimation24.3 A priori information24.4 The Zellner and Malinvaud formulations24.5 Specification testing24.6 Misspecification testing

    526527533539548562567

    571571574579585589596

  • Contents

    PredictionThe multivariate dynamic linear regression(MDLR) model 599Appendix 24. 1 .. The Wishart distribution 6O2Appendix 24.2 - Kronecker products and matrixdifferentiation 603

    25 The simultaneous equations model25.1 Introduction25.2 The multivariate linear regression and

    simultaneous equations modelsldentification using linear homogeneousrestrictions

    25.4 Specification25.5 Maximum likelihood estimation25.6 Least-squares estimation25.7 lnstrumental variables25.8 Misspecification testing25.9 Specification testing25.10 Prediction

    26 Epilogue: towards a methodology of econometricmodelling

    26.1 A methodologist's critical eye26.2 Econometric modelling, formalising a

    methodology26.3 Condusion

    References

    lndex

    608608

    610

    614619621626637644649654

    659659

    673

    689

    * Starred chapters and or sections are typically more difficult andmight be avoided at first reading.

  • PA R T I

    Introduction

  • Econometric modelling, a preliminary view

    1.1 Econometrics - a brief historical overview

    lt is customary to begin a textbook by defining its subject matter. ln thiscase this brings us immediately up against the problem of defining'econometrics'. Such a definition, however, raises some very difficultmethodological issues which could not be discussed at this stage. Theepilogue might be a better place to give a proper definition. For thepurposes of the discussion which follows it suffices to use a workingdefinition which provides only broad guide-posts of its intended scope:

    Econometrics is concerned willl the slaremarfc study t#' economicphenomenalsfrlg observed data.

    This definition is much broader than certain textbook definitionsnarrowing the subject matter of econometrics to the

    kmeasurement' oftheoretical relationships as suggested by economic theory. lt is argued inthe epilogue that the latter definition of econometrics constitutes a relic ofan outdated methodology, that of the logical positivism (seeCaldwell(1982:. The methodological position underlying the denition given aboveis largely hidden behind the word tsystematic'. The term systematic is usedto describe the use of observed data in a framework where economic theoryas well as statistical inference play an important role, as yet undetined. Theuse of observed data is what distinguishes econometrics from other forms ofstudying economic phenomena.

    Econometrics, detined as the study of the economy using observed data,can be traced as far back as 1676, predating economics as a separatediscipline by a century. Sir William Petty could be credited with the first

  • A preliminary view

    'systematic' attempt to study economic phenomena using data in hisPolitical Arithmetik. Systematic in this case is used relative to the state of theart in statistics and economics of the time.

    Petty (1676) used the pioneering results in descriptive statistics developedby his friend John Graunt and certain rudimentary forms of economictheorising to produce the first

    'systematic'

    attempt in studying economicphenomena using data. Petty might also be credited as the first to submit toa most serious temptation in econometric modelling. According to Hull,one of his main biographers and collector of his works:

    Petty sometimes appears to be seeking figures that will support aconclusion he has already reached: Graunt uses his numerical data as abasis for conclusions. declining to go beyond them.

    (See Hull (1899), p. xxv.)

    Econometrics, since Petty's time, has developed alongside statistics andeconomic theory borrowing and lending to both subjects. In order tounderstand the development of econometrics we need to relate it todevelopments in these subjects.

    Graunt and Petty initiated three important developments in statistics:(i) the systematic collection of (numerical)data;(ii) the mathematical theory (#' probabilitl' related to life-tables; and(iii) the development of what we nowadays call descriptive statistics

    (see Chapter 2) into a coherent set of techniques for analysingnumerical data.

    It was rather unfortunate that the last two lines of thought developedlargely independent of each other for the next two centuries. Their slowconvergence during the second half of the nineteenth and early twentiethcenturies in the hands of Galton, Edgeworth, Pearson and Yule, inter alia,culminated with the Fisher paradigm which was to dominate statisticaltheory to this day.

    The development of the calculus of probability emanating from Graunt'swork began with Halley (1656-1742jand continued with De Moivre(1667-1754q, Daniel Bernoulli (1700-824,Bayes 41702-61q, Lagrange(1736- 1813j, Laplace g1749-1827j, Legendre g1752-1833q, Gauss (1789-1857) inter alia. ln the hands of De Moivre the main line of the calculus ofprobability emanating from Jacob Bernoulli (1654-17054 was joined upwith Halley's life tables to begin a remarkable development of probabilitytheory (seeHacking (1975), Maistrov (1974)).

    The most important of these developments can be summarised under thefollowing headings:(i) manipulation of probabilities (addition,multiplicationl;

  • 1.1 A brief historical oveniew

    families of distribution functions (normal, binomial, Poisson,exponentiall;

    (iii) 1aw of error, least-squares. least-absolute errors',(iv) limit theorems (law of large numbers, central limit theoreml;(v) life-table probabilities and annuities;(vi) higher order approximations;(vii) probability generating functions.Some of these topics will be considered in some detail in Part 11 becausethey form the foundation of statistical inference.

    The tradition in Political Arithmetik originated by Petty was continuedby Gregory King (1656-17141. Davenant might be credited withpublishing the rst tempirical' demand schedule (seeStigler (1954:, drawingfreely from King's unpublished work. For this reason his empirical demandfor wheat schedule is credited to King and it has become known as tKing'slaw'. Using King's data on the change in the price (pf) associated with agiven change in quantity (fyt)Yule (1915) derived the empirical equationexplicitly as

    p, = - 2.33:, +0.05:,2- 0.0017gJ.

    Apart from this demand schedule, King and Davenant extended the line ofthought related to the population and death rates in various directions thusestablishing a tradition in Political

    u4/'lnltall'/v', 'the

    art of reasoning byfigures upon things, relating to government'. Political Arithmetik was tostagnate for almost a century without any major developments in thedescriptive study of data apart from grouping and calculation of tendencies.

    From the economic theory viewpoint Political Arithmetil played animportant role in classical economics where numerical data on moneystock, prices, wages, public finance, exports and imports were extensivelyused as important tools in their various controversies. The best example ofthe tradition established by Graunt and Petty is provided by Malthus''Essay on the Principles of Population'. ln the bullionist and currency-banking schools controversies numerical data played an important role (seeSchumpeter (1954)). During the same period the calculation of indexnumbers made its lirst appearance.

    With the establishment of the statistical society in 1834 began a morecoordinated activity in most European countries for more reliable andcomplete data. During the period 1850-90 a sequence of statisticalcongresses established a common tradition for collecting and publishingdata on many economic and social variables making very rapid progress onthis front. ln relation to statistical techniques, however, the progress wasmuch slower. Measures of central tendency (arithmetic mean, median,

  • A preliminary view

    mode, geometric mean) and graphical techniques were developed.Measures of dispersion (standard deviation, interquartile range),correlation and relative frequencies made their appearance towards the endof this period. A leading figure of this period who can be considered as oneof the few people to straddle all three lines of development emanating fromGraunt and Petty was the Belgian statistician Quetelet(1796-1874j.

    From the econometric viewpoint the most notable example of empiricalmodelling was Engel's study of family budgets. The most important of hisconclusions was:

    The poorer a family, the greater the proportion of its total expenditurethat must be devoted to the provision of food.

    (Quotedby Stigler (1954).)This has become known as Engel's law and the relationship betweenconsumption and income as Engel's curve (see Deaton and Muellbauer(1980)).

    The most important contribution during this period (early nineteenthcentury) from the modelling viewpoint was what we nowadays call theGauss linear model. In modern notation this model takes the followinggeneral form:

    y =X# +u,

    where y is a F x 1 vector of observations linearly related to the unknownk x 1 vector p via a known fixed F x k matrix X (ranktx)= k, F> k) butsubject to (observation)error u. This formulation was used to model asituation such that:

    it is suspected that for settings xl , xc, . . . , xk there is a value y related by alinear relation:

    k).,= jj; pix;,

    i = 1

    where puzzpi) is unknown. A number T of observations on y can be made,

    corresponding to F different sets of (x: , . . . , xkl,i.e. we obtain a data set (yf,xf : , xta, . . . , xfk), r = 1s2, . . . , 'Cbut the readings .p on y, are subject to error.

    (See Heyde and Seneta (1977).)The problem as seen at the time was one of interpolation (approximation),that is, to

    Gapproximate' the value of #.The solution proposed came in theform of the least-squares approximation of p based on the minimisation of

    u'u=ty-x#)'(y

    -X#),

    whichlead to/= (X'X)--1X'y (1.4)

  • A brief historical oveniew

    (see Seal (1967:. The problem, as well as the solution, had nothing to dowith probability theory as such. The probabilistic arguments entered theproblem as an afterthought in the attempts of Gauss and Laplace to justifythe method of least-squares. lf the error terms lk1, t = 1,2, . . . , T are assumedto be independent and identically distributed (IlD) according to the normaldistribution, i.e.

    ld,x..::(.0, c2), t = 1, 2, . . . , T) (1.5)

    then j- in (4) can be justifiedaskthe optimal solution' from a probabilistic

    viewpoint (seeHeyde and Seneta (1977),Seal (1967), Maistrov (1974)).The Gauss linear model was later given a very different interpretation in

    the context of probability theory by Galton, Pearson, and Yule, which gaverise to what is nowadays called the linear regression model. The model givenin (2)is now interpreted as based wholly on probabilistic arguments, y, andXf are assumed to bejointly normally distributed random variables and X/is viewed as the conditional expectations of pf given that X! = xf (Xr takes thevalue xp) for r = 1, 2, . . . , Tt.i.e.

    .E()rry/Xt= xf ) = #'xt, t = 1, 2, . . . , 'C (1.6)with the error term u, defined by lz, = yf - F(y/ Xr = xf) (seeChapter 19 forfurther details). The linear regression model

    .pf= .Et-p,Xf = xl) + u,, t = 1, 2, . . . , T

    can be written in matrix form as in (2) and the two models becomeindistinguishable in terms of notation. From the modelling viewpoint,however, the two models are very different. The Gauss linear modeldescribes a

    'law-like' relationship where the xffs are known constants. Onthe other hand, the linear regression model refers to a

    'predictive-like'

    relationship where yf is related to the observed values of the random vectorXl (forfurther discussion see Chapter 19). This important difference wentlargely unnoticed by Galton, Pearson and the early twentieth-centuryapplied econometricians. Galton in particular used the linear regressionmodel to establish

    'law-like' causal relationships in support of his theoriesof heredity in the then newly established discipline of eugenics.

    The Gauss linear model was initially developed by astronomers in theirattempt to determine

    'law-like' relationships for planetary orbits, using alarge number of observations with less than totally accurate instruments.The nature of their problem was such as to enable them to assume that theirtheories could account for all the information in the data apart from avhfrc-ntpfsp(see Chapter 8) error term uf. The situation being modelledresembles an

    'experimental design' situation because of the relativeconstancy of the phenomena in question with nature playing the role of the

  • 8 A preliminary view

    experimenter. Later, Fisher extended the applicability of the Gauss linearmodel to

    'experimental-like' phenomena using the idea of randomisation(see Fisher (1958:. Similarly, the linear regression model. firmly based onthe idea of conditional expectation, was later extended by Pearson to thecase of stochastic regressors (seeSeal (1967)).

    ln the context of the Gauss linear and linear regression models theconvergence of descriptive statistics and the calculus of probability becamea reality, with Galton (1822-1911.q, Edgeworth g1815- 1926q, Pearsong1857-19361 and Yule g1871-195 1q being the muin protagonists. ln thehands of Fisher (1890-1962jthe convergence was completed and a newmodelling paradigm was proposed. One of the most importantcontributing factors to these developments in the early twentieth centurywas the availability of more complete and reliable data towards the end ofthe nineteenth century. Another important development contributing tothe convergence of the descriptive study of data and the calculus ofprobability came in the form of Pearson's family of frequency curves whichprovided the basis for the transition from histograms to probability densityfunctions (seeChapter 2). Moreover, the various concepts and techniquesdeveloped in descriptive statistics were to be reinterpreted and provide thebasis for the probability theory framework. The frequency curves as used indescriptive statistics provide convenient 'models' for the observed data athand. On the other hand, probability density functions were postulated asimodels' of the population giving rise to the data with the latter viewed as arepresentative sample from the former. The change from the descripti,estatistics to the probability theor approach in statistical modelling wentalmost unnoticed until the mid-1930s when the latter approach formalisedby Fisher dominated the scene.

    During the period 1890-1920 the distinction between the populationfrom where the observed data constitute a sample and the sample itself wasblurred by the applied statisticians. This was mainly because the paradigmtacitly used, as formulated by Pearson, was firmly rooted in the descriptivestatistics tradition where the modelling proceeds from the observed data inhand to the frequency (probablity)model and no distinction between thepopulation and the sample is needed. In a sense the population consists ofthe data in hand. In the context of the Fisher paradigm, however, themodelling of a probability model is postulated as a generalised descriptionof the actual data generation prpcess (DGP), or the population and theobserved data are viewed as a realisation of a sample from the process. Thetransition from the Pearson to the Fisher paradigm was rather slow andwent largely unnoticed even by the protagonists. ln the exchanges betweenFisher and Pearson about the superiority of the maximum likelihoodestimation over the method of moments on efficiency grounds, Pearson

  • 1.1 A brief historical oveniew 9

    never pointed out that his method of moments was developed for a differentstatistical paradigm where the probability model is not postulated a priori(see Chapter 13). The distinction between the population and the samplewas initially raised during the last decade of the nineteenth century andearly twentieth century in relation to biqher t?l'J:'r approximations of thecentral limit theorem (CLT) results emanating from Bernouli, De Moivreand Laplace. These limit theorems were sharpened considerably bythe Russian school (Chebyshev (1821-941, Liapounov (1857-1922q,Markov g1856-1922q, Kolmogorov g1903- 1(seeMaistov (1974)) andused extensively during this period. Edgeworth and Charlier, amongothers, proposed asymptotic expansions which could be used to improve theapproximation offered by the CLT for a given sample size T (seeCramer(1972)). The development of a formal distribution theory based on a fixedsample size 'T)however, began with Gosset's (Studenfs) t and Fisher's F.distributions (see Kendal and Stuart (1969)). These results provided thebasis of modern statistical theory based on the Fisher paradigm. Thetransition from the Pearson to the Fisher paradigm became apparent in the1930swhen the theory of estimation and testing as we know it today wasformulated. lt was also the time when probability theory itself was given itsaxiomatic foundations by Kolmogorov (1933) and tirmly established aspart of mathematics proper. By the late 1930s probability theory as well asstatistical inference as we know them today were firmly established.

    The Gauss linear and linear regression models were appropriate formodelling essentially static phenomena. Yule (1926) discussed thedifficulties raised when time series data are used in the context of the linearregression model and gave an insightful discussion of

    'non-sense

    regressions' (seeHendry and Morgan (1986)). ln an attempt to circumventthese problems, Yule (1927) proposed the linear autorenressive model(AR(m)) where the xffs are replaced by the lagged ).fS, i.e.

    An alternative model for time-series data was suggested by Slutsky (1927) inhis discussion of the dangers in ksmoothing' such data using weightedaveraging. He showed that by weighted averaging of a white-noise processutcan produce a data series with periodicities. Hence, somebody looking forcyclic behaviour can be easily fooled when the data series have beensmoothed. His discussion gave rise to the other important family of time-series models, subsequently called the moinq tkptalv/gcp m()(leI(MA(p)):

  • A preliminary view

    Wold (1938) provided the foundations for time series modelling by relatinpthe above models to the mathematical theory of probability establshed bjKolmogorov (1933). These developments in time series modelling were tohave only a marginal effect on mainstream econometric modelling until themid-7os when a slow but sure convergence of the two methodologies began.One of the main aims of the present boc)k is to complete this convergence inthe context of a reformulated methodology,

    With the above dcvelopments in probability theory and statisticalinference in mind, let us consider the history of econometric modelling inthe early twentieth eentury. The marginalist revolution of the 187s, withWalras and Jevons the protagonists, began to take root and with it a changeof attitude towards mathematical and statistical techniques and their role instudying the economy. ln classical ekronomics ebserved data were usedmainly to

    'establish' tendencies in support of theoretical arguments or askfacts'to be explained. The mathematisation of economic theory broughtabout by the marginalist revolution contributed towards a purposefulattempt to quantify theoretical relationships using observed data. Thetheoretical relationships formulated in terms of equations such as demandand supply funetions seemed to offer themselves for quantification using thenewly established techniques of correlation and regression.

    The early literature in econometric modelling concentrated mostly ontwo general areas, business cycles and demand curves (see Stigler (1954:.This can be explained by the availability of data and th influence of themarginalist revolution. The statistical analysis of business cycles took theform of applying correlation as a tool to separate long-term secularmovements, periodic movements and short-run oscillations (seeHooker(1905), Moore (1914)inter alia). The empirical studies in demand theoryconcentratd mostly on estimating demand curves using the Gauss linearmodel disguised as regression analysis. The estimation of such curves wastreated as

    tcurve fitting' with any probabilistic arguments beingcoincidental. Numerous studies of empirical demand schedules, mostly ofagricultural products, were published during the period 1910-30 (seeStigler (1954), Morgan (1982), Hendry and Morgan (1986:,seeking toestablish an empirical foundation for the

    'law of demand'. These studiespurported to estimate demand schedules of the simple form

    qD= f/() +J1n, (1.10)lwhere OD refers to quantities demanded at time l (intentionson behalf ofeconomic agents to buy a certain quantity of a commodity) correspondingto a range of hypothetical prices (q. By adopting the Gauss linear modelthese studies tried to approximate' (10) by fitting the

    'best' line through thescatter diagram of fq- ) t = 1, 2, . . . , T where t usually referred to/7 t '

  • 1.1 A brief historical overview 11

    quantities transacted (or produced) and the corresponding prices f at time1.That is, they would estimate

    L = b 1-1-

    b 1 , l = 1, 2 s . . . r X (1. 11)using least-squares or some other interpolation method and interpret theestimated coefficients h and /)1as estimates of the theoretical parameters?oand f1j respectively, if the signs and vaiues were consistent wth the law ofdemand'. This simplistic modelling approach, however, ran into difficultiesimmediately. Moore (19l4) estimated (l 1)using data on pig-iron (rawsteel)production and price and Sdiscovered' (or so he thought) a positivelysloping demand schedule (Jj > 0). This result attracted considerablecriticism from the applied econometricians of the time such as Lehfeldt andWright (see Stigler (1962)) and raised the most important issue ineconometrie modelling; te connccrft?n bpfwtzpn be estimated equationsrs/ng observed data tznJ the theovetical l-elationships pt/ullf/tvlt?l %'cctpntpl?c'theory. Lehfeldt (1915), commenting on Moore*s

    'discovery', argued thatthe estimated equation was not a demand but a supply curve. Severalapplied econometricians argued that Moore's estimated equation was amixture of demand and supply. Others, taking a more extreme view, raisedthe issue of whether estimated equations represent statistical artifacts orgenuine empirical demand or supply curves. lt might surprise the reader tolearn that the same issue remains largely unresolved to this day. Several'solutions' have been suggested since then but no satisfactory answer hasemerged.

    During the next two decades (1910-30)the applied econometriciansstruggled with the problem and proposed several ingenious ways to'resolve' some of the problems raised by the estimated versus theoreticalrelationships issue. Their attempts were mainly directed towards specifyingmore

    4realistic' theoreteal models and attempting to rid the observed dataof kirrelevant information'. For example, the scenario that the demand andsupply curves simultaneously shifting allowing us to observe only theirintersection points received considerable attention (see Working (1927)incr ('/fll. The time dimension of time-series data proved particularlydifticult to

    'solve' given that the theoretical model was commonly static.Hence Sdetrending' the data was a popular way to

    kpurify' the observed dataIn order to bring them closer to the theoretical concepts purporting tomeasure (seeMorgan (1982/.As argued below the estimated-theoreticaltssue raises numerous problems which, given the state of the art as far asstatistical inference is concerned, could not have been resolved in anyAttisfactory way. ln modern terminology these problems can bexummarised under the following headings:: ' theoretical variables versus observed data;

  • A preliminary N'iew

    (ii) statistical model specificalion'.(iii) statistical misspecification testing;(iv) specification testing, reparametrisation, identification;(v) empirical versus theoretical models.

    By the late 1920: there was a deeply felt need for a more organised effortto face the problems raised by the early applied econometricians such asMoore. Mitchell, Schultz. Ctark- Working, Wallace, Wright, fnlcr (Ilftl. Thisled to the creation of the Econometric Society in 1930. Frisch, Tinbergenand Fisher (Irving) initiated the establishment of 'an international society'//- r/lt, adt,ancement (?/' economic l/:t?t??'). in its rt?lation to statistics and

    Anfl/lpnlrkrs. The decade immediately after the creation of theEconometric Society can be characterised as the period during which thefoundations of modern econometrics were laid mainly by posing someimportant and insightful questions.

    An important attempt to resolve some of the problems raised by theestimated theoretical distinction was mae by Frisch (1928) (1934).Arguingfrom the Gauss linear model viewpoint Frisch suggested the so-callederrors-in-variables formulation where the theoretical relationships definedin terms of theoretical variables pt EEEfy 1 ,, . . . , Jtkf)

    bare defined by the systemof k linear equations:

    A'p =0f 5

    and the observed data yt BE (.v11, . . . ,.J.!k2)' are related to pt via

    yg = pt + ct ,

    where cr are errors of measurement. This formulation emphasises thedistinction between theoretical variables and observed data with themeasurement equations (13) relating the two. The problem as seen byFrisch was one of approximation (interpolation) in the context of linearalgebra in the same way as the Gauss linear model was viewed. Frisch,however, with his coqjluence analq,sis offered no. proper solution to theproblem. A complete solution to the simplest case was only recentlyprovided, 50 years later, by Kalman (1982). lt is fair to say that althoughFrisch understood the problems raised by the empirical theoreticaldistinction as the quotation below testifies, his formulation of the problemturned out to be rather unsuccessful in this respect. Commenting onTinbergen's 'A statistical test of business cycle theories'. Frisch argued that:

    The qucstion of what connection there is between relations we work within theory and those we get by fitting ctlrves to actual statistical data is avery delicatc onc. Tinbergen in his work hardly mentions it. He more orIess takes it for granted that the relatons he has found are in their nature

  • 1.1 A brief historical overview

    the same as the theory . . . This is, in my opinion, unsatisfactory. In a workof this sort, the connection bctween slatisifL'fll and l,b.ltll'et'il'all-ta/f-/ri?nsmust be thoroughly understood and the nature of the intbrmation whichthe statistical relations furnish - although they are not identical with thetheoretical rclations - should be clearly brought out.

    (Sce Frisch ( 1938), pp. 2 -3.)

    As mentioned above, by the late 1930s the Fisher paradigm of statisticalinference was formtllated into a coherent body of knowledge with a firmfoundatin in probability theory. The first important attempt to introducethis paradigm into econometrics was made by Koopmans (1937). Heproposed a resetting of Frisch's errors-in-variables formulation in thecontext of the Fisher paradigm and related the least-squares method to thatof maximum likelihood, arguing that the latter paradigm provides us withadditional insight as to the nature of the problem posed and its

    bsolution'

    (estimation). Seven years later Haavelmo (a student of Frsch) published hiscelebrated monograph on b'T'heprobability approach in econometrics' (seeHaavelmo (1944)) where he argued that the probability approach (theFisher paradigm) was the most promising approach to econometricmodelling (see Morgan (19841). His argument in a nutshell was that ifstatistical inference (estimation, testing and prediction) are to be usedsystematically we need to accept the framework in the context of whichthese results become available. This entails formulating theoreticalpropositions in the context of a well-defined statistical model. ln the samemonograph Haavelmo exemplified a methodological awareness far aheadof his time. ln relation to the above discussion of the appropriateness of theGauss linear model in modelling economic phenomena he distinguishedbetween observed data resulting from:

    (1) experiments that we should like to make to see if ccrtain real economicphenomena - when artificially isolatcd from

    'other

    intlucnces' - wouldverify certain hypotheses, and

    (2) the stream of experiments that Nature is steadily turning out from herown enormous laboratory, and which wc mcrely watch as passiveobservers

    He went on to argue:

    ln the first case we can make the agreement or disagreement betweentheoryand facts depend upon two things: the facts we choose to consider,as well as our theory about them . . . . ln the second casc we can only try toadjust our theories to reality as it appears before us. And what is themeaningof a design of experiments in this case'? lt is this: Wc try to choosea theory and a design of experiments to go with it. in such a way that thc

  • A preliminary view

    resulting data would be those which we get by passive observation t)!reality.And to the extent that we succeed in doing so, we become master 0:reality- by passive agreement.Now if we examine current economic theories, we see that a great many ofthem, in particular the more profound ones, require experiments of thefirst type mentioned above. On the other hand, the kind of economic datathat we actually have belong mostly to the second type.

    (See Haavelmo (1944).)

    Unfortunately for econometrics, Haavelmo's views on the methodology ofeconometric modelling had much lesser influence than his formulation of astatistical model thought to be tailor made for econometrics; the so-calledsimultaneous equations model.

    ln an attempt to capture the interdependence of economic relationshipsHaavelmo (1943) proposed an alternative to Frisch's errors-in-variablesformulation where no distinction between theoretical variables andobserved data is made. The simultaneous equation formulation wasspecified by the system

    F'y + A'xf + tt = 0,f

    where yr refers to the variables whose behaviour this system purports toexplain (endogenous)and x/ to the explanatory (extraneous)variableswhose behaviour lies outside the intended scope of the theory underlying(14) and zt is the error term (seeChapter 25). The statistical analysis of (14)provided the agenda for a group of distinguished statisticians andeconometricians assembled in Chicago in 1945. This group known as theCowles Foundation Group, introduced the newly developed techniques ofestimation (maximum likelihood) and testing into econometrics via thesimultaneous equation model. Their results, published in two monographs(see Koopmans (1950)and Hood and Koopmans (1953))were to providethe main research agenda in econometrics for the next 25 years.

    lt is important to note that despite Haavelmo's stated intentions in hisdiscussion of the methodology of econometric modelling (seeHaavelmo(1944)), the simultaneous equation model was later viewed in the Gausslinear model tradition where the theory is assumed to account for a11theinformation in the data apart from some non-systematic (white-noise)errors. lndeed the research in econometric theory for the next 25-30 yearswas dominated by the Gauss linear model and its misspecification analysisand the simultaneous equations model and its identification andestimation. The initial optimism about the potential of the simultaneousequations model and its appropriateness for econometric modelling wasnot fulfilled. The problems related to the isstle of estimated versus

  • A sketcb of a metbodology

    theoretical relationships mentioned above were largely ignored because ofthis initial optimism. By the late 1970s the experience with largemacroeconometric models based on the simultaneous equationsformulation called into queson the whole approach to econometricmodelling (seeSims (1980),Malinvaud (1982),fnl!r t:!ia).

    The inability of large macroeconometric models to compete with Box-Jenkins ARIMA models, which have no economic theory content, onprediction grounds (see Cooper (.1972)) renewed the interest ofeconometricians to the issue of sttzlic tllptlry tlcrsus Jynclmfcrfmpseries Jcltlraised in the 1920s and 30s. Granger and Newbold (1974) questioned theconventional econometric approach of paying little attention to the timeseries features of economic data', the result of specifying statistical modelsusing only the information provided by economic theory. By the late 1970sit was clear that the simultaneous equations model, although very useful,was not a panacea for al1 econometric modelling problems. The wholeeconometric methodology needed a reconsideration in view of theexperience of the three decades since the Cowles Foundation.

    The purpose of the next section is to consider an outline of a particularapproact to econometric modezing wtch takes account of some of theproblems raised above. lt is only an outline because in order to formulatethe methodology in any detail we need to use concepts and resultswhich aredeveloped in the rest of the book. In particular an important feature of theproposed methodology is the recasting of statistical models of interest ineconometrics in the Fisherian mould where the probabilistic assumptionsare made directly in terms of the observable random variables giving rise tothe observed data and not some unobservable error term. The concepts andideas involved in this recasting are developed in Parts 1l-IV. Hence, a moredetailed discussion of the proposed methodology is given in the epilogue.

    1.2 Econometric modelling - a sketch of a methodology

    In order to motivate the methodology of econometric modelling adoptedbelow let us consider a simplistic view of a commonly propoundedmethodology as given in Fig. 1.1 (forsimilar diagrams see lntriligator(1978), Koutsoyiannis (1977),inter 4?/fJ), ln order to explain the procedurerepresented by the diagram let us consider an extensively researchedtheoretical relationship, the transactions demand for money.

    There is a proliferation of theories related to the demand for moneywhich are beyond the scope of the present discussion (fora survey see Fisher(1978:. For our purposes it suffices to consider a simple theory where thetransactions demand for money depends on income. the price level and

  • A preliminary view

    Data

    PredictionTheoretical Econometric Estimation (fOrecasting)Theory model model testing

    policy evaluation

    Statistical inference

    Fig. 1.1. The'textbook'

    approach to econometric modelling.

    interest rate, i.e.MB =./'(Ft #, 1). (1.15)

    Most theories of the demand for money can be accommodated in somevariation of (15) by attributing different interpretations to F. The theoreticalmodel is a mathematical formulation of a theory. ln the present case weexpressed the theory directly in the functional form (15) in an attempt tokeep the discussion to a minimum. Let the theoretical model be an explicitfunctional form for (15), say

    MB= zlyvt'ac.fza

    orln MB = atj + rzj ln F + xz ln # + aa ln 1

    in log-linear form with tzll = ln,4

    being a constant.The next step in the methodological scheme represented by Fig. 1.1 is to

    transform the theoretical model (17) into an econometric model. This iscommonly achieved in an interrelated sequence of steps which is rarelyexplicitly stated. Firstly, certain data series, assumed to representmeasurements of the theoretical variables involved, are chosen. Secondly,the theoretical variables are assumed to coincide with the variables givingrise to the observed data chosen. This enables us to respecify (17) in terms ofthese observable variables, say, V,, f,, Ft and I-t

    ln Vt = txtl + a1 ln 1-,)+ a2 ln #f+ aafr, (1.18)The last step is to turn (18) into an econometric (statistical)model byattaching an error term t1t which is commonly assumed to be a normallydistributed random variable of the form

  • l.2 A sketch of a methodology

    the effects of the exeluded variables. Adding this error term onto (18) yields

    n = x () + x 1 .f?+ a at + u3l + at , t 6 T, (1

    .20)

    t

    u here small letters represent the logarithm of the corresponding capitallctters. Equation (20) is now viewed as a Gauss linear model with thecstimation, testing and prediction techniques related to this at our disposal:o analyse

    'the

    transactions demand for money'.''l'e next qate t%to esttmate 2% ustttg tte statistkal results retated to the

    Gauss linear model and test the postulated assumptions for the error term.lf any of the assumptions are invalid we correct by respecifying the errorterm and then we proceed to test the a priori restrictions suggested by thetheory such as, tz1 :>: 1, aa cls 1, - 1< (s

  • A preliminary view

    starting point of econometric modelling is some theory. This arises becausethe intended scope of econometrics is narrowly defined as the

    'measurement

    of theoretical relationships'. Such a definition was rejected at the outset ofthe present book as narrow and misleading. Theories are developed not forthe sake of theorising but in order to understand some observablephenomenon of interest. Hence, defining the intended scope of econo-metrics as providing numbers for our own constructions and ignoringthe original aim of explaining phenomena of interest, restricts its scopeconsiderably by attaching Sblinkers' to the modeller. In a nutshell, itpresupposes that the only

    ilegitimate information' contained in theobserved data chosen is what the theory allows. This presents the modellerwith insurmountable difficulties at the statistical model specification stagewhen the data do not fit the tstraightjacket' chosen for them without theirnature being taken into consideration. The problem becomes moreapparent when the theoretical model is turned into a statistical(econometric) model by attaching a white-noise error term to areinterpreted equation in terms of observable variables. lt is naive tosuggest that the statistical model should be the same whatever the observeddata chosen. ln order to see this 1etus consider the demand schedule at timet referred to in Section 1.1:

    qtB = atl + a Lpt. (1.2 1)If the data refer to intentions q?tt= q) pitj,i = 1,2, . . . , n wlich correspond tothe hypothetical range of prices pit,i = 1,2, . . . , n then the most appropriatestatistical model in the context of which (21) can be analysed is indeed theGauss linear model. This is because the way the observed data weregenerated was under conditions which resemble an experimental situation',the hypothetical prices pff, i = 1, 2, . . . , n were called out and the economicagents considered their intentions to buy at time !. This suggests that uoand 1 can be estimated using

    y-P,t= 'xo + a j pit+ uft, i = 1, 2, . . . , n. (1.22)ln Haavelmo's categorisation, ty-pjf,pit),i = 1, 2, . . . , n constitute observeddata of type one; experimental-like situations, see Section 1.1. On the otherhand, if the observed data come in the form of time series li, f), t = 1,2, . . . ,F where tj'r refers to quantities transacted and ! the corresponding prices attime t then the data are of type two; generated by nature. In this case theGauss linear model seems wholly inappropriate unless there existsadditional information ensuring that

    t?2D(f) = ( for all 1. (1.23)Such a condition is highly unlikely to hold given that in practice other

  • A sketch of a methodology

    factors such as supply-side, historical and institutional will influence thedetermination of j',and . It is highly likely that the data (!, t), t = 1,2, . . . ,T when used in the context of the Gauss linear model,

    = )'()+ / l r + ltwill give rise to verq' misleading estimates for the theoretical parameters ofinterest tztl and tzj (see Chapter 19 for the demand for money). This isbecause the GM represented by the Gauss linear model bears little, if any,resemblanee to the acttlal DGP which gave rise to the observed data (f, pt),t = 1.2, . . . , TJln order to account for this some alternative statistical modelshould be specified in this case (see Part IV for several such models).Moreover, in this case the theoretical model (21) might not be estimable. Amoment's retlection suggests that without any additional information theestimable form of the model is likely to be an kp-juslrrlcn! process (priceor/and quantity). lf the observed data have a distinct time dimension thisshould be taken into consideration in deciding the estimable form of themodel as well as in specifying the statistical model in the context of whichthe latter will be analysed. The estimable form of the model is directlyrelated to the observable phenomenon of interest which gave rise to thedata (the actual DGP). More often than not the intended seope of thetheory in question is not the demand schedule itself but the explanation ofchanges in prices and quantities of interest. ln such a case a demand or anda supply schedule are used as a means to explain price and quantity changesnot as the intended scope of the theory.

    ln the context of the textbook methodology distinguishing between thetheoretical and estimable models in view of the observed data seems totallyunnecessarj' for three interrelated reasons:(i) the observed data are treated as an afterthought',(ii) the actual DGP has no role to play', and(iii) theoretical variables are assumed to coincide (one-to-one) with the

    observed data chosen.As in the case of (21) above the theoretical variables do not corresponddirectly to a particular observed data series unless we generate the dataourselves by

    -artificially

    isolating the economic phenomenon of interestfrom other influences- (see the Haavelmo (1944) quotation in Section 1.1).We only have to think of theoretical variables such as aggregate demand formoney, income, price level and interest rates and dozens of available dataseries become possible candidates for measuring these variables.Commonly, none of these data series measures what the theoreticalvariables refer to. Proceeding to assume that what is estimable coincideswith the theoretical model and the statistical model differs from these by a

  • .4 preliminary' view

    w'hite-noise error terln re.ltlrtlll?vs q/' lhe (?).st?,-l't.?t/ (l(l(l t'/?t?st>??. can only leadto misleading conclusions.

    The question which naturally arises at this stage is whether we can tackle

    some of the problems raised above in the context of an alternativemethodological framework. ln view of the apparent limitations of thetextbook methodology any alternative framework should be flexibleenough so as to allow the modeller to ask some of the questions raisedabove even though readily available answers might not always beforthcoming. With this in mind such a methodological framework shouldattribute an important role to the actual DGP in order to widen theintended scope of econometric modelling. lndeed. the estimable modelshould be interpreted as an approximation to the aclual DGP. This bringsthe nature of the observed data at the centre of the scene with the statisticalmodel being defined directly in terms of the random variables giving rise tothe data and not the error term. The statistical model should be specified asa generalised description of the mechanism giving rise to thedata. in view ofthe estimable model- because the latter is going to be analysed in its context.A sketch of such a methodological framework is given in Fig. 1

    .2.

    Animportant feature of this framework is that it can include the textbookmethodology as a speeial case under certain conditions. When the actualDGP is

    -designed-

    to resemble the conditions assumed by the theory inquestion (Haavelmo type one observed data) then the theoretical andestilnable models could coincide and the statistical model could differ fromthese by a white-noise error. In general, however. we need to distinguishbetween them even though the estimable model might not be readilyavailable in some cases such as the case of the transactions demand formoney (see Chapter 23).

    In order to be able to turn the above skeleton of a methodology into aftlllyfleshed framework we need to formulate some of the concepts involvedin more detail and discuss its implementation at length. Hence, a moredetailed discussion of this methodology is considered in the epilogue wherethe various components shown in Fig. 1

    ,2

    are properly defined and theirrole explained. In the meantime the following w'orking definitions willstlffice for the discussion which follows:

    Tlle()l- '.' a conceptual constrtlct provid ing an ideal ised description of thephenomena within its intended scope which will enable us to seekcxplanations and predictions related to the actual DGP.

  • l.2 A sketch of a methodology

    Actual data generatingTheory

    Theoretical model Observed data

    r'--

    -

    3l ll

    Estimable model Statistical modellI 1

    l Il Il lI l1 Estimation l1 Misspecification Il IReparametrisationl 1I Model selection II lI ll Il I1 ll IEmpirical econometric model l1 prediction, poI icy evaluation 1l l

    Estilnable zyt?t/cp/..a particular form of the theo retical model which ispotentitlly estimable in view of the actual DG P and the observed datachosen.

    Sttltivstil'tll ??kt)t/(>/.' a probabilistic formulation purporting to provide ageneralised description of the actual DGP with a view of analysing thttestimable model in its context.

    Enlpirical trvtlrltpnt-zr-. motlel: a reformulation (reparametrisation/'restriction) of a well-defined estimated statistical model in view of theestimable model which can be used for description. explanation or/andprediction.

  • A preliminary view

    Looking ahead

    As the title of the book exemplifies. its main aim is the statisticalfoundations of econometric modelling. In relation to Fig. 1.2 the bookconcentrates mainly on the part u ithin the dotted rectangle. Thespecification of a statistical model in lerms of the variables giving rise to theobserved data as well as the related statistictl inference results will be thesubject matter of Parts 11and 111.ln Part 15' N'arious statistical models ofinterest in econometric modelling and the related statistical inferenceresults will be considered in some detail. Special attention will be given tothe procedure from the specification of the stttistical model to 1he tdesign'

    of the empirical econometric model. The transactions dcmand for moneyexample considered above will be tlsed th roughout Part 1V in an attempt toillustrate the

    'dangers'

    awaiting the tlnaware il1 the context of the textbookmethodology as well as compare this with the alternative methodologyformalised in the present book.

    Parts 11 and lll form an integral part of econometric modelling andshould not be viewed as providing a summary of the concepts anddefinitions to be used in Part lV, A sound background in probability theoryand statistical inference is crtlcial for the implementation of the approachadopted in the present book. This is mainly becatlse the modeller is requiredto specify the

    'appropriate' statistical model taking into consideration thenature of the data in hand as well as the estimable model. This entailsmaking decisions about characteristics of the random variables which gaverise to the observed data chosen such as normalitys independence,stationarity- mixing, before any estimation is even attempted. This is one ofthe most crucial decisions in the context of econometric modelling becausean inappropriate choice of the statistical model renders the relatedstatistical inference concltlsions invalid. Hence. the reader is advised to viewParts 11 and 11I as an integral part of econometric modelling and not asreference appendices. ln Part IV the reader is encouraged to vieweconometric modelling as a thinking person's activity and not as a sequenceof technique recipes. Chapter 2 provides a very brief introduction to thePearson paradigm in an attempt to motivate 1he Fisher paradigm which isthe subject matter of Parts 11 and 111.

    Additional references

  • C H A P T E R 2

    Descriptive study of data

    2.1 Histograms and their numerical characteristics

    By descriptive study of data we refer to the summarisation and exposition(tabulation, grouping, graphical representation) of observed data as well asthe derivation of numerical characteristics such as measures of location,dispersion and shape.

    Although the descriptive study of data is an important facet of modellingwith real data in itself, in the present study it is mainly used to motivate theneed for probability theory and statistical inference proper.

    In order to make the discussion more specific let us consider the after-taxpersonal income data of 23 000 households for 1979-80 in the UK. Thesedata in raw form constitute 23 000 numbers between f 1000 and f 50000.This presents us with a formidable task in attempting to understand howincome is distributed among the 23 000 households represented in the data.The purpose of descriptive statistics is to help us make some sense of suchdata. A natural way to proceed is to summarise the data by allocating thenumbers into classes (intervals).The number of intervals is chosen a prioriand it depends on the degree of summarisation needed. In the present casethe income data are allocated into 15 intervals, as shown in Table 2. 1below(see National lncome (CH(/ Expenditul'e (1983)). The first column of the tableshows the income intervals, the second column shows the number ofincomes falling into each interval and the third column the relativefrequency for each interval. The relative frequency is calculated by dividingthe number of observations in each interval by the total number ofobservations. Summarising the data in Table 2.1 enables us to get some ideaof how income is distributed among the various classes. lf we plot therelative frequencies in a bar graph we get what is known as the histogram,

    23

  • Descriptive study of data

    0. 160. 150. 140.13

    U'0. 12. E 0. 11

    # 0.10,/ 0.09

    0.081 o o7g0.06

    a c.c50.040.030.020,01

    0 1 2 3 4 5 6 7 8 9 10 1 1 12 13 14 15lncome

    Fig. 2.1. The histogram and frequency polygon of the personal incomedata.

    shown in Fig. 2. 1. The pictorial representation of the relative frequenciesgives us a more vivid impression of the distribution of income. Looking atthe histogram we can see that most households earn less than E4500 and insome sense we can separate them into two larger groups: those earningbetween f 1000 and f 4500 and those above f4500. The first impression is

  • Histograms and their numerical characteristies

    that the distribution of income inside these two larger groups appears to berather smilar.

    For further information on the distribution of income we could caiculate

    varous numerical charaeteristcs describing the hstogram's location,dispersion and shape. Such measures can be caleulated directly in terms ofthe raw data. Howeqrer, in the present case it is more convenient forexpositional purposes to use the grouped data. The main reason for this isto introduce variotls concepts which will be reinterpreted in the context ofprobability theory in Part ll.

    The n'lpt/l' as a measure of location takes the form15

    .- = () -. = ?'y

    ..u j '... L ... . ,1=1

    where $:and zi refer to the relative frequency and the midpoint of interval I'.' Tbe rnott? as a measure of location refers to the value of income that occursmost frequentl) in the data set. ln the present case the mode belongs to thefirst interval f 1.0- 1.5. Another measure of location is the mtalan referring tothe value of ineome in thc luiddle when incomes al'e arranged in anascenling (01' descending) order according to the size of income. The bestway to calculate the median is to plot the cIftnulatit'v

    .//gt/l/.eznc)'

    qvaph whichis more consrenient for answering questions such as -Ho5v manyobservations fall below a particular value of income?' (seeFig. 2.2). Fromthe cumulative frequency graph we can see that the median belongs to theinterval f 3.0-3.5. Comparing the three measures of location we can see that

    1.0

    0.9$

    0.8' o 7g

    .

    # o.6+-

    y 0.5'; '0.4'B

    E= 0,3c

    10.2

    $'

    0. 1l1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

    lncomeFig. 2.2. The cumulative histogram and ogive of the personal incomedata.

  • Descriptive study of data

    mode < median < mean, confirming the obvious asymmetry ol' thehistogram.

    Another important feature of the histogram is the dispersion of therelative frequencies around a measure of central tendency. The mostfrequently used measure of dispersion is the l'al-iance defined by

    15

    '2= (zf- pli = 4.85,i = t

    which is a measure of dispersion around the mean; p is known as thestandard deriation.

    We can extend the concept of the variance to

    15mk= ) (zf- z'lksi , k = 3 , 4 , . , . ,

    f = 1

    defining what are known as hiqber central rrlf?rntanrs. These higher momentscan be used to get a better idea of the shape of the histogram. For example,the standardised form of the third and fourth moments defined by

    ?n a u?nSK = - an d K = --7.4 , (2.4)

    X. L,

    known as the skewness and kurtosis tro//'icfcaurk. measure the asymmetryand peakedness of the histogram, respectively. In the case of a symmetrichistogram, SK =0 and the less peaked the histogram thegreater value ofK. For the income data

    SK = 1.43 and K = 7.33,

    which confirms the asymmetry of the histogram (skewed to the right). Theabove numerical characteristics referring to the location, dispersion andshape were calculated for the data set as a whole. lt was argued above,however, that it may be preferable to separate the data into two largergroups and study those separately. Let us consider the groups f 1.0-4.5 andf4.0-20.0 separately. The numerical characteristics for the two groups are

    h = 2.5, (721= 0.996, SKL = 0,252,and

    h = 6.18, :22 = 3.8 14, SKz = 2.55, Kz = 11.93, respectively.

    Looking at these measures we can see that although the two subsets of theincome data seemed qualitatively rather similar they actually differsubstantially. The second group has much bigger dispersion, skewness andkurtosis coefficients.

    Returning to the numerical characteristics of the data set as a whole we

  • 2.2 Frequency curves

    can see that these seem to represent an uneasy compromise between theabove two subsets. This confirms our first intuitive reaction based on thehistogram that it might be more appropriate to study the two larger groupsseparately.

    Another form of graphical representation for time-series data is the timegrkpll (zfs l). l = 1. 2e . . . . 'CThe temporal pattern of an economic time seriesis important not only in the context of descriptive statistics but also plays animportant role in econometric modelling in the context of statisticalinference proper', see Part lV.

    2.2 Frequency curves

    Although the histogram can be a very useful way to summarise and studyobserved data it is not a very convenient descriptor of data. This is becausem - 1 parameters /1, /a, . . . , ()m- j (mbeing the number of intervals) areneeded to describe it. Moreover, analytically the histogram is acumbersome step function of the form

    m ((,.c )= )''')-.-

    '

    .f

    . J(g,:.ri , ,:.r i . j.)),i -- 1

    (Sf+ 1 - L'i)

    where gf . z'i . 1 ) represents the Eth half-closed interval and 1( ' ) is theindicator function

    for z g Ilc,zi +. 1 )for z k!llz'f,zi . 1).

    Hence, the histogram is not an ideal descriptor especially in relation to themodelling facet of observed data.

    The first step towards a more convenient descriptor of observed data isthe so-called p-equencyJmtvfr/tpnwhich is a modified histogram. This isobtained by joining up the midpoints of the step function, as shown in Fig.2.1. to get a continuous function.

    An analogous graph for the cumulative frequency graph is known as theogive (seeFig. 2.2). These two graphs can be interpreted as the histogramsobtained by increasing the number of intervals. In summarising the data inthe form of a histogram some information is lost. The greater the number ofintervals the smaller the information lost. This suggests that increasing thenumber of intervals we might get more realistic descriptors for our data.

    lntuition suggests that if we keep on increasing the number of intervals toinfinity we sllould get a much smoother frequency curve. Moreover, with asmooth frequency curve we should be able to describe it in some functionalform with fewer than m - l parameters. For example, ifwe were to describe

  • Descriptive study of data

    the two subsets of the data separately we cotlld conceivably be able toexpress a smoothed version of the frequency polygons in a polynomial formwith one or two parameters. This line of reasoning 1ed statisticians in thesecond part of the nineteenth century to suggest various such families offrequency curves with various shapes for describing observed data,

    The Pearson familyof frequencytwrptas'ln his attempt to derive a general family of frequency curves to describeobserved data, Karl Pearson in the late 189()s suggested a family based onthe differential equation

    dtlo /(z)) z + (1U,d ): bo+ b1 :: + b2::

    which satisfies the condition that the curve touches the z-axis at T)(.c) = 0 andhas an optimum at z= -a, that is, the curve has one mode. Clearly, thesolution of the above equation depends on the roots of the denominator. Byimposing different conditions on these roots and choosing different valuesfor a, bv, ?1 and bz we can generate numerous frequency curves such as

    (2.8)

    (iii) 4(J) =.,4 aty,':--lt'* 11 - J-shaped. (2.10)

    In the case of the income data above we can see that the J-shaped (iii)frequency curve seems to be our best choice. As can be seen it has only oneparameter (1 and it is clearly a much more convenient descriptor ('ifappropriate) of the income data than the histogram. For

    ,4a

    equal to thelowest income value this is known as the Pareto frequency curve. Lookingat Fig. 2. 1 we can see that for incomes greater than f 4.5 the Paretofrequency curve seems a very reasonable descriptor.

    An important property of the Pearson family of frequency cursres is thatthe parameters a, bv, l?I and bs are completely determined from knowledgeof the first four moments. This implies that any frequency curq'e can be fittedto the data using these moments (seeKendall and Sttlart ( 1969)). At thispoint, instead of considering how such frequency curves can be fitted toobserved data we are going to leave the story unfinished to be taken up inParts ll1 and IV in order to look ahead to probability theory and statisticalinference proper.

  • 2.3 luooking ahead

    Looking ahead2.3

    The most important drawback of descriptive statistics is that the study ofthe observed data enables us to draw certain conclusions which relate tpnlrto the data in hand. The temptation in analysing the above income data is toattempt to make generalisations beyond the data in hand, in particularabout the distribution of ineome in the UK. This- however, is not possible inthe descriptive statistics framework. ln order to be able to generalisebeyond the data in hand weneed :to model' the distribution of income in theUK and not just tdescribe' the observed data in hand. Such a general'model- is provided by probability theory to be considered in Part Il. ltturns out that the model provided by probability theory owes a 1ot to theearlier developed descriptive statistics. ln partieular, most of the conceptswhich form the basis of the probability model were motivated by thedescriptive statistics concepts eonsidered above. The concepts of measuresof location, dispersion and shape, as well as the frequency curve, weretransplanted into probability theory with renewed interpretations. Thefrequencl curve when reinterpreted becomes a density function purportingto model observable real world phenomena. ln particular the Pearsonfamily of frequency curves can be reinterpreted as a family of densityfunctions. As for the various measures- they will now be reinterpreted interms of the density function.

    Equipped with the probability model to be developed in Part 11 we cango on to analyse observed data (now interpreted as generated by someassumed probability model) in the context of statistical inference proper',the subject matter of Part 111.ln such a context we can generalise beyondthe observed data in hand. Probability theory and statistical inference willenable us to construct and analyse statistical models of particular interest ineconometrics', the subject matter of Part lV.

    ln Chapter 2 we consider the axiomatic approach to probability whichforms the foundation for the discussion in Part ll. Chapter 3 introduces theconcept of a random variable and related notions; arguably the most widelyused concept in the present book. ln Chapters 4--10 we develop themathematical framework in the context of which the probability modelcould be analysed as a prelude to Part 111.

    Additional references

  • PART 11

    Probability theory

  • Probability

    'Why do we need probability theory in analysing observed dataf?' ln thedescriptive study of data considered in the previous chapter it wasemphasised that the results cannot be generalised outside the observed dataunder consideration. Any question relating to the population from whichthe observed data were drawn cannot be answered within the descriptivestatistics framework. ln order to be able to do that we need the theoreticalframework offered by probability theory. ln effect probability theorydevelops a matbematical molt?! which provides the logical foundation ofstatistical inference procedures for analysing observed data.

    ln developing a mathematical model we must first identify the importantfeatures, relations and entities in the real world phenomena and then devisethe concepts and choose the assumptions with which to project ageneralised description of these phenomena', an idealised picture of thesephenomena. The model as a consistent mathematical system has a

    'life of itsown' and can be analysed and studied without direct reference to real worldphenomena. Moreover, by definition a model should not bejudged as

    'true'

    or'false', because we have no means of making suchjudgments (seeChapter

    26).A model can only bejudged as a'good'

    or'better'

    approximation to the'reality' it purports to explain if it enables us to come to grips with thephenomena in question. That is, whether in studying the model's behaviourthe patterns and results revealed can help us identify and understand thereal phenomena within the theory's intended scope.

    The main aim of the present chapter is to construct a theoretical modelfor probability theory. ln Section 3.1 we consider the notion of probabilityitself as a prelude to the axiomatisation of the concept in Section 3.2. Theprobability model developed comes in the form of a probability space (S,

    ,%

    P( ' )).ln Section 3.3 this is extended to a conditional probability space.

  • Probability

    3.1 The notion of probability

    The theory of probability had its origins in gambling and games of chancein the mid-seventeenth eentury and its early history is associated with thenames of Huygens, Pascal, Fermat and Bernoulli. This early developmentof probability was rather sporadic and without any rigorous mathematicalfoundations. The first attempts at some mathematical rigour and a moresophisticated analytical apparatus than just combinatorial reasoning, arecredited to Laplace, De Moivre, Gauss and Poisson (seeMaistrov (1974)).Laplace proposed what is known today as the classical definition ofprobability:

    Dehnition 1

    If a random experiment can rtrsu/r in N mutuall-v exclusive andequally likely outcomes and if NA oj' rtasp outcomes result in lroccurrence oj' te event

    a4, then te probability of A is desned !?.p

    NJ'(.4) c=..-..J-.

    N

    To illustrate the definition let us consider the random experiment of tossing

    a fair coin twice and observing the face which shows up. The set of al1equally likely outcomes is

    S = )(SF), (FS), (SS), (TF)l, With N = 4.

    Let the event-4

    be 'observing at least one head (S)', then

    .4 = )(.J.fF). TH), (J.ff.f)).

    Since Nz = 3, P(A)=t.

    Applying the classical definition in the aboveexample is rather straightforward but in general it can be a tedious exercisein combinatorics (seeFeller (1968/. Moreover, there are a number ofserious shortcomings to this definition of probability, which render ittotally inadequate as a foundation for probability theory. The obviouslimitations of the classical approach are:(i) it is applicable to situations where there is only a jlnite number of

    possible outcomes; and(ii) the

    Sequally likely' condition renders the definition circular.Some important random experiments, even in gambling games (inresponseto which the classical approach was developed) give rise to a set of infiniteoutcomes. For example, the game played by tossing a coin until it turns upheads gives rise to the infinite set of possible outcomes S = (4S), (TS),(FTf1), (TFTS), . .

    .);

    it is conceivable that somebody could flip a coinindefinitely without ever turning up heads! The idea of

    iequally likely' is

  • 3.1 The notion of probability

    synonymous with equally probable', thus probability is defined using theidea of probability! Moreover, the definition is applicable to situationswhere an apparent

    'objective'

    symmetry exists, which raises not only thequestion of circularity but also how this definition can be applied to the caseof a biased coin or to consider the probability that next year's rate ofinflation in the UK will be 10oz,z7Where are the

    iequally likely' outcomesand which ones result in the occurrence of the event? These objections werewell known even by the founders of this approach and since the 1850sseveral attempts have been made to resolve the problems related to the'equally likely' presupposition and extend the area of applicability ofprobability theory.

    The most intluential of the approaches suggested in an attempt to tacklethe problems posed by the classical approach are the so-called frequencyand subjective approaches to probability. Therwltpncy approach had itsorigins in the writings of Poisson but it was not until the late 1920s that VonMises put forward a systematic account of the approach. The basicargument of the frequency approach is that probability does not have to berestricted to situations of apparent symmetry (equallylikely) since thenotion of probability should be interpreted as stemming from theobservable stability of empirical frequencies.For example, in the case of afair coin we say that the probability of

    -4

    = (S) is , not because there aretwo equally likely outcomes but because repeated series of large numbers oftrials demonstrate that the empirical frequency of occurrence of

    ..4

    'converges' to the limit as the number of trials goes to infinity. lf we denoteby n.4 the number of occurrences of an event zt in n trials, then if

    na1im = PA ,ex n11

    we say that #(z4)= PA. Fig. 3. 1 illustrates this notion for the case of #= t1:flin a typical example of 100 trials. As can be seen, although there are sometwild fluctuations' of the relative frequency for a small number of trials, asthese increase the relative frequency tends to tsettle' (convergearound ).

    Despite the fact that the frequency approach seems to be an improvementover the classical approach, giving objective status to the notion ofprobability by rendering it a property of real world phenomena, there aresome obvious objections to it. tWhat is meant by

    ilimit

    as n goes toinfinity'''?' l-low can we generate infinite sequences of trials'?' 'What happensto phenomena where repeated trials are not possible'?'

    The subjecttve approach to probability renders the notion of probabilitya subjective status by regarding it as degrees of belief' on behalf ofindividuals assessing the uncertainty of a particular situation. The

  • 36

    1.0

    0.90.8

    ..c. 0.7(ru)0.00.50.40.3

    0.20.1

    l l I I l 1 I10 20 30 40 50 60 70 80 90 100

    Probability

    Fig. 3.1. Observed relative frequency of an experiment with 100 cointossings.

    protagonists of this approach are interalia Ramsey (1926), de Finetti (1937),Savage (1954), Keynes (192 1) and Jeffreys (196 1),.see Barnett (1973) andLeamer (1978)on the differences between the frequency and subjectiveapproaches as well as the differences among the subjectivists.

    Recent statistical controversies are mainly due to the attitudes adoptedtowards the frequency and subjective definitions of probability. Althoughthese controversies are well beyond the material covere in this book, it isadvisable to remember that the two approaches lead to alternative methodsof statistical inference. The frequentists will conduct the discussion aroundwhat happens tin the long-run' or

    -on average', and attempt to developobjective', procedures which perform well according to these criteria. Onthe other hand, a subjectivist will be concerned with the question of revisingprior beliefs in the light of the available information in the form of theobserved data, and thus devise methods and techniques to answer suchquestions (see Barnett (1973)). Although the question of the meaning ofprobability was high on the agenda of probabilists from the mid-nineteenthcentury, this did not get in the way of impressive developments in thesubject. ln particular the systematic development of mathematicaltechniques related to what we nowadays call limit theorems (seeChapter 9).These developments were mainly the work of the Russian School(Chebyshev, Markov. Liapounov and Bernstein). By the 1920s there was awealth of such results and probabilith' began to grow knto a systematic bodyof knowledge. Although various people attempted a systematisation ofprobability it was the work of the Rtlssian mathematician Kolmogorovwhich proved to be the cornerstone for a systematic approach to

  • The axiomatic approach

    probability theory. Kolmogorov managed to relate the concept ofprobability to that of a measure in integration theory and exploited to thefull the analogies between set theory and the theory of functions on the onehand and the concept of a random variable on the other. ln a monumentalmonograph in 1933 he proposed an axiomatisation of probability theoryestablishing it once and for a1l as part of mathematics proper. There is nodoubt that this monograph proved to be the watershed for the laterdevelopment of probability theory growing enormously in importance andapplicability. Probability theory today plays a very important role in manydisciplines including physics, chemistry, biology, sociology and economics.

    3.2 The axiomatic approach

    The axiomatic approach to probability proceeds from a set of axioms(accepted without questioning as obvious), which are based on manycenturies of human experience, and the subsequent development is builtdeductively using formal logical arguments, like any other part ofmathematics such as geometry or linear algebra. ln mathematics anaxiomatic system is required to be complete, non-redundant and consistent.By complete we mean that the set of axioms postulated should enable us toprove every other theorem in the theory in question using the axioms andmathematical logic. The notion of non-redundancy refers to theimpossibility of deriving any axiom of the system from the other axioms.Consistency refers to the non-contradictory nature of the axioms.

    A probability model is by construction intended to be a description of achance mechanism giving rise to observed data. The starting point ofsuch a model is provided by the concept of a vandom t?xpt?rrntrnr describinga simplistic and idealised process giving rise to observed data.

    Djinition 2

    ,4 random experiment, denoted /?y4 is an f?xpt?rrrlt?;r wllfcll satishesthe Ji?lltpwngconditions:(f-l) alI possible distinct f.?lkrct?nlty.s are knfpwn a priori;(/?) il1 /ny particular trial rt? ourtrol'tlt/ is not known a priori; and(c) it trfkn be repeated unde. Eftpnlftrtnl conditions.

    Although at first sight this might seem as very unrealistic, even as a model ofa chance mechanism, it will be shown in the following chapters that it can beextended to provide the basis for much more realistic probability andstatistical models.

    The axiomatic approach to probability theory can be viewed as aformalisation of the concept of a random yxpcrzlcnr. In an attempt to

  • Probability

    formalise condition (a) all possible distinct outcomes are known a priori,Kolmogorov devised the set S which includes tall possible distinctoutcomes' and has to be postulated before the experiment is performed.

    Dejlnition J

    The samplespace, denoted by S, is dejlned to be the set t#*a// possibleoutcomes (?J te experiment

    '.

    The elements t#' S are calledelementary events.

    Example

    Consider the random experiment Ji' of tossing a fair coin twice andobserving the faces turning up. The sample space of & is

    S = l(f1T), (TS), (HH4' (TT)l,

    with (ST), (TS), (SS), (TT) being the elementary events belonging to S.

    The second ingredient of ($* to be formulated relates to (b)and in particularto the various forms events can take. A moment's reflection suggests thatthere is no particular reason why we should be interested in elementaryoutcomes only. For example, in the coin experiment we might be interestedin such events as z4l -

    'at least one S',.,42

    - at most one H' and these are notelementary events; in particular

    ,41= t(z;T), TH4, HH)and

    -4c- l(SF), CTH), (FT)lare combinations of elementary events. A1lsuch outcomes are called e,entsassociated with the sample space S and they are defined by combining'elementary events. Understanding the concept of an event is crucial for thediscussion which follows. lntuitively an event is any proposition associatedwith ' which may occur or not at each trial. We say that event

    z41

    occurswhen any one of the elementary events it comprises occurs. Thus, when atrial is made only one elementary event is observed but a large number ofevents may have occurred. For example, if the elementary event (ST)occurs in a particular trial,

    .,4l

    and.,12

    have occurred as well,Given that S is a set with members the elementary events this takes us

    immediately into the realm of set theory and events can be formally definedto be subsets of S formed by set theoretic operations ('t..J5 - union,

    'c7'-

    intersection, $-' - complementation) on the elementary events (seeBinmore

  • 3.2 The axiomatic approacb

    (1980:. For example,

    Two special events are S itself, called the sul-e plllrll and the impossible eventZ defined to contain no elements of S, i.e. .Z = yf j; the latter is defined forcompleteness.

    A third ingredient of &' associated with (b) which Kolmogorov had toformalise was the idea of uncertainty related to the outcome of anyparticular trial of Ji. This he formalised in the notion of probabilitiesattributed to the various events associated with $ such as #(,4j), #(.,4c),expressing the likelihood' of occurrence of these events. Althoughattributing probabilities to the elementary events presents no particularmathematical problems, doing the same for events in general is not asstraightforward, The difficulty arises because if

    -4: and zzlc are events.g-f:

    =

    S - z41, zlc = S -.,4a,

    z4l k.p.4a,

    z4:ch

    x4c, -41- z1c, etc., are also events because

    the occurrence or non-occurrcnce of.,4l

    and -42 implies the occurrence ornot of these events. This implies that for the attribution of probabilities tomake sense we have to impose some mathematical structure on the set of allevents, say

    r%

    which reflects the fact that whichever way we combine theseevents, the end result is always an event. The temptation at this stage is todefine .F to be the set of a1l subsets of S, called the pwt!r ser; surely thiscovers all possibilities! ln the above example the power set of S takes theform

    ,.F = lS,.?,

    t(.HT)), )(Tf1)), )(1S)), )(TT)), )(F11), (ST)),(4'.Ff1),(1/f1)),t('f'f:l),(TT')), .t(1-1'F),(HHII',)(f1T'), (TT)),((ff'.f),(TT)), .t(fT), (TH), (ff'f1)l,.t(f.fT'), (Tf.f), (TT)),tlfff/'l,(TT), (Tff)l, (CHH), (FT'), (HT)l).

    lt can be easily checked that whichever way we combine any events in ,F weend up with events in .LEFor example,

    tlflffl,('FT)) ch (('f'f1),(fT')l = (3 6,:/;(CHHS,(T1-1)) k.p t(T1-1),(1-7T)l- tl'fffl.tTfll, (ffT)l e z:/k etc.

    lt turns out that in most cases where the power set does not lead to anyinconsistencies in attributing probabilities we dene the set of events .F tobe the power set of S. But when S is intinite or uncountable (ithas as many

  • 40 Probability

    elements as there are real numbers) or we are interested in some but not a11possible events, inconsistencies can arise. For example, if S = ft

    .,41

    ,

    z42,. . . )

    such that zlf chAj = ,Z (#.j), isjzzz1, 2, . . . , U,i)-1 zztf = S and #(z4) = a > 0, #f,where .P(.,4)refers to the probability assigned to the event

    -4. Then #4.) =

    p')z'1 P(.,4) = )J,z 1 a > 1 (seebelow), which is an absurd probability- beinggreater than one; similar inconsistencies arise when S is uncountable. Apartfrom these inconsistencies sometimes we are not interested in a1Ithe subsetsof S. Hence, we need to define ,LF independently of the power set byendowing it with a mathematical structure which ensures that noinconsistencies arise. This is achieved by requiring that .LF has a specialmathematical structure, it is a c-field related to S.

    Dnron 4

    Let ..F be a set q subsets t#' S.,.t+-

    is called a J-field lf :(f)

    .4

    e:.:)

    r/ltpnWG .F - closurelnt/l/- complementation: t.'/??J

    () zzlf g.kj

    1*= 1, 2, . . . , then (UI-%1 zzlfl 6: ,F - clllsure ullderctprfflrlh/t.?utlioll.

    Note that (i)and (ii)taken together imply the following:(iii) S e .J; because

    .,4

    k.p.z'1-=

    S;(iv)

    .f3'

    (EE,F (from(iii)V= .(J(E.#-),.

    and(v )

    -4

    j G.k9j

    i = 1, 2 , . . . , then ((-),5.) I .p1.i )G .:FThese suggest that a c-field is a set of subsets of S which is closed undercomplementation- and countable unions and intersections. That is, any ofthese operations on the elements of

    -#' will give rise to an element of..9

    lt canbe checked that the power set of S is indeed a c-field, and so is the set

    .?A)= (.t(ST)), )HH), (T/f), (FT)), Z,.),

    but the set C = tt(fT),( TH)( j is not because ZIC, S'#C, t(ST), (FS) )#C.What we can do, however, in the latter case is to start from C and constructthe minimal cn#e'/J generated by its elements. This can be achieved byextending C to include all the events generated by set theoretic operations(unions, intersections, complementations) on the elements of C. Tlaen theminimal c-field generated by C is

    .%

    = ts',Z, ((J-1F), (FS)), t(SS), (FT)))and we denote it by

    .6

    = c(C).This way of constructing a c-field can be very useful in cases where the

    events of interest are fewer than the onesgiven by the power set in the case of

    a finite S. For example. ifwe are interested in events w'ith one of each H or Fthere is no point in defining the c-field to be the power set, and ...kL.can do aswell with fewer events to attribute probabilites to. The usefulness of thismethod of constructing c-fields is much greater in the case where S is eitherinfinite or uncountable; in such cases this method is indispensable. Let us

  • 3.2 The axiomatic approach

    eonsider an example where S is uncountable and discuss the construction ofsuch a c-field.

    Example

    Let S be the real line R = tx: - :c) < x < '.r ) and the set of events of interestbe

    J = t6BxL x c 2 ) where Bx = f c: z .%x lj = ( '.'.f , - x) .

    This is an educated ehoice, whieh will prove to be very useful in the sequel.

    How can we construct a c-field on E2?The definition of a c-field suggeststhat if we start from the events Bx, x 6: R then extend this set to include Xxandtake countable unions of Bx and X'xwe should be able to define a c-fieldon (R, c(J) -- the mfnmll c-,/it!?J lenerated b

    ' t'Ile t?rt?nrs Bx, x iE Q. Bydefinition Bx G c(.f). lf we take complements of Sx: X'x= tz.. .7 e R, z > x =

    J4 Taking countable unions of B : UJ- 1 ( - :f- . x - (1/))j=(x, :7- )e c( . x ,( - :y- , x) s c(./). These imply that c(.f) is indeed a c-field. ln order to see howlarge a collection c(J') is we can show that events of the form (x,

    .cc ),gx,.:ys),

    (x, z) for x < c, and tx)also belong to c(J), using set theoretie operations asfollows'.

    (x, (y.) = ( - 'ctp , xj c c(./).

    gx, :y:. )= ( - 'L

  • 42 Probability

    Dqflnition.5

    Probability is dhned as a set function on ,W'satisfying thefollowingaxioms:

    Axiom 1: #(.g1)): 0 for 'l7crl'-4

    6 +j.Axiom 2: PS) = 1,.andAxiom 3: #IUIL-1 ,4fl = ZI5-) 1 #(,4f) (' ftylf ),5; : issequence of muttally exclusive events in

    ,),/-

    thatzlfra Aj=

    .g

    for i #.j) - called countable additivity).In other words, probability is defined to be a set function with 'F as itsdomain and the closed real line interval g0,1Jas its range, so that

    #( ' ):,/- -+ I0, 1q.The first two axioms seem rather self-evident and are satisfied by both theclassical as well as frequency definitions of probability. Hence, in somesense,the axiomatic definition of probability tovercomes' the deficiencies ofthe other definitions by making the interpretation of probabilitydispensable for the mathematical model to be built. The third axiom is lessobvious, stating that the probability of the union of unrelated events mustbe equal to the addition of their separate probabilities. For example, since((SF)l rn (IHHII= Z,

    #(.r(JfF)) kl (f'f))) = #(l(ST'))) +#(((ffS)))

    - .t+ .)= .Again this coincides with the dfrequency interpretation' result. Tosummarise the argument so far, Kolmogorov formalised the conditions (a)and (b)of the random experiment ($ in the form of the trinity (k%,

    ..%

    P ' ))comprising the set of a1loutcomes S v-the sample space,a c-field c'Fof eventsrelated to S and a probability function #(

    .)

    assigning probabilities to eventsin si' For the coin example,