Jacod - European Summer School - Statistics and High Frequency Data 2009

Embed Size (px)

Citation preview

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    1/100

    Statistics and high frequency data

    Jean Jacod

    1 Introduction

    This short course is devoted to a few statistical problems related to the observation of agiven process on a fixed time interval, when the observations occur at regularly spaceddiscrete times. This kind of observations may occur in many different contexts, but theyare particularly relevant in finance: we do have now huge amounts of data on the pricesof various assets, exchange rates, and so on, typically tick data which are recorded atevery transaction time. So we are mainly concerned with the problems which arise in thiscontext, and the concrete applications we will give are all pertaining to finance.

    In some sense they are not standard statistical problems, for which we want toestimate some unknown parameter. We are rather concerned with the estimation ofsome random quantities. This means that we would like to have procedures that are asmodel-free as possible, and also that they are in some sense more akin to nonparametric

    statistics.Let us describe the general setting in some more details. We have an underlying

    process X= (Xt)t0, which may be multi-dimensional (its components are then denotedbyX1, X2, ). This process is defined on some probability space (, F,P). We observethis process at discrete times, equally spaced, over some fixed finite interval [0, T], and weare concerned with asymptotic properties as the time lag, denoted by n, goes to 0. Inpractice, this means that we are in the context of high frequency data.

    The objects of interest are various quantities related to the particular outcome whichis (partially) observed. The main object is the volatility, but other quantities or featuresare also of much interest for modeling purposes, for example whether the observed pathhas jumps and, when this is the case, whether several components may jump at the sametimes or not.

    All these quantities are related in some way to the probabilistic model which is assumedfor X: we do indeed need some model assumption, otherwise nothing can be said. Infact, any given set of observed values X0, Xn , , Xin , , with n fixed, is of coursecompatible with many different models for the continuous time process X: for examplewe can suppose that X is piecewise constant between the observation times, or that it

    Institut de mathematiques de Jussieu, Universite Pierre et Marie Curie (Paris-6) and CNRS, UMR7586, 4 place Jussieu, 75252 Paris, France

    1

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    2/100

    is piecewise linear between these times. Of course neither one of these two models is ingeneral compatible with the observations if we modify the frequency of the observations.

    So in the sequel we will always assume that X is an Ito semimartingale, that is a

    semimartingale whose characteristics are absolutely continuous with respect to Lebesguemeasure. This is compatible with virtually all semimartingale models used for modelingquantities like asset prices or log-prices, although it rules out some non-semimartingalemodels sometimes used in this context, like the fractional Brownian motion.

    Before stating more precisely the questions which we will consider, and in order to beable to formulate them in precise terms, we recall the structure of It o semimartingales.We refer to [13], Chapter I, for more details.

    Semimartingales: We start with a basic filtered probability space ( , F, (Ft)t0,P), thefamily of sub--fields (Ft) ofFbeing increasing and right-continuous in t. A semimartin-gale is simply the sum of a local martingale on this space, plus an adapted process offinite variation (meaning, its paths are right-continuous, with finite variation on any finiteinterval). In the multidimensional case it means that each component is a real-valuedsemimartingale.

    Any multidimensional semimartingale can be written as

    Xt= X0+ Bt+ Xct +

    t0

    d

    (x)( )(ds, dx) + t

    0

    d

    (x)(ds, dx). (1.1)

    In this formula we use the following notation:

    - is the jump measure of X: if we denote by Xt = XtXt the size of thejump of X at time t (recall that X is right-continuous with left limits), then the set{t : Xt()= 0} is at most countable for each , and is the random measure on(0, ) Rd defined by

    (; dt,dx) =

    s>0: Xs()=0(s,Xs())(dt, dx), a = the Dirac measure sitting ata.

    -is the compensator (or, predictable compensator) of. This is the unique randommeasure on (0, ) Rd such that, for any Borel subset A ofRd at a positive distance of0, the process ((0, t] A) is predictable and the difference ((0, t] A) ((0, t] A) isa local martingale.

    - is a truncation function, that is a function: Rd

    Rd

    , bounded with compactsupport, such that (x) = x for all x in a neighborhood of 0. This function is fixedthroughout, and we choose it to be continuous for convenience.

    - is the function (x) =x (x).- B is a predictable process of finite variation, with B0= 0.

    - Xc is a continuous local martingale with Xc0 = 0, called the continuous martingalepart ofX.

    With this notation, the decomposition (1.1) is unique (up to null sets), but the processB depends on the choice of the truncation function . The continuous martingale part

    2

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    3/100

    does not depend on the choice of. Note that the first integral in (1.1) is a stochasticintegral (in general), whereas the second one is a pathwise integral (in fact for any t is issimply the finite sum

    st

    (Xs)). Of course (1.1) should be read componentwise in

    the multidimensional setting.In the sequel we use the shorthand notation to denote the (possibly stochastic)

    integral w.r.t. a random measure, and also for the (possibly stochastic) integral of aprocess w.r.t. a semimartingale. For example, (1.1) may be written more shortly as

    X = X0+ B+ Xc + ( ) + . (1.2)

    The * symbol will also be used, as a superscript, to denote the transpose of a vector ormatrix (no confusion may arise).

    Another process is of great interest, namely the quadratic variation of the continuousmartingale part Xc, which is the following Rd Rd-valued process:

    C = Xc, Xc, that is, componentwise, Cij =Xi,c, Xj,c. (1.3)This is a continuous adapted process with C0 = 0, which further is increasing in the setM+d of symmetric nonnegative matrices, that is Ct Cs belongs toM+d for all t > s.

    The triple (B,C,) is called the triple of characteristics ofX, this name coming fromthe fact that in good cases it completely determines the law ofX.

    The fundamental example of semimartingales is the case of Levy processes. We saythat X is a Levy process if it is adapted to the filtration, with right-continuous and left-limited paths andX0= 0, and such that Xt+s Xt is independent ofFt and has the samelaw asXs for alls, t0. Such a process is always a semimartingale, and its characteristics(B,C,) are of the form

    Bt() =bt, C t = ct, (; dt, dx) =dt F(dx). (1.4)Hereb Rd andc M+d andFis a measure on Rd which does not charge 0 and integratesthe function x x2 1. The triple (b,c,F) is connected with the law of the variablesXt by the formula (for all u Rd)

    E(eiu,Xt) = exp t

    iu, b 12u,cu +

    F(dx)

    eiu,x 1 iu, (x)

    , (1.5)

    called Levy-Khintchines formula. So we sometimes call (b,c,F) the characteristics ofXas well, and it is the Levy-Khintchine characteristics of the law ofX1 in the context of

    infinitely divisible distributions. b is called the drift, c is the covariance matrix of theGaussian part, andF is called the Levy measure.

    As seen above, for a Levy process the characteristics (B,C,) are deterministic, andthey do characterize the law of the process. Conversely, if the characteristics of a semi-martingale Xare deterministic one can show that Xhas independent increments, and ifthey are of the form (1.4) then X is a Levy process.

    Ito semimartingales. By definition, an Ito semimartingale is a semimartingale whosecharacteristics (B,C,) are absolutely continuous with respect to Lebesgue measure, in

    3

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    4/100

    the following sense:

    Bt() = t

    0bs()ds, C t() =

    t

    0cs()ds, (; dt, dx) =dt F,t(dx). (1.6)

    here we can always choose a version of the processes b or c which is optional, or evenpredictable, and likewise chooseFin such a way that Ft(A) is optional, or even predictable,for all Borel subsets A ofRd.

    It turns out that Ito semimartingales have a nice representation in terms of a Wienerprocess and a Poisson random measure, and this representation will be very useful for us.Namely, it can be written as follows (where for example ()

    t denotes the value at

    time t of the integral process () ):

    Xt= X0+

    t

    0bsds +

    t

    0sdWs+ () ( )t+ () t. (1.7)

    In this formulaWis a standard d-dimensional Wiener process and is a Poisson randommeasure on (0, ) Ewith intensity measure (dt, dx) =dt (dx), where is a-finiteand infinite measure without atom on an auxiliary measurable set (E, E).

    Of course the process bt is the same in (1.6) and in (1.7), and = (ij)1id,1jd is

    an Rd Rd-valued optional (or predictable, as one wishes to) process such that c = ,and = (,t,x) is a predictable function on [0, ) E (that is, measurable withrespect toPE, wherePis the predictable-field of [0, )). The connection betweenabove andFin (1.6) is thatFt, is the image of the measure by the mapx( ,t,x),and restricted to Rd\{0}.

    Remark 1.1 One should be a bit more precise in characterizing W and : W is an(Ft)-Wiener process, meaning it isFt adapted and Wt+s Wt is independent ofFt (ontop of being Wiener, of course). Likewise, is an (Ft)-Poisson measure, meaning that((0, t] A) isFt-measurable and((t, t + s] A) is independent ofFt, for all A E.

    Remark 1.2 The original space (, F,P) on which X is defined may be too small toaccommodate a Wiener process and a Poisson measure, so we may have to enlarge thespace. Such an enlargement is always possible.

    Remark 1.3 When the matrix ct() is of full rank for all (, t) and d = d, then it

    has a unique square-root t(), which further is invertible. In this case we haveW =

    ()1 Xc. Otherwise, there are many ways of choosing such that =c, hence manyways of choosing Wand its dimensiond (which can always be taken such that dd).

    In a similar way, we have a lot of freedom for the choice of . In particular we canchoose at will the space (E, E) and the measure , subject to the above conditions, andfor example we can always take E = R with the Lebesgue measure, although in thed-dimensional case it is somewhat more intuitive to take E= Rd.

    Of course a Levy process is an Ito semimartingale (compare (1.2) and (1.6)). In thiscase the two representations (1.2) and (1.7) coincide if we take E = Rd and = F (the

    4

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    5/100

    Levy measure) and = (the jump measure of a Levy process is a Poisson measure)and ( ,t,x) = x, and also if we recall that in this case the continuous martingale (orGaussian) part ofXis always of the form Xc =W, with =c.

    The setting of Ito semimartingales encompasses most processes used for modelingpurposes, at least in mathematical finance. For example, solutions of stochastic differentialequations driven by a Wiener process, or a by a Levy process, or by a Wiener process plus aPoisson random measure, are all Ito semimartingales. Such solutions are obtained directlyin the form (1.7), which of course implies that Xis an Ito semimartingale.

    The volatility. In a financial context, the process ct is called the volatility (sometimesit is t which is thus called). This is by far the most important quantity which needsto be estimated, and there are many ways to do so. A very widely spread way of doingso consists in using the so-called implied volatility, and it is performed by using theobserved current prices of options drawn on the stock under consideration, by somehow

    inverting the Black-Scholes equation or extensions of it.

    However, this way usually assumes a given type of models, for example that the stockprices is a diffusion process of a certain type, with unknown coefficients. Among thecoefficients there is the volatility, which further may be stochastic, meaning that itdepends on some random inputs other than the Wiener process which drives the priceitself. But then it is of primary importance to have a sound model, and this can bechecked only by statistical means. That is, we have to make a statistical analysis, basedon series of (necessarily discrete) observations of the prices.

    In other words, there is a large body of work, essentially in the econometrical literature,about the (statistical) estimation of the volatility. This means finding good methods for

    estimating the pathtct() fort[0, T], on the basis of the observation ofXin() forall i = 0, 1, , [T /n].

    In a sense this is very similar to the non-parametric estimation of a function c(t), sayin the 1-dimensional case, when one observes the Gaussian process

    Yt =

    t0

    c(s) dWs

    (here W is a standard 1-dimensional Wiener process) at the time in, and when n issmall (that is, we consider the asymptotic n 0). As is well known, this is possibleonly under some regularity assumptions on the function c(t), whereas the integratedvalue t0c(s)ds can be estimated as in parametric statistics, since it is just a number. Onthe other hand, if we know t0c(s)ds for all t, then we also know the function c(t), upto a Lebesgue-null set, of course: it should be emphasized that if we modify c on such anull set, we do not change the process Y itself; the same comment applies to the volatilityprocess ct in (1.6).

    This is why we mainly consider, as in most of the literature, the problem of estimatingthe integrated volatility, which with our notation is the process Ct. One has to be awareof the fact that in the case of a general Ito semimartingale, this means estimating therandom number or matrix Ct(), for the observed , although of course is indeed notfully observed.

    5

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    6/100

    Let us consider for simplicity the 1-dimensional case, when further X is continuous,that is

    Xt = X0+ t

    0

    bsds + t

    0

    sdWs, (1.8)

    and t (equivalently, ct = 2t ) is random. It may be of the form t() =(Xt()), it can

    also be by itself the solution of another stochastic differential equation, driven by W andperhaps another Wiener process W, and perhaps also some Poisson measures if it has

    jumps (even though X itself does not jump).

    By far, the simplest thing to do is to consider the realized integrated volatility, orapproximate quadratic variation, that is the process

    B(2, n)t =

    [t/n]i=1

    |niX|2, where niX=Xin X(i1)n. (1.9)

    Then if (1.8) holds, well known results on the quadratic variation (going back to Ito inthis case), we know that

    B(2, n)t Ct (1.10)

    (convergence in probability), and this convergence is even uniform intover finite intervals.Further, as we will see later, we have a rate of convergence (namely 1/

    n) under some

    appropriate assumptions.

    Now what happens when Xis discontinuous ? We no longer have (1.10), but rather

    B(2, n)t Ct+

    st|Xs|2 (1.11)

    (the right side above is always finite, and is the quadratic variation of the semimartingaleX, also denoted [X, X]t). Nevertheless we do want to estimateCt: a good part of thesenotes is devoted to this problem. For example, we will show that both quantities

    B(1, 1, n)t =

    [t/n]i=1

    |niX||ni+1X|, B(2, , )t =[t/n]

    i=1

    |niX|21{|niX|n }(1.12)

    converge in probability to 2Ct andCt respectively, and as soon as (0, 1/2) and >0for the second one.

    Inference for jumps. Now, whenXis discontinuous, there is also a lot of interest aboutjumps and, to begin with, are the observations compatible with a model without jumps,or should we use a model with jumps ? More complex questions may be posed: for a2-dimensional process, do the jumps occur at the same times for the two components ornot ? Is there infinitely many (small) jumps ? In this case, what is the concentrationof the jumps near 0 ?

    Here again, the analysis is based on the asymptotic behavior of quantities involvingsums of functions of the increments niXof the observed process. So, before going to themain results in a general situation, we consider first two very simple cases: whenX=W

    6

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    7/100

    for a constant >0, and whenX=W+ Y whenYis a compound Poisson process. It isalso of primary importance to determine which quantities can be consistently estimatedwhen n0, and which ones cannot be. We begin with the latter question.

    2 What can be estimated ?

    Recall that our underlying process X is observed at discrete times 0, n, 2n, , up tosome fixed time T. Obviously, we cannot have consistent estimators, as n 0, forquantities which cannot be retrieved when we observe the whole path t Xt() fort[0, T], a situation referred to below as the complete observation scheme.

    We begin with two simple observations:

    1) The drift bt can neverbe identified in the complete observation scheme, except in

    some very special cases, like when Xt = X0+ t0bsds.2) The quadratic variation of the process is fully known in the complete observation

    scheme, up to time T of course. This implies in particular that the integrated volatilityCt is known for all tT, hence also the process ct (this is of course up to a P-null set forCt, and a P(d) dt-null set for ct()).

    3) The jumps are fully known in the complete observation scheme, up to time T again.

    Now, the jumps are not so interesting by themselves. More important is the law ofthe jumps in some sense. For Levy processes the law of jumps is in fact determined by the

    Levy measure. In a similar way, for a semimartingale the law of jumps can be consideredas known if we know the measures Ft,, since these measures specify the jump coefficient in (1.7). (Warning: this specification is in a weak sense, exactly as c specifies ; wemay have several square-root ofc, as well as several such that Ft is the image of, butall choices oft and which are compatible with a givenct and Ft give rise to equationsthat have exactly the same weak solutions).

    Consider Levy processes first. Basically, the restriction of F to the complement ofany neighborhood of 0, after normalization, is the law of the jumps of X lying outsidethis neighborhood. Hence to consistently estimateFwe need potentially infinitely many

    jumps far from 0, and this possible only ifT . In our situation with Tfixed there isno way of consistently estimatingF.

    We can still say something in the Levy case: for the complete observation scheme, ifthere is a jump then Fis not the zero measure; if we have infinitely many jumps in [0, T]thenFis an infinite measure; in this case, we can also determine for which r >0 the sum

    sT|Xs|r is finite, and this is also the set ofrs such that{|x|

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    8/100

    Hence we will be interested, when coming back to the actual discrete observationscheme, in estimating Ct for t T, and whether there are zero or finitely many orinfinitely many jumps in [0, T].

    3 Some simple limit theorems for Wiener plus compoundPoisson processes

    This section is about a very particular case: the underlying process is X = X+ Y forsome > 0, and Y a compound Poisson process independent of W. And in the firstsubsection we even consider the most elementary case ofX=W. In these two cases westate all limit theorems that are available about sums of a function of the increments. Wedo not give the full proofs, but heuristic reasons for the results to be true. The reason fordevoting a special section to this simple case is to show the variety of results that can be

    obtained, whereas the full proofs can be easily reconstructed without annoying technicaldetails.

    Before getting started, we introduce some notation, to be used also for a general d-dimensional semimartingale X later on. Recall the increments niX in (1.9). First foranyp >0 andjd we set

    B(p, j, n)t =

    [t/n]i=1

    |niXj|p. (3.1)

    In the 1-dimensional case this is written simply B (p, n)t. Next iff is a function on Rd,

    the state space ofXin general, we set

    V(f, n)t =[t/n]

    i=1 f(niX),

    V(f, n)t =[t/n]

    i=1 f(niX/

    n).

    (3.2)The reason for introducing the normalization 1/

    nwill be clear below. These functionals

    are related one of the other by the trivial identity V(f, n) = V(fn, n) with fn(x) =f(x/

    n). Moreover, with the notation

    yR hp(y) = |y|p, x= (xj) Rd hjp(x) = |xj |p, (3.3)

    we also have B(p, j, n) = V(h

    j

    p, n) = p/2

    n V(hj

    p, n). Finally if we need to empha-size the dependency on the process X, we write these functionals as B(X;p, j, n) orV(X; f, n) or V

    (X; f, n).

    3.1 The Wiener case.

    Here we suppose that X = W for some constant > 0, so d = 1. Among all theprevious functionals, the simplest ones to study are the functionals V(f, n) with f afixed function on R. We needf to be Borel, of course, and not too big, for examplewith polynomial growth, or even with exponential growth. In this case, the results are

    8

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    9/100

    straightforward consequences of the usual law of large numbers (LNN) and central limittheorem (CLT).

    Indeed, for anyn the variables (niX/

    n: i1) are i.i.d. with lawN(0, 2). In theformulas below we write for the lawN(0, 2) and also(g) the integral of a functiongwith respect to it. Therefore, with fas above, the variables f(niX/

    n) when i varies

    are i.i.d. with moments of all orders, and their first and second moments equal (f) and(f

    2) respectively. Then the classical LLN and CLT give us that

    n V(f, n)t

    t(f)1n

    n V

    (f, n)t t(g) L N0, t((f2) (f)2).

    (3.4)We clearly see here why we have put the normalizing factor 1/

    n inside the functionf.

    The reader will observe that, contrary to the usual LNN, we get convergence in prob-

    ability but not almost surely in the first part of (3.4). The reason is as follows: let ibe a sequence of i.i.d. variables with the same law than f(X1). The LLN implies that

    Zn = t

    [t/n]

    [t/n]i=1 i converges a.s. to t(f). Since nV

    (f, n)t has the same law asZn we deduce the convergence in probability in (3.4) because, for a deterministic limit,convergence in probability and convergence in law are equivalent. However the variablesV(f, n)t are connected one with the others in a way we do not really control when nvaries, so we cannot conclude to nV

    (f; n)tt(f) a.s.(1.9) gives us the convergence for any time t, but we also have functional convergence:

    1) First, recall that a sequence gn of nonnegative increasing functions on R+ convergingpointwise to acontinuousfunctiong also converges locally uniformly; then, from the firstpart of (1.9) applied separately for the positive and negative parts f+ and f off andusing a subsequence principle for the convergence in probability, we obtain

    n V(f, n)t

    u.c.p. t(f) (3.5)whereZnt

    u.c.p.Zt means convergence in probability, locally uniformly in time: that is,supst |Zns Zs| 0 for all t finite.

    2) Next, if instead of the 1-dimensional CLT we use the functional CLT, or DonskersTheorem, we obtain

    1nnV(f, n)t t(f)t0 L= (f2) s(f)2 W (3.6)whereW is another standard Wiener process, and L= stands for the convergence in lawof processes (for the Skorokhod topology). Here we see a new Wiener process W appear.What is its connection with the basic underlying Wiener process W? To study that, onecan try to prove the joint convergence of the processes on the left side of (3.6) togetherwithW (or equivalently X) itself.

    This is an easy task: consider the 2-dimensional process Zn whose first componentis the left side of (3.6) and second component is Xn[t/n] (the discretized version of

    9

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    10/100

    X, which converges pointwise to X). Then Zn takes the form Znt =

    n[t/n]

    i=1 ni,

    where the ni are 2-dimensional i.i.d. variables as i varies, with the same distribution as(g1(X1), g2(X1)), where g1(x) = f(x)(f) and g2(x) = x. Then the 2-dimensionalversion of Donskers Theorem gives us that 1

    n

    nV

    (f; n)t t(f)

    , Xt

    t0

    L=

    B, X

    (3.7)

    and the pair (B, X) is a 2-dimensional (correlated) Wiener process, characterized by itsvariance-covariance at time 1, which is the following matrix:

    (f2) s(f)2 (f g2)

    (f g2) 2

    (3.8)

    (note that2 =(g22) and also(g2) = 0, so the above matrix is semi-definite positive).

    Equivalently, we can write B as B =

    (f2) s(f)2 W with W a standard Brow-

    nian motion (as in (3.7))) which is correlated with W, the correlation coefficient being

    (f g2)/(f2) s(f)2.Now we turn to the processes B(p, n). Since B(p, n) =

    p/2n V

    (hp, n) this isjust a particular case of (3.5) and (3.7), which we reformulate below (mp denotes the pthabsolute moment of the normal lawN(0, 1)):

    1p/2n B(p, n) u.c.p. tpmp, (3.9)

    1n

    1p/2n B(p, n)t tpmp

    , Xt

    t0

    L=

    B, X

    ,

    withB a Wiener process unit variance 2p(m2p m2p), independent ofX

    (3.10)

    (the independence comes from that fact that (g) = 0, where g(x) =x

    |x

    |p).

    Finally for the functionals V(f, n), the important thing is the behavior off near 0,since the increments niXare all going to 0 as n 0. In fact, supi[t/n] |niX| 0pointwise, so when the function f vanishes on a neighborhood of 0, for all n bigger thansome (random) finite number Ndepending also on t we have

    V(f, n)s = 0 st. (3.11)For a general function f we can combine (3.9) with (3.11): we easily obtain that (3.9)holds with V(f, n) instead ofB(p, n) as soon as f(x) |x|p as x 0, and the sameholds for (3.10) if we further have f(x) =|x|p on a neighborhood of 0.

    Of course these results do not exhaust all possibilities for the convergence ofV(f; n).

    For example on may prove the following:

    f(x) =|x|p log |x| 1p/2n

    log(1/n) V(f, n)

    u.c.p. 12

    tpmp, (3.12)

    and a CLT is also available in this situation. Or, we could consider functions f whichbehave like xp as x 0 and like (x)p as x 0, with p= p . However, we essentiallyrestrict our attention to functions behaving like hp: for simplicity first, and since moregeneral functions do not really occur in the applications we have in mind, and also becausethe extension to processesXmore general than the Brownian motion is not easy for otherfunctions.

    10

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    11/100

    3.2 The Wiener plus compound Poisson case.

    Our second example is when the underlying process Xhas the form X=W+ Y, whereas before > 0 and W is a Brownian motion, and Y is a compound Poisson processindependent ofW. We will write X= W. Recall that Y has the form

    Yt =p1

    p1{Tpt}, (3.13)

    where the Tps are the successive arrival times of a Poisson process, say with parameter 1(they are finite stopping times, positive, strictly increasing with p and going to), andthe ps are i.i.d. variables, independent of the Tps, and with some law G. Note that in(3.13) the sum, for any given t, is actually a finite sum.

    The processesV (f, n), which were particularly easy to study when Xwas a Wienerprocess, are not so simple to analyze now. This is easy to understand: let us fix t; at stage

    n, we have

    n

    iX=

    n

    iXfor alli[t/n], except for those finitely manyis correspondingto an interval ((i 1)n, in] containing at least one of the Tps. Furthermore, all thoseexceptional intervals contain exactly one Tp, as soon as n is large enough (depending on(, t)). Therefore forn large we have

    V(f, n)t= V(X; f, n)t+ Ant, where

    Ant =[t/n]

    i=1

    p11{(i1)n 2. Then in (3.14)the leading term becomes A

    nt , which is approximately equal to

    p/2

    n st |Xs|p. So

    p/2n V(f, n)t converges in probability to the variable

    B(p)t =st

    |Xs|p (3.15)

    (we have just proved the convergence for any given t, but it is also a functional conver-gence, for the Skorokhod topology, in probability).

    Again, these cases do not exhaust the possible behaviors of f, and further we havenot given a CLT in the second situation above. But, when f is not bounded it looks a

    11

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    12/100

    bit strange to impose a specific behavior at infinity, and without this there is simply noconvergence result for V (f, n)t, not to speak about CLTs.

    Now we turn to the processes V(f, n). To begin with, we observe that, similar to

    (3.14), we have

    V(f, n)t = V(X, f, n)t+ Ant, where

    Ant =[t/n]

    i=1

    p11{(i1)n0 we can write f=f+f withfandf continuous, and f(x) =hp(x) if|x| /2 and f(x) = 0 if|x| and|f| hpeverywhere. Sincef vanishes around 0, we have V(f, n)t V(f)t by (3.18), andV(f)t converges toV(f)t as0. On the other hand the process An associated withfby (3.16) is the sum of summands smaller than 2p, the number of them being boundedfor each (, t) by a number independent of: hence Ant is negligible and V(f, n) andV(X; f, n) behave essentially in the same way. This means heuristically that, with thesymbolmeaning approximately equal to, we have

    V(f, n)tV(f)t, V(f, n)tp/21n t p mp. (3.19)Adding these two expressions, we get

    V(f, n)t Sk V(f)t if p > 2

    V(f, n)tSk V(f)t+ t2 if p= 2

    1r/2n V(f, n)t

    u.c.p. tp/2mp if p < 2.

    (3.20)This type of LLN, which shows a variety of behaviors according to how fbehaves near 0,will be found for much more general processes later, in (almost) exactly the same terms.

    12

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    13/100

    Now we turn to the CLT. Here again we single out first the case where fvanishes in aneighborhood of 0. We need to find out what happens to the difference V(f, n) V(f).It is easier to evaluate is the difference V(f, n)t V(f)n[t/n], since by (3.17) we have

    V(f, n)s V(f)n[s/n] =[s/n]

    i=1

    p1

    1{(i1)n 0 on a neighborhood of 0 and is still C1

    outside 0, exactly as for (3.19) we obtain heuristically that

    V(f, n)tV(f)n[t/n]+n Unt , V(f, n)tp/21n tpmp+ p/21/2n Unt ,whereUn and Un converge stably in law to the right side of (3.22) and to the process B

    of (3.10), respectively. We then have two conflicting rates, and we can indeed prove that,withB (f) as in (3.22) and B as in (3.10) (thus depending on r):

    1n

    V(f, n)t V(f)n[t/n]

    Ls= B(f)t if p >3

    1n

    V(f, n)t V(f)n[t/n]

    Ls= t3m3+ B(f)t if p= 3

    1

    p/21n

    V(f, n)t V(f)n[t/n]

    u.c.p. tpmp if 2< p < 3

    1n

    V(f, n)t V(f)n[t/n] t2

    Ls= Bt+ B(f)t if p= 2

    1

    1p/2n

    1p/2n V(f; n)t tpmp

    Sk V(f)t if 1< p < 2

    1nn V(f, n)t tm1 Ls= V(f)t+ Bt if p= 1

    1n

    1p/2n V(f, n)t tpmp

    Ls= Bt if p 3, p = 2 andp < 1. When p = 3 and p = 1 we still have a CLT, with a bias. When 2 < p < 3 or1< p

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    14/100

    We see that these results exhibit again a large variety of behavior. This will be en-countered also for more general underlying processesX, with of course more complicatedstatements and proofs (in the present situation we have not really given the complete

    proof, of course, but it is relatively easy along the lines outlined above). However, in thegeneral situation we will not give such a complete picture, which is useless for practicalapplications. Only (3.20) and the cases r >2 in (3.23) will be given.

    4 Auxiliary limit theorems

    The aims of this section are twofold: first we define the stable convergence in law, alreadymentioned in the previous section. Second, we recall a number of limit theorems for partialsums of triangular arrays of random variables.

    1) Stable convergence in law. This notion has been introduced by Renyi in [22], forthe very same reasons as we need it here. We refer to [4] for a very simple exposition andto [13] for more details.

    It often happens that a sequence of statistics Zn converges in law to a limit Z whichhas, say, a mixed centered normal distribution: that is, Z = U where U is anN(0, 1)variable and is a positive variable independent ofU. This poses no problem other thancomputational when the law of is known. However, in many instances the law of isunknown, but we can find a sequence of statistics nsuch that the pair (Zn, n) convergesin law to (Z, ); so although the law of the pair (Z, ) is unknown, the variable Zn/nconverges in law toN(0, 1) and we can base estimation or testing procedures on this newstatistics Zn/n. This is where the stable convergence in law comes into play.

    The formal definition is a bit involved. It applies to a sequence of random variablesZn, all defined on the same probability space (, F,P), and taking their values in thesame state space (E, E), assumed to be Polish (= metric complete and separable). Wesay that Zn stably converges in law if there is a probability measure on the product( E, F E), such that(A E) = P(A) for all A F and

    E(Y f(Zn))

    Y()f(x)(d,dx) (4.1)

    for all bounded continuous functionsf onEand bounded random variables Y on (, F).This is an abstract definition, similar to the definition of the convergence in law

    which says that E(f(Zn)) f(x)(dx) for some probability measure . Now for theconvergence in law we usually want a limit, that is we say Zn

    LZ, and the variableZ isany variable with law , of course. In a similar way it is convenient to realize the limitZfor the stable convergence in law.

    We can always realize Z in the following way: take = E andF =F Eand endow (,F) with the probability , and put Z(, x) = x. But, as for the simpleconvergence in law, we can also consider other extensions of (, F,P): that is, we havea probability space (,F, P), where = andF =F F for some auxiliarymeasurable space (, F) and

    P is a probability measure on (

    ,

    F) whose first marginal

    14

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    15/100

    is P, and we also have a random variable Zon this extension. Then in this setting, (4.1)is equivalent to saying (with

    E denoting the expectation w.r.t.

    P)

    E(Y f(Zn))E(Y f(Z)) (4.2)

    for allf andYas above, as soon asP(A {ZB}) =(A B) for all A F andB E.We then say that Zn converges stably to Z, and this convergence is denoted by

    Ls.Clearly, when is given, the propertyP(A {Z B}) = (AB) for all A F

    and B E simply amounts to specifying the law of Z, conditionally on the -fieldF.Therefore, sayingZn

    LsZamounts to saying that we have the stable convergence in lawtowards a variable Z, defined on any extension (,F, P) of (, F,P), and with a specifiedconditional law knowingF.

    Obviously, the stable convergence in law implies the convergence in law. But it implies

    much more, and in particular the following crucial result: ifZnLs

    Z and ifYn and Y

    are variables defined on (, F,P) and with values in the same Polish space F, then

    Yn Y (Yn, Zn) Ls (Y, Z). (4.3)

    On the other hand, there are criteria for stable convergence in law of a given sequenceZn. The-field generated by all Zn is necessarily separable, that is generated by a count-able algebra, sayG . Then if for any finite family (Ap : 1 p q) inG, the sequence(Zn, (1Ap)1pq) ofERq-valued variables converges in law as n , then necessarilyZn converges stably in law.

    2) Convergence of triangular arrays. Our aim is to prove the convergence of func-tionals like in (3.1) and (3.2), which appear in a natural way as partial sums of triangulararrays. We really need the convergence for the terminal time T, but in most cases theavailable convergence criteria also give the convergence as processes, for the Skorokhodtopology. So now we provide a set of conditions implying the convergence of partial sumsof triangular arrays, all results being in [13].

    We are not looking for the most general situation here, and we restrict our attention tothe case where the filtered probability space (, F, (Ft)t0,P) is fixed. For eachn we havea sequence ofRd-valued variables (ni :i1), the components being denoted by n,ji for

    j = 1, , d. The key assumption is that for all n, i the variable ni isFin-measurable,and this assumption is in force in the remainder of this section.

    Conditional expectations w.r.t.F(i1)n will play a crucial role, and to simplify no-tation we write it Eni1 instead of E(.| F(i1)n), and likewise Pni1 is the conditionalprobability.

    Lemma 4.1 If we have

    [t/n]i=1

    Eni1(ni) 0 t >0, (4.4)

    15

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    16/100

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    17/100

    processes: such a process can be viewed as a variable taking its values in the Skorokhodspace D(Rd) of all functions from R+ into R

    d which are right-continuous with left limits,provided we endow this space with the Skorokhod topology which makes it a Polish space.

    See [10] or Chapter VI of [13] for details on this topology. In fact, in Lemma 4.3 theconvergence in law is also relative to this Skorokhod topology. The stable convergence in

    law for processes is denoted asLs= below.

    In the previous results the fact that all variables were defined on the same space(, F, (Ft)t0,P) and the nis wereFin-measurable was essentially irrelevant. This is nolonger the case for the next result, for which this setting is fundamental.

    Below we single out, among all martingales on (, F, (Ft)t0,P), a possibly multidi-mensional Wiener processW. The following lemma holds for any choice ofW, and evenwith no W at all (in which case a martingale orthogonal to W below means any mar-tingale) but we will use it mainly with the process W showing in (1.7). The following isa particular case of Theorem IX.7.28 of [13].

    Lemma 4.4 Assume (4.7) for some continuous adaptedRd-valued process of finite vari-ationA, and (4.8) with some continuous adapted processC = (Cjk ) with values inM+dand increasing in this set, and also (4.9). Assume also

    [t/n]i=1

    Eni1(

    ni

    niN)

    0 t > 0 (4.10)

    wheneverN is one of the components ofWor is a bounded martingale orthogonal to W.

    Then the processes[t/n]i=1

    ni converge stably in law to A+B, whereB is a continuous

    process defined on an extension (,F, P) of the space (, F,P) and which, conditionallyon the -fieldF, is a centered Gaussian Rd-valued process with independent incrementssatisfyingE(Bjt Bkt| F) =Cjkt .

    The conditions stated above completely specify the conditional law ofB , knowingF,so we are exactly in the setting explained in1 above and the stable convergence in law iswell defined. However one can say even more: letting ( Ft) be the smallest filtration onwhich make B adapted and which contains (Ft) (that is, A Ft whenever A Ft),then B is a continuous local martingale on (,F, ( Ft)t0, P) which is orthogonal in themartingale sense to any martingale on the space (, F, (Ft)t0,P), and whose quadraticvariation process is C. Of course, on the extended space B is no longer Gaussian.

    The condition (4.10) could be substituted with weaker ones. For example if it holds

    when N is orthogonal to W, whereas[t/n]

    i=1 Eni1(

    ni

    niW

    j) converges in probabilityto a continuous process for all indices j, we still have the stable convergence in law of[t/n]

    i=1 ni , but the limit has the form A+B + M, where the process M is a stochastic

    integral with respect to W. Se [13] for more details.

    17

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    18/100

    5 A first LNN (Law of Large Numbers)

    At this stage we start giving the basic limit theorems which are used later for statistical

    applications. Perhaps giving first all limit theorems in a purely probabilistic setting is notthe most pedagogical way of proceeding, but it is the most economical in terms of space...

    We are in fact going to provide a version of the results of Section 3, and other con-nected results, when the basic process X is an Ito semimartingale. There are two kindsof results: first some LNNs similar to (3.5), (3.9), (3.18) or (3.20); second, some centrallimit theorems (CLT) similar to (3.10) or (3.23). We will not give a complete picture,and rather restrict ourselves to those results which are used in the statistical applications.

    Warning: Below, and in all these notes, the proofs are often sketchy and sometimesabsent; for the full proofs, which are sometimes a bit complicated, we refer essentiallyto [15] (which is restricted to the 1-dimensional case for X, but the multidimensionalextension is straightforward).

    In this section, we provide some general results, valid for any d-dimensional semi-martingale X= (Xj)1jd, not necessarily Ito. We also use the notation (3.1) and (3.2).We start by recalling the fundamental result about quadratic variation, which says thatfor any indices j, k, and as n (recall n0):

    [t/n]i=1

    niXjniX

    k Sk [Xj , Xk]t = Cjkt +st

    Xjs Xks . (5.1)

    This is the convergence in probability, for the Skorokhod topology, and we even havethe joint convergence for the Skorokhod topology for the d2-dimensional processes, when1j, kd. When further Xhas no fixed times of discontinuity, for example when it isan Ito semimartingale, we also have the convergence in probability for any fixed t.

    Theorem 5.1 Letfbe a continuous function fromRd into Rd.

    a) Iff(x) = o(x2) asx0, then

    V(f, n)t Sk f t =

    st

    f(Xs). (5.2)

    b) Iff coincide on a neighborhood of0 with the functiong(x) =

    dj,k=1 jk xj xk (here

    eachjk is a vector inRd

    ), then

    V(f, n)tSk

    dj,k=1

    jk Cjkt + f t. (5.3)

    Moreover both convergences above also hold in probability for any fixedt such thatP(Xt =0) = 1 (hence for allt whenX is an Ito semimartingale).

    Proof. 1) Suppose first that f(x) = 0 whenx , for some > 0. Denote byS1, S2, the successive jump times of Xcorresponding to jumps of norm bigger than

    18

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    19/100

    /2, so Sp . Fix T > 0. For each there are two integers Q = Q(T, ) andN = N(T, ) such that SQ() T < SQ+1() and for all n N and for any interval(i1)n, in] in [0, T] then either there is noSq in this interval andniX , or thereis exactly one Sq in it and then we set

    n

    q =

    n

    iX XSq . Since f(x) = 0 whenx we clearly have for all tT and nN:V(f, n)t

    q: Sqn[t/n]f(XSq)

    Qq=1

    |f(XSq+ nq ) f(XSq)|.

    Then the continuity offyields (5.2), because nq 0 for all q.2) We now turn to the general case in (a). For any > 0 there is > 0 such that

    we can write f = f + f, where f is continuous and vanishes forx , and where

    f(x) x2. By virtue of (5.1) and the first part of the proof, we have

    V(f, n) [t/n]

    i=1 niX

    2

    Sk

    dj=1[Xj , Xj ],V(f, n)

    Sk f

    Moreover, f u.c.p. f as 0 follows easily from Lebesgue convergence theorem

    and the property f(x) =o(x2) as x0, becausex2 t 0and >0 are arbitrarily small, we deduce (5.2) from V(f, n) =V(f, n) + V(f

    , n).

    3) Now we prove (b). Letf = f g, which vanishes on a neighborhood of 0. Thenif we combine (5.1) and (5.2), plus a classical property of the Skorokhod convergence, weobtain that the pair (V(g, n), V(f

    , n)) converges (for the 2d-dimensional Skorokhodtopology, in probability) to the pair

    dj,k=1 jk C

    jk +g , f

    , and by adding the

    two components we obtain (5.3).Finally the last claim comes from a classical property of the Skorokhod convergence,

    plus the fact that an Ito semimartingale has no fixed time of discontinuity.

    In particular, in the 1-dimensional case we obtain (recall (3.1)):

    p >2 B(pr, n) Sk B(p)t :=st

    |Xs|p. (5.4)

    This result is due to Lepingle [18], who even proved the almost sure convergence. Itcompletely fails when r

    2 except under some special circumstances.

    6 Some other LNNs

    6.1 Hypotheses.

    So far we have generalized (3.18) to any semimartingale, under appropriate conditions onf. If we want to generalize (3.5) or (3.14) we needX to be an Ito semimartingale, plusthe fact that the processes (bt) and (t) and the function in (1.7) are locally boundedand (t) is either right-continuous or left-continuous.

    19

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    20/100

    When it comes to the CLTs we need even more. So for a clearer exposition we gatherall hypotheses needed in the sequel, either for LNNs or CLTs, in a single assumption.

    Assumption (H): The processXhas the form (1.7), and the volatility process tis also

    an Ito semimartingale of the form

    t = 0+

    t0

    bs ds + t0 dWs+ () ( )t+ () t. (6.1)

    In this formula, t (a dd matrix) is considered as an Rdd-valued process;bt() andt() are optional processes, respectivelydd anddd2-dimensional, and ( ,t,x) id a dd-dimensional predictable function on R+ E; finally is a truncation function on Rddand(x) =x (x).

    Moreover, we have:

    (a) The processesbt() and supxE (,t,x)(x) and supxE (,t,x)(x) are locally bounded,whereand are (non-random) nonnegative functions satisfying E((x)2 1) (dx)

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    21/100

    whereZ is a multi-dimensional Levy process, and f is a C2 function with at most lineargrowth, then if X consists in a subset of the components of Y, it satisfies Assumption(H). The same holds for more general equations driven by a Wiener process and a Poisson

    random measure.

    6.2 The results.

    Now we turn to the results. The first, and most essential, result is the following; recall thatwe use the notation for the lawN(0, ), andk denotes thek-fold tensor product.We also write k (f) =

    f(x)k (dx) iff is a (Borel) function on (Rd)k. With such a

    function fwe also associate the following processes

    V(f,k, n)t =[t/n]

    i=1f

    niX/

    n, , ni+k1X/

    n

    . (6.3)

    Of course whenf is a function on Rd, thenV (f, 1, n) =V(f, n), as defined by (3.2).

    Theorem 6.2 Assume (H) (or (H) only, see Remark 6.1)), and let f be a continuousfunction on(Rd)k for somek1, which satisfies

    |f(x1, , xk))| K0k

    j=1

    (1 + xjp) (6.4)

    for somep0 andK0. If eitherX is continuous, or ifp 0, and for all indices j, kd we set

    Vjk (,, n)t =

    [t/n]i=1

    (niXjniX

    k)1{niXn }. (6.6)

    21

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    22/100

    More generally one can consider the truncated analogue ofV (f,k, n) of (6.3). With and as above, and iffis a function on (Rd)k, we set

    V(, ; f,k, n)t =[t/n]

    i=1

    f

    niX/

    n, , ni+k1X/

    n

    1j=1,,k{ni+j1Xn }. (6.7)

    Theorem 6.3 Assume (H) (or (H) only), and letf be a continuous function on (Rd)k

    for some k 1, which satisfies (6.4) for some p 0 and some K0 > 0. Let also (0, 12 ) and > 0. If either X is continuous, or X is discontinuous and p 2 we havenV

    n(, ; f,k, n)tu.c.p. t0ku(f)du.

    In particular, Vjk (,, n) u.c.p.Cjkt .

    This result has no real interest when X is continuous. When X jumps, and at theexpense of a more complicated proof, one could show that the result holds when p 4,and also whenp >4 and p42p2r4 when additionally we have

    ((x)r 1)(dz)

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    23/100

    Then obviously this is equal to 1n V(f,l, n), where

    f(x1, , xl) = 14m

    r1 m

    rl l

    v=1 |xjv+ x

    kv |rv

    l

    v=1 |xjv xkv |rv ,

    and l (f) = ()jk by a simple calculation. Then we deduce from Theorem 6.2 thefollowing result:

    Theorem 6.4 Assume (H) (or (H) only)), and let r1, , rl (0, 2) be such that r1+ + rl = 2. ThenVjk (r1, , rl, n) u.c.p.Cjkt .

    Now, the previous LNNs are not enough for the statistical applications we have inmind. Indeed, we need consistent estimators for a few other processes than Ct, and inparticular for the following one which appears as a conditional variance in some of the

    forthcoming CLTs: Djk (f)t =st

    f(Xs)(cjks+ c

    jks ) (6.11)

    for indices j, k d and a function f on Rd with|f(x)| Kx2 forx 1, so thesummands above are non-vanishing only when Xs= 0 and the processDjk (f) is finite-valued.

    To do this we take any sequence kn of integers satisfying

    kn , knn0, (6.12)and we let In,t(i) ={jN : j=i : 1j[t/n],|i j| kn} define a local windowin time of length knn around time in. We also choose

    (0, 1/2) and > 0 as in

    (6.6). We will consider two distinct cases for fand associate with it the functions fn:

    f(x) = o(x2) as x0, fn(x) =f(x) f(x) =dv,w=1 vwxvxw on a neighborhood of 0, fn(x) =f(x)1{x>n }.

    (6.13)

    Finally, we set

    Djk (f,,, n)t= 1

    knn

    [t/n]kni=1+kn

    fn(niX)

    lIn,t(i)

    (nlXj nlX

    k)1{nlXn }.

    (6.14)

    Theorem 6.5 Assume (H) (or (H) only), and let f be a continuous function on R

    d

    satisfying (6.13), andj, kd and(0, 1/2) and >0. ThenDjk (f,,, n)

    Sk Djk (f). (6.15)If furtherX is continuous andf(x) =pf(x) for all >0 andxRd, for somep >2(hence we are in the first case of (6.13)), then

    1p/2n Djk (f,,, n)

    u.c.p. 2 t

    0u(f)c

    jku ds. (6.16)

    Before proceeding to the proof of all those results, we give some preliminaries.

    23

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    24/100

    6.3 A localization procedure.

    The localization is a simple but very important tool for proving limit theorems for dis-cretized processes, over a finite time interval. We describe it in details in the setting ofthe previous theorems, but it will also be used later for the CLTs.

    The idea is that, for those theorems, we can replace the local boundedness assumptionsin (H-r) for example by boundedness (by a constant), which is a much stronger assumption.More precisely, we set

    Assumption (SH): We have (H) and also, for some constant and all ( ,t,x):

    bt() , t() , Xt() , bt() , t() ( ,t,x) ((x) 1),

    ( ,t,x) (

    (x) 1)

    (6.17)

    If these are satisfied, we can of course choose andsmaller than 1.Lemma 6.6 If X satisfies (H) we can find a sequence of stopping times Rp increasingto + and a sequence of processesX(p) satisfying (SH) and with volatility process(p),such that

    t < Rp X(p)t = Xt, (p)t= t. (6.18)

    Proof. Let Xsatisfy (H). The processes bt, bt,t, supxE (t,x)(x) and supxE (t,x)(x) are

    locally bounded, so we can assume the existence of a localizing sequence of stoppingtimes Tp (i.e. this sequence is increasing, with infinite limit) such that for p

    1:

    tTp() bt() p, bt() p, t() p,

    ( ,t,x) p(x), ( ,t,x) p(x). (6.19)We also set Sp = inf(t :Xt p ort p), so Rp = TpSp is again a localizingsequence, and we have (6.19) for tRp and alsoXt p andt p for t < Rp. Thenwe set

    b(p)t=

    bt if tRp0 otherwise,

    b(p)t=bt if tRp

    0 otherwise,(p)t=

    t if tRp0 otherwise,

    (p)( ,t,x) = ( ,t,x) if ( ,t,x) 2p andtRp0 otherwise,

    (p)( ,t,x) = ( ,t,x) if ( ,t,x) 2p andtRp0 otherwise,

    At this stage we define the process (p) by (6.1) with the starting point (p)0= 0 if0< p and (p)0 = 0 otherwise, and the coefficients b(p) and(p) and(p), and thenthe process X(p) by (1.7) with the starting point X0 = X0 ifX0 < p and X(p)0 = 0otherwise, and the coefficientsb(p) and(p) (as defined just above) and (p).

    24

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    25/100

    We can write as =

    t>01D(t) (t,t)whereD is the countable (random) support ofand t isE-valued. Outside a P-null setNwe have Xt= 1D(t) (t, t) and X(p)t =1D(t) (p)(t, t), and sinceXt 2p whent < Rp we deduce Xt= X(p)t ift < Rp,which implies that () t =((p)) t for t < Rp. As for the two local martingales() ( ) and ((p)) ( ), they have (a.s.) the same jumps on the predictableinterval [0, Rp] as soon as(x) = 0 whenx> 2p(this readily follows from the definitionof(p), so they coincide a.s. on [0, Rp].

    The same argument shows that ()t= ((p))

    tfor t < Rp, and ()()t =((p))()t for t Rp. It first follows in an obvious way that (p)t = t for all

    t < Rp, and then X(p)t = Xt for all t < Rp, that is (6.18) holds.

    Finally ny definition the coefficients b(p), b(p),(p), (p) and(p) satisfy (6.17) with = 2p. Moreover the processes

    (p) andX(p) are constant after time Rp, and they have

    jumps bounded by 2P, so they satisfy (6.17) with = 3p, and thus (SH) holds for X(p).

    Now, suppose that, for example, Theorem 6.2 has been proved when X satisfies (SH).LetXsatisfy (H) only, and (X(p), Rp) be as above. We then know that, for all p, T andall appropriate functionsf,

    suptT

    nVn(X(p); f,k, n)t t0

    k(p)u(f)du 0. (6.20)

    On the set{Rp > T+ 1}, and ifkn1, we haveVn(X(p); f,k, n)t= Vn(X; f,k, n)tand (p)t = t for all t T, by (6.18). Since P(Rp > T+ 1} 1 as p , it readilyfollows that nV

    n(X; f,k, n)tu.c.p.

    t0

    ku(f)du. This proves Theorem 6.2 under (H).

    This procedure works in exactly the same way for all the theorems below, LNNs orCLTs, and we will call this the localization procedure without further comment.

    Remark 6.7 If we assume (SH), and if we choose the truncation functions and insuch a way that they coincide with the identity on the balls centered at 0 and with radius2, in Rd and Rdd

    respectively, then clearly (1.7) and (6.1) can be rewritten as follows:

    Xt = X0+t

    0bs ds +t

    0s dWs+ ( )t,t = 0+

    t0

    bs ds +

    t0

    dWs+

    ( )t.

    (6.21)

    6.4 Some estimates.

    Below, we assume (SH), and we use the form (6.21) for X and . We will give a numberof estimates, to be used for the LLNs and also for the CLTs, and we start with somenotation. We set

    ni,l= 1

    n

    (i+l)n(i+l1)n

    bs ds + (s (i1)n) dWs

    ni,l= (i1)n

    ni+lW/

    n,

    ni,l =

    1n

    ni+l( ( )),ni,l=

    ni,l+

    ni,l ,

    ni =

    ni,0,

    ni =

    ni,0,

    ni =

    ni,0.

    (6.22)

    25

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    26/100

    In particular, ni+lX =

    n(ni,l+

    ni,l). It is well known that the boundedness of the

    coefficients in (SH) yields, through a repeated use of Doob and Davis-Burkholder-Gundyinequalities, for allq >0 (below, Kdenotes a constant which varies from line to line and

    may depend on the constants occurring in (SH); we write it Kp if we want to emphasizeits dependency on another parameter p):

    Eni1(niXcq)Kqq/2n , E(t+s tq | Ft)Kqs1(q/2),

    Eni+l1(ni,lq)Kq, Eni+l1(ni,lq)Kq,l1(q/2)n ,

    Eni+l1(ni,lq + ni,l q)

    Kq,l

    (1q/2)n in general

    Kq,l1(q/2)n ifX is continuous

    (6.23)

    We also use the following notation, for >0:

    (x) =(x/), aC function on Rd

    with 1{x1}(x)1{x2}. (6.24)Lemma 6.8 Assume (SH) and letr[0, 2] be such that ((x)r 1)(dx)}) , M () = (1{||}) ( ), B() =(1{||>}) .

    Then if = {(x)} (x)r(dx), we have by (SH):

    Pni+l1(ni+lN()= 0) Eni+l1(ni+l(1{>} ) ) = n({ > }) Knr

    Eni+l1((ni+lM())

    2) n{(x)} (x)

    2(dx) n2r,|ni+lB()| Kn

    1 +{(x)>}((x) 1)(dx)

    Kn(r1)+ .

    We also trivially have

    |ni,l|2 n2n1{ni+lN()=0}+ 3|ni,l|2 + 31n |ni+lM()|2 + 31n |ni+lB()|2.

    26

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    27/100

    Therefore, using (6.23), we get

    Eni+l1(ni,l2 2n)K

    2nnr

    + n+ 2r+ n(r1)

    +

    .Then since0 as0, (6.25) follows by taking = n= u2n(u1n (un)1/4), whereun=

    1/2n

    1/4n 0 (note that nun, hence nun).

    Next, suppose r 1. Then = N() +A(), where A() = (1{||}) , andobviously Eni+l1(|ni+lA()| Kn1r. Moreover 1

    nni+l( )

    n n1{ni+lN()=0}+ 1n |ni+lA()|.Therefore

    Eni+l1 1n ni+l( ) 1 Knnr +n 1r ,and the same choice as above for = n gives (6.26).

    Finally, we have for any >0:

    |

    n ni,l|2 2 21{ni+lN()=0}+ 3n|ni,l|2 + 3|ni+lM()|2 + 3|ni+lB()|2,

    hence if we take =

    above we get Eni+l1|n ni,l|2 2

    Kngn(), where

    gn() = 2r/2 + n+ 1r/2+ n(r1)

    +.

    Since g n()g () :=2r/2 + 1r/2 and 0 as 0, we readily get (6.27).

    Lemma 6.9 Assume (SH). Let k 1 and l 0 be integers and let q > 0. Let f be acontinuous function on(Rd)k, satisfying (6.4) for somep0 andK0> 0.

    a) If eitherX is continuous or ifqp 0, the supremum GA() of|f(x1+y1, , xk+yk) f(x1, , xk)|over allxj A andyj goes to 0 as 0. We set g(x, y) = 1 + xqp + yqp. Ifwe want to prove (6.29) the sequence nis of course as above, whereas if we want to prove

    27

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    28/100

    (6.28) we put n = for all n. Then for all A >1 and s0 and >0 and [1, ]we have, by a (tedious) calculation using (6.4), the constant Kdepending on K0, q, k :

    |f(x1+ y1,

    , xk+ yk)1

    j=1,,k

    {xj

    n}

    f(x1,

    , xk)

    |q

    GA()q + Kk

    m=1

    h,s,A,n(xm, ym) j=1,,k, j=m

    g(xj, yj)

    , (6.30)where

    h,s,A,n(x, y) = xpq+1

    A + xpq(y 1) + Apq y

    2 12

    +ypq+s pq+sn

    As .

    We apply these estimates with xj = nil,l+j1 and yj =

    nil,l+j1. In view of (6.23) we

    have ifXis continuous or ifpq2:

    Eni+j2(g(nil,l+j1, nil,l+j1))K. (6.31)Next consider ni,j,,A = E

    ni+j2(h,s,A,n(

    nil,l+j1,

    nil,l+j1)) for an adequate choice of

    s, to be done below. When X is continuous we take s = 1, and (6.23) and Cauchy-Schwarz inequality yieldni,j,,AK(1/A +

    n+ nA

    pq/2). In the discontinuous casewhen pq < 2 and n = we take s = 2pq > 0 and by (6.23) and Cauchy-Schwarzagain, plus (6.25) with r = 2, we get the existence on a sequence n 0 such thatni,j,,AK(1/A + 1/As + Apqn/2). Finally in the discontinuous case whenn 0:

    sup,i,j

    ni,j,,A() n(A, ), where limA limsupn n(A, ) = 0. (6.32)

    At this stage, we make use of (6.30) and use the two estimates (6.31) and (6.32) andtake successive downward conditional expectations to get the left sides of (6.28) and (6.29)are smaller than GA()

    q +Kn(A, ). This hold for all A >1 and >0. Then by usingGA()0 as 0 and the last part of (6.32), we readily get the results.

    Lemma 6.10 Under (SH), for any function(, x)g(, x)onRd which isF(i1)nRd-measurable, and even and with polynomial growth inx, we have

    En

    i1(n

    iN g(.,

    n

    i)) = 0 (6.33)

    forN being any component ofW, or being any bounded martingale orthogonal to W.

    Proof. When N = Wj we have niN g(ni)() = h((i1)n,

    niW)() for a function

    h( ,x,y) which is odd and with polynomial growth in y , so obviously (6.33) holds.

    Next assume that N is bounded and orthogonal to W. We consider the martingaleMt = E(g(.,

    ni)|Ft), for t (i 1)n. Since W is an (Ft)-Brownian motion, and since

    ni is a function of(i1)n and of niW, we see that (Mt)t(i1)n is also, conditionally

    onFi1)n , a martingale w.r.t. the filtration which is generated by the process Wt

    28

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    29/100

    W(i1)n. By the martingale representation theorem the processM is thus of the formMt= M(i1)n+

    t(i1)n sdWs for an appropriate predictable process . It follows that

    Mis orthogonal to the process Nt =Nt N(i1)n (for t(i 1)n), or in other wordsthe productM N is an (Ft)t(i1)nmartingale. Hence

    Eni1(

    niN g(.,

    n (i1)n

    niW)) = E

    ni1(

    niN

    Min) = Eni1

    niN

    niM) = 0,

    and thus we get (6.33).

    6.5 Proof of Theorem 6.2.

    When f(x) = pf(x) we have V(f, n) = p/2n V

    (f, n), hence (6.5) readily followsfrom the first claim. For this first claim, and as seen above, it is enough to prove it underthe stronger assumption (SH).

    If we set

    V(f,k, n)t =[t/n]

    i=1

    f(ni,0, , ni,k1),

    we have n(V(f,k, n) V(f,k, n)) u.c.p.0 by Lemma 6.9-(a) applied with l = 0 and

    q= 1. Therefore it is enough to prove that nV(f,k, n)t

    u.c.p. t0kv(fv)dv. For this,withI(n,t,l) denoting the set of all i {1, , [t/n]} which are equal to l modulo k, itis obviously enough to show that for l = 0, 1, , k 1:

    iI(n,t,l)ni

    u.c.p. 1k

    t0

    kv(fv)dv, where ni = nf(

    ni,0, , ni,k1). (6.34)

    Observe that n

    i isF(i+k1)n-measurable, and obviouslyE

    ni1(

    ni) = n

    k(i1)n

    (f), Eni1(|ni|2) K2n.By Riemann integration, we have

    iI(n,t,l) E

    ni1(

    ni)

    u.c.p. 1kt

    0kv(fv)dv, because t

    kt (f) is right-continuous with left limits. Hence (6.34) follows from Lemma 4.1.

    6.6 Proof of Theorem 6.3.

    The proof is exactly the same as for Theorem 6.2, once noticed that in view of Lemma

    6.9-(b) applied with n= 1/2n we have

    n(V(, ; f,k, n)t V(f,k, n)t)) u.c.p.

    0.6.7 Proof of Theorem 6.5.

    Once more we may assume (SH). Below, j, kare fixed, as well as and and the functionf, satisfying (6.13), and for simplicity we write D =Djk (f) and Dn =Djk (f,,, n).Set also Dnt = 1kn [t/n]kni=1+kn fn(niX)lIn,t(i) n,jl n,kl ,Dnt = 1kn[t/n]kni=1+kn f(n ni)lIn,t(i) n,jl n,kl .

    (6.35)29

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    30/100

    Lemma 6.11 We haveDn Sk D.Proof. a) Let be as in (6.24) and

    Y()nt = 1

    kn

    [t/n]kni=1+kn

    (fn)(niX)

    lIn,t(i)

    n,jl n,kl , Z()

    nt =Dnt Y()nt.

    It is obviously enough to show the following three properties, for some suitable processesZ():

    lim0

    limsupn

    E(supst

    |Y()ns |) = 0, (6.36)

    (0, 1), n Z()n Sk Z(), (6.37)0 Z() u.c.p. D. (6.38)

    b) Let us prove (6.36) in the first case of (6.13). We have|(f )(x)| ()x2

    forsome function such that ()0 as 0. Hence (6.23) yields Eni1(|(f )(niX)|)K()n. Now, Y()

    nt is the sum of less than 2kn[t/n] terms, all smaller in absolute

    value than 1kn |(f )(niX)|nj 2 for some i= j. By taking two successive conditionalexpectations and by using again (6.23) the expectation of such a term is smaller thanK()n/kn, hence the expectation in (6.36) is smaller thanK t() and we obtain (6.36).

    Next, consider the second case of (6.13). Then (fn)(x) =g(x)1{n 0, wehave the following properties for all n large enough: there is no Tq in (0, knn], nor in(t (kn+ 1)n, t]; there is at most one Tq in an interval ((i 1)n, in] with in t,and if this is not the case we have (

    niX) = 1. Hence forn large enough we have

    Z()t=

    q: knn

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    31/100

    where

    nq = 1

    kn(f(1 ))(ni(n,q)X)

    lI(n,q)

    n,jl n,kl ,

    andi(n, q) = inf(i: inTq) and I(n, q) ={l: l=i(n, q), |l i(n, q)| kn}.To get (6.37) it is enough that nq

    (f(1 ))(XTq) (cjkTq+ cjkTq

    ) for any q. Since

    (f(1 ))(ni(n,q)X)(f(1 ))(XTq) pointwise, it remains to prove that1

    kn

    lI(n,q)

    n,jl n,kl

    cjkTq, 1

    kn

    lI+(n,q)

    n,jl n,kl

    cjkTq . (6.39)

    whereI(n, q) andI+(n, q) are the subsets ofI(n, q) consisting in those l smaller, respec-tively bigger, thani(n, q). Lettingl(n, q) be the smallest l in I(n, q), we see that the leftside of the first expression in (6.39) is Unq + U

    nq , where

    Unq =d

    r,s=1

    jrl(n,q)n

    ksl(n,q)nUqn(r, s), U

    nq (r, s) =

    1

    knn

    lI(n,q)

    nlWrnlW

    s,

    Unq =d

    r,s=1

    1

    knn

    lI(n,q)

    (jr(l1)n

    jrl(n,q)n

    )(ks(l1)n ksl(n,q)n)nlWrnlWs.

    On the one hand, the variables niWare i.i.d. N(0, nId), so Unq (r, s) is distributed

    as 1/kn times the sum ofkn i.i.d. variables with the same law as Wr1 W

    s1 , hence obviously

    Unq (r, s) converges in probability to 1 ifr = s and to 0 otherwise. Since l(n,q)nTq,

    we deduce that Unq

    cjk

    Tq.

    On the other hand, due to (6.23) and by successive integrations we obtain

    E(|Unq|) 1

    kn

    lI(n,q)

    E((l1)n l(n,q)n2)K knn

    which goes to 0 by virtue of (6.12). Therefore we have proved the first part of (6.39), andthe second part is proved in a similar way.

    Lemma 6.12 If f is continuous and f(x) = pf(x) for all > 0, x Rd and somep2, we have1p/2n

    Dnt

    u.c.p.2

    t

    0u(f)cjku du.

    Proof. First we observe that by polarization, and exactly as in the proof of Theorem 6.3, itis enough to show the result when j = k, and of course whenf0: thenDnt is increasingin t, and

    t0u(f)c

    jku du is also increasing and continuous. Then instead of proving the

    local uniform convergence it is enough to prove the convergence (in probability) for anygiven t.

    With our assumptions on f, we have

    1p/2n Dnt = nkn[t/n]kn

    i=1+kn

    lIn,t(i)

    f(ni)n,ji

    n,ki .

    31

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    32/100

    Moreover, n[t/n]kn

    i=1+kn(i1)n (f)c

    jk(i1kn)n

    t0u(f)cjku duby Riemann integra-tion. Therefore, it is enough to prove the following two properties:

    [t/n]

    kn

    i=1+kn

    n(f(ni) (i1)n (f))cjk(i1kn)n 0, (6.40)

    Ynt :=n

    kn

    [t/n]kni=1+kn

    lIn,t(i)

    ni,l0, where ni,l= f(ni)(n,jl n,kl cjk(i1kn)n). (6.41)

    Each summand, sayni, in the left side of (6.40) isFin-measurable with Eni1(ni) = 0and Eni1((

    ni)

    2)K2n (apply (6.23) and recall that|f(x)| Kxr with our assump-tions onf), so (6.40) follows from Lemma 4.1.

    Proving (6.41) is a bit more involved. We set

    ni,l =f((i1kn)nniW/

    n) (n,jl

    n,kl cjk(i1kn)n), Ynt =

    nkn

    [t/n]kni=1+kn

    lIn,t(i)

    ni,l

    On the one hand, for any l In,t(i) (hence either l < i or l > i) and by successiveintegration we have

    |Eni1kn(ni,l)|=|(i1kn)n (f)Eni1kn(cjk(l1)n c

    jk(i1kn)n)| K

    knn

    by (6.23), the boundedness of and|f(x)| Kxr. Moreover Eni1kn((ni,l)2) Kis obvious. Therefore, since E((Yn

    t

    )2) is 2/k2

    n

    times the sum of all E(ni,l

    ni

    ,l) for all

    1 + kni, i[t/n] kn andlIn,t(i) andl In,t(i), by singling out the cases where|i i|> 2kn and|i i| 2kn and in the first case by taking two successive conditionalexpectations, and in the second case by using Cauchy-Schwarz inequality, we obtain that

    E((Ynt )2) K

    2n

    k2n

    4k2n[t/n]

    2(knn) + 4k2n[t/n]

    K(t2knn+ tn) 0.In order to get (6.41) it remains to prove that Ynt Ynt 0. By Cauchy-Schwarz

    inequality and (6.23), we have

    E(|ni,l

    ni,l|)KE(|(i1)n (f) (i1kn)n (f)|2)1/2 .

    Then another application of Cauchy-Schwarz yields E(|Ynt Ynt |)Kt

    n(t), where

    n(t) = n

    kn

    [t/n]kni=1+kn

    lIn,t(i)

    E(|(i1)n (f) (i1kn)n (f)|2)

    = 2n

    [t/n]kni=1+kn

    E(|(i1)n (f) (i1kn)n (f)|2) 2 t

    0gn(s)ds,

    32

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    33/100

    with the notation gn(s) = E((n(kn+[sn])(f)n[s/n](f))2). Since ct is boundedand f is with polynomial growth, we first have gn(s) K. Since further t t has nofixed time of discontinuity and f is continuous and nkn 0, we next have gn(s) 0pointwise: hence n(t)0 and we have the result.

    Proof of (6.15). In view of Lemma 6.11 it is enough to prove thatDn Dn u.c.p.0, andthis will obviously follow if we prove that

    supi=l

    1

    2nE(|fn(niX)nl|) 0 as n , (6.42)

    whereni = nlX

    jnlXk1{nl Xn } n

    n,jl

    n,kl .

    A simple computation shows that for x, yRd and >0, we have

    |(xj+ yj )(xk+ yk)1{x+y} xjxk| K1x3 + x(y ) + y2 2.We apply this tox =

    n

    nl andy =

    n

    nl and =

    n, and (6.23) and (6.27) with

    = and Cauchy-Schwarz inequality, to get

    Enl1(|nl|)Kn(1/2n + n)

    for some n going to 0. On the other hand, (SH) implies that niX is bounded by a con-

    stant, hence (6.13) yields|fn(niX)| KniX2 and (6.23) again gives Eni1(|fn(niX)|)Kn. Then, by taking two successive conditional expectations, we get E(|fn(niX)nl|)

    K2n(

    1/2n + n) as soon as l

    =i, and (6.42) follows.

    Proof of (6.16). In view of Lemma 6.12 it is enough to prove that 1r/2n (DnDn) u.c.p.

    0, whenX is continuous and f(x) =rf(x) for somer >2. With the notationni of theprevious proof, this amounts to prove the following two properties:

    supi=l

    1

    1+r/2n

    E(|f(niX)nl|) 0 as n , (6.43)

    supi=l

    1

    r/2n

    E

    |f(niX)|1{niX>n }nl) 0 as n . (6.44)

    Since X is continuous and|f(x)| Kxr, we have Eni1(|fn(niX)|) Kr/2n , hence

    the proof of (6.43) is like in the previous proof. By Bienayme-Tchebycheff inequality and(6.23) we also have Eni1(|f(niX)|1{niX>n }) Kq

    qn for any q > 0, hence (6.44)

    follows.

    7 A first CLT

    As we have seen after (3.14), we have the CLT (3.7) when X is the sum of a Wienerprocess and a compound Poisson process, as soon as the function f in V (f, n) satisfies

    33

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    34/100

    f(x)/|x|p 0 as|x| , for some p

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    35/100

    Rq-valued process with independent increments, satisfying

    E(V(fi, k)tV(fj , k)t) =

    t

    0Riju(f, k)du. (7.3)

    Another, equivalent, way to characterize the limiting process V (f, k) is as follows, see[13]: for each , the matrix R(f, k) is symmetric nonnegative, so we can find a square-rootS(f, k), that isS(f, k)S(f, k)

    =R(f, k), which as a function of is measurable.Then there exists a q-dimensional Brownian motionB = (Bi)iq on an extension of thespace (, F,P), independent ofF, and V (f, k) is given componentwise by

    V(fi, k)t =q

    j=1

    t0

    Siju(f, k)dBju. (7.4)

    As a consequence we obtain a CLT for estimatingCjkt whenX is continuous. It suffices

    to apply the theorem with k = 1 and the d d-dimensional function f with componentsfjl (x) =xj xk. Upon a simple calculation using (7.2) in this case, we obtain:

    Corollary 7.2 Assume (H) (or (H) only,although it is not then a consequence of theprevious theorem) and that X is continuous. Then the dd-dimensional process withcomponents

    1n

    [t/n]i=1

    niXj niX

    k Cjkt

    converge stably in law to a continuous process(Vjk )1j,kddefined defined on an extension(

    ,

    F,P)of the space(, F,P), which conditionally on the-fieldFis a centered Gaussian

    Rq

    -valued process with independent increments, satisfyingE(Vjkt Vjkt ) = t0

    (cjk

    u cjku + c

    jj

    u ckk

    u )du. (7.5)

    It turns out that this result is very special: Assumption (H) is required for Theorem

    7.1, essentially because one needs that n[t/n]

    i=1 (i1)n (g) converges tot

    0s(g)dsat a

    rate faster than 1/

    n, and this necessitates strong assumptions on (instead of assumingthat it is an Ito semimartingale, as in (H), one could require some Holder continuity of itspaths, with index bigger than 1/2). However, for the corollary, and due to the quadraticform of the test function, some cancelations occur which allow to obtain the result underthe weaker assumption (H) only. Although this is a theoretically important point, it is

    not proved here.

    There is a variant of Theorem 7.1 which concerns the case where in (6.3) on take thesum over the is that are multiple ofk . More precisely we set

    V(f,k, n)t =[t/kn]

    i=1

    f

    n(i1)k+1X/

    n, , nikX/

    n

    . (7.6)

    The LLN is of course exactly the same as Theorem 6.2, except that the limit should bedivided byk in (6.5). As for the CLT, it runs as follows (and although similar to Theorem7.1 i is not a direct consequence):

    35

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    36/100

    Theorem 7.3 Under he same assumptions than in Theorem 7.1, theq-dimensional pro-cesses

    1

    n nV(f,k, n)t 1

    k t

    0

    ku(f)duconverge stably in law to a continuous process V(f, k) defined defined on an extension(,F, P)of the space(, F,P), which conditionally on the-fieldFis a centered GaussianRq-valued process with independent increments, satisfying

    E(V(fi, k)tV(fj, k)t| F) = 1k

    t0

    ku(fifj) ku(fi)ku(fj)

    du. (7.7)

    Theorem 7.1 does not allow to deduce a CLT associated with Theorem 6.4, since thefunctionfwhich is used in (6.10) cannot meet the assumptions above. Nevertheless sucha CLT is available when Xis continuous: see [8], under the (weak) additional assumptionthat t

    t is everywhere invertible. When X is discontinuous and with the additional

    assumption that ((x) 1)(dx)

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    37/100

    Proposition 7.5 The processes

    Vnt = n

    [t/n]

    i=1 ni k(i1)n (f) (7.8)converge stably in law to the processV(f, k), as defined in Theorem 7.1.

    Next, we successively prove the following three properties:

    n

    [t/n]i=1

    Eni1(

    ni )

    u.c.p. 0, (7.9)

    n[t/n]

    i=1 ni Eni

    1(

    ni )

    u.c.p.

    0, (7.10)

    1n

    n

    [t/n]i=1

    k(i1)n (f) t

    0ku(f)du

    u.c.p. 0. (7.11)

    Obviously our theorem is a consequence of these three properties and of Proposition 7.5.Apart from (7.10), which is a simple consequence of Lemma 6.9, all these steps are nontrivial, and the most difficult is (7.9).

    7.2 Proof of (7.10).

    We use the notation I(n,t,l) of the proof of Theorem 6.2, and it is of course enough toprove

    n

    iI(n,t,l)

    ni Eni1(ni )

    u.c.p. 0

    for each l = 0, , k1. Since each ni isF(i+k1)n-measurable, by Lemma 4.1 it iseven enough to prove that

    n

    iI(n,t,l)E

    ni1((

    ni )

    2) u.c.p. 0.

    But this is a trivial consequence of Lemma 6.9 applied with q= 2 and l = 0: in case (a)

    the function f obviously satisfies (6.4) for some r 0 and X is continuous, whereas incase (b) it satisfies (6.4) with r = 0.

    7.3 Proof of (7.11).

    Let us consider the function g() =k (f), defined on the setM. (7.11) amounts to[t/n]

    i=1

    niu.c.p. 0, where ni =

    1n

    in(i1)n

    (g(u) g((i1)n))du. (7.12)

    37

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    38/100

    Sincef is at least C1 with derivatives having polynomial growth, the function g is C1b onM. However, the problem here is that may have jumps, and even when it is continuousits paths are typically Holder with index >1/2, but nor = 1/2: so (7.12) is not trivial.

    Withg denoting the gradient of g (a dd-dimensional function), we may writeni =

    ni +

    ni where (with matrix notation)

    ni = 1

    ng((i1)n)

    in(i1)n

    (u (i1)n) du,

    ni = 1

    n

    in(i1)n

    g(u) g((i1)n) g((i1)n)(u (i1)n)

    du.

    In view of (6.21) we can decompose further ni as ni =

    ni +

    ni , where

    ni = 1

    n

    g((i1)n) in

    (i1)ndu

    u

    (i1)n bsds,ni =

    1n

    g((i1)n) in

    (i1)ndu

    u(i1)n

    sdWs+

    u(i1)n

    (s, x)( )(ds, dx) .On the one hand, we have |ni | K3/2n (recall that g is C1b andb is bounded), so[t/n]

    i=1 ni

    u.c.p. 0. On the other hand, we have Eni1(ni ) = 0 and Eni1((ni )2) K2nby Doob and CauchySchwarz inequalities, hence

    [t/n]i=1

    ni

    u.c.p.0 by Lemma 4.1.Finally since g is C1b on the compact setMwe have|g() g() g()( )|

    K h( ) for all , M, where h()0 as 0. Therefore

    |ni | 1

    n

    in(i1)n

    h(u (i1)n)u (i1)n du

    1n

    h()

    in(i1)n

    u (i1)n du + K

    n

    in(i1)n

    u (i1)n2 du.

    Since h() is arbitrarily small we deduce from the above and from (6.23) that[t/n]

    i=1 E(|ni |)0. This clearly finishes to prove (7.12).

    7.4 Proof of Proposition 7.5.

    We prove the result whenk = 2 only. The case k3 is more tedious but similar.Letting gt(x) =

    t(dy)f(x, y), we have V

    n(f)t =

    [t/n]+1i=2

    ni +

    n1 n[t/n]+1,

    where ni =ni +

    ni and

    ni =

    n

    f(ni1,0,

    ni1,1)

    (i2)n (dx)f(

    ni1,0, x)

    ,

    ni =

    n

    (i1)n (dx)f(

    ni,0, x) 2(i1)n (f)

    .

    38

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    39/100

    Since obviously E(|ni|) K

    n, it is enough to prove that Vn

    (f)t =[t/n]+1

    i=2 ni

    converges stably in law to the process V (f, 2).

    Note that ni isFin-measurable, and a (tedious) calculation yieldsE

    ni1(

    ni) = 0, E

    ni1((

    ni)

    2) = nni, E

    ni1(|ni|4)K2n, (7.13)

    whereni =g((i 2)n, (i 1)n, ni1) and

    g(s,t,x) =

    t(dy)

    f(x, y)2 +

    t(dz)f(y, z)

    2 t(dy)f(x, y)2

    2t (f)2 22t (f) s(dy)f(x, y) + 2 (dy)(dz)f(x, sy)f(ty, tz)

    (here, is the lawN(0, Id)). Then if we can prove the following two properties:

    [t/n]+1i=2

    Eni1(

    niN

    ni)

    0 (7.14)

    for any Nwhich is a component ofWor is a bounded martingale orthogonal to W, and

    n

    [t/n]+1i=2

    ni t

    0Ru(f, 2)du (7.15)

    (with the notation (7.1); here fis 1-dimensional, so R(f, 2) is also 1-dimensional), then

    Lemma 4.4 will yield the stable convergence in law ofVn

    to V (f, 2).

    Let us prove first (7.14). Recall ni =ni +

    ni , and observe that

    ni =

    n h((i2)n, ni1W/

    n,

    niW/

    n)

    ni =

    n h((i1)n ,

    niW/

    n),

    where h(,x,y) and h(, x) are continuous functions with polynomial growth in x an y,uniform in M. Then (7.14) whenN is a bounded martingale orthogonal toW readilyfollows from Lemma 6.10.

    Next, suppose that N is a component ofW, say W1. Since fis globally even and sis a measure symmetric about the origin, the function h(, x) is even in x, so h(, x)x1

    is odd in x and obviouslyE

    n

    i1(n

    i n

    iW1

    ) = 0. So it remains to prove that[t/n]+1

    i=2

    ni 0, where ni = Eni1(niniW1). (7.16)

    An argument similar to the previous one shows that h(,x,y) is globally even in (x, y),so ni has the form n k((i2)n,

    ni1W/

    n) where k(, x) =

    (dy)h(,x,y)y

    1 isodd inx, and alsoC1 inx with derivatives with polynomial growth, uniformly in M.Then Eni2(

    ni ) = 0 and E

    ni2(|ni|2) K2n. Since ni is alsoF(i1)n-measurable, we

    deduce (7.16) from Lemma 4.1, and we have finished the proof of (7.14).

    39

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    40/100

    Now we prove (7.15). Observe that ni isF(i1)n-measurable and

    Eni2(

    ni) = h((i 2)n, (i 1)n), Eni2(|ni |2) K,

    whereh(s, t) = s(dx)g(s,t,x). Then, by Lemma 4.1,the property (7.15) follows fromn

    [t/n]i=1

    h((i 1)n, in) t

    0Ru(f, 2)du, (7.17)

    so it remains to show (7.17). On the one hand we have|h(s, t)| K. On the otherhand, since f is continuous with polynomial growth and t is bounded we clearly haveh(sn, tn)h(t, t) for any sequences sn, tnt which are such that sn and tn convergeto t: since the later property holds, for P-almost all and Lebesgue-almost all t, for allsequencessn, tnt, we deduce that

    n

    [t/n]i=1

    h((i 1)n, in) t

    0h(u, u)du.

    Since

    h(t, t) = 2t (f2) 3

    2t (f)

    2+ 2

    t(dx)t(dy)t(dz)f(x, y)f(y, z),

    is trivially equal to Rt(f, 2), as given by (7.1). Hence we have (7.17).

    7.5 Proof of (7.9).

    As said before, this is the hard part, and it is divided into a number of steps.

    Step 1. For l = 0, , k 1 we define the following (random) functions on Rd:

    gni,l(x) =

    fniX

    n, ,

    ni+l1X

    n, x , xl+1, , xk1

    (kl1)(i1)n (dxl+1, , xk1)

    (for l = 0 we simply integrate f(x, xl+1, , xk1), whereas for l = k 1 we have nointegration). As a function of this isF(i+l1)n-measurable. As a function ofx it isC1,and further it has the following properties, according to the case (a) or (b) of Theorem7.1 (we heavily use the fact that t is bounded, and also (6.23)):

    |gni,l(x)| + gni,l(x) KZni,l(1 + xr) wherein case (a): r0, Eni1(|Zni,l|p)Kp p >0, Zni,l isF(i+l2)n-measurablein case (b): r= 0, Zni,l= 1.

    (7.18)For allA1 there is also a positive functionGA() tending to 0 as 0, such that withZni,l as above:

    x A, Zni,lA, y gni,l(x + y) gni,l(x) GA(). (7.19)

    40

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    41/100

    Observing that ni is the sum overl from 0 to k 1 of

    f

    niX

    n, ,

    ni+lXn

    , ni,l+1, , ni,k1 f

    niXn

    , , ni+l1X

    n, ni,l, , ni,k1

    ,

    we have

    Eni1(

    ni ) =

    k1l=0

    Eni1

    gni,l(ni+lX/

    n) gni,l(ni,l)

    .

    Therefore it is enough to prove that for any l0 we have

    n

    [t/n]i=1

    Eni1

    gni,l(ni+lX/

    n) gni,l(ni,l)

    u.c.p. 0. (7.20)

    Step 2. In case (b) the process Xhas jumps, but we assume that

    ((x) 1)(dx)

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    42/100

    Ifq is a non zero integer, we compute the qth moment ofYs+t Ys by differentiating qtimes its Laplace transform at 0: this is the sum, over all choices p1, . . . , pk of positiveintegers with

    ki=1pi = q, of suitable constants times the product for all i = 1, . . . , k of

    the terms t ((x)2pi 1)(dx), each one being smaller than Kt. Then we deduce thatE((Ys+t Ys)q | Fs)Kqt, and by interpolation this also holds for any real q1.Then, coming back to the definition ofni,landni,l, and using the properties(t, x)

    K((x) 1) and(t, x) K((x) 1), plus the fact that ((x) 1)(dx)

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    43/100

    which goes to 0 by the dominated convergence theorem and the bounds given in (SH) and((x) 1)(dx) 0:

    |ni,l(3)

    | GA()

    ni,l

    + KZni,l(1 +

    ni,l

    r +

    ni,l

    r)

    ni,l

    (1{Zni,l>A}+ 1{ni,l>A}+ 1{ni,l>}) GA()ni,l + KZni,l(1 + Zni,l)1 + ni,l)r+1

    A +

    (1 + ni,l)rni,l

    +(1 + ni,l)ni,lr+1

    A +

    ni,lr+2

    .

    By (7.23) we have Eni+l1(ni,l|q) Kqn if q 2. Then in view of (6.23) we get byHolder inequality:

    Eni+l1(|ni,l(3)|) K

    n

    GA() + Z

    ni,l(1 + Z

    ni,l) 1

    A+

    1/6n

    .

    Then sinceE

    ((Zn

    i,l)q

    )Kq for all q >0 we have

    E

    n

    [t/n]i=1

    Eni+l1(ni,l(3)) KtGA() + 1A + 1/6n

    ,

    and (7.27) for j = 3 follows (choose A big and then small).

    Step 7) It remains to prove (7.27) for j = 2. By (7.18) we have

    |ni,l(2)| K

    nZni,l(1 + ni,lr)

    ni,l.

    43

  • 8/10/2019 Jacod - European Summer School - Statistics and High Frequency Data 2009

    44/100

    Hence by Cauchy-Schwarz inequality and (6.23),

    E

    E

    ni+l1(

    ni,l(2))

    KE

    Zni,l(n+

    ni,l)

    K

    n+

    E(ni,l)

    .

    Then, in view of (7.26), the result is obvious.

    7.6 Proof of Theorem 7.3.

    The proof is exactly the same as above, with the following changes:

    1) In Proposition 7.5 we substitute Vn

    and V (f, k) with

    Vnt =

    n

    [t/kn]

    i=1 n(i1)k+1 k(i1)kn (f)

    andV (f, k) respectively. The proof is then much shorter, becauseni =

    n(n(i1)k+1

    k(i1)kn (f)) isFikn-measurable. We have

    En(i1)k(

    ni)) = 0, E

    n(i1)k((

    ni)

    2) = nni, E

    n(i1)k((

    ni)

    4)K2n,

    withni =k(i1)kn

    (f2)k(i1)kn (f)2, and (7.17) is replaced by the obvious convergenceof[t/kn]

    i=1 En(i1)k((

    ni)

    2) to the right side of (7.7) (recall that we assumed q= 1 here). We

    also have En(i1)k((Nikn N(i1)kn)ni ) = 0 whenN is a bounded martingale orthogonaltoWby Lemma 6.10, and ifNis one of the components ofWbecause then this conditional

    expectation is the integral of a globally odd function, with respect to a measure on (Rd

    )kwhich is symmetric about 0. So Lemma 4.4 readily applies directly, and the propositionis proved.

    2) Next, we have to prove the analogues of (7.9), (7.10) and (7.11), where we only takethe sum for thosei of the formi = (j 1)k + 1, and where in (7.11) we divide the integralby k. Proving the new version of (7.10) is of course simpler than the old one; the newversion of (7.11) is the old one for n, whereas for (7.9) absolutely nothing is changed.So we are done.

    7.7 Proof of Theorem 7.4.

    For this theorem again we can essentially reproduce the previous proof, with k = 1, andwith the functionf with components fjm(x) =x

    jxm (here m replaces the index k in thetheorem). Again it suffices by polarization to prove the result for a single pair (j, m).

    Below we set n = 1/2n , which goes to. Introduce the function on Rd defined

    bygn(x) =xjxmn(x) (recall (6.24)), and set

    ni = niX

    jniXk

    n1{niXn } gn(niX/

    n),

    ni =gn(

    niX/

    n) n,mi n,ki .

    44