17
The Long Memory of the Efficient Market Fabrizio Lillo J. Doyne Farmer SFI WORKING PAPER: 2003-12-066 SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent the views of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or proceedings volumes, but not papers that have already appeared in print. Except for papers by our external faculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, or funded by an SFI grant. ©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensure timely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the author(s). It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may be reposted only with the explicit permission of the copyright holder. www.santafe.edu SANTA FE INSTITUTE

The Long Memory of the Efficient Market · The long memory of the efficient market Fabrizio Lillo1,2 and J. Doyne Farmer1 1Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Long Memory of the Efficient Market · The long memory of the efficient market Fabrizio Lillo1,2 and J. Doyne Farmer1 1Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501

The Long Memory of theEfficient MarketFabrizio LilloJ. Doyne Farmer

SFI WORKING PAPER: 2003-12-066

SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent theviews of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or proceedings volumes, but not papers that have already appeared in print. Except for papers by our externalfaculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, orfunded by an SFI grant.©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensuretimely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rightstherein are maintained by the author(s). It is understood that all persons copying this information willadhere to the terms and constraints invoked by each author's copyright. These works may be reposted onlywith the explicit permission of the copyright holder.www.santafe.edu

SANTA FE INSTITUTE

Page 2: The Long Memory of the Efficient Market · The long memory of the efficient market Fabrizio Lillo1,2 and J. Doyne Farmer1 1Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501

The long memory of the efficient market

Fabrizio Lillo1, 2 and J. Doyne Farmer1

1Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 875012Istituto Nazionale per la Fisica della Materia, Unita di Palermo, Italy

(Dated: November 20, 2003)

Using data from the London Stock Exchange we demonstrate that the signs of orders obey a long-memory process. The autocorrelation function decays roughly as τ−α with α ≈ 0.6, correspondingto a Hurst exponent H ≈ 0.7. The time τ is measured in terms of the number of intervening events.This is true for market orders, limit orders, and cancellations. Although the values for α vary fromstock to stock, in the range 0.36 − 0.77, in most cases the exponents for different stocks are quitesimilar, and they are always less than one. This implies that the signs of future orders are quitepredictable from the signs of past orders; all else being equal, this would suggest a very strongmarket inefficiency. We demonstrate, however, that fluctuations in signs are compensated for byanti-correlated fluctuations in transaction size and liquidity. For example, when buy orders becomemore likely, buy orders tend to be smaller than sell orders and buy liquidity tends to be higher thansell liquidity. By breaking down the data by institutional codes we show that some institutionsdisplay long-range memory and others don’t.

Contents

I. Introduction 1

II. Data 2

III. Review of methods for understanding

long-memory processes 3A. Definitions of long-memory 3B. Statistical tests for long-memory 3C. Methods of measuring the Hurst exponent 4

IV. Demonstration of long-memory for order

signs 5A. A quick look at the autocorrelation function 5B. Statistical evidence for long-memory 6C. Estimating the Hurst exponents 6D. Idiosyncratic variation of the Hurst

exponents 7E. Order flow in real time 8F. Autocorrelation of transaction volume 8G. NYSE 9

V. Market inefficiency? 9A. Inefficiency of prices in absence of liquidity

fluctuations 10B. The key role of liquidity fluctuations in

market efficiency 11

VI. Individual institutions 12

VII. Implications for market impact 13

VIII. Conclusions 14A. Possible causes of long-memory order flow 14B. Summary 15

Acknowledgments 15

References and Notes 15

I. INTRODUCTION

Roughly speaking, a random process is called a long-memory process if it has an autocorrelation functionthat is not integrable. This happens, for example, whenthe autocorrelation function decays asymptotically as apower law of the form τ−α with α < 1. This is impor-tant because it implies that values from the distant pastcan have a significant effect on the present, and impliesanomalous diffusion. We present a more technical dis-cussion of long-memory processes in Section III.

Long-memory processes have been observed in differ-ent natural and human phenomena ranging from the levelof rivers to the temperature of the Earth [1]. In financethere has been a long-standing debate as to whether ornot prices have long-memory properties [2–5]. It is morewidely accepted (though still not uncontroversial) thatthe volatility of prices is a long-memory process [6]; thishas led to the development of alternate models for volatil-ity, such as FIGARCH [7]. Another market property thatseems to have long-memory properties is stock markettrading volume [8]. Models of long-memory processes in-clude fractional Brownian motion [9] and the FARIMAprocess introduced by Granger [10, 11].

In this paper we make use of a data set from theLondon Stock Exchange (LSE), which contains a fullrecord of individual orders and cancellations, to showthat stocks in the LSE display a remarkable long-memoryproperty 1. We label each event as either a buy or a sellorder, and assign it ±1 accordingly. The autocorrelationof the resulting time series shows a power law autocorre-lation function, with exponents that are typically about

1 Bouchaud et al. [12] independently discovered the long-memoryproperty of order signs for stocks in the Paris exchange. Wethank them for acknowledging the oral presentation of our resultsin May 2003.

Page 3: The Long Memory of the Efficient Market · The long memory of the efficient market Fabrizio Lillo1,2 and J. Doyne Farmer1 1Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501

2

0.6, in the range 0.36 < α < 0.77. Positive autocorrela-tion coefficients are seen at statistically significant levelsover lags of many thousand events, spanning many days.Thus the memory of the market is remarkably long.

This immediately raises a conundrum concerning mar-ket efficiency. All other things being equal, such stronglong-memory behavior would imply strong predictability,easily exploitable for substantial profits. How can this becompatible with market efficiency? We show that the an-swer is that other properties of the market adjust in orderto compensate and keep the market efficient2. In partic-ular, the relative volume of buy and sell orders, and therelative buy and sell liquidity are skewed in opposition tothe imbalance in order signs. For example, suppose thelong-memory of previous order signs predicts that buyorders are more likely in the near future. All things be-ing equal, since buy orders have a positive price response,the price should go up. But the market compensates forthis: When buy orders become more common, sell marketorders tend to be larger than buy market orders, and vol-ume at the best ask tends to be larger than the volume atthe best bid. As a result, even though there are more buyorders, price responses are smaller for buy orders than sellorders. This happens in just the right amount, so thatprice returns remain roughly white, i.e. the autocorrela-tion of price returns rapidly decays to zero. In order tocompensate and keep the market efficient, market ordervolume and liquidity are also long-memory processes.

This brings up the interesting question of what ac-tually causes these long-memory properties of markets.While not answering this question, we provide a clueabout the answer by making use of the institutional codesassociated with each order. Some institutions show long-memory quite clearly, while others do not.

The paper is organized as follows: In Section II wegive a summary of the data set, and in Section III weprovide a more technical discussion of long-memory pro-cesses and the statistical techniques we use in this paper.Then in Section IV we present the evidence that theseare long-memory processes, and in particular we demon-strate that these series pass stringent tests so that wecan be sure that they are long-memory with a high de-gree of confidence. In Section V we show how other pro-cesses compensate for the long-memory of order signs,so that price changes remain roughly efficient, and showthat both order size and liquidity are also long-memoryprocesses. In Section VI we break down the orders byinstitution and show that the behavior of some institu-tions shows long-memory quite clearly, while others donot show it at all. In Section VII we discuss the impli-cations of the long-memory properties of order flow fordelayed market impact. We conclude in Section VIII,discussing some of the broader issues and the remainingquestions.

2 For a discussion of what we mean by “efficient”, see Section V.

TABLE I: Summary statistics of the 20 stocks we study forthe period 1999-2002. The columns give the number of eventsof each type, in thousands. All events are “effective” events– see the discussion in the text.

tick market orders limit orders cancellations totalAZN 652 2,067 1,454 4,173BA 381 950 598 1,929BAA 226 683 487 1,397BLT 297 825 557 1,679BOOT 246 711 501 1,458BSY 404 1,120 726 2,250DGE 527 1,329 854 2,709GUS 244 734 518 1,496HG. 228 676 472 1,377LLOY 723 1,664 1,020 3,407PRU 448 1,227 821 2,496PSON 373 1,063 734 2,170RIO 381 1,122 771 2,274RTO 276 620 389 1,285RTR 479 1,250 820 2,549SBRY 284 805 561 1,650SHEL 717 4,137 3,511 8,365TSCO 471 949 523 1,943VOD 1,278 2,358 1,180 4,817WPP 399 1,151 780 2,330total 9,034 25,441 17,277 51,752

II. DATA

In order to a have a representative sample of stockswe select 20 companies continuously traded at the Lon-don Stock Exchange (LSE) in the 4-year period 1999-2002. The stocks we analyzed are Astrazeneca (AZN),Bae Systems (BA.), Baa (BAA), BHP Billiton (BLT),Boots Group (BOOT), British Sky Broadcasting Group(BSY), Diageo (DGE), Gus (GUS), Hilton Group (HG.),Lloyds Tsb Group (LLOY), Prudential (PRU), Pear-son (PSON), Rio Tinto (RIO), Rentokil Initial (RTO),Reuters Group (RTR), Sainsbury (SBRY), Shell Trans-port & Trading Co. (SHEL), Tesco (TSCO), VodafoneGroup (VOD), and WPP Group (WPP). Table I givesa summary of the number of different events for the 20stocks.

The London Stock Exchange consists of two markets,the electronic (SETS) exchange, and the upstairs market.We study only the electronic exchange. The data set weanalyze contains every action by every institution partic-ipating in this exchange. In 1999 the electronic exchangecontains roughly 57% percent of the order flow for a typ-ical stock, and in 2002 roughly 62% percent. It is thusalways a substantial fraction of the total order flow, andis believed to be the dominant mechanism for price for-mation. There are several types of orders allowed by theexchange, with names such as “fill or kill” and “executeor eliminate”. To place our analysis in more useful termswe label events in terms of their net effect on the limitorder book. We label any order that results in an imme-diate transaction an effective market order, and any orderthat leaves a limit order sitting in the book an effective

Page 4: The Long Memory of the Efficient Market · The long memory of the efficient market Fabrizio Lillo1,2 and J. Doyne Farmer1 1Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501

3

limit order. A single order may result in multiple effec-tive orders. For example, consider a crossing limit order,i.e. a limit order whose limit price crosses the opposingbest price quote. The part of the order that results in animmediate transaction is counted as an effective marketorder, while the remaining non-transacted part (if any) iscounted as an effective limit order. Finally, we will alsolump together any event that results in a queued limitorder being removed without a transaction, and refer tosuch an event as a cancellation. Henceforth dropping themodifier “effective”, we can then classify events as one ofthree types: market order, limit order and cancellation.For the set of 20 stocks described above there is a total ofroughly 9 million market orders, 25 million limit ordersand 17 million cancellations. Throughout this paper, un-less otherwise specified, we use the number of effectiveevents as a measure of time, which we call event time.We typically do this in terms of the number of events ofa given type, e.g. if we are studying market orders wemeasure event time in terms of the number of marketorders, and if we are studying limit orders we measureevent time in terms of the number of limit orders.

Trading begins each day with an opening auction.There is a period leading up to the opening auction inwhich orders are placed but no transactions take place.The market is then cleared and for the remainder of theday (except for occasional exceptional periods) there is acontinuous auction. We remove all data associated withthe opening auction, and analyze only orders placed dur-ing the continuous auction.

An analysis of the limit order placement shows thatin our dataset approximately 35% of the effective limitorders are placed behind the best price (i.e. inside thebook), 33% are placed at the best price, and 32% areplaced inside the spread. This is roughly true for allthe stocks except for SHEL for which the percentagesare 71%, 18% and 11%, respectively. Moreover for allthe stocks the properties of buy and sell limit orders areapproximately the same.

In our dataset cancellation occurs roughly 32% of thetime at the best price and 68% of the time inside thebook. This is quite consistent across stocks and betweenthe cancellation of buy and sell limit orders. As for thecase of the placement of limit orders, the only significantdeviation is SHEL for which the percentages are 14% and86%.

In the following price will indicate the mid price, i.e.p(t) = (a(t)+b(t))/2 where a(t) and b(t) are the best askand best bid prices at time t, respectively.

III. REVIEW OF METHODS FOR

UNDERSTANDING LONG-MEMORY

PROCESSES

A. Definitions of long-memory

There are several way of characterizing long-memoryprocesses. A widespread definition is in terms of the auto-covariance function γ(k). We define the process as long-memory if in the limit k → ∞

γ(k) ∼ k−αL(k) (1)

where 0 < α < 1 and L(x) is a slowly varying function3

at infinity. The degree of long-memory dependence isgiven by the exponent α; the smaller α, the longer thememory.

Long-memory is also discussed in terms of the Hurstexponent H , which is simply related to α. For a long-memory process H = 1 − α/2 or α = 2 − 2H . Short-memory processes have H = 1/2, and the autocorrelationfunction decays faster than k−1. A positively correlatedlong-memory process is characterized by a Hurst expo-nent in the interval (0.5, 1). The use of the Hurst expo-nent is motivated by the relationship to diffusion prop-erties of the integrated process. For normal diffusion,whose increments do not display long-memory, the stan-dard deviation asymptotically increases as t1/2, whereasfor diffusion processes with long-memory increments thestandard deviation asymptotically increases as tHL(t),with 1/2 < H < 1, and L(t) a slow-varying function.

Yet another equivalent definition of long-memory de-pendence can be given in terms of the behavior of thespectral density for low frequencies. A long-memory pro-cess has a spectral density which diverges for low frequen-cies as

g(f) ' f1−2HL(f), (2)

where f is the frequency, and L(f) is a slowly varyingfunction in the limit f → 0. This follows immediatelyfrom the fact that the autocorrelation and the spectraldensity are Fourier transforms of each other.

B. Statistical tests for long-memory

The empirical determination of the long-memory prop-erty of a time series is a difficult problem. The ba-sic reason for this is that the strong autocorrelation

3 L(x) is a slowly varying function [13] if limx→∞ L(tx)/L(x) = 1.In the definition above, and for the purposes of this paper, weare considering only positively correlated long-memory processes.Negatively correlated long-memory processes also exist, but thelong-memory processes we will consider in the rest of the paperare all positively correlated.

Page 5: The Long Memory of the Efficient Market · The long memory of the efficient market Fabrizio Lillo1,2 and J. Doyne Farmer1 1Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501

4

of long-memory processes makes statistical fluctuationsvery large. Thus tests for long-memory tend to requirelarge quantities of data and can often give inconclusive re-sults. Furthermore, different methods of statistical anal-ysis often give contradictory results. In this section wereview two such tests and discuss some of their prop-erties. In particular we discuss the classical R/S test,which is known to be too weak, and Lo’s modified R/Stest, which is known to be too strong.

The basic idea behind the classical R/S test [14–16] isto compare the minimum and maximum values of run-ning sums of deviations from the sample mean, renor-malized by the sample standard deviation. For long-memory processes the deviations are larger than for non-long memory processes. The classical R/S test has beenproven to be too weak, i.e. it tends to indicate a timeseries has long-memory when it does not. In fact, Lo [3]showed that even for a short-memory process, such asa simple AR(1) process, the classical R/S test does notreject the null hypothesis of short-memory. This fact mo-tivated Lo to introduce a test based on a modified R/Sstatistic which is sensitive to true long-memory proper-ties [3].

We now describe Lo’s modified R/S test. Considera sample time series X1, X2,...,Xn with sample mean(1/n)

j Xj as Xn. Let σ2x and γx be the sample vari-

ance and autocovariance. The modified rescaled rangestatistic Qn(q) is defined by

Qn(q) ≡ (3)

≡ 1

σn(q)

max1≤k≤n

k∑

j=1

(Xj − Xn) − min1≤k≤n

k∑

j=1

(Xj − Xn)

where

σ2n(q) ≡ σ2

x +2

q∑

j=1

ωj(q)γj , ωj(q) ≡ 1− j

q + 1(4)

and q < n. It is worth noting that Qn(q) differs from theclassical R/S statistics of Mandelbrot only in the denom-inator. In the classical R/S test σn(q) is replaced by thesample standard deviation σx.

The optimal value of q to be used in Eq. (3) to computeQn must be chosen carefully. Lo suggested the valueq = [kn] where

kn ≡(

3n

2

)1

3

(

1 − ρ2

)2

3

(5)

where [kn] indicates the greatest integer less than or equalto kn and ρ is the sample first-order autocorrelation co-efficient of the data. Lo was able to prove that if theprocess has finite fourth moment and it has a short-memory dependence (and satisfies other supplementaryconditions) Vn ≡ Qn/

√n tends asymptotically to a ran-

dom variable distributed according to

FV (v) = 1 + 2

∞∑

k=1

(1 − 4k2v2)e−2(kv)2 . (6)

This result makes it possible to find the boundaries of agiven confidence interval under the null hypothesis thatthe time series is short-memory. When Vn is outside theinterval [0.809, 1.862], we can reject the null hypothesisof short range dependence with 95% confidence.

Recently Taqqu and coworkers [17] have shown thatLo’s rescaled R/S test is too severe. In fact they showednumerically that even for a synthetic long-memory timeseries with a moderate value of the Hurst exponent (likeH = 0.6) the Lo test cannot reject the null hypothesisof short range dependence. For our results here we arelucky that we are able to pass the rescaled R/S test forlong-memory, but the stringency of this test should beborne in mind in evaluating our results.

C. Methods of measuring the Hurst exponent

The determination of the Hurst exponent of a long-memory process is not an easy task, especially when onecannot make any parametric assumptions about the in-vestigated time series. Several heuristic methods havebeen introduced to estimate the Hurst exponent. Re-cently some authors suggested the use of a “portfolio” ofestimators instead of relying on a single estimator whichcould be biased by the property of the time series un-der investigation [18]. In this paper we will discuss fourwidespread Hurst exponent estimators which we describebelow. These methods are the periodogram method, theR/S method, Detrended Fluctuation Analysis and the fitof the autocorrelation function. We find that the firstthree methods give reasonable agreement both in realand in surrogate time series. The fourth method appearsto be more noisy and less reliable.

To use the periodogram method, one first calculatesthe periodogram I(f), which is an estimate of the spec-tral density.

I(f) =1

2πn

n∑

j=1

Xj eijf∣

2(7)

where, as before, n is the size of the sample Xj . Thena regression of the logarithm of the periodogram againstthe logarithm of f for small values of f gives a coefficient,which is an estimate of 1−2H (see Eq. 2). We make ourregression on the lowest 10% of the data [18].

The second method is the R/S method [15, 16]. Adescription of the method, which is strongly based onR/S statistics, can be found in [1]. In summary, we dividea time series of length n in K blocks of size n/K. Thenfor each lag k we compute the classical R/S statistics(i.e. Eq.(3) with σx instead of σn(q) in the denominator),starting at points ki = in/K+1, i = 1, 2, .. such that ki+k ≤ n. When k < n/K, one obtains K different valuesof the R/S statistics. We chose logarithmically spacedvalues of k and we plot the value of the R/S statisticsversus k in double logarithmic scale. The parameter His obtained by fitting a power-law relationship.

Page 6: The Long Memory of the Efficient Market · The long memory of the efficient market Fabrizio Lillo1,2 and J. Doyne Farmer1 1Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501

5

The third method is the Detrended Fluctuation Analy-sis [19]. The time series is first integrated. The integratedtime series is divided into boxes of equal length m. Ineach box, a least squares line is fit to the data (repre-senting the trend in that box). The y coordinate of thestraight line segments is denoted by ym(k). Next, we de-trend the integrated time series, y(k), by subtracting thelocal trend, ym(k), in each box. The root-mean-squarefluctuation of this integrated and detrended time seriesis calculated by

F (m) =

1

n

n∑

k=1

[y(k) − ym(k)]2 (8)

This computation is repeated over all time scales (boxsizes) to characterize the relationship between F (m) andthe box size m. Typically F (m) will increase with boxsize m. The Hurst exponent is obtained by fitting F (m)with a relation F (m) ∝ mH . The proposers of thismethod claim that it is able to remove local trends dueto bias in the enhanced occurrence of a class of events[19].

A fourth method is to simply compute the autocorre-lation function and measure α = 2−2H by regressing theautocorrelation function with a power law. This method,however, suffers from the problem that the sample er-rors in adjacent autocorrelation coefficients are stronglycorrelated, and so this method is less accurate than theother two methods discussed above. Thus, we only usethis method as an indication. Based on tests on real andsurrogate data, we find that the first three methods allgive very similar results; we use either the R/S method orthe periodogram method when we want to get accuratevalues of the exponent α.

IV. DEMONSTRATION OF LONG-MEMORY

FOR ORDER SIGNS

A. A quick look at the autocorrelation function

We consider the symbolic time series obtained in trans-action time by replacing buy orders with +1 and sellorders with −1 irrespective of the volume (number ofshares) in the order. This can be done for market orders,limit orders, or cancellations. As we will see, all of theseseries show very similar behavior. We reduce these seriesto ±1 rather than analyzing the signed series of ordersizes ωt in order to avoid problems created by the largefluctuations in order size; analysis of the signed series oforder sizes produces results that do not converge verywell.

Figure 1 shows the sample autocorrelation functions ofthe order sign time series for Vodafone in the period 1999-2002 in double logarithmic scale. Vodafone was chosento illustrate the results in this paper because it is one ofthe most capitalized and most heavily traded stocks in

FIG. 1: Autocorrelation function of sequences of order signsfor Vodafone in the period 1999-2002 in double logarithmicscale for (a) market orders, (b) limit orders and (c) cancella-tions. The lag is measured in terms of the number of eventsof each type, e.g., number of market orders, number of limitorders, etc. In each case the autocorrelation function remainspositive over periods much longer than the average numberof events in a day.

Page 7: The Long Memory of the Efficient Market · The long memory of the efficient market Fabrizio Lillo1,2 and J. Doyne Farmer1 1Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501

6

the LSE during this period; however, we see very similarresults for all the other stocks in our dataset. The auto-correlation function for market orders, limit orders andcancellations decays roughly linearly over more than 4decades, although with some break in the slope for limitorders and cancellations. This suggests that a power-lawrelation ρ(k) ∼ k−α is a reasonable asymptotic approx-imation for the empirical autocorrelation function. Ofcourse, for larger lags there are fewer independent inter-vals, and the statistical fluctuations are much larger.

Estimating α from the sample autocorrelation usingordinary least squares fit gives α = 0.39 for market or-ders. For limit orders there appears to be a break in theslope, with an exponent roughly 0.4 for lags less thanroughly 500 and 0.6 for larger lags. There is a similarbreak in the slope for cancellations, with a slope roughly0.4 for less than 50 lags and 0.7 for larger lags. As alreadymentioned, the sample autocorrelation is a poor methodfor estimating α, and should only be considered an in-dication; later on we will use more reliable estimators.But the fact that α is much smaller than 1 in every casesuggests that these might be a long-memory processes[1]. The memory is quite persistent, as is evident fromthe fact that the sample autocorrelations remain positiveover a very long span of time. The average daily numberof market orders for Vodafone in the investigated periodis approximately 1, 300, whereas the slow decay of theautocorrelation function in Fig. 1 is seen for lags as largeas 10, 000. This indicates that the long-memory propertyof the market order placement is not just an intra-dayphenomenon, but rather spans multiple days, persistingon a timescale of more than a week. Similar statementsare true for limit orders and cancellations. See also Sec-tion IVE, where we analyze this phenomenon in real timerather than event time.

B. Statistical evidence for long-memory

In order to test the presence of long-memory propertiesin the time series of market order signs both longitudi-nally (i.e. analyzing a stock for different time periods)and cross-sectionally (i.e. analyzing different stocks) weproceed as follows. We consider the set of 20 highly capi-talized stocks described in Section 2 for the 4 year period1999-2002. Since the number of orders is different fordifferent stocks in different calendar years, we divide thedata for each year and for each stock into subsets in sucha way that each set contains roughly a fixed number oforders4. To each set we apply the Lo test based on mod-

4 The number of market orders ranges from 26, 438 for WPP in1999 to 415, 392 for VOD in 2002, the number of limit ordersranges from 51, 798 in 1999 for BLT to 2, 552, 410 for SHEL in2002, and the number of cancellations ranges from 29, 395 forBLT in 1999 to 2, 259, 526 for SHEL in 2002. We thus dividedthe market orders into 324 subsets ranging in size from 25, 000

FIG. 2: Histogram of the statistics of the modified R/S statis-tics Qn for subsets of the market order sign time series. Theoriginal set of 20 stocks traded in the LSE in the period 1999-2002 is divided into 324 disjoint subsets as described in thetext. The black region is the 95% confidence interval regionof the null hypothesis of short-memory. For 289 (89%) ofthem we can reject the null hypothesis of long-memory withat least 95% confidence. Similar analyses for limit orders andcancellations give even stronger results.

ified R/S statistics, obtaining a value for the statisticsQn. Since our time series consists of +1 and −1 we donot have problems with the existence of moments. Fig-ure 2 shows the histogram of the 324 values of Qn forthe subsets of market orders. For 289 (89.2%) subsets wecan reject the null hypothesis of short-memory processeswith 95% confidence. Repeating this test for limit ordersand cancellations gives even stronger results: For limitorders, based on 468 subsets the short-memory hypoth-esis is rejected at the 95% level in 97% of the cases, andfor cancellations using 558 subsets it is rejected in 96%of the cases. We can therefore conclude that these ordersign time series are almost certainly long-memory pro-cesses. This result is even stronger when one considersthe severity of this test, as pointed out in [17].

C. Estimating the Hurst exponents

Now that we have established that these are long-memory processes we determine the Hurst exponent Hto see if there is consistency in the exponent in differ-ent years and for different stocks. Recall that for along-memory process the Hurst exponent is related to

to 49, 999; we divided limit orders into 468 subsets ranging insize from 50, 000 to 99, 999, and we divided cancellations into558 subsets in size ranging from 29, 000 to 57, 999.

Page 8: The Long Memory of the Efficient Market · The long memory of the efficient market Fabrizio Lillo1,2 and J. Doyne Farmer1 1Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501

7

the exponent α of the autocorrelation function throughα = 2 − 2H .

The first estimator we used for the determination ofthe Hurst exponent is least squares fitting of the peri-odogram. The mean estimated value of the Hurst ex-ponent is H = 0.695 ± 0.039 for market orders, H =0.716± 0.054 for limit orders, and H = 0.768± 0.059 forcancellations, where the error is the standard deviation.The histograms of the exponents obtained in this wayare shown in Fig. 3. We see that in every case the Hurstexponent is roughly peaked around the value H = 0.7which corresponds to α = 0.6.

Following the suggestion of [18] we also estimate theHurst exponent for market orders through the classicalR/S method [1]. In this case the mean Hurst exponentis 0.696 ± 0.032, which is consistent with the value ob-tained with the periodogram method. Figure 3(a) givesa comparison of the results of the two methods. In theinset we plot the Hurst exponent obtained from the peri-odogram against the Hurst exponent obtained from theR/S method, showing that the results are quite corre-lated on a case-by-case basis, with no discernable bias.

D. Idiosyncratic variation of the Hurst exponents

The previous results bring up the interesting questionof whether there are real variations in the Hurst expo-nents, or whether they have a universal value, and thevariations that we see are just sample fluctuations. Tocompare the longtitudinal and cross-sectional variationswe perform a classical ANOVA test. We assume thatfor each stock i the value of the Hurst exponent in dif-ferent time periods is normally distributed with meanmi and standard deviation σ. We test the null hypoth-esis that all the mi are equal. We indicate with Hij

the estimated Hurst exponent of stock i in sub-periodj. There are r = 20 stocks, each of them with a vari-able number ni of sub-periods. The total number ofsubsets is n =

i ni. As usual the sum of squaresof deviations of Hij can be decomposed in the sum ofsquared deviations within groups (i.e. stocks) (n−r)s2

2 =∑r

i=1

∑ni

j=1(Hij−Hi)2 and the sum of squared deviations

between groups (r − 1)s21 =

∑ri=1(Hi − H)2, where H is

the sample mean for the entire sample and Hi is the sam-ple mean for stock i. Under the above null hypothesis,the sum of squared deviations within groups has a χ2

distribution with n − r degrees of freedom. Likewise thesum of squared deviations between groups has a χ2 dis-tribution with r − 1 degrees of freedom. Therefore thelogarithm of the ratio between s1 and s2 has Fisher’s Z-distribution with (r − 1, n − r) degrees of freedom. Forall the three types of orders we reject the null hypothesiswith 99% confidence. Moreover in all three cases s1 > s2,showing that the cross sectional variation of the Hurst ex-ponent is significantly larger that the longitudinal varia-tion, which suggests that the variations in the exponentsbetween stocks are statistically significant. Nonetheless,

FIG. 3: Histogram of Hurst exponents for (a) market or-ders, (b) limit orders, and (c) cancellations, for subsets of thedata as described in the text. In (a) we show histograms forboth the periodogram (continuous line) and the R/S method(dashed line), while (b) and (c) show the periodogram methodonly. The inset of (a) plots the results from the two methodsagainst each other (periodogram on the x-axis and R/S onthe y axis).

Page 9: The Long Memory of the Efficient Market · The long memory of the efficient market Fabrizio Lillo1,2 and J. Doyne Farmer1 1Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501

8

FIG. 4: Autocorrelation function of the quantity (nb(t) −ns(t))/(nb(t)+ns(t)), where nb(t) and ns(t) indicate the num-ber of buy and sell market orders, respectively, in a time in-terval of length T = 5 minutes starting at time t. A lag uniton the x axis corresponds to 5 minutes. There are strongpositive autocorrelations for periods of at least 5000 minutes,or roughly 10 days, and no indication that there is anythingspecial about the daily boundary.

it is interesting that these variations are relatively small.

E. Order flow in real time

We have shown that the sequence of order signs is along-memory process in event time. In this section webriefly consider the correlation properties in real time.This is complicated by the fact that trading is not homo-geneous. There are both strong intra-day periodicities,e.g. volume tends to increase near the open and the close,and also strong temporal autocorrelations in number oftrades. Thus the number of trades in any given time in-terval T can vary dramatically depending on the interval.

To understand the long-memory of orders in real time,we are seeking a quantity that gives information aboutimbalances in the signs of orders, but which is indepen-dent of the number of orders placed in a given time in-terval. We use two methods, which give similar results.Let nb(t) and ns(t) indicate the number of buy and sellmarket orders, respectively, in a time interval of lengthT starting at time t. The first method follows a majorityrule, which assigns the value +1 if nb(t) > ns(t) and thevalue −1 if nb(t) < ns(t). When nb(t) = ns(t) or thereare no market orders in the interval we assign the value0. The main defect of this method is that it does notdistinguish intervals with small or large imbalance of onetype of orders. The second method is to use a continuousvariable defined as (nb(t) − ns(t))/(nb(t) + ns(t)) whennb(t) + ns(t) 6= 0 and zero elsewhere. This is boundedbetween −1 and 1. In Figure 4 we show the autocorre-

lation function of v(t) for a time interval T = 5 minutesfor Vodafone. We note that a power law decay of the au-tocorrelation function fits the empirical data quite well,with an exponent α = 0.3, which is close to the corre-sponding value in event time.

This study makes it quite clear that the long-memoryproperties of order signs persist across trading days.There are 102 intervals of length 5 minutes in a trad-ing day, which means that the last lag in Figure 4 cor-responds to approximately 10 trading days. Moreoverthe autocorrelation does not show any significant peakor break in slope near lag = 102, indicating that thelong-memory properties of the market persist more orless unchanged across daily boundaries.

F. Autocorrelation of transaction volume

In this section we show that the volume of the trans-actions is a long memory process in event time; later onin Section VB we will argue that this is connected tothe long-memory properties of order signs via market ef-ficiency. The long-memory properties of aggregated vol-ume have been known for a long time [8, 29]. We usemodified R/S statistics in order to test the null hypoth-esis that the transaction volume is a short memory pro-cess in event time. The value of a stock changes in timebecause of the change in price. Therefore one could ex-pect that the number of traded shares is non-stationarydue to the non stationarity of the price. For this reasonwe decide to investigate the value of the transaction, de-fined as the product of the number of traded shares andthe transaction price. The value is invariant under stocksplits. In Figure 5 we show the autocorrelation functionof the volume of Vodafone measured in terms of valuein the period 1999-2002. The inset shows a histogramof the transaction volume, which is well fit by a Gammadistribution. Once we adjust for scale, this seems to beroughly the same for all the stocks in the sample (see also[30]). Moreover this result is in contrast with what hasbeen observed for the NYSE [29] (see Section IV G).

We applied the modified R/S test to the 324 subsetsused to test the long-memory properties of market ordersize. One of the conditions for the applicability of themodified R/S test is that the unconditional kurtosis ofthe time series is finite; from the inset of Figure 5 it seemsthat the volume distribution does not have a power-lawtail, so the modified R/S test is applicable. Howeverthere could be biases in the test due to large fluctuationsin volume. In 303 of the 324 subsets, or 94% of thetime, we reject the null hypothesis of short-memory forthe transaction volume with 95% confidence. The Hurstexponent estimated with the periodogram method variesacross subsets and the mean value is H = 0.732± 0.075,within sampling error of the exponent found for ordersigns.

Page 10: The Long Memory of the Efficient Market · The long memory of the efficient market Fabrizio Lillo1,2 and J. Doyne Farmer1 1Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501

9

FIG. 5: Autocorrelation function of the sequence of trans-action volume for Vodafone in the period 1999-2002. Thetransaction volume is defined as the product of the numberof shares times the price. The lag is measured in terms of thenumber of transactions. The inset shows the unconditionalhistogram of transaction volume in double logarithmic scale.

G. NYSE

We performed a similar analysis on a set of stockstraded at the New York Stock Exchange. The propertiesof the unconditional volume distribution for individualtransactions and the corresponding correlation propertieswere reported by Gopikrishnan et al. [29]. Their conclu-sion, based on the Trade and Quote database (TAQ),is that the distribution of volume of individual transac-tions is in the Levy regime, asymptotically decaying asP (V ) ∼ V −ζ , where ζ = 1.53 ± 0.07. Moreover theyfound that the volume is a short-memory process. Wequalitatively confirm these findings. These properties forNYSE stocks contrast strongly with those of LSE stocks.In fact, for the LSE we find the opposite set of properties:the unconditional distribution does not appear to have apower law tail for large volumes5, and the volume is along-memory process. These differences could be due toany of several factors, for example, the different tradingmechanisms in the two stock exchanges, or to the factthat we do not include upstairs trading volume in ouranalysis of the LSE, where as it is included (and can’t beseparated) for the NYSE.

We have analyzed the correlation properties of thetransaction sign time series in the NYSE. Unfortunatelythe TAQ database only contains quote changes and trans-actions, and does not give any direct information aboutorders. We cope with this using the Lee and Ready al-

5 Interestingly, the distribution of volumes for LSE stocks doeshave a power law tail at zero.

gorithm [32] to infer the sign of each transaction (whichcorresponds to the sign of the order initiating the trans-action). However, this procedure is far from perfect: ForNYSE data the algorithm is able to classify only about85% of the trades. This creates problems because wedo not know the sign of the remaining orders, which weclassify randomly. This has some problems. First, vi-sual inspection and correlation analysis show that theunclassified trades tend to be strongly clustered in eventtime. Moreover we strongly suspect that most of thetrades within a given cluster have the same sign. Last,the TAQ database contains the transactions as reportedby the specialist and not the individual market orders.The specialist could report a single transaction for sev-eral market orders of the same sign. These three factssubstantially alter the correlation properties of the mar-ket order sign time series. Nonetheless, by random sub-stitution of the sign of the unclassified trades, it is stillquite clear that for NYSE stocks the market order sign isa long-memory process with exponents similar to thoseobserved in the LSE.

V. MARKET INEFFICIENCY?

At first sight the long-memory property of the marketorder sign time series is puzzling when considered fromthe perspective of market efficiency. Long-memory im-plies strong predictability using a simple linear model.When this is combined with the fact that orders haveprice impact, it naively suggests that price changesshould follow a long-memory process as well. That is, buymarket orders tend to drive the price up, and sell marketorders tend to drive it down. Thus, all other things beingequal, a run of buy orders should imply future upwardprice movement, and a run of sell orders should implyfuture downward price movement. The predictability ofsigns is sufficiently strong that one would expect thatprofits could be made by taking advantage of it.

There are many ways to define market efficiency, andwe should be clear how we are using it. Here we meanspecifically linear efficiency, which we take to be pat-terns that are easily detectable by a linear time seriesalgorithm. This allows for the possibility that there maybe other more complicated nonlinear patterns, and as-sumes a trivial reference equilibrium of an IID randomprocess. This is a strong efficiency in the sense of Fama[31], in that the information set is sequence of recent buyor sell order signs, which for the LSE is publicly availableduring this period in real time. While one might quib-ble about time lags in availablity, since the long-memorypersists for days, this is clearly not an issue.

In this section we explore the consequences of long-memory in order signs, and show that the impact of thisfor predictability for prices is offset by other factors. Inparticular, the relative size of buy and sell market ordersand the relative size of the best quotes at the best bidand ask move in a way that is anti-correlated with the

Page 11: The Long Memory of the Efficient Market · The long memory of the efficient market Fabrizio Lillo1,2 and J. Doyne Farmer1 1Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501

10

long memory of order signs, and compensates to makethe market at least approximately efficient. While ordersigns, market order volume, and volume at the best pricesare all long-memory processes, prices changes are not.

A. Inefficiency of prices in absence of liquidity

fluctuations

In this subsection we show that if liquidity were fixed,the long-memory in the signs of orders would drive astrong inefficiency in prices. In the next subsection weshow how market efficiency implies that liquidity mustvary in order to cancel out the long-memory of ordersigns, which implies that liquidity is also a long-memoryprocess.

We first construct a series of surrogate prices assumingthat liquidity depends on volume but is otherwise fixed.The relation between the volume of a market order andthe consequent price shift is described by the price im-

pact function (also called the market impact function).Recent studies of the impact of a single transaction [20–23] have shown that the average market impact is a con-cave function of order or transaction volume, matchingother studies based on time-aggregated volume [24–26].It appears that the average impact varies across marketsand stocks. For example, for a set of 1000 stocks tradedat NYSE (which works with a specialist) the impact isroughly

E(r|V ) =sign(V )|V |β

λ(9)

where r is the logarithmic price return, V is the volumeof a transaction, λ is a liquidity parameter, and E(.) in-dicates the expected value. The exponent β depends onV and it is approximately 0.5 for small volumes and 0.2for large volumes [22]. The liquidity parameter λ variesfor each stock, and in general may also vary in time.Bouchaud and Potters [23] analyzed a much smaller set ofstocks traded at the Paris Bourse and NASDAQ and sug-gested a logarithmic price impact function. For the LSE,Figure 6 shows the price impact of buy market ordersfor 5 highly capitalized stocks, i.e. AZN, DGE, LLOY,SHEL, and VOD. The price impact is well fit by the re-lation E(r|V ) ∝ V β , where V should now be interpretedas the market order size V = |ω| and β ' 0.3.

If one assumes that the price impact is a determinis-tic function of order size, since market order placementconstitutes a long-memory process, the generated pricereturn time series must be long-memory too. We testthis conclusion by constructing a synthetic price time se-ries using real market order flow with a deterministicimpact function of the form of Equation (9), but withV now representing market order size. We use β = 0.3as measured for Vodafone in Figure 6, and arbitrarilyset λ = 1. For each real market order of volume Vi

and sign εi = ±1 we construct the surrogate price shift

FIG. 6: Market impact function of buy market orders for a setof 5 highly capitalized stocks traded in the LSE, specificallyAZN (filled squares), DGE (empty squares), LLOY (trian-gles), SHEL (filled circles), and VOD (empty circles). Tradesof different sizes are binned together, and the average sizeof the logarithmic price change for each bin is shown on thevertical axis. The dashed line is the best fit of the marketimpact of VOD with a functional form described in Eq.(9).The value of the fitted exponent for VOD is β = 0.3.

∆pi = εiV0.3i , and construct a surrogate price series itera-

tively using pi+1 = pi+∆pi. In order to qualitatively testthe correlation properties and the linear efficiency of thistime series we use a diffusion plot. We plot the quantityE((pt+τ − pt)

2) as a function of τ . Providing the serieshas a finite second moment, as this series does, there arethree possible cases, depending on the correlation proper-ties of the price shift6: (1) When the price shift is uncor-related E((pt+τ − pt)

2) depends linearly on τ . (2) Whenthe price shift is a short-memory process (for example theautocorrelation decays exponentially) E((pt+τ − pt)

2) isnonlinear for τ smaller than the characteristic time scaleand becomes linear for larger lags. (3) When the priceshift is a long-memory process E((pt+τ − pt)

2) ∼ τ2H ,where H is the Hurst exponent. The price time series islinearly efficient when E((pt+τ − pt)

2) depends linearlyon τ , i.e. case (1). Figure 7 shows E((pt+τ − pt)

2) as afunction of τ , where pt is the synthetic price generatedas described above and t and τ describe time as mea-sured by the number of events. There is clearly upwardcurvature, with a slope greater than one, implying thatsynthetic price returns are described by a long-memoryprocess, in contradiction with the assumption of linearefficiency.

6 A random process whose second moment does not exist can haveanomalous diffusion with H 6= 0.5 even though the autocorrela-tion function does not exist.

Page 12: The Long Memory of the Efficient Market · The long memory of the efficient market Fabrizio Lillo1,2 and J. Doyne Farmer1 1Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501

11

FIG. 7: Diffusion plot of two surrogate time series of prices.The dashed line is the diffusion plot of the surrogate timeseries obtained by using the real order flow of market orderof Vodafone (volume and sign) and by using a deterministicprice impact of Eq.(9) with β = 0.3. The continuous line isthe diffusion plot of the surrogate time series obtained by thereal price shift due only to market orders for Vodafone. Thedotted line is a linear function to be used as a guide for theeye.

There are two possible explanations for how the realprice series can be efficient when the surrogate price se-ries defined above are inefficient. We have only used mar-ket orders above, so the first possible reason is that theprice shift generated by limit orders and cancellationsact to make the market efficient. The second possibilityis that the assumption of a deterministic price impactis wrong and efficiency comes about due to fluctuationsin the impact (and therefore in the liquidity). The firstreason has been recently suggested in Ref. [12]. Theirargument is that the price shift due to a market orderis anticorrelated with the price shift generated by limitorders and cancellations placed between market orders.We verified empirically that such an anticorrelation doesexist. However, as we will show below, the market is ef-ficient even when we include only price shifts driven bymarket orders. Instead, we show that efficiency is due tofluctuations in liquidity.

To show that efficiency does not depend on limit ordersand cancellations, we take the real price shift ∆pi dueto each market order i and we iteratively construct asurrogate time series pi+1 = pi + ∆pi starting with anarbitrary value p0. Figure 7 shows the diffusion plot forthis surrogate price time series. It is very close to linear,indicating the price series is linearly efficient. This showsthat market order fluctuations are efficient even whenconsidered by themselves, so the explanation of efficiencymust lie elsewhere, as discussed in the next subsection.

B. The key role of liquidity fluctuations in market

efficiency

Because liquidity varies strongly in time, for under-standing market efficiency it is not a good approximationto describe it by a constant average value, as was donein Equation 9. In fact, studies of real data make it quiteclear that the fluctuations in liquidity are large in com-parison to the volume dependence of E[λi|V ], and thatin Equation 9 one should regard λ as a random variablewhose fluctuations are at least as large in relative termsas those of ∆p [27]. In this section we show that fluctu-ations in liquidity are critical for market efficiency, andthat in fact fluctuations in liquidity are anti-correlatedwith fluctuations in order signs in just such a way as tokeep the market approximately linearly efficient.

We first show that uncorrelated fluctuations in liquid-ity are not sufficient to ensure linear efficiency. ConsiderEquation 9 and let us assume that the inverse of the liq-uidity `i ≡ 1/λi is a random variable uncorrelated withmarket order sign and size. In the previous section we

have seen that ai ≡ εiVβi is a long-memory random pro-

cess. Therefore if E(ai) = 0 then E(∆pi) = 0, and theauto-covariance of price return is

γ∆p(τ) = E(∆pi+τ∆pi) =

E(ai+τ ai`i+τ `i) = E(ai+τai)E(`i+τ `i) =

γa(τ)(γ`(τ) + E(`)2) (10)

Now ` is by definition a positive quantity and E(`) > 0.Therefore the term in brackets in the last line of Eq. (10)cannot be zero and γ∆p(τ) 6= 0, i.e. when the liquidity isuncorrelated with the order flow, the market cannot beefficient.

Now we show that fluctuations in liquidity are not ran-dom, but rather are synchronized with the long-memoryof order signs in order to make the market roughly lin-early efficient. To do this we need a measure of liquid-ity. We take as our proxy the volume at the best price:for a buy market order we take the volume at the bestask, and for a sell market order we take the volume atthe best bid. This can be justified on the grounds thatonly very few orders penetrate more than one occupiedprice level [27]. We now repeat the previous experiment,constructing surrogate price series using the real orderflow. We know from the previous analysis that the mar-ket order sign is a long-memory process. In panel (a) ofFigure 8 we plot the autocorrelation of the sign for Voda-fone (top curve). We have also seen that the correlationbetween sign and volume decreases the autocorrelationof the synthetic price shift ∆p ∝ εV β , but it is unableto remove the long-memory properties (see the diffusionplot in Figure 7). The bottom curve of Figure 8 (a) isthe autocorrelation function of this synthetic time series.

When we introduce fluctuations in liquidity throughthe proxy of volume at the best price, the price timeseries becomes linearly efficient. To show this we com-pare the autocorrelations of three surrogate time series,

Page 13: The Long Memory of the Efficient Market · The long memory of the efficient market Fabrizio Lillo1,2 and J. Doyne Farmer1 1Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501

12

FIG. 8: The autocorrelation function of surrogate price shiftseries ∆pi with and without varying liquidity, based on datafrom Vodafone. (a) shows two autocorrelation functions ondouble logarithmic scale, one for ∆pi = εi (top curve) and onefor ∆pi = εiV

0.3

i (bottom curve). εi is the sign of the ith mar-ket order and Vi is its size. In (b) the autocorrelation is plot-ted on linear scale, and the increments are ∆pi = εiV

0.3

i /λi,where λi is the volume at the best price, i.e. the best askwhen i is a buy order and the best bid when it is a sell order.The series in (a) gives a series with long-memory, whereasin (b) the autocorrelation function becomes negative withinten lags or so. This demonstrates how liquidity, volume, andorder signs interact to maintain linear market efficiency.

shown in Figure 8. The first has increments ∆pi = εi,and clearly has long-memory. The second has incre-ments ∆pi = εiV

0.3i , and still has long-memory, though

less so than before, indicating that fluctuations in mar-ket order volume are somewhat anti-correlated with thelong-memory in signs. The third has increments ∆pi =εiV

0.3i /λi, where λi is the volume at the best price. The

inclusion of the time varying liquidity term removes thelong-memory, making the autocorrelation decrease suffi-ciently fast that it begins to have negative values in lessthan ten lags.

In more explicit terms, what does this imply about themarket? For example, consider a period in which therehas been a run of buy market orders. This implies thatthe next order is more likely to be a buy order. This iscompensated for by two effects: First, the expected vol-ume of the next buy market order is smaller than thatof the next sell order. Second, it implies that the volumeat the best ask is likely to be lower than the volume atthe best bid, so that a buy market order of a given sizewill tend to generate a smaller price response than a sellmarket order. These two effects work together to nearlycancel the long-memory of the sign process to keep themarket linearly efficient, with price increments that donot have long-memory. In describing things this way,we do not mean to necessarily suggest that the long-

FIG. 9: The autocorrelation of the volume at the best prices,shown in double logarithmic scale, as a function of time mea-sured in terms of the number of market orders. The threecurves shown, from top-to-bottom, are the volume at the bestask, best bid, and best price (i.e. the best ask when the orderis a buy order and the best bid when the order is a buy order).All three are long-memory processes.

memory of order signs is primary, and that of volumeand liquidity are consequences, but just that they areintimately related: One could equally well say that thelong-memory of order signs adjusts in order to offset thatof volume and liquidity. The key point is that in orderto ensure linear efficiency the memory of these three pro-cesses are intimately linked. Thus, since order signs havelong-memory, it is no surprise that volume and liquidityalso have long-memory, as seen in Figure 9

VI. INDIVIDUAL INSTITUTIONS

In this section we consider the behavior of individualinstitutions in order to gain some understanding of whatdrives the long-memory processes described above. TheLSE database allows us to track the actions of individualinstitutions through a numerical code which identifies theinstitution. For privacy reasons the code is different fordifferent stocks and it is reshuffled each calendar month.Therefore our analysis will be limited to a single tradingmonth.

We consider as a case study the market order place-ment of Vodafone in July 2002. Our choice is motivatedby the fact that Vodafone is one of the most heavilytraded stocks. In this month there were 45, 774 marketorders distributed across 155 trading institutions. Wehave found that the 12 most active institutions are re-sponsible for more than 70% of market orders. Thus,the participation in trading is extremely inhomogeneousamong the institutions [28], with few institutions placingmany orders and many institutions placing only a few

Page 14: The Long Memory of the Efficient Market · The long memory of the efficient market Fabrizio Lillo1,2 and J. Doyne Farmer1 1Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501

13

TABLE II: Summary statistics of the 12 most active institu-tions trading Vodafone in July 2002.

code number of market orders fraction of buy Qn

3589 6065 0.24 2.27

2146 3929 0.53 1.331666 3606 0.54 1.99

3664 3532 0.44 2.85

1556 3291 0.50 1.88

3007 3132 0.49 1.061886 2154 0.51 2.35

1994 1530 0.32 2.66

2196 1512 0.51 1.502681 1420 0.52 1.99

2742 1247 0.49 1.66823 1200 0.41 1.59

orders. Table II shows the identification code, the num-ber of market orders, and the fraction of market ordersthat are buy orders for each of the twelve largest institu-tions. In Figure 10 we show the autocorrelation functionof the time series of market order signs for four activeinstitutions. In panel (a) we show two institutions whosemarket order flow is a long-memory process. One of thetwo institutions (code 3589) is the most active institu-tion, which placed buy market orders 24% of the time,and the other one (code 1886) placed buy market order51% of the time. We see that in both cases the autocor-relation function is well-described by a power law withan exponent α ' 0.5, which corresponds to H = 0.75.Panel (b) shows two active institutions (code 3007 andcode 823) whose market order sign time series is a short-memory process. To test the hypothesis that the indi-vidual market order placement is a long-memory processmore rigorously, we apply the modified R/S test to thetime series of the market order signs of the twelve mostactive institutions. Table II reports the value of Qn. Aboldface font indicates the cases when Qn is outside the95% confidence interval of the null hypothesis of short-memory. We see that for 7 of the 12 active institutionswe reject the null hypothesis of short-memory process.

This result shows that even at the institution level theplacement of orders has long-memory properties. This isnot true for all institutions, but rather there is an hetero-geneity in their behavior. A correlated sign in the orderplacement could be an indication of splitting a large orderin smaller size orders in order to maximize profit with-out paying too much in terms of price impact. On theother hand an uncorrelated (or at least short range corre-lated) sign in the order placement could indicate differentstrategies such as, for example, market making. In sec-tion VIII A we suggest and discuss possible causes of thelong-memory of order flow.

VII. IMPLICATIONS FOR MARKET IMPACT

In this section we discuss a practical consequence oflong-memory of order flow. We have seen in Section V.A

FIG. 10: Autocorrelation function of the market order signtime series for four institutions trading Vodafone in July 2002.In (a) we show two institutions displaying long-range memoryin market order placement. The top curve refers to institution3589 and the bottom one to institution 1886. The dashed linesare the best fit of the empirical curves with a power law. Theexponents are α = 0.47 and α = 0.51, respectively. In (b) weshow two institutions displaying short range order placement,institution 3007 (continuous line) and institution 823 (dashedline). In both panels the lag is measured as the number ofmarket orders.

that the market impact is a concave function of volume.We may therefore ask about the price shift in the future(in transaction time) given that an order of a given vol-ume and sign in the present. To be more specific, letus consider a buy market order of volume V1 occurringnow. The generated price shift ∆p1 is the difference be-tween the midprice just after the order and just beforethe order. Between this market order and the next mar-ket order the midprice can change because of new limitorders and cancellations, generating a price change ∆pi.When the next market order arrives a new midprice shift∆p2 occurs. The total price shift between the instant

Page 15: The Long Memory of the Efficient Market · The long memory of the efficient market Fabrizio Lillo1,2 and J. Doyne Farmer1 1Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501

14

FIG. 11: (a) A decomposition of the average delayed marketimpact of two successive market orders, as defined by Equa-tion (11), for Vodafone. All four elements are conditioned onthe size V1 of the first market order. The immediate impactof the first market order, E(∆p1|V1) is the continuous line;the impact of any intervening limit orders or cancellations,E(∆pi|V1), has triangles;, The immediate impact due to thesecond market order, E(∆p2|V1), is shown with squares, andthe total market impact E(∆p1−2|V1) is shown with circles(b) The average market impact for a series of market orders,conditioned on the volume V1 of the first order, E(∆p1−m|V1).m = 1 (black), m = 2 (red), m = 3 (green), m = 4 (blue),m = 5 (orange) m = 6 (cyan) and m = 10 (magenta). Theaverage market impact builds steadily with each order; this iscaused by the long-memory of the order sign and order size.both panels show the results for buy market orders

just before the first order is placed and the instant justafter the second market order is placed is therefore

∆p1−2 = ∆p1 + ∆pi + ∆p2 (11)

If the order flow were random we would expect thatE(∆p1−2|V1) = E(∆p1|V1) since the volume and the signof the next orders is uncorrelated with the correspond-ing quantities of the first order. In Figure 11 we present

a decomposition of the impact of two successive mar-ket orders in the terms described above. In panel (a) ofFigure 11 we show the four quantities E(∆p1|V1) (thesame quantity shown in Fig. 6), E(∆pi|V1), E(∆p2|V1)and E(∆p1−2|V1). We see that E(∆pi|V1) is almost zero,meaning that the price shift due to limit orders and can-cellations after a market order is relatively unimportant.This result suggests that the role of price reversion due tolimit orders and cancellations between two market ordersis marginal in making the market efficient. On the otherhand E(∆p2|V1) is clearly positive and increasing withV1. This is due to the strong temporal correlation in mar-ket order sign and size. In fact if the first market orderis a buy market order it is probable that the next marketorder is also a buy and the volume of the second marketorder is correlated with the first one. Therefore it is moreprobable that the price will move up due to the arrival ofthe second order. Figure 11 shows E(∆p1−2|V1), which issimply the sum of the three terms, as shown in Eq. (11).The distance between E(∆p1−2|V1) and E(∆p1|V1) is ameasure of the effect of the correlation of order sign andsize in the delayed price impact.

To extend this analysis to more orders, we study thedelay market impact E(∆p1−m|V1) where m is the num-ber of future market orders. Panel (b) of Figure 11 showsthis quantity as a function of V1 for m = 2, 3, 4, 5, 6 andm = 10. We see that for a fixed value of V1, E(∆p1−m|V1)is an increasing function of m. This is clearly due to thelong correlation of market order sign and size. Eventu-ally, for large values of m, E(∆p1−m|V1) becomes inde-pendent of m.

Our results lead to conclusions that are substantiallydifferent from those of Bouchaud et al.[12]. They havesuggested that placement of limit orders by market mark-ers may offset the impact of market orders. We suggestinstead that this is due to fluctuations in liquidity, anddemonstrate that limit order placement plays a fairly mi-nor role in delayed impact, at least over a short timerange. We also show that the fluctuations in liquidityproxy eliminate the long memory in prices, and hence inimpact, over a fairly short time range.

VIII. CONCLUSIONS

A. Possible causes of long-memory order flow

We have shown that the sign of order flow is a long-memory process. What might cause this? In this sectionwe make a few speculations about the possible origin oflong-memory.

One possible explanation for long-memory in orderflow is that it simply reflects news arrival. Good (or bad)news may be clustered in time, driving the sign of orderflow. Such news could either be external to the mar-ket, or it could be generated by factors internal to themarket. If external it could be a property of the naturalworld, a reflection of the environment that humans nec-

Page 16: The Long Memory of the Efficient Market · The long memory of the efficient market Fabrizio Lillo1,2 and J. Doyne Farmer1 1Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501

15

essarily interact with. We know that floods, hurricanes,earthquakes, and natural disasters have a power law dis-tribution, and perhaps these are just symptoms of a ubiq-uitous property of the natural world that is reflected inwhat we consider “news”. Alternatively, this could be aninternal property, due to human social dynamics. Such“news” might be internally generated, e.g. due to herd-ing behavior [33], or it might be caused by inattention:Time lags in the response of investors to news arrival cancause autocorrelations in order flow. However, it is notclear why this should have a power law distribution.

A different explanation is in terms of the execution oflarge orders, which leads to order splitting. It is well-known that institutions with large orders frequently splitthem into small pieces [34, 35], spreading out the execu-tion of the orders over periods that can be many monthslong. If such orders have a power law distribution, andthe time needed to fully execute an order is proportionalto the size of the order, then this might give rise to powerlaw autocorrelations in time. The idea that order sizewould have a power law distribution is not implausiblegiven that many related quantities, such as firm size,wealth and mutual fund size, have power law distribu-tions [36–38].

We have tended to discuss the autocorrelation of ordersigns as though this were a primary property, and the be-heavior of volume and liquidity are consequences, whichmust exist in order to maintain market efficiency. An al-ternative is that this reasoning is reversed, and that theautocorrelation of volume and liquidity are primary prop-erties, and that of order signs is a consequence. However,it still remains to be determined why these should havesuch strong temporal autocorrelations. At this stage wesimply don’t know what causes this phenomenon, but itis clearly a remarkable aspect of human economic activitythat deserves more attention.

B. Summary

In this paper we have conclusively demonstrated thatthe signs of orders of 20 large British stocks are long-memory processes. This is separately true for market or-ders, limit orders, and cancellations. This long-memorybehavior is surprising because it indicates that the signs

of future orders are highly predictable. This predictabil-ity does not apply to price movements, however, becauseliquidity and market order size compensate in an anti-correlated manner. As a result, both volume of ordersand volume at the best prices are also long-memory pro-cesses. Thus, despite the striking predictability of everyfeatures of the market except prices, the market nonethe-less appears to remain roughly linearly efficient.

Our analysis argues that the consequences of long-memory in order signs is somewhat different from thathypothesized by Bouchaud et al. [12]. The key differ-ence concerns the way in which liquidity is treated. Theyassume a deterministic propagator for market impact,which amounts to assuming that liquidity is fixed. Incontrast, we demonstrate that liquidity is time-varying ina manner that is anti-correlated with order signs. This,together with time variations in market order volume,are crucial in preventing delayed market impact (e.g. asshown in Figure 11) from being infinitely persistent.

We have not provided an explanation of why theselong-memory properties exist. However, we do performa breakdown of the order flow for the most active insti-tutions, and show that some of them display the long-memory property quite clearly, while others do not showit at all. This shows that the cause is due to the behaviorof certain institutions, and not others. It might, for ex-ample, be caused by strategic order splitting, to minimizetransaction costs. However, this is just speculation, andin any case does not make it clear why this would causepower law autocorrelations, rather than exponential cor-relations with a long time constant. We intend to makea more thorough study of the institutional properties inthe future [28].

Acknowledgments

We would like to thank the James S. McDonnell Foun-dation for their Studying Complex Systems ResearchAward, Credit Suisse First Boston, McKinsey Corpora-tion, Bob Maxfield, and Bill Miller for supporting thisresearch. We would also like to thank Dick Foster forvaluable conversations and Marcus Daniels for technicalsupport

[1] Beran, J. (1994). Statistics for Long-Memory Processes.Chapman & Hall.

[2] Mandelbrot, B.B. (1971). When Can Price Be ArbitragedEfficiently? A Limit to the Validity of the Random Walkand Martingale Models. Review of Economics and Statis-

tics 53, 225-236.[3] Lo, A.W. (1991). Long-term memory in stock market

prices. Econometrica 59, 1279-1313.[4] Willinger, W., Taqqu, M.S., and Teverovsky, V. (1999).

Stock market prices and long-range dependence. Finance

and Stochastics 3, 1-13.[5] Campbell, Y.J., Lo, A.W., and Mackinlay, A.C. (1997)

The Econometrics of Financial Markets, (Princeton Uni-versity Press, Princeton).

[6] Ding, Z., Granger, C.W.J., and Engle, R. (1993). A longmemory property of stock market returns and a newmodel. Journal of Empirical Finance 1, 83-106.

[7] Baillie, R.T., Bollerslev, T., and Mikkelsen, H.O. (1996).Fractionally integrated generalized autoregressive condi-tional heteroskedasticity. Journal of Econometrics 74, 3-

Page 17: The Long Memory of the Efficient Market · The long memory of the efficient market Fabrizio Lillo1,2 and J. Doyne Farmer1 1Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501

16

30.[8] Lobato, I.N. and Velasco, C. (2000). Long-Memory in

Stock-Market Trading Volume. Journal of Business &

Economic Statistics 18, 410-427.[9] Mandelbrot, B.B. and van Ness, J.W. (1968). Fractional

Brownian motions, fractional noises and applications.SIAM Review 10, No.4, 422-437.

[10] Granger, C.W.J. and Joyeux, R. (1980). An introductionto long-range time series models and fractional differenc-ing. Journal of Time Series Analysis 1, 15-30.

[11] Hosking, J.R.M. (1981). Fractional differencing.Biometrika 68, 165-176.

[12] Bouchaud, J.-P., Gefen, Y., Potters, M., and Wyart,M. (2003). Fluctuations and response in financial mar-kets: the subtle nature of ’random’ price changes,xxx.lanl.gov/cond-mat/0307332.

[13] Embrechts, P., Kluppelberg, C., and Mikosch, T. (1997).Modelling Extremal Events for Insurance and Finance.Springer-Verlag, Berlin Heidelberg.

[14] Hurst, H. (1951). Long Term Storage Capacity of Reser-voirs. Transactions of the American Society of Civil En-

gineers 116, 770-799.[15] Mandelbrot, B.B. (1972). Statistical Methodology for

Non-Periodic Cycles: From the Covariance to R/S Anal-ysis. Annals of Economic and Social Measurement 1, 259-290.

[16] Mandelbrot, B.B. (1975). Limit Theorems on the Self-Normalized Range for Weakly and Strongly DependentProcesses. Z. Wahrscheinlichkeitstheorie verw. gebiete

31, 271-285.[17] Teverovsky, V., Taqqu, M.S., and Willinger, W. (1999)

A critical look to Lo’s modified R/S statistics. Journal of

Statistical Planning and Inference 80, 211-227.[18] Taqqu, M.S., Teverovsky, V., and Willinger, W. (1995).

Estimators for long-range dependence: an empiricalstudy. Fractals 3 785-788.

[19] Peng, C.-K., Buldyrev, S.V., Havlin, S., Simons, M., andStanley, H.E. (1994). Goldberger AL., Mosaic organiza-tion of DNA nucleotides Physical Review E49,1685-1689.

[20] Hausman, J.A. and Lo, A.W. (1992). An ordered probit-analysis of transaction stock prices. Journal of Financial

Economics 31 319-379.[21] Farmer, J.D. (1996) “Slippage 1996”, Pre-

diction Company internal technical report,http://www.predict.com/jdf/slippage.pdf

[22] Lillo, F., Farmer, J.D., and Mantegna, R.N. (2003). Mas-ter curve for price impact function. Nature 421, 129-130.

[23] Potters, M. and Bouchaud, J.-P. (2002). More statisticalproperties of order books and price impact. Physica A

324, 133-140.[24] Torre, N. (1997) BARRA Market Impact Model Hand-

book, BARRA Inc, Berkeley CA, www.barra.com.[25] Kempf, A. and Korn, O. (1998) “Market depth and order

size”, University of Mannheim technical report .[26] Plerou, V., Gopikrishnan, P., Gabaix, X., and Stanley,

H.E. (2002) Quantifying stock price response to demandfluctuations. Physical Review E 66 027104.

[27] Farmer, J.D., Gillemot, L., Lillo, F., Mike, S., and Sen,A., in preparation.

[28] Lillo, F. et al. Agents in the market (in preparation).[29] Gopikrishnan, P., Plerou, V., Gabaix, X., and Stanley,

H.E. (2000). Statistical properties of share volume tradedin financial markets. Physical Review E 62, R4493-R4496.

[30] Farmer, J.D. and Lillo, F. (2003) On the origin ofpower-law tails in price fluctuations. xxx.lanl.gov/cond-mat/0309416,

[31] Fama, E., (1970). Efficient capital markets: A review oftheory and empirical work. Journal of Finance 25, 383 -417.

[32] Lee, C. and Ready, M. (1991). Inferring Trade Directionfrom Intraday Data. Journal of Finance 46, 733-746.

[33] Cont, R. and Bouchaud, J.-P. (2000). Herd behavior andaggregate fluctuations in financial markets. Macroeco-

nomic dynamics. 4, 170-196.[34] Chan, L.K.C. and Lakonishok, J. (1993). Institutional

trades and intraday stock price behavior. Journal of Fi-

nancial Economics 33, 173-199.[35] Chan, L.K.C. and Lakonishok, J. (1995). The behavior

of stock prices around institutional trades. Journal of Fi-

nance 50, 1147-1174.[36] Axtell, R. (2001). Zipf distribution of U.S. firm size. Sci-

ence 293, 1818-1820.[37] Pareto, V. (1896). Cours d’economie politique. Reprinted

as a volume of Oeuvres Completes (Droz, Geneva, 1896-1965).

[38] Gabaix, X., Gopikrishnan, P. Plerou, V., and Stanley,H.E. (2003). A theory for power-law distributions in fi-nancial market fluctuations. Nature 423, 267-270.