41
Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei February 22, 2017 Abstract How much of the structure of a Limit Order Book (LOB) by only observing the bid/ask price dynamics? In this paper we provide a model which, surprisingly, allows us to recover with reasonable empirical accuracy the general profile of the LOB in the setting of some small-tick stocks. Our approach exploits the empirically observed multi-scale dynamics of the LOB and we also apply such multi-scale analysis to obtain a jump diffusion model which connects the distribution of jumps with the distribution of orders inside the LOB. 1 Introduction The Limit Order Book (LOB) is used in most of today’s financial markets to match buyers and sellers. In this light, LOB plays a substantial role in financial trading activities. The survey paper [10] provides a comprehensive review of the LOB, including its basic mechanisms, empirical features observed in different markets and LOB models developed in the literature. As a trading venue, the LOB contains information about the potential sellers and buyers on the market, reflecting their perspectives on the value of the underlying security. Therefore, LOB information is important for the design of trading strategies (see, for example, [21]). However, the information of the whole LOB is not always available to the traders. In some cases, such as the equity market, the LOB data is not free; while in other cases, it is not open to the public at all. For example, in some electronic brokerage systems of the foreign exchange market, participants can only see the best price levels (the bid and ask prices) with associate order volume (see [20]). In contrast, trade and quote (TAQ) information, including the bid/ask price, sizes and prices of the trades, is much easier to access in financial markets. The main contribution of this paper is to demonstrate that a substantial portion of the LOB persists at sufficiently large time scales. As a consequence of this persistence, and due to the high frequency of the trades that occur during such time scales, it is possible to recover a substantial amount of information of the LOB only from TAQ data. This is far from trivial, because the LOB contains much more information than the TAQ data. In particular, the LOB includes all available price levels for both the buy and sell sides, and the order volumes at each price level. To perform this recovery, we start with a queueing model for the LOB dynamics. With the introduction of multiple time scales in the model, we are able to build a direct and analytic connec- tion between the characteristics of the whole LOB and the bid/ask price process, observed at the time of trade. By such a connection, we are able to recover the characteristics of the LOB from the TAQ data which contains the bid/ask price information we need. Our approach is mostly useful for the so-called small-tick stocks: stocks which tend to have a persistent relatively large spread. As examples, we study 3 stocks, Amazon.com (ticker: AMZN), Goldman Sachs (ticker: GS) and Baidu, Inc (ticker: BIDU), to test our recovery method. In our numerical experiments, we show 1

Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

Unraveling Limit Order Books Using Just Bid/Ask Prices

Jose Blanchet, Xinyun Chen and Yanan Pei

February 22, 2017

Abstract

How much of the structure of a Limit Order Book (LOB) by only observing the bid/askprice dynamics? In this paper we provide a model which, surprisingly, allows us to recover withreasonable empirical accuracy the general profile of the LOB in the setting of some small-tickstocks. Our approach exploits the empirically observed multi-scale dynamics of the LOB andwe also apply such multi-scale analysis to obtain a jump diffusion model which connects thedistribution of jumps with the distribution of orders inside the LOB.

1 Introduction

The Limit Order Book (LOB) is used in most of today’s financial markets to match buyers andsellers. In this light, LOB plays a substantial role in financial trading activities. The survey paper[10] provides a comprehensive review of the LOB, including its basic mechanisms, empirical featuresobserved in different markets and LOB models developed in the literature.

As a trading venue, the LOB contains information about the potential sellers and buyers onthe market, reflecting their perspectives on the value of the underlying security. Therefore, LOBinformation is important for the design of trading strategies (see, for example, [21]). However, theinformation of the whole LOB is not always available to the traders. In some cases, such as theequity market, the LOB data is not free; while in other cases, it is not open to the public at all.For example, in some electronic brokerage systems of the foreign exchange market, participants canonly see the best price levels (the bid and ask prices) with associate order volume (see [20]). Incontrast, trade and quote (TAQ) information, including the bid/ask price, sizes and prices of thetrades, is much easier to access in financial markets.

The main contribution of this paper is to demonstrate that a substantial portion of the LOBpersists at sufficiently large time scales. As a consequence of this persistence, and due to the highfrequency of the trades that occur during such time scales, it is possible to recover a substantialamount of information of the LOB only from TAQ data. This is far from trivial, because the LOBcontains much more information than the TAQ data. In particular, the LOB includes all availableprice levels for both the buy and sell sides, and the order volumes at each price level.

To perform this recovery, we start with a queueing model for the LOB dynamics. With theintroduction of multiple time scales in the model, we are able to build a direct and analytic connec-tion between the characteristics of the whole LOB and the bid/ask price process, observed at thetime of trade. By such a connection, we are able to recover the characteristics of the LOB from theTAQ data which contains the bid/ask price information we need. Our approach is mostly usefulfor the so-called small-tick stocks: stocks which tend to have a persistent relatively large spread.As examples, we study 3 stocks, Amazon.com (ticker: AMZN), Goldman Sachs (ticker: GS) andBaidu, Inc (ticker: BIDU), to test our recovery method. In our numerical experiments, we show

1

Page 2: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

that the average number of orders at each price level in the order book can be reconstructed withsignificant accuracy by observing basically the bid/ask price dynamics.

This insight already provides a substantial value to traders who do not have access to the entireLOB information. We believe that our ideas can be used in trading systems in which order booksare maintained but only TAQ information is publicly available.

Moreover, we demonstrate in Section 6 how in long time scales (several minutes) the distributionof the LOB could manifest itself in the appearance of jumps in the limiting continuous-time priceprocess. We derive a bivariate jump diffusion model for the joint evolution of the bid/ask prices, asobserved at the times of trades in our LOB model. The volatility, jump distribution and intensityof the limiting process are linked to the structure of the LOB. To our best knowledge, this is thefirst paper that studies the connection between the price jumps and the order flow dynamics froma micro-structure / high-frequency perspective.

While our results are novel, the starting point of our approach, in particular a queueing modelof the LOB dynamics, is closely related to various works in high frequency trading literature. Forexample, Markovian queue models are studied in [23], [6], [1], [5], [19], [18], [3] and [24]. Paper [2]generalizes the Markovian queues and uses Hawkes process to capture the autocorrelation in orderflows as observed in real markets. Contributions [16] and [15] develop a class of state-dependentmodels to capture traders’ responses to the market depth. Given the complicated interplay betweenthe order flow and price to change on the LOB, it is usually difficult to obtain an analyticalexpression of the price process statistics from the LOB queueing models. Hence, more restrictionor model simplification are imposed to obtain analytic results on the price process. In [5], theauthors consider only the bid and ask price levels of the LOB, and provide an expression of theprice volatility in terms of the arrival and cancellation rates of orders. In [3] and [19], the authorsconsider only one side of the LOB to derive analytic characteristics of the price process. To recoverthe LOB structure from the TAQ information, however, we must connect the bid/ask price processwith the whole LOB dynamics. To this end, we introduce three different time scales in our queueingmodel for the LOB:

(a) limit order event times (limit order arrivals and cancellations, fractions of a second),(b) trade times (market order arrivals, usually around 10 seconds), and(c) after many trades (a few minutes).Our model is postulated at time scale (a), then we derive a simplified model at time scale (b),

and finally we derive continuous-time model which is suitable for time scale (c). We emphasizethat our time scalings are all motivated by empirical findings, as discussed in Section 2.2.

Under time scale (a), we model the dynamics of the LOB as a Markovian multi-class queueingsystem. The customers are the limit orders and they are waiting to be traded. This queueing systemhas two servers, corresponding to the bid and ask sides of the LOB. On each side, the service isprovided by the market orders, as they cause transactions against limit orders immediately upontheir arrivals, and the classes of the limit orders are determined by their prices. The highest servicepriority is given to the buy (sell) limit orders at the highest (lowest) price, and orders at the sameprice are served FIFO. While waiting, the limit orders can abandon the system at certain ratesdepending on their prices. These are the main ingredients of the queueing model. Of course,there are other aspects that are relevant and that we shall discuss in the body of the paper. Forexample, about the Markovian assumption (which implies Poisson arrivals), as we shall explainlater, our analysis will allow us to ultimately accommodate self-exciting arrival processes and otherextensions. At this moment, we only wish to convey the main elements to conceptually describehow our insights are obtained.

In the context of time scale (b) (the model at trade times), we observe empirically that thearrival rates of limit orders (i.e. orders inside the book) occur at a much faster rate than market

2

Page 3: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

orders (i.e. trade transactions). We introduce a scaling that allows us to prove an averagingprinciple for the Markovian multi-class queueing system at the times of trades (i.e. at the arrivaltimes of market orders). Such averaging principle allows us to compute (in the asymptotic limit)analytically the distribution of the bid/ask price changes between two consequent trades, as afunction of the model parameters (such as the limit orders’ arrival rates and cancellation rates overthe whole LOB). By these analytical results, we are able to provide a one-to-one correspondencebetween the bid/ask price changes, and the structure of the LOB.

Under time scale (c), we derive a jump-diffusion limit of the bid/ask price process in our LOBmodel. Interesting technical tools are needed to characterize the limiting process. For example, wehave to introduce a local-time like “pushing” process in the description of the joint dynamics ofthe bid/ask price process to make sure that the spread is kept non-negative.

The paper is organized as follows. In Section 2, we discuss the queueing model of the LOBunder time scale (a). In Section 3, we do an asymptotic analysis of the queueing model under timescale (b), and obtain the analytic results connecting the price change distribution and the LOBstructure. In Section 4, we use the analytic results to study the daily trading data of AMZN, GSand BIDU. In Section 5, we discuss various extensions which can be easily accommodated withinour multiscale analysis framework. Finally, in Section 6, we derive the macroscopic price dynamicson the LOB characterized by a pair of jump-diffusion processes under time scale (c).

2 The High Frequency Model

2.1 Model and Assumptions

In our queueing model, the LOB at time t is represented by vector q (t) such that the i-th coordinatefor i > 0, qi(t), is the number of sell limit orders that are waiting in the LOB at time t at price iδ.The number of buy limit orders at iδ are represented with a negative sign qi (t). We shall imposeassumptions to make sure that there we do not have a situation in which qj (t) < 0 and qi (t) > 0for i < j because this will mean that there are standing orders to be sold at a price, iδ, which islower than the price jδ at which |qj (t)| buy orders are standing in the book.

The parameter δ is the so-called tick size of the LOB. For most U.S. listed stocks, δ equals to1 cent. Figure ?? illustrates the representation of q (t).

For each limit order (ask or bid, respectively) we define its relative price as the differencebetween its own price and the corresponding benchmark (ask or bid) prices, as we shall specifywhen we describe the arrivals and cancellations of limit orders.

The benchmark price of all sell limit orders at time t are computed as follows. First, define

a (t) := a (q (t)) := min{iδ > 0 : qi (t) > 0},

if qi (t) ≤ 0 for all i > 0, then a (q (t)) =∞. Next define

b (t) := b (q (t)) := max{iδ > 0 : qi (t) < 0},if qi (t) ≥ 0 for all i > 0 then set b (q (t)) = −∞.In our model, the dynamics of the limit orders will depend on their relative prices, instead

of their actual prices. As a limit order’s relative price changes with the benchmark prices, thisdependence reflects traders’ response towards the change of market condition.

We shall assume that all types of orders are of the same size, or equivalently, of one unit ofshares. The dynamics of a LOB is driven by the following three types of events:

3

Page 4: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

i�

q(t)

5.21 5.22 5.23 5.24 5.25

5.195.185.175.165.15 · · ·· · ·

sell side

buy side

Figure 1: Illustration of the Queueing Model for LOB at time t

1. Limit order arrivals: a new limit order is inserted to the LOB.

2. Limit order cancellations: an existing limit order is removed from the LOB.

3. Market order arrivals: a market order gets immediately executed with an opposite limit ordersitting at the best price level upon its arrival.

Now we shall describe the stochastic processes used in our model corresponding to the threetypes of events.

Arrivals of Market Orders:

In order to capture the autocorrelation observed in the time series of trading volumes on bothsides, we shall model the arrivals of market orders by a 2-dimensional Hawkes processes. In partic-ular, let µa(t) (µb(t)) be the arrival rate of sell (buy) market orders and Na(t) (N b(t)) be the totalnumber of sell (buy) market orders that have arrived by time t. The arrival rates at t ≥ 0 satisfy

µa(t) = e−κt(µa(0) + µ

∫ t

0eκss+

∫ t

0δ1e

κsdNa(s) +

∫ t

0δ2e

κsdN b(s)

), (1)

µb(t) = e−κt(µb(0) + µ

∫ t

0eκss+

∫ t

0δ2e

κsdNa(s) +

∫ t

0δ1e

κsdN b(s)

),

4

Page 5: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

with κ > 0, µ > 0, δ1, δ2 > 0. To avoid explosive behavior in the market order arrival rates, weassume δ1 + δ2 < κ, see [13] for stability properties of (1).

Let {tk} be the sequence of market order arrival times. The dynamics of the limit orders isdescribed as follows:

Arrivals and Cancellations of Limit Orders:

L1. Arrivals of limit sell and buy orders are modelled as two independent Poisson processes withrates λa and λb, respectively.

L2. During the time interval [tk, tk+1), the benchmark prices for sell and buy limit orders area(tk) and b(tk) respectively. Note that a sell limit order placed at relative price of iδ at timet means that its price equals to a(tk) + iδ, and a buy limit order placed at relative price of iδat time t means that its price equals to b(tk)− iδ.

L3. A limit sell (buy) order, arriving during the time interval [tk, tk+1), will choose its relativeprice equal to iδ with probability pa(iδ; a(tk), b(tk)) (pb(iδ; a(tk), b(tk))) upon its arrival.

L4. We assume that pa(iδ; a, b) = pb(iδ; a, b) = 0 for all iδ ≤ −(a− b)/2, so that the actual priceof an arriving limit sell (buy) order, namely a + iδ (b − iδ) is greater (less) than (a + b)/2.Consequently, we have that∑

iδ>−(a−b)/2

pa(iδ; a, b) = 1 =∑

iδ>−(a−b)/2

pb(iδ; a, b).

L5. During the time interval [tk, tk+1), sell (buy) limit orders waiting at relative price of iδ arecancelled at rate αa(iδ; a(tk), b(tk)) > 0 (αb(iδ; a(tk), b(tk)) > 0).

L6. The arrivals of limit orders and the price selection mechanism of the limit orders are mutuallyindependent. Limit order arrivals are also independent of market order arrivals.

Remark: Our treatment allows to weaken the assumption that all market orders are of size 1and allow market orders which arrive in batches, as long as the size of the incoming market orderbatch is less than the volume of standing limit orders at the best price level. Additional extensions,which accommodate more complex dependence between the arrivals of market and limit orders isstudied in Section 5.

Assumption L2 and L4 play important roles in the model analysis. Under Assumption L4, theask and bid sides of the LOB can be treated as two separate systems. Under Assumption L2,between the arrivals of two consequent market orders, the benchmark prices remain unchanged.As a result, each side of the LOB evolves just as a set of independent infinite-server queues. Eachinfinite-server queue corresponds to a price level, and the queue consists of the limit orders at thatprice level. The arrival rate of the infinite-server queue is just the arrival rates of limit ordersat the corresponding price level, and the “service rate” is equal to the cancellation rate of thecorresponding class.

Before we move on into the model analysis part, we shall briefly discuss the empirical supportfor us to impose these assumptions.

5

Page 6: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

2.2 Discussion of model assumptions

Assumption L2 states that the arrival and cancellation rates of limit orders depend on the bid/askprices at the time of the previous trade, instead of depending on the real-time bid/ask price, as inthe models studied in [6], [16] and [2]. We impose this assumption in order to model a scenarioin which market participants retain a rich amount of market information (i.e. the actual pricesduring transactions) while trying to reduce variability on the amount of information used to postand cancel orders. The real-time bid/ask price of limit orders can change without trade and tendto be noisy. Indeed, due to the growing popularity of algorithmic trading, limit orders are put andcancelled without being executed at high frequency, especially for those inside the spread ([12]).Therefore, the continuous observation of the bid/ask prices may result in a process with too muchmicrostructure “noise” due to variability caused by cancellations of such fleeting orders. In the restof the paper, we shall call the bid/ask price process observed at the times of trades the bid/askprice per trade.

Assumption L4 states that the arriving limit orders never cross the mid-price at the time ofthe previous trade. The assumption largely simplifies our model analysis. We believe that thisassumption is reasonable for the so-called small-tick stocks we shall study. This is because, whenthe spread is persistently large, it is expensive to put limit orders crossing the mid-price. In ourdata, we find that the proportion of limit orders which cross the mid-price is small for the threestocks we study. Table 1 gives the average percentage of cross mid-price limit orders for AMZN,BIDU and GS among the 20 trading days in May of 2011. GS has a tighter spread and hence itscross-mid-price percentage is slightly larger.

Table 1: Descriptive Statistics

Statistics (Average of May 2011) AMZN BIDU GS

Number of Limit Order Submission 311143.10 299579.57 217414.62

Number of Trades 22386.71 23661.71 13319.29

Percentage of Hidden Trades 20.53% 22.08% 16.62%

Tick Size (cents) 1 1 1

Average Spread (cents) 7.41 6.11 3.83

Daily Cross Mid-Price Limit Order Percentage 4.93% 5.31% 7.69%

Cancel Without Execution LO Percentage 95.50% 94.48% 95.39%

3 Averaging Principle: Connecting Order Flows with Price Re-turns

In this section, we derive an analytic characterization of the bid/ask price per trade processesin terms of the model parameters. The result is obtained using the stochastic-average principleunder an asymptotic regime as suggested by empirical data. We shall first report the empiricalobservations that motivate our asymptotic analysis.

3.1 Empirical observations

Empirical Observation 1: Multi-Scale Evolution of Limit Order Flows and the Occur-rence of Trades.

6

Page 7: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

The contrast between the daily number of limit order events (including the submission andcancellation of limit orders) and the daily number of trades has been documented in LOB literature(see, for instance, [7]). We also summarize our empirical observation on the AMZN, BIDU and GSdata in Table 1 and provide basic descriptive statistics which support the fact that the evolutionof limit orders is much more frequent than the arrivals of market orders in the LOB.

Empirical Observation 2: Fleeting Orders and High Cancellation Rate.Due to the prevalence of algorithmic trading in these days, the cancellation rate of limit orders

has increased dramatically over recent years. For example, [11] reports the cancellation rate hasincreased dramatically, while [12] finds that in the year of 2010, about 95% of the limit orderswere canceled without execution in the NASDAQ market. Table 1 reports the monthly averagesof percentages of limit orders that are cancelled without (partial) execution in AMZN, BIDU andGS during May of 2011, and it also illustrates the high cancellation rate phenomenon in the mostrecent high frequency trading environment.

The high cancellation rate can be attributed to the large proportion of “fleeting” limit orders,which are usually put inside the spread and then canceled immediately if not being executed rightaway. Hautsch and Huang [12] reports that in 2010 NASDAQ data, the mean inter-arrival time ofmarket orders is about 7 seconds while the mean cancellation time of limit orders inside the spreadis less than 0.2 second.

3.2 Price change distribution

Before stating the formal results on the price change distribution, we briefly discuss the intuitionbehind the results and the proofs. Under Assumption L4, we can analyze the bid and ask sides ofthe LOB separately. So we shall focus on the ask side in the discussion, since the bid side followsthe same argument.

Following the empirical observations, we will assume that the arrival rates λa and cancellationrates αa(·) are substantially higher than the arrival rates of market orders µa(·) in our model. Wenow want to study the distribution of the ask price changes per trade, i.e. a(tk+1) − a(tk), where{tk} are the time of trades. Under our assumptions, between the arrivals of two consecutive marketorders, the dynamic of the ask side of the LOB is equivalent to a set of independent infinite-serverqueues. With λa >> µa(·) and αa(·) >> µa(·) as suggested by the Empirical Observations 1 and2, heuristically, we can approximate the distribution of the queues in the LOB at time tk+1 (giventhe state of the system at time tk) by the associated steady-state distribution.

Note that between time tk and tk+1, the infinite-server queue of limit orders at relative priceiδ has arrival rate λapa(iδ; a(tk), b(tk)) and cancellation rate αa(iδ; a(tk), b(tk)). As a result, thesteady-state distribution of the number of standing limit orders at relative price iδ is Poisson,with the mean equal to the arrival rate λapa(iδ; a(tk), b(tk)) divided by the cancellation rateαa(iδ; a(tk), b(tk)). Using this heuristic approximation, we have that

P (a(tk+1)− a(tk) ≥ iδ) = P (The queues at relative prices lower than iδ are all empty at time tk+1)

≈ exp

−i−1∑

jδ≥−(a(tk)−b(tk))/2

λapa(jδ; a(tk), b(tk))

αa(jδ; a(tk), b(tk))

, θa(iδ; a(tk), b(tk)).

(2)

7

Page 8: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

Following the same argument, we can also approximate

P (b(tk+1)− b(tk) ≤ −iδ) ≈ exp

−i−1∑

jδ≥−(a(tk)−b(tk))/2

λbpb(jδ; a(tk), b(tk))

αb(jδ; a(tk), b(tk))

, θb(iδ; a(tk), b(tk)).

(3)

Since we are mainly interested in the price changes at the time of trades, we use a(t) (b(t))to denote the ask (bid) price per-trade, i.e., a(t) (b(t)) equals to the ask (bid) price at the timeof the last trade by time t. The approximations (2) and (3) are equivalent to saying that theprocesses {a(t) : t ≥ 0} and {b(t) : t ≥ 0} jump every time when a trade occurs and the jumpsizes almost follow the distributions given by θa and θb. This heuristic can be validated by thestochastic-averaging principle ([17]) and the result is summarized in Theorem 1. We use D[0,∞)to denote the space of right continuous with left limits functions from [0,∞) to R endowed withthe Skorokhod J1 topology (see [4] for reference).

Theorem 1. Consider a sequence of LOB systems indexed by n. In the n-th system, the totalnumber of orders in the order book is given by qn (0) = q (0) < ∞ at time 0, and we let an (0) =a (0) = a and bn(0) = b (0) = b. We assume that the market orders arrive according to the Hawkesprocess (1) and the distribution of incoming limit orders is pan(·) = pa(·) and pbn(·) = pb(·) (i.e. thedistribution remains constant along the sequence of systems).Suppose there exists a sequence of positive numbers {ξn : n ≥ 1} such that λan = ξnλ

a, λbn = ξnλb,

αan(·) = ξnαa(·), αbn(·) = ξnα

b(·) and ξn →∞ as n→∞.We also assume the regularity condition that for any (a, b) ∈ Z2

limi→∞

θa(iδ; a, b) = 0, and limi→∞

θb(iδ; a, b) = 0. (4)

Then, the corresponding price per-trade process (an(·), bn(·)) converges weakly in D([0,∞),Z2) to apure jump process (a(·), b(·)). The process (a(·), b(·)) jumps at times corresponding to the arrivalsin the Hawkes process (1). Moreover, if t is an arrival time in the Hawkes process, then

P(a(t)− a(t−) = iδ, b(t)− b(t−) = −jδ |a(t−), b(t−)

)= [θa(iδ; a(t−), b(t−))− θa((i+ 1)δ; a(t−), b(t−))]

× [θb(jδ; a(t−), b(t−))− θb((j + 1)δ; a(t−), b(t−))].

Remark: The regularity condition (4) on θa (iδ; a, b) not only is quite natural, but it also canbe easily verified in terms of pa (iδ; a, b) and αa (iδ; a, b) by the explicit formula for θa (iδ; a, b) givenin equation (2). The regularity condition on θb (iδ; a, b) holds following (3).

It is important to note that in the previous result we do not scale the arrival processes of marketorders, so this result simply describes the price processes at time scales corresponding to the inter-arrival times of market orders (i.e. in the order of a few seconds according to the representativedate discussed earlier). In Section 6 we shall introduce a scaling that will allow us to consider theprocess in longer time scales (several minutes or longer) by increasing the arrival rate of marketorders.

8

Page 9: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

4 Connecting Order Flows with Price Changes: An EmpiricalStudy

In this section, we use our model and Theorem 1 to study the limit order book data of AMZN,BIDU and GS. AMZN and BIDU are listed on NASDAQ, and GS is listed on NYSE. They areall small-tick stocks with daily average spreads being 7.41 cents, 6.11 cents and 3.83 cents in May2011, respectively (as reported in Table 1).

4.1 The LOB data

We use AMZN, BIDU and GS limit order book data up to level 50 for 20 consecutive tradingdays during May 2011 from NASDAQ’s Historical TotalView-ITCH data. For each stock, the datacontains information of all LOB events, including the submissions, cancellations and executions ofthe limit orders at the first 50 price levels during the specified time period. All the LOB eventsare time stamped from 9:30 am to 16:00 pm in EST, with decimal precision of nanoseconds (10−9

second).For each trading day we remove the data in the first half hour after market opens (9:30 am -

10:00 am in EST) and the last half hour before market closes (15:30 pm - 16:00 pm in EST), asthe LOB dynamics is more volatile in these two period than in the rest of the trading day. We alsocompress consecutive trades that have the same order ID, same price, at the same direction (bid orask) and with time difference less than 1 millisecond as a single trade, in order to reduce the noisein model calibration.

4.2 Model validation: empirical test of Theorem 1

In this section we use the LOB data to test Equations (2) and (3) derived from our model. Wefirst observe the empirical tail probability of the price change per-trade from the data. Then, wecalculate the model-implied tail probability of price change per-trade by plugging in Equations (2)and (3) the calibrated arrival and cancellation rates of limit orders at all relative price levels. Thedetailed calibration steps are given in Appendix 8.3.

Figure 2 compares the daily empirical and model-implied tail probabilities of the price changeper-trade on the bid and the ask sides, respectively, during 20 trading days in May of 2011. Figure3 reports the same results for BIDU, and Figure 4 for GS, in the same time window.

We can see that the empirical and model predicted tail probabilities agree quite well at positiverelative price levels. However for relative price level iδ ≤ 0, our model tends to give lower estimatescompare to the empirical results, which might be due to the lack of hidden limit order informationin the data base. A detailed discussion on our treatment of hidden orders is given in AppendixSection 8.3.

4.3 Model application: recover LOB from the price change distribution

We now apply our model to estimate the expected volumes of visible limit orders at each relativeprice level, upon the arrival of an market order, using only the bid/ask price per trade data.

According to Theorem 1, at each price level iδ, either on the bid or ask side, the number of limitorders can be approximated by the steady-state distribution of the infinite server queue, which is aPoisson random variable with mean ρa/b (iδ; a, b) := λa/bpa/b (iδ; a, b) /αa/b (iδ; a, b). On the otherhand, by Equations (2) and (3), we have

ρa/b (iδ; a, b) = log(θa/b(iδ; a, b)

)− log

(θa/b((i+ 1)δ; a, b)

). (5)

9

Page 10: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

Figure 2: AMZN tail probabilities in May 2011

Relative Price Level i

Tai

l Pro

babi

lity,

i.e.

P(P

er-T

rade

Pric

e C

hang

e >

= i /

)

Ask Side

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-27

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-26

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-25

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-24

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-23

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-20

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-19

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-18

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-17

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-16

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-13

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-12

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-11

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-10

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-09

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-06

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-05

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-04

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-03

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-02

EmpiricalModel

Relative Price Level i

Tai

l Pro

babi

lity,

i.e.

P(P

er-T

rade

Pric

e C

hang

e >

= i /

)

Bid Side

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-27

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-26

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-25

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-24

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-23

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-20

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-19

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-18

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-17

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-16

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-13

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-12

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-11

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-10

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-09

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-06

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-05

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-04

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-03

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-02

EmpiricalModel

10

Page 11: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

Figure 3: BIDU tail probabilities in May 2011

Relative Price Level i

Tai

l Pro

babi

lity,

i.e.

P(P

er-T

rade

Pric

e C

hang

e >

= i /

)

Ask Side

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-27

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-26

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-25

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-24

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-23

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-20

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-19

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-18

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-17

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-16

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-13

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-12

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-11

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-10

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-09

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-06

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-05

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-04

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-03

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-02

EmpiricalModel

Relative Price Level i

Tai

l Pro

babi

lity,

i.e.

P(P

er-T

rade

Pric

e C

hang

e >

= i /

)

Bid Side

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-27

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-26

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-25

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-24

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-23

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-20

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-19

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-18

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-17

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-16

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-13

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-12

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-11

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-10

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-09

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-06

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-05

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-04

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-03

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-02

EmpiricalModel

11

Page 12: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

Figure 4: GS tail probabilities in May 2011

Relative Price Level i

Tai

l Pro

babi

lity,

i.e.

P(P

er-T

rade

Pric

e C

hang

e >

= i /

)

Ask Side

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-27

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-26

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-25

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-24

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-23

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-20

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-19

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-18

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-17

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-16

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-13

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-12

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-11

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-10

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-09

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-06

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-05

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-04

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-03

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-02

EmpiricalModel

Relative Price Level i

Tai

l Pro

babi

lity,

i.e.

P(P

er-T

rade

Pric

e C

hang

e >

= i /

)

Bid Side

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-27

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-26

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-25

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-24

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-23

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-20

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-19

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-18

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-17

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-16

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-13

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-12

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-11

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-10

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-09

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-06

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-05

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-04

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-03

EmpiricalModel

-5 0 5 10 150

0.2

0.4

0.6

0.8

12011-05-02

EmpiricalModel

12

Page 13: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

Figure 5: AMZN LOB volume estimation results for both sides in May 2011

Relative Price Level

Ave

rage

Que

ue S

ize

Ask Side

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-27

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-26

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-25

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 40

50

100

150

2002011-05-24

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 40

50

100

150

2002011-05-23

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-20

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-19

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-18

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-17

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-16

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-13

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 40

50

100

150

2002011-05-12

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-11

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-10

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 40

50

100

150

2002011-05-09

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 40

50

100

150

2002011-05-06

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 40

50

100

150

2002011-05-05

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-04

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-03

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-02

ModelEmpirical

Relative Price Level

Ave

rage

Que

ue S

ize

Bid Side

-5 -4 -3 -2 -1 0 1 20

50

100

150

2002011-05-27

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-26

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-25

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 40

50

100

150

2002011-05-24

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 40

50

100

150

2002011-05-23

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-20

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-19

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-18

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-17

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-16

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 40

50

100

150

2002011-05-13

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-12

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 40

50

100

150

2002011-05-11

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-10

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 40

50

100

150

2002011-05-09

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 50

50

100

150

2002011-05-06

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 40

50

100

150

2002011-05-05

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 40

50

100

150

2002011-05-04

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-03

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

2002011-05-02

ModelEmpirical

13

Page 14: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

Figure 6: BIDU LOB volume estimation results for both sides in May 2011

Relative Price Level

Ave

rage

Que

ue S

ize

Ask Side

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-27

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-26

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-25

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-24

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-23

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-20

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-19

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-18

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-17

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-16

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-13

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-12

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-11

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-10

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 40

50

100

150

200

250

3002011-05-09

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-06

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 40

50

100

150

200

250

3002011-05-05

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 40

50

100

150

200

250

3002011-05-04

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 40

50

100

150

200

250

3002011-05-03

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-02

ModelEmpirical

Relative Price Level

Ave

rage

Que

ue S

ize

Bid Side

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-27

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-26

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-25

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-24

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-23

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-20

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-19

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-18

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-17

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-16

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-13

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-12

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-11

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-10

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 40

50

100

150

200

250

3002011-05-09

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-06

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 40

50

100

150

200

250

3002011-05-05

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 40

50

100

150

200

250

3002011-05-04

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 40

50

100

150

200

250

3002011-05-03

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 30

50

100

150

200

250

3002011-05-02

ModelEmpirical

14

Page 15: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

Figure 7: GS LOB volume estimation results for both sides in May 2011

Relative Price Level

Ave

rage

Que

ue S

ize

Ask Side

-2 -1 0 1 20

50

100

150

200

250

3002011-05-27

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-26

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-25

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-24

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-23

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-20

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-19

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-18

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-17

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-16

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-13

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-12

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-11

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-10

ModelEmpirical

-2 -1 0 1 2 30

50

100

150

200

250

3002011-05-09

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-06

ModelEmpirical

-2 -1 0 1 2 30

50

100

150

200

250

3002011-05-05

ModelEmpirical

-2 -1 0 1 2 30

50

100

150

200

250

3002011-05-04

ModelEmpirical

-2 -1 0 1 2 30

50

100

150

200

250

3002011-05-03

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-02

ModelEmpirical

Relative Price Level

Ave

rage

Que

ue S

ize

Bid Side

-2 -1 0 1 20

50

100

150

200

250

3002011-05-27

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-26

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-25

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-24

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-23

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-20

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-19

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-18

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-17

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-16

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-13

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-12

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-11

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-10

ModelEmpirical

-2 -1 0 1 2 30

50

100

150

200

250

3002011-05-09

ModelEmpirical

-2 -1 0 1 2 30

50

100

150

200

250

3002011-05-06

ModelEmpirical

-2 -1 0 1 2 30

50

100

150

200

250

3002011-05-05

ModelEmpirical

-2 -1 0 1 2 30

50

100

150

200

250

3002011-05-04

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-03

ModelEmpirical

-2 -1 0 1 20

50

100

150

200

250

3002011-05-02

ModelEmpirical

15

Page 16: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

Note that θa/b(·) is the tail probability of the bid/ask price change per-trade and can be directlyobserved in the bid/ask price per trade data. Therefore, we first obtain the empirical estimation forθa/b(iδ; a, b) for each relative price iδ from the bid/ask price per trade data, and then compute theestimation for the mean number of limit orders sitting at iδ using equation (5). Since we assumethat all orders are of the same size in our model, we shall estimate the mean volume as the meannumber of orders multiplied by the average size of market orders, or equivalently, the average tradesize.

For each trading day, we do the estimation for the relative price levels that are smaller thanthe 95% quantile of the empirical price change per-trade distribution. The estimation results forAMZN, BIDU and GS are reported in Figure 5, 6 and 7, with a comparison to the actual meanvolumes observed in the LOB upon the arrivals of the market orders. Through the graphs, we cansee that our estimation of the limit order volume is good, especially for the relative price levelsaround 0.

To statistically test how good our model performs in terms of recovering the visible limit ordervolumes at trade times, we run linear regressions

Vbid/askmodel = β

bid/ask0 + β

bid/ask1 · V bid/ask

emp + εbid/ask,

where Vbid/askmodel is the vector of model estimated average volumes at relative price levels −2,−1, 0, 1, 2

during 20 trading days in May 2011, and Vbid/askemp is the vector of empirical average volumes at the

same relative price levels in the same time range.For stock AMZN, we can see from Figure 8 that for both bid and ask sides, the linear regressions

have high R2 (both above 85%) and with 95% confidence level we cannot reject null hypothesis

βbid/ask0 = 0 and β

bid/ask1 = 1. Similar test results for stocks BIDU and GS are given in Figure 9

and Figure 10. Although statistically we will reject the null hypothesis βbid/ask1 = 1 for these cases,

yet the upper bounds of the 95% confidence intervals of βbid/ask1 are quite close to 1.

Also note that our estimated average volumes are highly correlated (correlation coefficientsabove 90%) with the empirical average volumes for all three stocks, in that sense our model is ableto recover the volumes of limit orders sitting at different price levels to a substantial extent, eventhough we may fail the t tests in the previous linear regression setting in some cases.

The empirical results indicate that our model is able to recover the limit order volumes onseveral relative prices by observing the TAQ data only. It might be surprising that the recoveryis remarkably good despite the simple independence assumption of limit orders at different pricelevels, which we have imposed to make the model tractable.

To check if our model could adapt to different situations as stock trading dynamics change, wealso perform a similar study for AMZN in a 20-day time window from 2014-05-01 to 2014-05-29.Compared with May 2011, the stock price of AMZN has increased from around $200 to around$300, and the trading activity becomes less active. For the time period we choose in 2014, theaverage spread is 19.60 cents, much wider than what we have in May 2011, 7.41 cents. Figure 11shows the empirical and model estimated average volumes of visible limit orders at each relativeprice level upon the arrival of an market order. Again our model gives satisfactory results despitethe change of trading environment.

4.4 Discussion

In terms of recovering the information inside the LOB just by observing the bid/ask price, ourapproach seems to perform reasonably well. Still, deviations arise from the actual order book andthe model prediction during a few days in the testing trading month. It is appropriate to close this

16

Page 17: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

Figure 8: AMZN Volume Estimate Fitness Check in May 2011

Empirical Average Queue Size

Mod

el Im

plie

d A

vera

ge Q

ueue

Siz

e

AMZN May 2011

0 20 40 60 80 100 120 140 160 1800

20

40

60

80

100

120

140

160Ask Side

0 20 40 60 80 100 120 140 160 1800

20

40

60

80

100

120

140

160Bid Side

-0 2 [ -5.5218 , 4.1822 ]

-1 2 [ 0.88013 , 1.0182 ]

R-squared: 0.88361

correlation = 0.94

-0 2 [ -5.2014 , 5.7435 ]

-1 2 [ 0.84689 , 1.0005 ]

R-squared: 0.85323

correlation = 0.92371

Figure 9: BIDU Volume Estimate Fitness Check in May 2011

Empirical Average Queue Size

Mod

el Im

plie

d A

vera

ge Q

ueue

Siz

e

BIDU May 2011

0 50 100 150 200 250 3000

20

40

60

80

100

120

140

160

180

200Ask Side

0 50 100 150 200 250 3000

20

40

60

80

100

120

140

160

180

200Bid Side

-0 2 [ 3.4231 , 16.9184 ]

-1 2 [ 0.67446 , 0.81879 ]

R-squared: 0.81138

correlation = 0.90077

-0 2 [ -2.0446 , 8.842 ]

-1 2 [ 0.79502 , 0.91871 ]

R-squared: 0.88525

correlation = 0.94088

17

Page 18: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

Figure 10: GS Volume Estimate Fitness Check in May 2011

Empirical Average Queue Size

Mod

el Im

plie

d A

vera

ge Q

ueue

Siz

eGS May 2011

0 50 100 150 200 250 3000

50

100

150

200

250Ask Side

0 50 100 150 200 250 3000

50

100

150

200

250Bid Side

-0 2 [ -1.1216 , 13.3906 ]

-1 2 [ 0.78942 , 0.90639 ]

R-squared: 0.89413

correlation = 0.94559

-0 2 [ -3.1597 , 16.6553 ]

-1 2 [ 0.77102 , 0.93436 ]

R-squared: 0.81413

correlation = 0.90229

section with a discussion on the limitations and the possible issues in our approach that might playa role in some of the salient discrepancies.

The first possible issue that comes to our mind is the condition for the averaging principle tohold. In other words, it might be the case that the dynamic of the limit order events does notreach stationarity over the inter-trade intervals and hence the steady-state approximation is toocrude. We attempted to test this potential problem by splitting each and everyone of the inter-tradeintervals into five subintervals of equal size each. In Theorem 1, the analytical expression of theprice change distribution mainly depends on the ratio of the arrival rates and cancellation ratesof limit orders. Therefore, in Figure 12, we report the ratios between the frequency of limit orderarrivals and cancellations observed in each of the five sub-intervals during May 2011 for differentprice levels. If the ratios remain constant across the five subintervals then the data is consistentwith the application of the averaging principle.

In general, we see that, for relative price levels ≥ 0, the limit order events remain constant overthe five subintervals within the inter-trade intervals. For relative price levels < 0, the limit orderevents are less uniform, but the difference is moderate. So, this issue by itself did not appear tosignificantly explain discrepancies.

Given that most of the significant discrepancies happen on the relative price levels < 0, it isquite possible that the existence of hidden orders might have significant impact in our analysis. Thisproblem is specially challenging since there is no direct information on the arrivals and cancellationsof hidden orders in our data. We believe that we have dealt with this challenge within the constrainsof our model in a reasonable way (see Appendix 8.3), but certainly the fact that hidden orderscannot be fully observed was, we believe, a major contributing factor in the observed discrepancies.

Another issue that we do not model is that the system is assumed to be time homogeneousthroughout the day. We only remove the first and last half hours of the trading day to mitigatenon-homogeneous components. But relevant pieces of news might have an effect which changes thedynamics significantly and this will not be captured by our model. To solve this issue requires moresophisticated statistical analysis on the LOB data jointly with news reports data, which is out ofthe scope of the current paper.

18

Page 19: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

Figure 11: AMZN LOB volume estimation results for both sides in May 2014

Relative Price Level

Ave

rage

Que

ue S

ize

Ask Side

-5 -4 -3 -2 -1 0 1 2 3 4 5 60

50

100

150

2002014-05-29

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 70

50

100

150

2002014-05-28

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 60

50

100

150

2002014-05-27

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 50

50

100

150

2002014-05-23

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 70

50

100

150

2002014-05-22

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 70

50

100

150

2002014-05-21

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 70

50

100

150

2002014-05-20

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 80

50

100

150

2002014-05-19

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 80

50

100

150

2002014-05-16

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 70

50

100

150

2002014-05-15

ModelEmpirical

-5 -4-3-2-1 0 1 2 3 4 5 6 7 8 90

50

100

150

2002014-05-14

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 60

50

100

150

2002014-05-13

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 60

50

100

150

2002014-05-12

ModelEmpirical

-5 -4-3-2-1 0 1 2 3 4 5 6 7 8 90

50

100

150

2002014-05-09

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 70

50

100

150

2002014-05-08

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 70

50

100

150

2002014-05-07

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 80

50

100

150

2002014-05-06

ModelEmpirical

-5 -4-3-2-1 0 1 2 3 4 5 6 7 8 90

50

100

150

2002014-05-05

ModelEmpirical

-5 -4-3-2-1 0 1 2 3 4 5 6 7 8 90

50

100

150

2002014-05-02

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 80

50

100

150

2002014-05-01

ModelEmpirical

Relative Price Level

Ave

rage

Que

ue S

ize

Bid Side

-5 -4 -3 -2 -1 0 1 2 3 4 5 60

50

100

150

2002014-05-29

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 70

50

100

150

2002014-05-28

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 60

50

100

150

2002014-05-27

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 50

50

100

150

2002014-05-23

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 70

50

100

150

2002014-05-22

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 70

50

100

150

2002014-05-21

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 70

50

100

150

2002014-05-20

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 80

50

100

150

2002014-05-19

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 60

50

100

150

2002014-05-16

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 80

50

100

150

2002014-05-15

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 80

50

100

150

2002014-05-14

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 50

50

100

150

2002014-05-13

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 60

50

100

150

2002014-05-12

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 80

50

100

150

2002014-05-09

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 70

50

100

150

2002014-05-08

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 70

50

100

150

2002014-05-07

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 70

50

100

150

2002014-05-06

ModelEmpirical

-5 -4-3-2-1 0 1 2 3 4 5 6 7 8 90

50

100

150

2002014-05-05

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 80

50

100

150

2002014-05-02

ModelEmpirical

-5 -4 -3 -2 -1 0 1 2 3 4 5 6 70

50

100

150

2002014-05-01

ModelEmpirical

19

Page 20: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

Figure 12: Inter-trade stationarity check for AMZN, BIDU and GS on both sides in May 2011

Time After Last Trade / Inter-Trade Time

Arr

ival

Fre

quen

cy /

Can

cella

tion

Fre

nque

ncy

Ask Side

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2Relative Price Level > 0

AMZNBIDUGS

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2Relative Price Level = 0

AMZNBIDUGS

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2Relative Price Level < 0

AMZNBIDUGS

Time After Last Trade / Inter-Trade Time

Arr

ival

Fre

quen

cy /

Can

cella

tion

Fre

nque

ncy

Bid Side

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2Relative Price Level > 0

AMZNBIDUGS

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2Relative Price Level = 0

AMZNBIDUGS

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2Relative Price Level < 0

AMZNBIDUGS

20

Page 21: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

5 Model Extension

The empirical results in Section 4 show that our multi-scale model yields reasonably accuraterecovery despite the independence assumptions between the arrival of limit and market orders. Inthis section, we show how the multi-scale framework can be extended in order to accommodatea dependence characteristics which may reflect the interaction between market participants infinancial markets.

For instance, a trader who wants to buy or sell the security has a choice between placing amarket order or a limit order. This choice, which may be a function of the whole order book, caninduce a general dependence structure. In the following extended model, we allow the traders tochoose between market and limit orders. Moreover, we allow traders’ choices of the order types(market or limit) and of the limit order prices all to be dependent on the current LOB status q(t).Recall that q(t) = {qi(t)} has been defined in Section 2.1 with qi (t) representing the number oflimit orders at price p (t)+iδ in the LOB. Thus, the buy and sell sides of the LOB can be correlatedas the order flow dynamics on both sides depends on the same vector q(t).

Given that the dynamic of the LOB now is much more complicated than the model that wehave studied in Section 3 ,we shall impose an upper bound on the prices at which limit ordercan be placed in the order book, in order to guarantee the regularity conditions as required inKurtz [17] when applying the stochastic averaging principle. Although the assumption is imposedfor technical reasons, we believe that this assumption can be motivated from a practical pointof view. In practice, there are protocols in place, such as the “limit up-limit down” mechanismapproved by SEC on US equity market (SEC [22]), in the operation of LOB dynamics which preventextremely wide price fluctuations during a single trading day. So this sort of mechanism motivatesour technical assumption, practically speaking.

To be precise, the LOB status is {qi(t)}Γi=1, where Γδ is the maximum price at which limitorders can be placed. Now, given q(t), the ask price of the LOB is defined as

a (t) = a (q (t)) = min (min{0 < iδ ≤ Γδ : qi (t) > 0},Γ + 1) ,

and the bid price as

b (t) = b (q (t)) = max (max{iδ > 0 : qi (t) < 0}, 0) .

The mathematical description of the extended model is given as follows:

Extended model: dynamics of the limit and market orders

1. Arrivals of sell and buy orders are modelled as a independent Poisson process with ratesλa > 0 and λb > 0.

2. For a sell (or buy) order arriving at time t, suppose the last trade happens at time tk < t.Then, with probability pm,a(a(tk), b(tk),q(t)) (or pm,b(a(tk), b(tk),q(t))), the order is chosenby the trader to be a market order order and with probability 1− pm,a(a(tk), b(tk),q(t)) (or1− pm,b(a(tk), b(tk),q(t))) to be a limit order.

3. If the sell (or buy) order is chosen to be a limit order. Then, it will choose its relativeprice, with respect to a(tk) (or b(tk)) equal to iδ with probability pa(iδ; a(tk), b(tk),q(t)) (orpb(iδ; a(tk), b(tk),q(t))).

21

Page 22: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

4. We assume that pa(iδ; a(tk), b(tk),q(t)) = 0 for all iδ ≤ b(t)−a(tk) and pb(iδ; a(tk), b(tk),q(t)) =0 for all iδ ≤ b(tk)− a(t), so that there is no trade triggered by an incoming limit order. Wealso assume that if i > Γ

pa(iδ; a(tk), b(tk),q(t)) = 0 = pb(iδ; a(tk), b(tk),q(t)).

5. During the time interval [tk, tk+1), limit sell (or buy) orders waiting at relative price of iδare cancelled at rate αa(iδ; a(tk), b(tk),q(t)) > 0 (or αb(iδ; a(tk), b(tk),q(t)) > 0). We alsoassume that if i > Γ

αa(iδ; a(tk), b(tk),q(t)) = 0 = αb(iδ; a(tk), b(tk),q(t)).

6. The orders choose their types and limit prices independently.

7. For any fixed (a, b), consider a modified process (abusing notation slightly), q (t), with con-stant arrival rates λa/bpa(iδ; a, b,q (t)) and cancellation rates αa/b(iδ; a, b,q (t)). We assumethat the modified process q (·), for fixed (a, b), is an irreducible and positive recurrent con-tinuous time Markov chain.

Remark: As a consequence of Assumption 7, for each fixed (a(tk), b(tk)) the (modified) LOBprocess q (·) with no market order arrival after time tk possesses a unique stationary distributionπ(dq; a(tk), b(tk)).

In the asymptotic regime, we will send the arrival rates of all types of orders to infinity, butthe probability that the traders choose market orders to zero. This is consistent with the empiricalobservation that trade happens much less frequently than limit order events. Under this asymptoticregime the stochastic average principle holds and we have the following convergence result for theextended model.

Theorem 2. Consider a sequence of LOBs indexed by n. In the n-th system, λan = ξnλa, λbn = ξnλb,αan(·) = ξnα

a(·) and αbn(·) = ξnαb(·), while pm,an (q) = pm,a(q)/ξn and pm,bn (q) = pm,b(q)/ξn.

Under Assumption 1 to 7, the corresponding price per-trade process (an(·), bn(·)) converges weaklyin D[0,∞) to a pure jump process (a(·), b(·)) with jump rate at time t equal to

λ (t) :=

∫ (λapm,a(a(t−), b(t−),q) + λbpm,b(a(t−), b(t−),q)

)π(dq; a(t−), b(t−)).

Given a jump occurs at time t, its jump size follows the following distribution:

P (∆a = i,∆b = j)

=λ (t)−1∫I(a(q)− a(t−) = i, b(q)− b(t−) = j)

·(λapm,a(a(t−), b(t−),q) + λbpm,b(a(t−), b(t−),q)

)π(dq; a(t−), b(t−)).

The proof of this result is given in Section 8.2. We introduced the upper bound Γ on the pricesan(·) in order to facilitate the verification of the so-called compact containment condition in orderto apply the averaging principle.

22

Page 23: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

6 Longer Time Scales: Continuous Time Model

In previous sections, we have built a connection between the distribution of the price change per-trade and the dynamics of the order flow. In this section we will characterize the dynamics of thetrading price in a longer time scale, as we will show. Our goal here is to explain how jumps in thecontinuous time price dynamics for the bid/ask price are related to the structure of the LOB.

Recall that a(t) and b(t) are the bid/ask price per-trade. Define s(t) = a(t) − b(t) to be thebid-ask spread per-trade at time t, and m(t) = (a(t) + b(t))/2 the mid-price. We shall develop astochastic model for the price-spread dynamics in longer time scale (order of several minutes ormore). The model will be a jump-diffusion limit of the discrete price-spread processes as given inSection 2.

Consider a sequence of limit order books indexed by n and their bid/ask price per-trade process{(an(·), bn(·))}. The dynamic of (an(·), bn(·)) is characterized by the arrival rate of market ordersand the price changes. By Theorem 1, the price change distribution at trade times can be put in aone-to-one correspondence with the sequence of traffic intensity (i.e. ratios between cancellationsand arrival times at each relative price in the LOB). Because of this one-to-one correspondence, weshall directly work with price change distribution at the time-scale of market orders.

To simplicity of the exposition (as we are mainly concerned to show how jumps in continuoustime process observed in the order of minutes are connected to the structure of the LOB), we shallassume that the price changes of the ask and bid sides follows the same distribution which dependsonly on the spread size. The proofs will be almost the same for the more general case wherethe ask and bid price changes have different distributions. Let {tk} be the sequence of the tradetimes. We use ∆n

a (sn(tk)) to denote the ask price change and ∆nb (sn(tk)) the bid price change. For

simplicity in the notation, we often write ∆na (tk) instead of ∆n

a (sn(tk)) (similarly for ∆nb (tk)). As

, ∆na (tk) and ∆n

b (tk) follow the same distribution, we simply provide the description for ∆na (tk) in

our following Assumption 1.

Assumption 1. (Price change distribution) First define,

∆na (sn(tk)) = (1− Iak ) · (−1)R

ak [Uak /(

√nδ)]δ + Iak [sn(tk)V

ak /(2δ)]δ (6)

where:i) Iak is Bernoulli with P (Iak = 1) = q for some q > 0,ii) Uak is a random variable with support on [0, ξ] for ξ ∈ (0,∞).iii) Rak is Bernoulli with P (Rak = 1) = (1 + 2β/

√n)/2 for some β > 0,

iv) V ak is a continuous random variable so that P (V a

k ≥ −1) = 1.v) the random variables Iak , Uak , Rak and V a

k are independent of each other (independence isassumed to hold across k and for the superindices a, b).

Then we let {an(tk+1) = an(tk) + ∆n

a (s(tk)) ∨ ([−sn(tk)/(2δ)]δ),

bn(tk+1) = bn(tk)−∆nb (s(tk)) ∨ ([−sn(tk)/(2δ)]δ),

(7)

and this is equivalent to assuming

θ(iδ, an(tk), bn(tk)) = P (∆n

a (s(tk)) ∨ ([−sn(tk)/(2δ)]δ) ≥ iδ) .

Remarks:1. Recall that we have ruled out orders which cross over the mid-price, this results in the cap

∨([−sn(tk)/(2δ)]δ) appearing in (7), which consequently yields an, bn.

23

Page 24: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

Relative Price Level i

P(P

er-T

rade

Pric

e C

hang

e =

i/)

Distribution of Per-Trade Price Level Change >= 2

Ask Side0 5 10 15 20 25 30 35 40

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1Empirical FrequencyFitted Power Law Density

Bid Side0 5 10 15 20 25 30 35 40

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Empirical FrequencyFitted Power Law Density

Figure 13: Tail Distribution of Price Change Per-trade, AMZN 2011-05-02.

2. The asymmetry in the distribution of Rak allows us to introduce a drift term in the spread,which ensures the ergodicity of the spread process.

Assumption 1 is imposed according to our empirical findings in the previous section, that is,a significant of trade prices are very close to the previous one while they jump far away fromthe previous one occasionally. For example stock AMZN during the trading day 2011-05-02, theoccurrence percentages of absolute value of the price change per-trade on the bid side being ≤ 1,≥ 5, ≥ 10 are 74.83%, 3.95% and 0.42% respectively. Similar statistics for the ask side are 76.09%,3.72% and 0.45% correspondingly. Figure 13 illustrates the tail part of the price change per-tradeand suggests that the probability density shapes of both the bid and ask sides decay similarly as apower-law tail.

In addition, we impose the following assumptions on time and space scalings, which are consis-tent with Empirical Observations 1 and 2. In order to carry out a heavy traffic approximation, weconsider a sequence of LOB systems indexed by n ∈ Z+, such that in the n-th system:

Assumption 2. (Time and Space Scale)

1. The “base rate” µn of Hawkes processes corresponding to market orders on each side satisfiesµn = nµ;

2. The “initial rate” µan(0) and µbn(0) of the Hawkes satisfy µan(0)n and µbn(0)

n → µ0 as n→∞.

3. Tick size δn → 0 so that either δn = o(n−1/2

)or δn = n−1/2;

4. We assume that qn = γ/n for some γ > 0.

Now we are ready to state our result on the spread and price dynamics on the order of manyminutes (i.e. after a large number of trades) and informed by the limit order book.

Theorem 3. For the n-th system, let sn(t) = an(t)−bn(t) be the spread process and mn(t) = an(t)+bn(t) be twice of the mid-price. Suppose (sn(0), mn(0)) = (s0,m0). Then, under Assumption 2 and3, the pair of processes (sn, mn) ∈ D([0,∞),R+×R) converges weakly to (s, m) ∈ D([0,∞),R+×R)with (s(0), m(0)) = (s0,m0) such that{

ds(t) = −η(t)dt+ dWa(t) + dWb (t) + s (t−) dJ1(t)/2 + s (t−) dJ2(t)/2 + dL(t),

dm(t) = dWa(t)− dWb (t) + s (t−) dJ1(t)/2− s (t−) dJ2(t)/2.(8)

24

Page 25: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

Here,

1. Let µ(t) = 2κµκ−δ +

(2µ0 − 2κµ

κ−δ

)e−(κ−δ)t.

2. Then, η(t) = −2β(E[Uak ] + E[U bk]

)· µ(t)

3. Wa and Wb are two diffusion processes, such that

Wa(t) =√

E[(Uak )2] ·B1

(∫ t

0µ(s)ds

),Wb(t) =

√E[(U bk)2] ·B2

(∫ t

0µ(s)ds

)where B1 and B2 are two independent Brownian motions.

4. J1 and J2 are two i.i.d. compound non-homogeneous Poisson processes with time-varyingjump intensity γµ(t) and the jump density distribution given by the density of V a

1 .

5. s(t) ≥ 0 and L(t) satisfies: L(t) = 0, dL(t) ≥ 0 and s(t)dL(t) = 0 for all t ≥ 0.

Theorem 3 underscores the role that the LOB structure plays in the price dynamics over thecourse of a trading day. In particular, it is remarkable that the LOB appears to inform price-jumpdynamics at time scales of the order of several minutes up to hours.

7 Conclusions and additional potential extensions

In this paper we develop a Markovian model for the limit order book of small-tick stocks. Wepropose an asymptotic regime under which we derive an analytic expression of the price changeprocess in terms of the arrival and cancellation rates of the limit orders. These analytical expressionsare justified using multi-scale analysis and the stochastic-averaging framework. We have exploredextensions which include dependence between arriving market and limit orders in Section 5.

Our multiscale approach be further extended without much complications to include other typesof dependence between dynamics in the arrivals of the market orders. For instance, one way toextend our model is to allow traders to post market orders depending on the current bid/ask price.This modification can be introduced by thinning the original Hawkes arrivals of the market orders,the thinning parameter might depend on the observed bid/ask price (a(·), b(·)) to reflect traders’reaction on market conditions.

Other examples of the interactions between market participants that can be included in ourmodel extensions are correlation between the bid and ask sides and dependence between arrivalrate of market orders and the spread width.

8 Appendix: Technical Proofs

8.1 The Proof of Theorem 1

Proof of Theorem 1. In this proof, we will follow the notations required to apply Theorem 2.1and Example 2.3 in Kurtz [17]. Define

Xn(·) =(an(·), bn(·), µa(·), µb(·)

)∈ E1 := R4.

25

Page 26: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

In addition, let Yn(·) =(Y in(·) : i ≥ 1

)∈ Z∞, where Yn = qn encodes the number of limit orders

as explained in Section 2.1 (i.e. the number of orders at level iδ is encoded with a negative signif these are buy orders and with a positive sign if they are sell orders). We use E2 to denote thespace Z∞ endowed with the topology induced by the metric

d (y, z) =

∞∑i=1

2−i min(∣∣yi − zi∣∣ , 1) ,

which is equivalent to the pointwise convergence topology (i.e. {y (n)} ⊂ E2 converges to y ifyk (n) → yk for any given k ≥ 1). Because yi is an integer, the topology induced is equivalent tothe discrete topology on finitely many coordinates. We naturally endow E1 with the Euclideanmetric.

Then, {(Xn(·), Yn(·))} is a sequence of stochastic processes living in the product space E1×E2.To be precise, let us explicitly describe the dynamics of this Markov chain.

Equation (1) describes the dynamics of(µa(·), µb(·)

). Actually, if Na (·) and N b (·) are two

independent Poisson processes with unit rate, we can couple the dynamics (1) with the explicitrepresentations

Na (t) = Na

(∫ t

0µa(s)ds

)and N b (t) = N b

(∫ t

0µb(s)ds

).

Using this representation, we can redefine tk as follows. Set t0 = 0, and for k ≥ 1, let

tk = inf{t > tk−1 : Na (t)−Na (t−) +N b (t)−N b (t−) 6= 0}.

For each i, let{Na,pi (·)

}i≥1

,{N b,pi (·)

}i≥1

,{Na,αi (·)

}i≥1

,{N b,αi (·)

}i≥1

be four independent

sequences of Poisson processes with unit rate and independent of Na (·) and N b (·). Then, define

Na,pi,n (t) = Na,p

i

(∫ t

0ξnλ

apa(iδ; an(s), bn(s))ds

),

N b,pi,n (t) = N b,p

i

(∫ t

0ξnλ

bpb(iδ; an(s), bn(s))ds

),

Na,αi,n (t) = Na,α

i

(∫ t

0αa(iδ; an(s), bn(s)

) ∣∣∣Y an(s)+iδn (t)

∣∣∣ ds) ,N b,αi,n (t) = N b,α

i

(∫ t

0αb(iδ; an(s), bn(s)

) ∣∣∣Y bn(s)−iδn (t)

∣∣∣ ds) ,representing the number of arrivals of limit orders and the number of canceled limit orders at therelative price level iδ up to time t in the n-th LOB system. Therefore, Y i

n(·), defined as the numberof limit orders waiting at price level iδ in the n-th LOB system at current time, changes at time tby

dY iδ−an(t)n (t) = dNa,p

i,n (t)−dNa,αi,n (t) if iδ ≥ an(t) + bn(t)

2and otherwise dY bn(t)−iδ

n (t) = dN b,pi,n (t)−dN b,α

i,n (t)

and finally note that

an (·) = an

(t(Na+Nb)(·)

)and bn (·) = bn

(t(Na+Nb)(·)

).

26

Page 27: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

It is clear from the previous construction that for each n, (Xn(·), Yn(·)) is a continuous timeMarkov Chain. The process is non-explosive because

∑i pa/b(iδ; an(s), bn(s)) = 1.

Now, let us introduce a few definitions. Let C(E1) be the space of all continuous functionsf : E1 → R that vanish at infinity and let C(E1) be the space of all bounded continuous functionsf : E1 → R.

Next, define D to be the set of function h : E1 × E2 → R satisfying that for each y ∈ E2,

h (·, y) ∈ C(E1), ∂h (·, y) /∂x3 ∈ C(E1), ∂h (·, y) /∂x4 ∈ C(E1).

Now, we define A : D → C(E1 × E2) as

Ah (x, y) = µa ·[h(a (y) ,b (y) , µa + δ1, µ

b + δ2, y)− h (x, y)

]+ µb ·

[h(a (y) ,b (y) , µa + δ2, µ

b + δ1, y)− h (x, y)

]− κ (µa − µ) · ∂h

∂x3(x, y)− κ

(µb − µ

)· ∂h∂x4

(x, y) .

and B : D → C(E1 × E2) as

Bh (x, y) =∑i

{λapa(iδ; a, b)h(x, y + ea+i) + λbpb(iδ; a, b)h(x, y − eb−i)

−(λa + λb

)h(x, y)}

+∑i

{αa(iδ; a, b)yan+i ((h(x, y − ea+i)− h(x, y))

+ αb(iδ; a, b)∣∣∣yb−i∣∣∣ (h(x, y + eb−i)− h(x, y)

)}.

From the stochastic representation provided earlier it is straightforward to verify that the generatorof (Xn (·) , Yn (·)) is given by

Qh (x, y) , Ah (x, y) + ξnBh (x, y) ,

and therefore

Mhn (t) , h (Xn (t) , Yn (t))−

∫ t

0Qh (Xn (s) , Yn (s)) ds

is a local martingale.Next, pick h ∈ D and suppose that h (x, y) = f (x) for f : E1 → R (i.e. h(x, y) depends only

on x). Then, we have thatQh (x, y) =Ah (x, y)

and Mhn (·) is a martingale.

Now, pick h ∈ D such that h (x, y) = g(y1, y2, ..., yk

)for any fixed k <∞, then

Qh (x, y) =ξnBh (x, y)

and Mhn (·) is also a martingale.

In order to apply the Theorem 2.1 (stochastic averaging principle) in Kurtz [17], we need to show(1)that {Yn(t) : t ≥ 0, n = 1, 2, ...} are tight in E2; (2) Xn (·) satisfies the compact containmentconditionthat is, we need to show that for any T > 0 and ε > 0, there exists a compact set K ⊂ E1

such thatlimn→∞P{Xn (t) ∈ K for all t ∈ [0, T ]} ≥ 1− ε. (9)

27

Page 28: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

We first consider µa(T ) and µb(T ). By [13] with the assumption that µ > 0, κ > 0 δ1 > δ2 > 0 andδ1 + δ2 < κ, we have the that the Hawkes process is non-explosive (i.e. is bounded almost surelyon any compact time interval), since the spectral radius of the matrix is

M =

[δ1/κ δ2/κδ2/κ δ1/κ

].

Next, to ensure (9) we need to deal to deal with an (·) and bn (·), which simply are piecewiseconstant processes changing values at the times 0 < t1 < t2 < ... < tNa(T )+Nb(T ) and such that

an (tk) = an (tk) and bn (tk) = bn (tk).Because of the compact containment condition established for µa(·) and µb(·) we know that

N = Na (T ) +N b (T ) is finite almost surely. We can condition on

GT = σ{µa (s) , µb (s) , Na (s) , N b (s) : 0 ≤ s ≤ T

},

and proceed sequentially for k = 1, 2, ..., N , verifying that an (tk) and bn (tk) are tight. We firstconsider k = 1, i.e. s ∈ [0, t1). Note that for each fixed i, we can represent

∣∣Y in (s)

∣∣ =∣∣Y i (ξns)

∣∣,where

∣∣Y i (s)∣∣ is simply the number of customers-in-system in an infinite-server queue at time s.

The arrival rate of customers in this system is bounded from above by λa+λb and the service rate isbounded from below by max{αa(iδ− an(0); a (0) , b (0)), αb(bn(0)− iδ; a (0) , b (0))} > 0. Therefore,∣∣Y in (s)

∣∣ converges weakly as n→∞ to a Poisson random variable. Actually, we have that

Y in (s)⇒ Y i = I

(i−(b (0) + a (0)

)/2 > 0

)Zi,a − I

(i−(b (0) + a (0)

)/2 < 0

)Zi,b, (10)

where

Zi,a ∼ Poisson

(λapa

((i− a (0)) δ; a (0) , b (0)

)αa((i− a (0)) δ; a (0) , b (0))

), (11)

Zi,b ∼ Poisson

(λbpb

((b (0)− i

)δ; a (0) , b (0)

)αb((b (0)− i

)δ; a (0) , b (0))

).

By the thinning theorem the sequence{Y in (s)

}i≥1

is conditionally independent given Gt1 and so

weak convergence of{Y in (s)

}i≥1

towards{Y i}i≥1

as n→∞ occurs under the discrete topology on

finitely many coordinates (the coordinates of Y =(Y 1, Y 2, ...

)are conditionally independent with

limiting distribution (10)).Because of (2), the functions a (·) and b (·) are continuous under the discrete topology on finitely

many coordinates of y on the support of the random element {Y (s)}i≥1. So, we have that

(a (Yn (t1)) ,b (Yn (t1))) =(an (t1) , bn (t1)

)⇒(a (t1) , b (t1)

)(12)

as n→∞.So, from the argument leading to (12), applied subsequently on the intervals [t1, t2], ..., [tN−1, tN ],

we can verify two conditions required in the application of Theorem 2.1 from Kurtz [17].Finally, we need to check that for all T > 0 and h ∈ D with h (x, y) = f (x) for some f : E1 → R,∫ T

0E(|Ah (Xn (s) , Yn (s))|2

)ds <∞.

28

Page 29: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

To this end, it is enough to show that

sups∈[0,T ]

E(µa (s)2 + µb (s)2

)<∞. (13)

Note from (1) that there is a continuous positive function r (·) and a constant c ∈ (0,∞) such that

E(µa(t)2 + µa(t)2

)≤ r (T ) + c

∫ T

0E(µa(s)2 + µb(s)2

)ds.

By Gronwall’s lemma we conclude that (13) holds.Then, following Theorem 2.1 and Example 2.3 in Kurtz [17] we obtain that {Xn (·)} is tight

in D([0,∞), E1) and each limit process along subsequences, X(·) =(a(·), b(·), µa(·), µb(·)

), satisfies

that for f ∈ C(E1) such that hf (x, y) := f(x) ∈ D, the stochastic process

f(X(t))−∫ t

0

∫E2

Ahf (X(s), y)πX(s)(dy)ds, (14)

is a martingale, where πX(s)(·) is the unique stationary distribution of a stochastic process Y ∈ E2

which solves the martingale problem

g(Y (t))−∫ t

0Bhg(x, Y (u))du.

for hg(x, y) := g(y) ∈ D. In our case, πx(·) is simply the distribution of the sequence of independentrandom variables

{Y i}i≥1

given(a (0) , b (0)

). Now we compute that in (14),∫

E2

Af (X(s), y)πX(s)(dy)

=∑i,j

{µa(s) ·

[f(a(s) + iδ, b(s)− jδ, µa(s) + δ1, µ

b(s) + δ2)− f

(a(s), b(s), µa(s), µb(s)

)]+ µb(s) ·

[f(a(s) + iδ, b(s)− jδ, µa(s) + δ2, µ

b(s) + δ1)− f

(a(s−), b(s), µa(s), µb(s)

)]− κ (µa(s)− µ) · ∂f

∂x3

(a(s), b(s), µa(s), µb(s)

)− κ

(µb(s)− µ

)· ∂f∂x4

(a(s), b(s), µa(s), µb(s)

)}· πX(s) ({i, j})

=− κ (µa(s)− µ) · ∂f∂x3

(a(s), b(s), µa(s), µb(s)

)− κ

(µb(s)− µ

)· ∂f∂x4

(a(s), b(s), µa(s), µb(s)

)+∑i,j

{µa(s) ·

[f(a(s) + iδ, b(s)− jδ, µa(s) + δ1, µ

b(s) + δ2)− f

(a(s), b(s), µa(s), µb(s)

)]+ µb(s) ·

[f(a(s) + iδ, b(s)− jδ, µa(s) + δ2, µ

b(s) + δ1)− f

(a(s), b(s), µa(s), µb(s)

)]}·[θa(iδ; a(s), b(s)

)− θa

(iδ + δ; a(s), b(s)

)]×[θb(jδ; a(s), b(s)

)− θb

(jδ + δ; a(s), b(s)

)]One can check that the martingale problem (14) has a unique solution X(·), see for instance

Chapter 4.4 in Ethier and Kurtz [9]. In particular,(a(·), b(·)

)is equivalent in distribution to a

jump process with jump intensity µ(s) = µa(s) + µb(s) at time s and jump size distribution

P(a(t)− a(t−) = i, b(t)− b(t−) = −j|a(t−), b(t−)

)=[θa(i; a(t−), b(t−)

)− θa

(i+ 1; a(t−), b(t−)

)]×[θb(j; a(t−), b(t−)

)− θb

(j + 1; a(t−), b(t−)

)].

29

Page 30: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

Since {Xn(·)} is tight and each of its convergent subsequence admits the same limit X(·), we canconclude that {Xn(·)} weakly converges to X(·) in D[0,∞).

8.2 Proof of Theorem 2

Proof of Theorem 2. We outline the argument following a very similar approach to thatexplained in Theorem 1. Define Xn(·) =

(an(·), bn(·))

)∈ R2 and Yn = qn encodes the number of

limit sell orders in the n-th LOB system at time t. We use E2 = ZΓ endowed with the Euclidiantopology, which is equivalent, because yi is an integer, the discrete topology. We also endow E1

with the Euclidean metric.The space of functions C(E1), C(E2) and D are all defined similarly as in the proof of Theorem

1. Define A : D → C(E1 × E2) as

Af (x, y) = λapm,a(a (y) ,b (y) , y) · [h (a (y) ,b (y) , y)− h (x, y)]

+ λbpm,b(a (y) ,b (y) , y) · [h (a (y) ,b (y) , y)− h (x, y)] ,

and Bn : D → C(E1 × E2) as

Bnh(x, y)

=Γ∑i=1

[λa(

1− pm,a(a(y),b(y), y)

ξn

)pa(iδ;x, y)h (x, y + ea+i)

+ λb(

1− pm,b(a(y),b(y), y)

ξn

)pb(iδ;x, y)h

(x, y + eb−i

)− 2h(x, y) + αa(iδ;x, y)

(ya+i(s)(h(x, y − ea+i)− h(x, y)

)+ αb(iδ;x, y)

∣∣∣yb−i∣∣∣ (h(x, y − eb−i)− h(x, y))].

The process (Xn (·) , Yn (·)) is a Markov jump process with generator Q such that for each h ∈ D,

Qh (x, y) = Ah (x, y) + ξnBnh (x, y) .

Now for hf ∈ D and suppose that hf (x, y) = f (x) for f : E1 → R (i.e. hf (x, y) depends onlyon x).we have that

Qhf (Xn(s), y) = Ahf (Xn(s), y) .

By the boundedness assumption on D, we have that

f (Xn(t))−∫ t

0Ahf (Xn(s), Yn(s)) ds

is a martingale. Likewise, we also have that if hg (x, y) = g(y1, ..., yΓ

)with hg ∈ D then

g (Yn (t))− ξn∫ t

0Bnhg (Xn(s), Yn(s)) ds

is a martingale.

30

Page 31: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

Note that the sequence Bn → B as an operator acting on functions hg ∈ D such that hg (x, y) =g(y1, ..., yΓ

)pointwise, where

Bh(x, y) =Γ∑i=1

[λapa(iδ;x, y)g (y + ea+i)

+λbpb(iδ;x, y)g(y + eb−i

)− 2g(y)

+αa(iδ;x, y) · ya+i · (g(y − ea+i)− g(y))

+αb(iδ;x, y) ·∣∣∣yb−i∣∣∣ · (g(y − eb−i)− g(y)

) ].

Because {Xn(·)} are jump processes with uniform bound (an(·), bn(·) ∈ [0,Γδ]), and hence aretight on E1. Besides, under Assumption 7, {Yn(t) : t > 0, n = 1, 2, ...} are relatively compact.

Finally, since h is bounded, for any T > 0,∫ T

0 E(|Ah(Xn(s), Yn(s))|2

)ds <∞. Then according to

Theorem 2.1 and the subsequent Example 2.3 in Kurtz [17], every subsequence of {Xn(·)} admit asub-subsequence that converges weakly to some sub-limit process X(·), and each sub-limit processX(·) =

(a(·), b(·), λa(·), λb(·)

)is a solution to the martingale problem:

f(X(t))−∫ t

0

∫E2

Ahf (X(s), y)πX(s)(dy)ds, (15)

in the sense that the stochastic process defined by (15) is a martingale. Moreover, in the expression(15), πx(·) is the unique stationary distribution of a stochastic process Y ∈ E2 which satisfies themartingale problem

g(Y (t))−∫ t

0Bhg(x, Y (u))du.

In our case, πx(·) is simply the unique stationary distribution π(dq; a(tk), b(tk)) under the arrivalrates λapa(iδ; a(tk), b(tk),q) and λbpb(iδ; a(tk), b(tk),q), and the cancellation rates αa(iδ; a(tk), b(tk),q)and αb(iδ; a(tk), b(tk),q). Now we compute that in (15),∫

E2

Ahf (X(s), y)πX(s)(dy)

=

∫E2

{λapm,a(a(s), b(s), y) ·

[f (a(y), b(y))− f

(a(s), b(s)

)]+ λbpm,b(a(s), b(s), y) ·

[f (a(y), b(y))− f

(a(s), b(s)

)]}· π(dq; a(s), b(s))

Following Chapter 4.4 in Ethier and Kurtz [9] one can verify that the martingale problem (15)has a unique solution X(·). In particular,

(a(·), b(·)

)is equivalent in distribution to a jump process

with jump intensity

λ(s) = λa∫E2

pm,a(a(s−), b(s−), y)πX(s−)(dy)

+ λb∫E2

pm,b(a(s−), b(s−), y)πX(s−)(dy),

at time s, and jump size distribution at time s equal to

P(∆a(s) = i,∆b(s) = j

)= λ (s)

−1∫E2

I(a(y)− a(s−) = iδ, b(y)− b(s−) = jδ

)·(λapm,a(a(s−), b(s−), y) + λbpm,b(a(s−), b(s−), y)

)πX(s−)(dy).

31

Page 32: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

Since {Xn(·)} is tight and each of its convergent subsequence admits the same limit X(·), we canconclude that {Xn(·)} weakly converges to X(·).

8.3 Model Calibration

Treatment of hidden orders: In our model, we define the bid/ask price per trade and use themas the benchmark price to determine the relative prices of limit orders between two consequenttrades. If all trades are against visible orders, we can simply take the bid and ask prices rightbefore the trade as the bid/ask price per trade, and the benchmark prices until the next trade.However, in real market, not all limit orders are displayed on the LOB and trades can happenagainst the so-called hidden limit orders. In our observation window, on average 20.53%, 22.08%and 16.62% of trades are against hidden limit orders for AMZN, BIDU and GS respectively. Thosetrades usually occur at a better price than the displayed bid/ask price and reveals the existenceof possible hidden liquidity at the trade price to the public. In this light, if a trade is againstsome hidden order at the bid (ask) side, we take the trade price as the benchmark price until thenext trade. For the other side, we still take the visible ask (bid) price at the time of trade as thebenchmark price.

On the other hand, the distribution of the price change per-trade is computed based on thesubmission and cancellation rates of limit orders in our model. From the data, however, we canonly calibrate the submission and cancellation rates of the visible limit orders. Therefore, weshall compute the empirical price change per trade between two consequent trades as the differencebetween the bid/ask price of the visible limit orders, right before the next trade, and the benchmarkprice.

After we determine the benchmark prices over all inter-trade intervals as described previously,we first transfer the dollar prices of all limit order events into relative prices with respect to thecorresponding benchmark prices. For each trading day, we estimate the arrival and cancellationrates of the LOB model as follows.

Let Lask be the number of sell limit orders that arrive during the trading day, and let D be theduration of the trading day which, as mentioned earlier, corresponds to 5.5 hours. Similarly, weuse Lbid to denote the number of buy limit orders during the trading day. The arrival rates of thebuy and sell limit orders, λb and λa, are estimated as follows:

λb = Lbid/D, λa = Lask/D.

We then write Lbidiδ (Laskiδ ) to denote the total number of buy (sell) limit orders at relative priceiδ during the trading day. We estimate the probability of a limit order sitting at a relative priceiδ, i.e. pa/b (iδ; a, b), at ask side and bid side to be

pb(iδ; a, b) =LbidiδLbid

, pa(iδ; a, b) =Laskiδ

Lask,

respectively.Next we use N bid (Nask) to denote the total number of buy (sell) market orders arriving during

the trading day. Moreover, we use N bid≥iδ (Nask

≥iδ ) to denote the total number of market buy (sell)orders during the trading day, that arrive at a time at which the real-time bid (ask) price is greaterthan or equal to iδ relatively to the benchmark price. The empirical estimates of tail probabilityof price change, i.e. θ (iδ; a, b) at each side are given by

θb (iδ; a, b) =N bid≥iδ

N bid, and θa (iδ; a, b) =

Nask≥iδ

Nask.

32

Page 33: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

Estimating the cancellation rate of limit orders, i.e. αb (iδ; a, b), αa (iδ; a, b), is not entirelystraightforward as the frequency of cancellation events depends not only on the cancellation rate,but also on the queue size, or the volume of limit orders. The approach that we take is similar tothe volume-dependent model introduced in [16].

For each relative price level iδ, either on the bid or ask side, we partition the limit ordercancellation events into different bins, according to the queue sizes of limit orders on the pricelevel, right before the events. The bins are labeled by integers j ≥ 1, and the j-th bin contains allcancellation events right before which the queue sizes are in the set Ij = {100·(j−1)+1, · · · , 100·j}.

Our estimation of the cancellation involves several auxiliary quantities:

Cbid/askj (iδ) = total size of canceled buy/sell limit orders in bin j during the day,

Tbid/askj (iδ) = total time (secs) that buy/sell limit order queue sizes are in set Ij during the day,

Qbid/askj (iδ) = 100 · j − 50 = mid value of set Ij ,

wbid/askj (iδ) = C

bid/askj (iδ)/

∑bins k

Cbid/askk (iδ).

The estimated cancellation rates are computed as

αb(iδ; a, b) =∑

bins j at rel. price iδ

wbidj (iδ)×Cbidj (iδ)

Qbidj (iδ) · T bidj (iδ),

αa(iδ; a, b) =∑

bins j at rel. price iδ

waskj (iδ)×Caskj (iδ)

Qaskj (iδ) · T askj (iδ).

We define 0/0 = 0 in the previous equations. In order to maintain robustness in estimation, thebins with Tj(iδ) of less than 0.05 milliseconds are considered empty.

The intuition behind our approach is that, under our model, the number of cancellation eventsin each bin j, Cj(iδ), should roughly equal to Tj(iδ)Qj(iδ) multiplied by cancellation rate, givenenough observations. Therefore, for each j, the ratio Cj(iδ)/(Qj(iδ) · Tj(iδ)) should give approx-imately equal estimation on the cancellation rate. However, bins containing smaller number ofobservations might give noisier estimates of the cancellation rate. In this light, we introduce theweights wj(iδ), that is proportional to the number of observations in bin j, to help mitigate suchnoise.

8.4 The Proof of Theorem 3

Now let us describe the road map for the proof of Theorem 3. We first construct some auxiliary

process(Sn(·), Mn(·)

)living in the same probability space as the underlying process (sn(·), mn(·)).

The auxiliary process is a convenient Markov process whose generator can be analyzed to concludeweak convergence to the postulated limiting jump diffusion (8). The auxiliary process has thesame dynamics as the target process except when it is on the boundary-layer set [0, 2/

√n] × R.

We also show the time spent by the two processes on the boundary-layer is small and as a resulttheir difference caused by their different dynamics on the boundary is also small. Actually, suchdifference is negligible as n → ∞ and therefore the target process converges to the same limitprocess.

33

Page 34: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

First, we define the auxiliary process coupled with the target process in a path by path fashion.Recall that by Assumption 6 and (7), we can write{

sn(tk+1) = sn(tk) + ∆na(sn(tk)) ∨ ([−sn(tk)/2δ]δ) + ∆n

b (sn(tk)) ∨ ([−sn(tk)/2δ]δ),

mn(tk+1) = mn(tk) + ∆na((sn(tk))) ∨ ([−sn(tk)/2δ]δ)−∆n

b (sn(tk)) ∨ ([−sn(tk)/2δ]δ).

Now we define the auxiliary process (Sn(·), Mn(·)) coupled with (sn(·), mn(·))) as{Sn(tk+1) = Sn(tk) +

(∆na(Sn(tk)) + ∆n

b (Sn(tk)))∨(−Sn(tk)

),

Mn(tk+1) = Mn(tk) + ∆na((Sn(tk)))−∆n

b (Sn(tk)),(16)

with the initial condition Sn(0) = sn(0) and Mn(0) = mn(0).Then the main result in this section Theorem 3 is an immediate corollary of the following two

propositions.

Proposition 1. The auxiliary process (Sn(·), Mn(·)) converges weakly to the limit process given by(8).

Proposition 2. The difference process(sn(·)− Sn(·), mn(·)− Mn(·)

)converges weakly to (0, 0)

on DR2 [0, t] for any t <∞.

Proof of Proposition 1. Define Na(t) =∑{j:tj≤t} I

aj , Nb(t) =

∑{j:tj≤t} I

bj . Next, define

Sn(0) = sn(0) ≥ 0, and Mn(0) = mn(0), and set

Sna (t) =∑{j:tj≤t}

(−1)Raj [Uaj /(

√nδn)]δn, Snb (t) =

∑{j:tj≤t}

(−1)Rbk [U bk/(

√nδn)]δn).

We will also define

Sn (t) = Sna (t) + Snb (t) + sn(0),

Mn (t) = Sna (t)− Snb (t) + mn(0),

(so by convention we set t0 = 0 and Sn (0) = sn(0)). Also, we define

Rn1 (t) = Sn (t)−min(Sn (u) : u ≤ t, 0).

Let A = inf{t ≥ 0 : Na (t) ≥ 1} and B = inf{t ≥ 0 : Nb (t) ≥ 1} be the first arrival times of Na (·)and Nb (·), respectively. Since

Sn(tk) = (∆na(Sn(tk)) + ∆n

b (Sn(tk)) + Sn(tk))+,

we have that on min(A,B) > tSn(t) = Rn (t) .

The strategy proceeds as follows. Step 1): Show that if (sn(0), mn(0))⇒ (s(0), m(0)), the processes(Sn (t) ,Mn (t) : t ≥ 0) converges weakly in D[0,∞) to the process (X (t) : t ≥ 0) defined via

X1 (t) = s(0) + η(t) +Wa (t) +Wb (t) ,

X2 (t) = m(0) +Wa (t)−Wb (t) .

34

Page 35: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

Step 2): Once Step 1) has been executed we can directly apply the continuous mapping principleto conclude joint weak convergence on [0,min(A,B)) of the processes

Rn(·)⇒ R (·) := X1 (·)−min (X1 (u) : 0 ≤ u ≤ ·) ,Mn(·)⇒ X2 (·) .

Step 3): By invoking the Skorokhod embedding theorem, we can assume that the joint weakconvergence in Step 2) occurs almost surely. We can add the jump right at time min (A,B) withoutchanging the distribution of X1 (t) and Sn(t) for t < min (A,B). More precisely, define

D = I (A < B)R (A)V a/2 + I (B ≤ A)R (B)V b/2,

where V b and V a are i.i.d. copies of V bk and V a

k respectively, and we also define

Dn = I (A < B) [Rn (A)V a/(2δ)]δ + I (B ≤ A) [Rn (B)V b/(2δ)]δ.

Then put on t ∈ [0,min (A,B)]

Sn(t) = Rn (t) I (t < min (A,B)) + I (t = min (A,B)) (Rn (min (A,B)) +Dn),

s(t) = R (t) I (t < min (A,B)) + I (t = min (A,B)) (R (min (A,B)) +D),

Mn(t) = Mn(t)I (t < min (A,B))

+ I (t = min (A,B)) I (A < B) [Rn (A)V a/(2δ)]δ

− I (t = min (A,B)) I (B ≤ A) [Rn (B)V b/(2δ)]δ,

m(t) = X2(t)I (t < min (A,B))

+ I (t = min (A,B)) I (A < B)R (A)V a/2δ

− I (t = min (A,B)) I (B ≤ A)R (B)V b/2δ.

So, assuming Step 2) and using Skorokhod embedding we then conclude that

sup0≤t≤min(A,B)

∣∣∣Sn(t)− s(t)∣∣∣+ sup

0≤t≤min(A,B)

∣∣∣Mn(t)− m(t)∣∣∣→ 0

almost surely. Step 4): Finally, note that the convergence extends throughout the interval [0, t] byrepeatedly applying Steps 1) to 3) given that there are only finitely many jumps in [0, t], which isguaranteed by tightness of processes Na(t) and Nb(t) as the scaled limits of Hawkes processes withstationarity condition δ/κ < 1 imposed. Clearly then this procedure completes the constructionto the solution of the SDE (8). So, we see that everything rests on the execution of Step 1), andfor this we invoke the martingale central limit theorem (see Ethier and Kurtz [9], Theorem 7.1.4).Define

Zak (n) = (−1)Rak[Uak /

(√nδn)]δn, Zbk(n) = (−1)R

bk

[U bk/

(√nδn)]δn.

We have that

EZak (n) = EZbk(n) =(−2β/

√n)E([Uak /

(√nδn)]δn)

=−2βE (Uak )

n+ o (1/n)

andV ar (Zak (n)) = E

([Uak /

(√nδn)]2

δ2n

)−O (1/n) .

35

Page 36: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

Let µan(t) (µbn(t)) denote the intensity of market sell (buy) order at time t, and let Nan(t) (N b

n(t))denote the counting process of market sell (buy) order arrivals in the n-th system. By the scalingspecified in Assumption 3, we have

dµan(t) = −κ(µan(t)− nµ)dt+ δ1dNan(t) + δ2dN

bn(t),

dµbn(t) = −κ(µbn(t)− nµ)dt+ δ1dNbn(t) + δ2dN

an(t),

and µan(0) = µbn(0) = nµ0 > 0. Let µn(t) := µan(t) + µbn(t) and Nn(t) := Nan(t) +N b

n(t), thus{dµn(t) = −κ(µn(t)− 2nµ)dt+ (δ1 + δ2)dNn(t),

µn(0) = 2nµ0

(17)

Let µn(t) = µn(t)/n, and Λn(t) =∫ t

0 µn(s)ds. Since Zak (·) is independent of Nn(·), we use standardthe strong approximation results for random walks and renewal processes, see Chapter 2 of [8] andTheorem 2.1 c) in [14], to obtain

Sna (t) =

Nn(t)∑k=1

Zak (n) =

N(∫ t0 µn(s)ds)∑k=1

Zak (n)

= E [Zak (n)]

∫ t

0µn(s)ds+

√V ar(Zak (n))B

(∫ t

0µn(s)ds

)+ o

((∫ t

0µn(s)ds

)1/2)

= −2βE[Uak ] · Λn(t) +

√E[(Uak )2]√

nB(nΛn(t)

)+ o

((nΛn(t)

)1/2)

Let δ = δ1 + δ2, then

µn(t)− 2µ0 =

∫ t

0κ (2µ− µn(s)) ds+ δ

Nn(t)

n

By the law of large numbers, Nn(·)/n⇒ E[N(·)] =∫ t

0 µ(s)ds on D(0,∞), where N(t) is a Hawkesprocess with rate µ(t) such that

dµ(t) = −κ(µ(t)− 2µ)dt+ δdN(t),

µ(0) = 2µ0.

In particular, a standard convergence argument as in, for instance, Section 3.3 in Ethier and Kurtz[9] allows to conclude that µ(t) satisfies the following equation,

µ(t)− 2µ0 =

∫ t

0κ (2µ− µ(s)) ds+ δ

∫ t

0µ(s)ds.

And by the continuous mapping theorem, µn(·) converges to µ∞(·), which is the solution to theODE:

µ∞(t)− 2µ =

∫ t

0κ (2µ− µ∞(s)) ds+ δ

∫ t

0µ(s)ds,

from which we solve µ∞ = µ. Since we can solve the ODE

µ(t) =2κµ

κ− δ +

(2µ0 −

2κµ

κ− δ

)e−(κ−δ)t.

36

Page 37: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

Applying the continuous mapping theorem again, we have as n→∞

Λn(t)⇒ Λ∞(t) =

∫ t

0µ(s)ds =

2κµt

κ− δ −2µ0 − 2κµ

κ−δκ− δ

(e−(κ−δ)t − 1

).

Thus

Sna (t)⇒ −2βE[Uak ]Λ∞(t) +√

E[(Uak )2] ·B1

(Λ∞(t)

)under the Skorokhod topology on compact sets. A completely analogous strategy is applicable toconclude

Snb (t)⇒ −2βE[U bk]Λ∞(t) +√E[(U bk)2] ·B2

(Λ∞(t)

),

where B1(·) and B2(·) are two independent Brownian motions. Let

η(t) = −2β(E[Uak ] + E[U bk]

)· Λ∞(t)

Wa(t) =√

E[(Uak )2] ·B1

(Λ∞(t)

)Wb(t) =

√E[(U bk)2] ·B2

(Λ∞(t)

)then the convergence holds jointly due to independency of limit processes and therefore we obtainthe conclusion required in Step 1). As indicated earlier, Steps 2) to 4) now follow directly. Notethat now J1 and J2 are compound Poisson processes with jump intensity γµ(t).

Proof of Proposition 2. For simplicity, we assume that ξ = 1, otherwise we can divide S,M and s, m by the constant ξ. Assume also that V a

k ≤ c for some c ≥ 1. The general case canbe dealt with using truncation because there are only a Poisson number of jumps that arise inO (1) time. Now, let us first give a bound for the difference sn(·) − Sn(·). For fixed n, we defineN(t) =

∑{j:tj≤t}(I

aj + Ibj ), intuitively N(t) corresponds to the number of jumps of the limiting

process from time 0 to time t. Now we prove by induction that

0 ≤ sn(tk)− Sn(tk) ≤(

(1 + c)N(tk) − 1)· 2√

n. (18)

At t0 = 0, we have Sn(t0) = s(t0). Now suppose the relation (18) holds at time tk−1, there are twocases at time tk, case 1): N(tk) = N(tk−1), and case 2): N(tk) > N(tk−1). First let us considerthe case when N(tk) = N(tk−1). In this case, we know that ∆n

a(sn(tk−1)) = ∆na(S(tk−1)) := ∆n

a

is independent of sn(tk−1) and Sn(tk−1). Also, keep in mind that |∆na (tk)| ≤ 1/

√n. Now we can

write the increment of the difference process

(sn(tk)− Sn(tk))− (sn(tk−1)− Sn(tk−1))

= ∆na ∨ ([−sn(tk)/(2δ)]δ) + ∆n

b ∨ ([−sn(tk)/(2δ)]δ)− (∆na + ∆n

b ) ∨ (−Sn(tk−1)). (19)

Therefore, if sn(tk−1) ≥ Sn(tk−1) ≥ 2/√n, we have

(sn(tk)− Sn(tk))− (sn(tk−1)− Sn(tk−1)) = ∆na + ∆n

b − (∆na + ∆n

b ) = 0

and as a result sn(tk−1) − Sn(tk−1) = sn(tk) − Sn(tk). If sn(tk−1) ≥ 2/√n ≥ Sn(tk−1) ≥ 0, We

have

(sn(tk)− Sn(tk))− (sn(tk−1)− Sn(tk−1))

= ∆na + ∆n

b − (∆na + ∆n

b ) ∨ (−Sn(tk−1)) = −(Sn(tk−1) + ∆na + ∆n

b )− ≤ 0.

37

Page 38: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

Therefore,

sn(tk−1)− Sn(tk−1) ≥ sn(tk)− Sn(tk) ≥2√n− (Sn(tk−1) + ∆n

a + ∆nb ) ≥ 0.

Otherwise, we have 0 ≤ Sn(tk−1) ≤ sn(tk−1) ≤ 2/√n. In this case, one can check that for any

fixed Sn(tk) = s and sn(tk) = s, the increment of the difference process (19) reaches its maximumat ∆n

a = −1//√n and ∆n

b = −s+ /√n and its minimum at ∆n

a = ∆nb = −[s/2δ]δ. Hence,

s− s ≤(sn(tk)− Sn(tk)

)−(sn(tk−1)− Sn(tk−1)

)≤ 0 ∨ (

1√n− s

2− s).

Plugging in Sn(tk) = s and sn(tk) = s, we have

0 ≤ sn(tk)− Sn(tk) ≤ (s− s) ∨ (1√n

+s

2) ≤ (sn(tk−1)− Sn(tk−1)) ∨ 2√

n.

The last inequality holds as s = sn(tk−1) ≤ 2/√n. In summary, we have proved that when N(tk) =

N(tk−1), if the relation (18) holds at time tk−1, so does it at time tk. Now if N(tk) ≥ N(tk−1) + 1,intuitively, at least one jump occurs in ∆n

a and ∆nb . If Iak = 1 we have

∆na(Sn(tk−1)) = IakV

ak [Sn(tk−1)/(2δ)]δ, and ∆n

a(sn(tk−1)) = IakVak [sn(tk−1)/(2δ)]δ.

If in addition Ibk = 1, then

∆nb (Sn(tk−1)) = IbkV

bk [Sn(tk−1)/(2δ)]δ, and ∆n

b (sn(tk−1)) = IbkVbk [sn(tk−1)/(2δ)]δ,

and therefore,sn(tk)− Sn(tk) ≤ (sn(tk−1)− Sn(tk−1))(IakV

ak + IbkV

bk + 1)/2.

As

0 ≤ V ak + V b

k + 1 ≤ 2c+ 1 and 0 ≤ sn(tk−1)− Sn(tk−1) ≤ ((c+ 1)N(tk−1) − 1) · 2√n,

by the induction assumption we have

0 ≤ sn(tk)− Sn(tk) ≤ ((c+ 1)N(tk−1) − 1)(2c+ 1) · 2√n≤ ((c+ 1)N(tk) − 1) · 2√

n.

If Ibk = 0, then following a similar argument as in the case when N(tk) = N(tk−1), we have

sn(tk)− Sn(tk) ≤ (sn(tk−1)− Sn(tk−1))(V ak + 1) + (sn(tk−1)− Sn(tk−1)) ∨ 2√

n

= ((c+ 1)N(tk) − 1) · 2√n

as c ≥ 1.

In summary, we have proved the relation (18) of Sn(·) and sn(·) by induction. Now let us turn tothe difference mn(·)− Mn(·). Actually, mn(t)− Mn(t) can be decomposed into two parts,

mn(t)− Mn(t)

≤∑

0≤k≤[nt]:N(tk+1)=N(tk)

[∆na(tk) ∨ ([sn(tk−1)/(2δ)]δ)−∆n

b (tk) ∨ ([sn(tk−1)/(2δ)]δ)− (∆na(tk)−∆n

b (tk))]

+

N(t)∑i=1

([sn(tk−1)/(2δ)]δ − [Sn(tk−1)/(2δ)]δ)(IakiVaki

+ IbkiVbki

),

38

Page 39: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

where {tki} are the jump times. We denote the two summation parts as

mn(t)− Mn(t) = εn0 (t) + εn1 (t).

Intuitively, εn0 (t) is the error corresponding to the diffusion part when Iak = Ibk = 0 and εn1 (t)is the error corresponding to the jumps. In the summation part εn0 (t), we write ∆n

a(sn(tk)) =∆na(Sn(tk)) = ∆n

a(tk) , because they are independent of sn(tk) and Sn(tk) when when Iak = Ibk = 0.Following a same induction argument as for sn − Sn, we can show that the error caused by jumpsεn1 (t) satisfies that

εn1 (t) ≤ ((1 + 2c)N(t) − 1) · 2√n.

On the other hand, note that εn0 (t) equals∑0≤k≤[nt]:N(tk+1)=N(tk)

[∆na(tk) ∨ ([sn(tk−1)/(2δ)]δ)−∆n

b (tk) ∨ ([sn(tk−1)/(2δ)]δ)− (∆na(tk)−∆n

b (tk))]

=∑

0≤k≤[nt]:N(tk+1)=N(tk)

[(∆na(tk) ∨ ([sn(tk−1)/(2δ)]δ)−∆n

a(tk))− (∆nb (tk) ∨ ([sn(tk−1)/(2δ)]δ)−∆n

b (tk))] .

Since ∆na(tk) and ∆n

b (tk) are independent and identically distributed, we have that for any k ≥ 1

E[εn0 (tk)− εn0 (tk−1)|Fntk ] = P (N(tk) = N(tk−1)) · (E [∆na(tk) ∨ ([sn(tk−1)/(2δ)]δ)−∆n

a(tk)]

−E [∆nb (tk) ∨ ([sn(tk−1)/(2δ)]δ)−∆n

b (tk)]) = 0,

where Fntk is the σ-field generated by {∆na(ti),∆

nb (ti), S

n(ti)}ki=1. Therefore, the process εn0 (·) is amartingale under the filtration Fn. Besides, as |∆n

a(tk)| ≤ 1/√n when N(tk) = N(tk−1), we have

|εn0 (tk)− εn0 (tk−1)| ≤ 2√nI(sn(tk−1) <

2√n

).

The quadratic variation

[εn0 ](t) ≤ 4

n

[nt]∑i=0

I(sn(ti) <2√n

).

Recall that we have proved sn(·) ≥ Sn(·),

[εn0 ](t) ≤ 4

n

[nt]∑i=0

I(Sn(ti) <2√n

).

Since 2/√n→ 0, for any ζ > 0 we have

limn→∞

[εn0 ](t) ≤ limn→∞

4

∫ t

0I(Sn(u) < ζ)du ≤ lim

n→∞4

∫ t

0f ζ(Sn(u))du,

where f ζ(·) is a smooth function on R+ and satisfies f(x) = 1 for all 0 ≤ x ≤ ζ, 0 ≤ f(x) ≤ 1for ζ ≤ x ≤ 2ζ and f(x) = 0 for x > 2ζ. (Such function can be constructed, for instance, byconvolution.) Since f ζ(·) is bounded and Sn(·) converges weakly to the limit process (8), we have

limn→∞

E[[εn0 ](t)] ≤ limn→∞

4E[

∫ t

0f ζ(Sn(u))du] = 4E[

∫ t

0f ζ(s(u))du] ≤ 4E[

∫ t

0I(s(u) ≤ 2ζ)],

39

Page 40: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

As the limit process s(·) has the same dynamics as a reflected Brownian motion except when at thefinite time of jumps on [0, t], we have E[

∫ t0 I(s(u) ≤ 2ζ)]→ 0 as ζ → 0. Since ζ can be arbitrarily

small, we conclude that the expected quadratic variation E[[εn0 ](t)] → 0 as n → 0 for any t < ∞.By Doob’s Inequality, we have that for all fixed ζ > 0,

P ( max0≤u≤t

|εn0 (u)| > ζ) ≤ E[[εn0 ](t)]

ζ2→ 0.

Therefore, εn0 (·) converges weakly to x(·) ≡ 0 in space D[0, t] for all t <∞. In the end, the countingprocess N(·) converges to a time inhomogeneous Poisson process with rate γµ(t), where µ(t) is givenin the proof of Proposition 1. Therefore for any t <∞

E[ max0≤u≤t

|((2c+ 1)N(u) − 1)2√n|] = E[((2c+ 1)N(t) − 1)

2√n

] = O(1√n

).

As a result, the process ((2c + 1)N(·) − 1) 2√n

converges weakly to x(·) ≡ 0 in space D[0, t]. Recall

that we have proved that ((2c + 1)N(·) − 1) 2√n

is an upper bound of |sn(·) − Sn(·)| and the ‘jump

part’ of |mn(·) − Mn(·)|. As a consequence, we can conclude that the difference process (sn(·) −Sn(·), mn(·)− Mn(·)) converges weakly to (0, 0) on any compact interval [0, t].

References

[1] F. Abergel and A. Jedidi. Econophysics of Order-Driven Markets, chapter A mathematicalapproach to order book modelling, pages 93–107. Springer, 2011.

[2] F. Abergel and A. Jedidi. Long-time behavior of a Hawkes process–based limit order book.SIAM Journal on Financial Mathematics, 6(1):1026–1043, 2015.

[3] C. Bayer, U. Horst, and J. Qiu. A functional limit theorem for limit order books with statedependent price dynamics. Preprint, 2015.

[4] P. Billingsley. Convergence of Probability Measures, 2nd Edition. Wiley, 1999.

[5] R. Cont and A. De Larrand. Price dynamics in a Markovian limit order book market. SIAMJournal on Financial Mathematics, 4:1–25, 2013.

[6] R. Cont, S. Stoikov, and R. Talreja. A stochastic model for order book dynamics. OperationsResearch, 58:549–563, 2010.

[7] R. Cont, A. Kukanov, and S. Stoikov. The price impact of order book events. Journal foFinancial Econometrics, 2013.

[8] M. Csorgo and P. Revesz. Strong Approximations in Probability and Statistics. AcademicPress, 1981.

[9] S. Ethier and T. Kurtz. Markov Processes: Characterization and Convergence. Wiley, 1986.

[10] M. Gould, M. Porter, S. Wiliams, M. McDonald, D. Fenn, and S. Howison. Limit order books.Quantitative Finance, 13:1709–1742, 2013.

[11] J. Hasbrouck and G. Saar. Technology and liquidity provision: The blurring of traditionaldefinitions. Journal of Finanical Markets, 12:143–172, 2009.

40

Page 41: Unraveling Limit Order Books Using Just Bid/Ask Pricesjblanche/papers/LOB_v1.pdf · Unraveling Limit Order Books Using Just Bid/Ask Prices Jose Blanchet, Xinyun Chen and Yanan Pei

[12] N. Hautsch and R. Huang. The market impact of a limit order. Journal of Economic Dynamicsand Control, 36:501–522, 2012.

[13] A. G. Hawkes and D. Oakes. A cluster process representation of a self-exciting process. Journalof Applied Probability, 11(3):493–503, 1974.

[14] L. Horvath. Strong approximation of renewal processes. Stochastic Processes and their Appli-cations, 18(1):127–138, 1984.

[15] W. Huang and M. Rosenbaum. Ergodicity and diffusivity of Markovian order book models: ageneral framework. Preprint, 2015.

[16] W. Huang, C.-A. Lehalle, and M. Rosenbaum. Simulating and analyzing order book data: Thequeue-reactive model. Journal of the American Statistical Association, 110:107–122, 2015.

[17] T. G. Kurtz. Applied Stochastic Analysis, chapter Averaging for martingale problems andstochastic approximation, pages 186–209. Springer, 1992.

[18] A. Lachapelle, J.-M. Lasry, C.-A. Lehalle, and P.-L. Lions. Efficiency of the price formationprocess in presence of high frequency participants: a mean field game analysis. Preprint, 2014.

[19] P. Lakner, J. Reed, and S. Stoikov. High frequency asymptotic for the limit order book.Preprint, 2013.

[20] M. Melvin, L. Menkhoff, and M. Schmeling. Exchange rate management in emerging markets:intervention via an electronic limit order book. Journal of International Economics, 79:54–63,2009.

[21] O. Obizhaevam and J. Wang. Optimal trading strategies and supply/demand dynamics. Jour-nal of Financial Markets, 16(1):1–32, 2013.

[22] SEC. Investor bulletin: Measures to address market volatility. URLhttps://www.sec.gov/investor/alerts/circuitbreakersbulletin.htm.

[23] E. Smith, D. J. Farmer, L. Gillemot, and S. Krishnamurthy. Statistical theory of the continuousdouble auction. Quantitative Finance, 3:481–514, 2003.

[24] I. M. Toke. The order book as a queueing system: average depth and influence of the size oflimit orders. Quantitative Finance, 15:795–808, 2015.

41