Algorithmic Trading with Learning - Ryerson Universityt"T)trader learns the realized value of D...

Preview:

Citation preview

Algorithmic Trading with Learning

Ryerson University

Damir Kinzebulatov1

(Fields Institute)

joint work with

Alvaro Cartea (University College London) and

Sebastian Jaimungal (University of Toronto)

1www.math.toronto.edu/dkinz1 / 43

Asset price St

Suppose that at time t < T trader has a prediction ST about ST .

ST is a random variable

e.g. in High Frequency trading, using Data Analysis algorithms:

ST − S0 =

2 · 10−2 prob 0.1

10−2 prob 0.20 prob 0.55

−10−2 prob 0.1−2 · 10−2 prob 0.05

2 / 43

Naive strategy:

if E[ST ] > St ⇒ buy

Advanced strategy:

– would incorporate prediction ST in the asset price process St

– would learn from the realized dynamics of the asset price

3 / 43

– incorporate prediction ST in the asset price process St . . .

A three point prediction... ST = −5, 0, 5 with prob 0.7, 0.2, 0.1

0 0.2 0.4 0.6 0.8 1−10

−5

0

5

10

Time

Midprice

4 / 43

Story 1: Asset price as a randomized Brownian bridge

5 / 43

Recall:

Brownian bridge βtT is a Gaussian process such that

β0T = βTT = 0, βtT ∼ N(

0,t

T(T − t)

)

6 / 43

Algorithmic trading with learning – our model

St is a “randomized Brownian bridge”

St = S0 + σβtT +t

TD

D – random change in asset price (distribution of D is known a priori)

βtT – Brownian bridge (‘noise’) independent of D

Thus, ST = S0 +D

t ↑ T ⇒ trader learns the realized value of D

7 / 43

Insider trading is not possible

Let Ft = (Su)u6t

Trader has access only to filtration Ft (but not to the filtration of βtT )

⇒ trader can’t distinguish between noise βtT and D

8 / 43

What about the standard model?

St = S0 + σWt (“arithmetic BM”)

corresponds to the choice D ∼ N(0, σ2T )

9 / 43

Proposition: Asset price St satisfies

dSt = A(t, St) dt+ σ dWt, St|t=0 = S0,

where Wt is an Ft-Brownian motion,

A(t, S) =E[D|St = S] + S0 − S

T − t

and

E[D|St = S] =

∫x exp

(x S−S0σ2(T−t) − x

2 t2σ2T (T−t)

)µD(dx)∫

exp(x S−S0σ2(T−t) − x2 t

2σ2T (T−t)

)µD(dx)

.

10 / 43

Story 2: Trader’s optimization problem

(high-frequency trading)

11 / 43

Market microstructure: Limit Order Book

Oxford Centre for Industrial and Applied Mathematics:

An order matching a sell limit order is called a buy market order (notshown, because it is executed immediately!)

12 / 43

Market microstructure: Limit Order Book

To summarize:

– use buy market orders (MO) ⇒ pay higher prices– use buy limit orders (LO) ⇒ pay lower prices, but have to wait . . .

(similarly for sell LO and sell MO)

13 / 43

Trader’s optimization problem: Strategy

Simplifying assumptions (not crucial)

– at each t post LOs & MOs for 0 or 1 units of asset, at best bid/ask price

⇒ trader’s strategy has 4 components:

`+t ∈ {0, 1} (sell LO)

`−t ∈ {0, 1} (buy LO)

m−t ∈ {0, 1} (buy MO)

m+t ∈ {0, 1} (buy MO)

– the spread is constant

14 / 43

Key quantities

Inventory:

Qt = −∫ t

0`+t dN

+t +

∫ t

0`−t dN

−t −m

+t +m−

t

where Poisson processes N+t , N−

t count the number of filled sell, buy LOs

Cash process

Xt =−∫ t

0

(St − ∆

2

)`−t 1{Qt6Q} dN

−t

+

∫ t

0

(St + ∆

2

)`+t 1{Qt>Q} dN

+t

−∫ t

0

(St + ∆

2 + ε)1{Qt6Q} dm

−t

+

∫ t

0

(St − ∆

2 − ε)1{Qt>Q} dm

+t

where ∆ = spread, ε is transaction fee for market order, St = midprice15 / 43

Constraints on inventory:

Q 6 Qt 6 Q and QT = 0

16 / 43

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.96

0.98

1

1.02

1.04

Time (t)

Asset

price

(S)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−20

−10

0

10

20

Inventory

(Q)

17 / 43

Trader’s optimization problem: Goal

Goal: find

sup{`±t }t≤T ,{m±

t }t≤T

E[XT +QT

(ST − ∆

2 sgn(QT )− αQT

)](1)

– 1st term: cash from trading– 2nd term: profit/cost from closing the position at T

So far midprice St was any process . . . We want RBB

St = S0 + σβtT +t

TD

18 / 43

Dynamic programming

Since RBB St satisfies an SDE

dSt = A(t, St) dt+ σ dWt

we can use Dynamic Programming to solve the optimization problem

19 / 43

Dynamic programming

Goal: find the value function

H(t, S,Q,X) =

sup`±· ,m±

·

E[XT +QT

(ST − ∆

2 sgn(QT )− αQT

) ∣∣∣∣St = S,Qt = Q,Xt = X

]

20 / 43

Dynamic programming

The value function H admits presentation

H(t,X, S,Q) = X +QS + g(t, S,Q)

where g solves (in viscosity sense) system of non-linear PDEs

0 = max{∂tg +

12σ2∂SSg +A(t, S) (Q+ ∂Sg)− ϕQ2

+1Q<Qmax`−∈{0,1} λ− [`−∆

2+ g(t, S,Q+ `−)− g

]+1Q>Qmax`+∈{0,1} λ

+[`+ ∆

2+ g(t, S,Q− `+)− g

];

max{−∆2− ε+ g(t, S,Q+ 1)− g,

−∆2− ε+ g(t, S,Q− 1)− g, 0}

}.

subject to terminal condition

g(θ, S,Q) = −∆2|Q| − αQ2, Q 6 Q 6 Q

21 / 43

Example

22 / 43

Example

Informed trader (IT) believes that

D =

{0.02 with prob 0.8−0.02 with prob 0.2

Compare the performance of IT trader with

– uninformed trader (UT) who views

D ∼ N(0, σ2T )

(i.e. St is an arithmetic BM)

– uninformed with learning (UL) who believes

D = 0.02,−0.02 with prob 0.5, 0.5

23 / 43

Example

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.96

0.98

1

1.02

1.04

Time (t)

Asset

price

(S)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−20

−10

0

10

20

Inventory

(Q)

The strategy of UT

who views the midprice as a Brownian motion

24 / 43

Example

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.96

0.98

1

1.02

1.04

Time (t)

Asset

price

(S)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−20

−10

0

10

20

Inventory

(Q)

The strategy of UL

who views D = −0.02, 0.02 with prob 0.5

25 / 43

Example

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.96

0.98

1

1.02

1.04

Time (t)

Asset

price

(S)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−20

−10

0

10

20

Inventory

(Q)

The strategy of IT

who views D = −0.02, 0.02 with prob 0.2, 0.8

Note: for large volatility IT stops learning.

26 / 43

Example

0.02 0.04 0.06 0.08 0.10.2

0.25

0.3

0.35

0.4

0.45

Std of P&L

MeanP&L

IwLUwLUwoL

Bounds oninventory areincreasing

Risk-Reward profiles for the three types of agents as inventory bound increases

27 / 43

Example

0 5 10 15 200

1

2

3

4

# of time interval

l.o. buy

l.o. sellm.o. buy

m.o. sell

UT: the mean executed Limit and Market orders

28 / 43

Example

0 5 10 15 200

1

2

3

4

# of time interval

l.o. buy

l.o. sellm.o. buy

m.o. sell

UL: the mean executed Limit and Market orders

29 / 43

Example

0 5 10 15 200

1

2

3

4

# of time interval

l.o. buy

l.o. sellm.o. buy

m.o. sell

IT: the mean executed Limit and Market orders

30 / 43

Multiple assets

31 / 43

Multiple assets

Asset midprices S are randomized Brownian bridges

S(i)t = S

(i)0 + σ(i) β

(i)tT +

t

TD(i)

β(i)tT − mutually independent std. Brownian bridges

D(i) − the random change in asset prices – may have dependence

– asset prices interact non-linearly through D = (D(i))

– IT may trade in an asset that has high volatility, and in which they aremarginally uniformed, but can learn joint information from a second, lessvolatile, asset

32 / 43

Multiple assets

For illustration purposes...

Probability of outcomes

D(1)

-0.02 +0.02

D(2) -0.02 0.45 0.05

+0.02 0.05 0.45

σ(1) = 0.02 and σ(2) = 0.01

With observing solely S(1) or S(2) the agent is uniformed

33 / 43

Multiple assets

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.98

1

1.02

1.04

Time (t)

Asset

price

(S)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−10

−5

0

5

10

Inventory

(Q)

The strategy of trader who excludes Asset 2 from their info

34 / 43

Multiple assets

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.98

1

1.02

1.04

Time (t)

Asset

price

(S)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−10

−5

0

5

10

Inventory

(Q)

The strategy of trader who includes Asset 2 in their info

35 / 43

Conclusions

– Agents who have info can outperform other traders

– We show how to trade when info is uncertain

– Optimal strategy learns from midprice dynamics and outperforms naivestrategies

– Including info from other assets can add value to assets in which learningdoes not help

Thank you!

www.math.toronto.edu/dkinz

36 / 43