21
Productive Data Tools for Quants Wes McKinney @wesmckinn Python in Finance 2013, 2013-04-05

Productive Data Tools for Quants

Embed Size (px)

Citation preview

Page 1: Productive Data Tools for Quants

Productive Data Tools for Quants

Wes McKinney@wesmckinn

Python in Finance 2013, 2013-04-05

Page 2: Productive Data Tools for Quants

Me

• Started pandas project at AQR in 2008

• Other Python projects I’ve been involved with: statsmodels, vbench, gpustats

• http://blog.wesmckinney.com

• Currently: Founder of stealth SF data startup

Page 3: Productive Data Tools for Quants

Book

• In print now!

• IPython

• NumPy

• pandas

• matplotlib

• Case studies

Page 4: Productive Data Tools for Quants

Finance languages

Page 5: Productive Data Tools for Quants

pandas

• Productivity-focused structured data manipulation tools for Python

• Fast, intuitive data structures

• Filling the gap between Python and more domain-specific languages like R

• Huge growth in 2011-2012, continuing in 2013

Page 6: Productive Data Tools for Quants

Productivity, why do we care?

Page 7: Productive Data Tools for Quants

People time = money

Page 8: Productive Data Tools for Quants

Productive not same as high performance

Page 9: Productive Data Tools for Quants

Tool bottlenecks impede innovation

Page 10: Productive Data Tools for Quants

Aside: vbench for performance testing

Page 11: Productive Data Tools for Quants

(Some) financial data challenges

• Metadata and data alignment

• “Missing” data

• Group Operations

• Time series

Page 12: Productive Data Tools for Quants

Data alignment

• Stock universes

• Timestamps

Page 13: Productive Data Tools for Quants

Let’s talk about...

Page 14: Productive Data Tools for Quants

Let’s talk about...

a - b

Signal 1 Signal 2

Page 15: Productive Data Tools for Quants

Let’s talk about...

sum(a - b) / mean(c)

Page 16: Productive Data Tools for Quants

a - b• Same length?

• Same metadata?

• Same frequency?

Data alignment

Assumptions can be dangerous

Page 17: Productive Data Tools for Quants

Data alignment• pandas uses axis indexing to specify default

join (“automatic data alignment”) behavior

B

C

D

E

1

2

3

4

A

B

C

D

0

1

2

3

+ =

A

B

C

D

NA

2

4

6

E NA

Page 18: Productive Data Tools for Quants

Hierarchical indexes

• Semantics: a tuple at each tick

• Enables easy group selection

• Terminology: “multiple levels”

• Natural part of GroupBy and reshape operations

A 1

2

3

1

2

3

4

B

Page 19: Productive Data Tools for Quants

Missing data

• Interpolation (esp. time series)

• Dropping / filtering

• Replacing with value

• Excluding from statistical computations

Page 20: Productive Data Tools for Quants

Time series

• Data alignment

• Frequency conversions

• Date arithmetic

• Resampling

• Time zones

• “As of” joins and lookups

Page 21: Productive Data Tools for Quants

GroupBy

A 0

B 5

C 10

5

10

15

10

15

20

A

A

A

B

B

B

C

C

C

A 15

B 30

C 45

A

B

C

A

B

C

0

5

10

5

10

15

10

15

20

sum

ApplySplit

Key

Combine

sum

sum