Upload
wes-mckinney
View
877
Download
2
Embed Size (px)
Citation preview
Productive Data Tools for Quants
Wes McKinney@wesmckinn
Python in Finance 2013, 2013-04-05
Me
• Started pandas project at AQR in 2008
• Other Python projects I’ve been involved with: statsmodels, vbench, gpustats
• http://blog.wesmckinney.com
• Currently: Founder of stealth SF data startup
Book
• In print now!
• IPython
• NumPy
• pandas
• matplotlib
• Case studies
Finance languages
pandas
• Productivity-focused structured data manipulation tools for Python
• Fast, intuitive data structures
• Filling the gap between Python and more domain-specific languages like R
• Huge growth in 2011-2012, continuing in 2013
Productivity, why do we care?
People time = money
Productive not same as high performance
Tool bottlenecks impede innovation
Aside: vbench for performance testing
(Some) financial data challenges
• Metadata and data alignment
• “Missing” data
• Group Operations
• Time series
Data alignment
• Stock universes
• Timestamps
Let’s talk about...
Let’s talk about...
a - b
Signal 1 Signal 2
Let’s talk about...
sum(a - b) / mean(c)
a - b• Same length?
• Same metadata?
• Same frequency?
Data alignment
Assumptions can be dangerous
Data alignment• pandas uses axis indexing to specify default
join (“automatic data alignment”) behavior
B
C
D
E
1
2
3
4
A
B
C
D
0
1
2
3
+ =
A
B
C
D
NA
2
4
6
E NA
Hierarchical indexes
• Semantics: a tuple at each tick
• Enables easy group selection
• Terminology: “multiple levels”
• Natural part of GroupBy and reshape operations
A 1
2
3
1
2
3
4
B
Missing data
• Interpolation (esp. time series)
• Dropping / filtering
• Replacing with value
• Excluding from statistical computations
Time series
• Data alignment
• Frequency conversions
• Date arithmetic
• Resampling
• Time zones
• “As of” joins and lookups
GroupBy
A 0
B 5
C 10
5
10
15
10
15
20
A
A
A
B
B
B
C
C
C
A 15
B 30
C 45
A
B
C
A
B
C
0
5
10
5
10
15
10
15
20
sum
ApplySplit
Key
Combine
sum
sum