Data Wrangling Kung Fu With pandas (PyData SV 2013)

Preview:

Citation preview

Data Wrangling Kung Fu with pandas

Wes McKinney@wesmckinn

PyData Conference 2013, 2013-3-20

Saturday, April 13,

Agile Tools for Real World Data

Wes McKinney

Python for Data Analysis

Saturday, April 13,

Me

• Started pandas in 2008

• Other Python projects I’ve been involved with: statsmodels, vbench, gpustats

• http://blog.wesmckinney.com

• New project in 2013...stay tuned

Saturday, April 13,

Observations

• Data often in wrong format for analysis

• Storage format frequently not Analysis format

• Data preparation bottleneck in many workflows

Saturday, April 13,

pandas

• Productivity-focused structured data manipulation tools for Python

• Fast, intuitive data structures

• Filling the gap between Python and more domain-specific languages like R

• Huge growth in 2011-2012, continuing in 2013

Saturday, April 13,

Agenda

• Data reshaping

• Hierarchical indexing

• GroupBy

• Miscellaneous munging

Saturday, April 13,

Recommended