Upload
wes-mckinney
View
77.775
Download
3
Tags:
Embed Size (px)
DESCRIPTION
by Wes McKinney
Citation preview
Python and Data:What’s next?
Wes McKinney@wesmckinn
PyCon Singapore 2013-06-14
Me
2013: Analytics Startup in SF
Book
• Python essentials
• NumPy
• IPython
• matplotlib
• pandas
Published October 2012
Some context
• 2007 to 2013
• NumPy, SciPy mature
• IPython Notebook
• Key libraries/tools developed: scikit-learn, statsmodels, PyCUDA, ...
• pandas helps make Python a desirable data preparation language
pandas
• Fast structured data manipulation tools for Python with nice API
• Goal: make Python a halfway decent language for data preparation / statistical analysis
• Sometimes say: “R data frames in Python”
• Fast-growing user base / community
Aside: vbench
Tool inefficiency impedes innovation
What can
tell us about Python?
Some Trends
• Decline of Desktop, Rise of Web/Cloud
• SVG / HTML5 Canvas / WebGL Tech
• Big Data
• JIT-compile all the things
• Democratize all the things
Challenge:Keeping Python relevant
Data on the Web
• Nirvana: ubiquitous, easy data analysis
• Challenges
• JavaScript: weak language for implementing analytics
• Computation needs to run “close” to data
• Maintaining interactivity
Golden age for web visualization
SVG
Embracing the JavaScript
• Build bridges, not walls
• Some examples
• IPython Notebook
• RStudio
• Rob Story’s pandas integrations
• Chartkick
In search of the perfect “data language”
• Minimal syntax overhead
• Domain-specific data types that all support missing (NA) values
• Rich built-in prep-related operations
• E.g. set logic, group by, sorting, binning, indexing
• Integrate within a larger application
JIT compiler tech
• LLVM: growing in popularity
• Rolling a new, fast compute engine much easier than it used to be
• But: not sure compiling Python code is the optimal long-term strategy
Big Data SQL
Some thoughts
• Web-friendliness: essential for survival
• You can never be too productive
• The data’s not getting any smaller
Thanks!