Upload
continuum-analytics
View
281
Download
0
Embed Size (px)
Citation preview
© 2016 Continuum Analytics - Confidential & Proprietary© 2016 Continuum Analytics - Confidential & Proprietary© 2016 Continuum Analytics - Confidential & Proprietary
Python Notebooks for Collaborative Data Science
Peter WangCTO, Co-FounderAnaconda Open Data Science Platform
© 2016 Continuum Analytics - Confidential & Proprietary
Open Data Science Platform
– 730+ Popular Python & R packages
– Compiled for Windows, Mac, and Linux
– Extensible via Conda Package Manager
– Easily sandbox and deploy packages & analytical computing environments
– Free and Open Source Core
– Foundation of our Enterprise Platform
Accelerate, Connect & Empower
© 2016 Continuum Analytics - Confidential & Proprietary 33
Anaconda…is Trusted by Industry LeadersFinancial ServicesRisk management, Quant modeling, Data exploration and processing, algorithmic trading, compliance reporting
GovernmentFraud detection, data crawling, web & cyber data analytics, statistical modeling
Healthcare & Life SciencesGenomics data processing, cancer research, natural language processing for health data science
High TechCustomer behavior, recommendations, ad bidding, retargeting, social media analytics
Retail & CPGEngineering simulation, supply chain modeling, scientific analysis
Oil & GasPipeline monitoring, noise logging, seismic data processing, geophysics
© 2016 Continuum Analytics - Confidential & Proprietary
Env 1
Python 2.7
Conda: Package and Environment ManagementEnv 2
Python 3.4
Pandas v.0.18
Jupyter
Env 3
R
R Essentials
conda
Windows, Mac OSX, Linux
– Install packages
– Update packages
– Create sandboxes: Conda environments
– Conda environments: Critical for reproducibility, collaboration & scale
NumPyv1.11
NumPyv1.10
Pandas v.0.16
© 2016 Continuum Analytics - Confidential & Proprietary 55
Anaconda• High performance Python &
R• 720+ data science
packages• Cross-platform package,
dependency & environments
• Community driven package repository collaboration
Anaconda Navigator• Desktop Portal & Installer
Anaconda Enterprise Components
OPEN DATA SCIENCE
DATA SCIENCE GOVERNANCE
DATA SCIENCE COLLABORATION
Anaconda Repository• Storage & sharing of
packages, environments, notebooks
• On-premise governance• Enterprise authentication
Anaconda• Deep Learning: Theano,
Tensorflow, Caffe, Keras, Neon, Lasagne
• Natural Language Processing: NLTK, spaCy
• Machine Learning: Scikit-learn
• GPU enablement
Anaconda Enterprise Notebooks
• Collaborative project based workflows for Python & R
• Enterprise authentication & permissioning
• Notebook sharing, versioning, search, differencing
Anaconda• Interactive browser based
dashboards & visualizations with Bokeh
• Bokeh apps using Python, R, Scala
DATA SCIENCE FOR BIG DATA
Anaconda Scale • Hadoop & Spark integration• Scalable distributed
processing framework• Integration with resource
management & data stores• Distributed package,
dependency & environments
Anaconda Fusion• Integration of Open Data
Science with Microsoft Excel®
• Big Data querying & transformations
© 2016 Continuum Analytics - Confidential & Proprietary
On-premises package repository– Governance for your analytics environment– Empower your data scientists within the
structure of enterprise IT
Enterprise notebook collaboration– Easily replicate and share analysts’
environments– Centrally store proprietary libraries and
manage versioning
Scalable analytics computations– Scale up: leverage GPU and parallel-
optimized libraries
– Scale out: easily manage Anaconda across your Hadoop/Spark cluster
– Scale up and out with Python and R
Enterprise data science deployment– Encapsulate and deploy data science projects
– Deploy live notebooks, dashboards, interactive applications, and models with REST APIs
Anaconda EnterpriseOpen Source Without Anxiety: Governance and Scalability
© 2016 Continuum Analytics - Confidential & Proprietary 77
Continuum Sponsored Open-Source Projects
• Bokeh - Interactive Web Visualizations
and Applications
• Dask – Painless distributed and parallel
computations in Python
• Numba - JIT for Python applications
• Jupyter, Spyder – Notebooks and IDE
for data science
• Pandas, Datashader, Blaze, …
© 2016 Continuum Analytics - Confidential & Proprietary 88
Anaconda, Jupyter, and Notebooks
• In 2008, we helped kick off some of the initial efforts on IPython which lead to the separation of the Kernel from the front-end. (Previously it was just a "nicer" command line REPL)
• We also helped fund the initial Notebook interface (based on Qt, not HTML) in 2008/2009
• Web Notebooks really started taking off around 2011
• Our Notebook cloud service launched in 2012
• Anaconda Enterprise Notebooks launched in 2014
© 2016 Continuum Analytics - Confidential & Proprietary© 2016 Continuum Analytics - Confidential & Proprietary© 2016 Continuum Analytics - Confidential & Proprietary
Data Science Notebooks
© 2016 Continuum Analytics - Confidential & Proprietary 10
Jupyter Notebook
• Interleave code, text, graphics
• Multiple languages: Python, R,
SQL, SAS, Spark, Julia, etc.
• Runs in the browser
• Open Source
© 2016 Continuum Analytics - Confidential & Proprietary 11
© 2016 Continuum Analytics - Confidential & Proprietary 12
Notebook Demos
© 2016 Continuum Analytics - Confidential & Proprietary 13
Gateway & Project Nodes,running IPython kernels
Package Control
Internal Anaconda Repository
Authentication
Anaconda Enterprise Notebook Server
Computation
Web Interface
Active Directory/ LDAPOptional
Workflow:– Analyst Log into the Enterprise
notebook server, authenticating against LDAP/AD
– Based on the project they select, is re-directed to the appropriate project node
– All notebooks/python code runs on project nodes; any needed packages are pulled down from your local repository
Anaconda Enterprise Notebook Computing
User 1 User 2 User 3
© 2016 Continuum Analytics - Confidential & Proprietary
Deploying Data Science Projects - Notebooks
© 2016 Continuum Analytics - Confidential & Proprietary
Deploying Data Science Projects - Dashboards
© 2016 Continuum Analytics - Confidential & Proprietary 16
JupyterLab
• Notebooks, plots, data
tables, code editors
• Dashboard authoring
• Interactive distributed
computing
• Collaboration between
Continuum, Bloomberg,
and others
© 2016 Continuum Analytics - Confidential & Proprietary© 2016 Continuum Analytics - Confidential & Proprietary© 2016 Continuum Analytics - Confidential & Proprietary
Anaconda Fusion
© 2016 Continuum Analytics - Confidential & Proprietary 18
Anaconda Fusion brings Open Data Science to Microsoft Excel
AnacondaFusion
• BRING interactive visualizations, machine learning and ETL to Excel
• BRIDGE Excel Data to Python & R through notebooks
• ACCESS all the power of Python and Big Data, natively embedded inside Excel
© 2016 Continuum Analytics - Confidential & Proprietary
Empowering Business Analysts & Data-driven Employees
• Anaconda Fusion is a Microsoft Excel® Add-in that enables a unique and simple link between Excel and Python without writing code
• Anaconda Fusion is targeted to Business Analysts who want “No Code” Data Science
© 2016 Continuum Analytics - Confidential & Proprietary
Analysts and Data Scientists can keep using their prefered tools
20
© 2016 Continuum Analytics - Confidential & Proprietary
“No Code” Data Science – Data Loading Example
1 2Select Anaconda Fusion Notebook and click “Upload”
Select function you wish to run
Click “Run” Data is loaded into spreadsheet3 4
© 2016 Continuum Analytics - Confidential & Proprietary
Just change one line of code in your notebook
© 2016 Continuum Analytics - Confidential & Proprietary
• Extract data - pull data directly into Excel to perform analysis
• Machine Learning – use trained models created by Data Scientists and plug them into your spreadsheet data
• Interactive Visualizations – create custom advanced interactive graphs, charts and plots from Excel data
• Big Data – analyze, transform, model and query data stored in Hadoop and Spark
Figure: Anaconda Fusion on Mac
Anaconda Fusion Use Cases