40
© 2016 Continuum Analytics - Proprietary WHY OPEN DATA SCIENCE MATTERS Why open source is eating the world Travis Oliphant, CEO Michele Chambers, CMO & VP Products Continuum Analytics Gartner BI & Analytics Summit 2016

Why Open Data Science Matters | Gartner BI & Analytics Summit '16

Embed Size (px)

Citation preview

Page 1: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

© 2016 Continuum Analytics - Proprietary

WHY OPEN DATA SCIENCE MATTERS Why open source is eating the world

Travis Oliphant, CEO Michele Chambers, CMO & VP Products Continuum Analytics Gartner BI & Analytics Summit 2016

Page 2: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

Agenda

© 2016 Continuum Analytics- Confidential & Proprietary

•  About Us •  What is Modern Analytics and how is different? •  Why does Open Data Science matter? •  Q&A

2

Page 3: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

© 2016 Continuum Analytics- Confidential & Proprietary 3

•  Travis Oliphant @teoliphant CEO & co-founder Continuum Analytics Ph.D. Mayo Clinic in Biomedical Engineering B.S., M.S. BYU Mathematics & Electrical Eng. Open Source contributor and leader since 1997

Creator of NumPy and SciPy Started Numba

Author Guide to NumPy

About Us

Page 4: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

© 2016 Continuum Analytics- Confidential & Proprietary 4

•  Michele Chambers @ mcAnalytics CMO & VP Product Continuum Analytics M.B.A Duke University, B.S. Computer Engineering Author

Big Data Big Analytics Wiley Modern Analytics Methodologies: Driving Business Value with Analytics Pearson FT Press Advanced Analytics Methodologies: Driving Business Value with Analytics Pearson FT Press

About Us

Page 5: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

WHAT’S THE PROBLEM?

© 2016 Continuum Analytics- Confidential & Proprietary

Page 6: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

© 2016 Continuum Analytics - Confidential & Proprietary 6

Why are major corporations moving to Modern Analytics & Open Data Science?

Large Investment Banks Major Upstream Oil & Gas Global CPG Manufacturers How can I create and

deploy timely risk models? How can I possibly identify

the root causes of my complex problem and

remediate early enough to create revenue assurance?

How can I take advantage of all this new sensor

information now?

Page 7: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

© 2016 Continuum Analytics- Confidential & Proprietary 7

Industry Leaders Trusting Open Data Science

Page 8: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

The Past vs. Present

© 2016 Continuum Analytics- Confidential & Proprietary 8

Decreasing Use

•  Vendor lock in •  High costs •  Lack of integration •  Inability to easily deploy •  Skills gap

Proprietary Software

•  Avoids vendor lock in •  Reduces cost •  Open APIs and

connectors •  Eliminates chasm

between build & deploy •  Accessible to tomorrow’s

talent

Accelerating Adoption Open Source Software

Page 9: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

Evolving Technology

© 2016 Continuum Analytics- Confidential & Proprietary 9

•  Limited Data Sources •  Legacy Compute

Engines •  On-premise

Status Quo Proprietary Software

•  Big Data •  Modern Analytics •  Distributed Computing •  High Performance

Computing •  Cloud •  Streaming

Next Generation Open Source Software

Page 10: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

Business Intelligence

How’s Modern Analytics Different from Traditional Analytics?

© 2016 Continuum Analytics- Confidential & Proprietary 10

Traditional Analytics SQL

Analytics Descriptive Statistics

Data Mining

Predictive Analytics

Prescriptive Analytics Modern Analytics

Machine Learning

Simulation Optimization

Page 11: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

How’s Modern Architectures Different from Traditional Architectures?

© 2016 Continuum Analytics- Confidential & Proprietary 11

Distributed Modern Architecture

Cloud Computing

Parallel Computing

Parallel & Distributed Computing

Stream Computing

Monolithic Traditional Architecture

Centralized Computing

Network Computing

High Performance Computing

Page 12: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

Evolving Roles

© 2016 Continuum Analytics- Confidential & Proprietary 12

•  Statistician •  Programmer

Status Quo Proprietary Software

•  Data Science Teams •  Business Analyst •  Data Scientist •  Developer •  Data Engineer •  DevOps

 

Next Generation Open Source Software

Page 13: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

How are Modern Roles Different from Traditional Roles?

© 2016 Continuum Analytics- Confidential & Proprietary

Team | Collaborative Individual | Silo

Modern Roles Traditional Roles

13

Page 14: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

Modern Data Science Teams use…

© 2016 Continuum Analytics- Confidential & Proprietary 14

Data Scientist •  Hadoop / Spark •  Programming

Languages •  Analytic Libraries •  IDE •  Notebooks •  Visualization

Biz Analyst •  Spreadsheets •  Visualization •  Notebooks •  Analytic

Development Environment

Data Engineer •  Database / Data

Warehouse •  ETL

Developer •  Programming

Languages •  Analytic Libraries •  IDE •  Notebooks •  Visualization

DevOps •  Database / Data

Warehouse •  Middleware •  Programming

Languages

RIGHT TECHNOLOGY FOR THE PROBLEM

Page 15: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

Modern Data Science Teams Want

© 2016 Continuum Analytics- Confidential & Proprietary 15

Collaboration

•  Iterate on analysis •  Share discoveries with team •  Interact with teams across

the globe

Interactivity

•  Interact with data •  Build high performance

models •  Visualize results in context

Integration

•  Work with open source and legacy data systems

•  Leverage data science languages: Python, R, Matlab, SAS, SPSS, Excel, Java, C/C++, C#, .NET, FORTRAN and more

Predict

Share

Deploy

with Open Data Science

Page 16: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

WHAT’S OPEN DATA SCIENCE?

© 2016 Continuum Analytics- Confidential & Proprietary

Page 17: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

“ ”17

An interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms

Wikipedia

Data Science is …

© 2016 Continuum Analytics- Confidential & Proprietary

Page 18: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

Data Science is not just Machine Learning…

© 2016 Continuum Analytics - Confidential & Proprietary 18

Distributed Systems

Business Intelligence

Machine Learning / Statistics

Web

Scientific Computing / HPC

Page 19: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

Data Science is Interdisciplinary…

© 2016 Continuum Analytics - Confidential & Proprietary 19

Hadoop,  Spark  

GPUs,  mul2-­‐cores  

Classifica2on,  deep  learning  

Regression,  PCA  

Web  crawling,  scraping,  3rd  party  data  &  API  providers,  predic2ve  

services  &  APIs  

Data  warehouse,  querying,  repor2ng  

Distributed Systems

Business Intelligence

Machine Learning / Statistics

Web

Scientific Computing / HPC

Page 20: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

Open Data Science is …

an inclusive movement

that makes open source tools of data science -- data, analytics, & computation – easily work together as a connected ecosystem

© 2016 Continuum Analytics- Confidential & Proprietary 20

Page 21: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

Open Data Science Means Open….

Availability Innovation

Interoperability Transparency

For everyone in the data science team

© 2016 Continuum Analytics- Confidential & Proprietary

OPEN DATA SCIENCE IS THE FOUNDATION TO MODERNIZATION

21

Page 22: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

Another words, if you want to…

Control the Chaos Empower the User Lead with Analytics Evangelize the New Modernize the Core

© 2016 Continuum Analytics- Confidential & Proprietary 22

YOU’VE GOT TO USE OPEN DATA SCIENCE

Page 23: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

Open Data Science Vibrant and Growing Community

© 2016 Continuum Analytics- Confidential & Proprietary 23

Python Community

30M+ Anaconda Downloads

3M+

Packages in Anaconda

720+

R Community

16M+ Spark Python Usage

50%+

Page 24: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

© 2016 Continuum Analytics- Confidential & Proprietary 24

Open Data Science Community

A  gathering  of  Python  enthusiasts  for  sharing  ideas  and  learning  from  each  other  for  ever  evolving  challenges  

Promotes  and  supports  ongoing  R&D  of  open  source  compu2ng  tools  through  educa2onal,  community  and  public  channels    

Support  for  the  Apache  Community  of  open  source  soJware  projects  which  are  for  the  public  good  

Suppor2ng  the  R  community,  the  R  Founda2on,  and  others  developing  and  distribu2ng  R  

Page 25: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

Open Source Communities Creates Powerful Technology for Data Science

© 2016 Continuum Analytics- Confidential & Proprietary 25

Numba  

dask  

xlwings  

Airflow  

Blaze  

Distributed Systems

Business Intelligence

Machine Learning / Statistics

Web

Scientific Computing / HPC

Page 26: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

Python is the Common Language

© 2016 Continuum Analytics- Confidential & Proprietary 26

Numba  

dask  

xlwings  

Airflow  

Blaze  

Distributed Systems

Business Intelligence

Machine Learning / Statistics

Web

Scientific Computing / HPC

Page 27: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

© 2016 Continuum Analytics- Confidential & Proprietary 27

Python Trusted by Industry Leaders

   

   

Page 28: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

“ ”28

Everyone at JPMorgan now needs to know Python and there are around 5,000 developers using it at Bank of America. There are close to 10 million lines of Python code in Quartz and we got close to 3,000 commits a day. It’s a good scripting language and easily integrated into both the front and back ends, which was one of the reasons we chose it in the first place.

Kirat Singh, Former Global Head of Risk Systems, Bank of America Merrill Lynch

Python is Everywhere

© 2016 Continuum Analytics- Confidential & Proprietary

Page 29: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

© 2016 Continuum Analytics - Confidential & Proprietary 29

Why Companies are Migrating to ODS…

Large Investment Bank Major Upstream Oil & Gas Global CPG Manufacturer Problem •  Hard to find people to create

proprietary risk assessment models

•  Takes months and years to deploy Solution •  Moved to ODS and leveraged

innovation in ODS Results •  Create and deploy risk models in

days not years with easier to find and hire data scientists

Problem •  Complex model and simulation

required with disparate internal and external data

Solution •  Moved to ODS to easily integrate

multiple data feeds and leverage OS innovation

Results •  Created full lifecycle predictive

model and simulation for revenue assurance

Problem •  Unable to ingest Big Data from

sensors to proactively monitor oil well holes

Solution •  Moved to ODS and leveraged

diversity of ODS analytics to create novel visualizations and predictive models using sensor data

Results •  Gained insights into oil hole issues

in weeks not years to detect issues earlier and increase profitability

Page 30: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

Python’s Not the Only One…

© 2016 Continuum Analytics- Confidential & Proprietary 30

SQL

Distributed Systems

Business Intelligence

Machine Learning / Statistics

Web

Scientific Computing / HPC

Page 31: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

But it’s also a Great Glue Language

© 2016 Continuum Analytics- Confidential & Proprietary 31

SQL

Distributed Systems

Business Intelligence

Machine Learning / Statistics

Web

Scientific Computing / HPC

Page 32: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

Anaconda is the Open Data Science Platform Bringing Technology Together…

© 2016 Continuum Analytics- Confidential & Proprietary 32

Numba  

dask  Airflow  

SQL

xlwings   Blaze  

Distributed Systems

Business Intelligence

Machine Learning / Statistics

Web

Scientific Computing / HPC

Page 33: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

© 2016 Continuum Analytics- Confidential & Proprietary 33

But Most Importantly Empowering Everyone on the Data Science Team

Data Scientist Biz Analyst Data Engineer Developer DevOps

Deploy & Operate

Explore & Analyze

Collaborate & Publish

Page 34: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

•  Accelerates Time-to-Value

•  Empowers the Data Science team

•  Connects Open Source Communities

34

is…. the leading modern open source analytics platform powered by Python the fastest growing open data science language

the innovative open data science platform to exploit data, analytics, and computation

Page 35: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

© 2016 Continuum Analytics- Confidential & Proprietary 35

Introducing Anaconda The Modern Open Source Analytics Platform Powered by Python

§  Enterprise  Ready  PlaNorm  –  Simplify  administra2on  –  Use  modern  data  science  

–  Collaborate  with  en2re  team  –  Leverage  modern  architectures  

–  Integrate  data  sources  –  Accelerate  performance  Security

Governance

Provenance

R Scala

Python

R | Scala

JS

C | C++ Fortran

APPLICATIONS

DATA

HARDWARE

ANALYTICS

Model Building

Data Exploration Software Development

HIGH PERFORMANCE

Business Analyst

Data Scientist

Developer

Data Engineer

DevOps

Data Science Team

Cloud On-premises

LANGUAGES

OPERATIONS

Page 36: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

ANACONDA GIVES SUPERPOWERS TO PEOPLE WHO CHANGE THE WORLD

© 2016 Continuum Analytics- Confidential & Proprietary

Page 37: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

Modern Data Science Teams Love Anaconda

© 2016 Continuum Analytics- Confidential & Proprietary 37

Data Scientist •  Hadoop / Spark •  Programming

Languages •  Analytic Libraries •  Notebooks •  Visualization •  IDE

Biz Analyst •  Spreadsheets •  Visualization •  Notebooks •  Analytic

Development Environment

Data Engineer •  Database / Data

Warehouse •  ETL

Developer •  Programming

Languages •  Analytic Libraries •  IDE •  Notebooks •  Visualization

DevOps •  Database / Data

Warehouse •  Middleware •  Programming

Languages

Page 38: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

© 2016 Continuum Analytics- Confidential & Proprietary 38

Anaconda Trusted by Industry Leaders Financial Services

Risk Mgmt, Quant modeling, Data exploration and processing, algorithmic trading, compliance reporting

Government Fraud detection, data crawling, web & cyber data analytics, statistical modeling

Healthcare & Life Sciences Genomics data processing, cancer research, natural language processing for health data science

High Tech Customer behavior, recommendations, ad bidding, retargeting, social media analytics

Retail & CPG Engineering simulation, supply chain modeling, scientific analysis

Oil & Gas Pipeline monitoring, noise logging, seismic data processing, geophysics

Page 39: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

Questions?

… and so that is why Open Data Science is eating the world.

© 2016 Continuum Analytics- Confidential & Proprietary 39

Page 40: Why Open Data Science Matters | Gartner BI & Analytics Summit '16

221  W.  6th  Street  Suite  #1550  Aus2n,  TX  78701  +1  512.222.5440  

[email protected]    @Con2nuumIO    

CONTINUUM ANALYTICS We Empower Data Science Teams to Change the World

Stop  by  booth  421  to  get  a  signed  book