26
Data Science Apps: Beyond Notebooks Natalino Busa

Data science apps powered by Jupyter Notebooks

Embed Size (px)

Citation preview

Page 1: Data science apps powered by Jupyter Notebooks

Data Science Apps: Beyond NotebooksNatalino Busa

Page 2: Data science apps powered by Jupyter Notebooks

2 Natalino Busa - @natbusa

Linkedin + Twitter + Github: @natbusa

DBS

Teradata

Cognitive Finance

ING Group

O’Reilly

Philips

Page 3: Data science apps powered by Jupyter Notebooks

3 Natalino Busa - @natbusa

Icons made by Gregor Cresnar from www.flaticon.com is licensed by CC 3.0 BY

Learning: The Scientific Method

Ørsted's "First Introduction to General Physics" (1811) https://en.m.wikipedia.org/wiki/History_of_scientific_method

observation hypothesis deduction synthesis

Hans Christian Ørsted

experiment

Page 4: Data science apps powered by Jupyter Notebooks

4 Natalino Busa - @natbusa

Data Scientist Experience

Page 5: Data science apps powered by Jupyter Notebooks

5 Natalino Busa - @natbusa

CloudTools Math Humans

Page 6: Data science apps powered by Jupyter Notebooks

6 Natalino Busa - @natbusa

The Jupyter Projecthttp://jupyter.org

Page 7: Data science apps powered by Jupyter Notebooks

7 Natalino Busa - @natbusa

Jupyter notebook: what is it?

The Jupyter NotebookThe Jupyter Notebook is a web application that

allows you to create and share documents that

contain live code, equations, visualizations and

explanatory text.

Uses include: data cleaning and

transformation, numerical simulation,

statistical modeling, machine learning and

much more.

credit : Jupyter projectextracted from http://jupyter.org/index.html

Page 8: Data science apps powered by Jupyter Notebooks

8 Natalino Busa - @natbusa

Jupyter notebook: why?

Language of choice

The Notebook has support for over 40 programming languages, including those popular in Data Science such as Python, R, Julia and Scala.

Share notebooks

Notebooks can be shared with others using email, Dropbox, GitHub and the Jupyter Notebook Viewer.

Interactive widgets

Code can produce rich output such as images, videos, LaTeX, and JavaScript. Interactive widgets can be used to manipulate and visualize data in realtime.

Big data integration

Leverage big data tools, such as Apache Spark, from Python, R and Scala. Explore that same data with pandas, scikit-learn, ggplot2, dplyr, etc.

credit : Jupyter projectextracted from http://jupyter.org/index.html

Page 9: Data science apps powered by Jupyter Notebooks

9 Natalino Busa - @natbusa

Text Cell

Code Cell

Cell Input

Cell Output

Edit, Run, Kernel, Widgets Menu’s

Kernel Type

Cell output: ASCII, HTML, Image. etc

Page 10: Data science apps powered by Jupyter Notebooks

10 Natalino Busa - @natbusa

Architecture of a Jupyter Notebook

Jupyter Notebook Server Kernel∅MQ

Notebook files

Jupyter Notebook Web App

WebBrowser

HTTP

Websockets

https://jupyter.readthedocs.io/en/latest/architecture/how_jupyter_ipython_work.html

Page 11: Data science apps powered by Jupyter Notebooks

11 Natalino Busa - @natbusa

Architecture of a Jupyter Notebook

• Modular architecture:

Web App, Server, Kernel

• Kernels:

Python, R, Scala, Bash, SQL

• Web App:

Asynchronous, rich editing, syntax highlight, export and share

Page 12: Data science apps powered by Jupyter Notebooks

12 Natalino Busa - @natbusa

Jupyter Notebook

● Narratives and Use Cases

Narratives are collaborative, shareable, publishable, and reproducible. We believe that Narratives help both yourself and other researchers by sharing your use of Jupyter projects, technical specifics of your deployment, and installation and configuration tips so that others can learn from your experiences.

From https://jupyter.readthedocs.io/en/latest/use-cases/content-user.html

Page 13: Data science apps powered by Jupyter Notebooks

13 Natalino Busa - @natbusa

Jupyter is more than Notebooks

“ What if I told you that the notebook

is NOT the only sort of narrative that

you can create with the Jupyter

project? ”

Page 14: Data science apps powered by Jupyter Notebooks

14 Natalino Busa - @natbusa

Examples of Jupyter powered narratives

● O’Reilly Orioles

● Examples - build your own!

Page 15: Data science apps powered by Jupyter Notebooks

15 Natalino Busa - @natbusa

Orioles: A powerful educational narrative

Page 16: Data science apps powered by Jupyter Notebooks

16 Natalino Busa - @natbusa

Geolocated clustering and prediction

services with scikit-learn

Learn how to build a venue

recommender and a geofencing

alerting engine using geolocated data,

ML clustering algorithms, and

scikit-learn

Page 17: Data science apps powered by Jupyter Notebooks

17 Natalino Busa - @natbusa

Build your own narrative!

What do you need?

Understand how to communicate to the jupyter server

Two ways: websockets or http api endpoints

Build your own web application

Many ways: e.g. angular, polymer, dart, etc

1

2

Page 18: Data science apps powered by Jupyter Notebooks

18 Natalino Busa - @natbusa

Demos: kernel gateway

Purpose:

- Understand how to expose API endpoints

- Build your own narrative!

- Productivity gain: faster app prototyping

Page 19: Data science apps powered by Jupyter Notebooks

19 Natalino Busa - @natbusa

Page 20: Data science apps powered by Jupyter Notebooks

20 Natalino Busa - @natbusa

Jupyter Gateway: expose API endpoints

Declare the endpoint

Declear MIME type, Headers, Status

GET http://localhost:8800/counters/my_counter

Page 21: Data science apps powered by Jupyter Notebooks

21 Natalino Busa - @natbusa

Jupyter: docker stacks

Docker container:jupyter notebook + apache toree

https://github.com/jupyter/docker-stacks

Page 22: Data science apps powered by Jupyter Notebooks

22 Natalino Busa - @natbusa

Dockerize your jupyter gateway api

IMAGE=demos/kernel_gateway_demo

docker build -t $(IMAGE) .

docker run -p 8888:8888 $(IMAGE) \ jupyter kernelgateway --KernelGatewayApp.ip=0.0.0.0 \ --KernelGatewayApp.port=8888 \ --KernelGatewayApp.api=notebook-http \ --KernelGatewayApp.seed_uri=/srv/notebooks/autoscience.ipynb

Page 23: Data science apps powered by Jupyter Notebooks

23 Natalino Busa - @natbusa

Big Data apps:Dockerize your jupyter gateway api with Toree

Jupyter Kernel Gateway Toree Kernel∅MQ

Notebook files

WebBrowser

Your ownWeb App

HTTP REST API

Docker Containers

on

e w

ebse

ssio

n =

o

ne

serv

er o

n a

clo

ud

Page 24: Data science apps powered by Jupyter Notebooks

24 Natalino Busa - @natbusa

Summary

• Jupyter notebook is a great way to create and share

data-driven uses cases and projects

• Jupyter is more than notebooks

– gateway, kernels, hub, etc

• Narratives powered by jupyter

– O’ Reilly Orioles

– build your own narrative

Page 25: Data science apps powered by Jupyter Notebooks

25 Natalino Busa - @natbusa

Resources

Jupyter

http://jupyter.org/index.html

https://jupyter.readthedocs.io/en/latest/index.html#

Jupyter Kernel Gateway

https://github.com/jupyter/kernel_gateway

http://jupyter-kernel-gateway.readthedocs.io/en/latest/

Jupyter Con (first of its kind!)

https://conferences.oreilly.com/jupyter/jup-ny

Apache Toree (Spark Kernel)

https://toree.apache.org/

Web application dev

https://angular.io/

https://www.polymer-project.org/1.0/

Docker

https://github.com/jupyter/docker-stacks

https://www.docker.com/

Page 26: Data science apps powered by Jupyter Notebooks

26 Natalino Busa - @natbusa

Linkedin and Twitter:@natbusa