54
Data analytics in the cloud with Jupyter Notebooks Graham Dumpleton [email protected]

Data analytics in the cloud with Jupyter notebooks

Embed Size (px)

Citation preview

Page 1: Data analytics in the cloud with Jupyter notebooks

Data analytics in the cloud with Jupyter

NotebooksGraham Dumpleton

[email protected]

Page 2: Data analytics in the cloud with Jupyter notebooks

http://jupyter.org/

Page 3: Data analytics in the cloud with Jupyter notebooks

Python Data Science Handbook / 04.12-Three-Dimensional-Plotting

Page 4: Data analytics in the cloud with Jupyter notebooks

Python Data Science Handbook / 04.13-Geographic-Data-With-Basemap

Page 5: Data analytics in the cloud with Jupyter notebooks

https://blog.data.gov.sg/how-we-caught-the-circle-line-rogue-train-with-data-79405c86ab6a

Page 6: Data analytics in the cloud with Jupyter notebooks
Page 7: Data analytics in the cloud with Jupyter notebooks

Who’s Using It?

Individuals

Collaborators

Teachers

Page 8: Data analytics in the cloud with Jupyter notebooks

Getting Started

pip3 install jupyter

jupyter notebook

Page 9: Data analytics in the cloud with Jupyter notebooks

Empty Workspace

Page 10: Data analytics in the cloud with Jupyter notebooks

Upload Notebooks

Page 11: Data analytics in the cloud with Jupyter notebooks

Local File System

$ ls notebooks/01*.ipynbnotebooks/01.00-IPython-Beyond-Normal-Python.ipynbnotebooks/01.01-Help-And-Documentation.ipynbnotebooks/01.02-Shell-Keyboard-Shortcuts.ipynbnotebooks/01.03-Magic-Commands.ipynbnotebooks/01.04-Input-Output-History.ipynbnotebooks/01.05-IPython-And-Shell-Commands.ipynbnotebooks/01.06-Errors-and-Debugging.ipynbnotebooks/01.07-Timing-and-Profiling.ipynbnotebooks/01.08-More-IPython-Resources.ipynb

Page 12: Data analytics in the cloud with Jupyter notebooks

Browsing Files

Page 13: Data analytics in the cloud with Jupyter notebooks

Interacting with a Notebook

Page 14: Data analytics in the cloud with Jupyter notebooks

Status of Notebooks

Page 15: Data analytics in the cloud with Jupyter notebooks

Installing Packages

Page 16: Data analytics in the cloud with Jupyter notebooks

Positives

• Save notebooks/data locally.

• Python virtual environments.

• Select Python version you want.

• Install required Python packages.

Page 17: Data analytics in the cloud with Jupyter notebooks

Negatives• Operating system differences.

• Python distribution differences.

• Python version differences.

• Package index differences.

• PyPi (pip) vs Anaconda (conda)

• Effort to setup and maintain.

Page 18: Data analytics in the cloud with Jupyter notebooks

Docker Images

https://github.com/jupyter/docker-stacks

Page 19: Data analytics in the cloud with Jupyter notebooks
Page 20: Data analytics in the cloud with Jupyter notebooks

Running Docker Image

docker run -it --rm -p 8888:8888 \jupyter/minimal-notebook

Page 21: Data analytics in the cloud with Jupyter notebooks

Positives• Pre-created images.

• Bundled operating system packages.

• Known Python distribution/vendor.

• Bundled Python packages.

• Docker images are read only.

• Don’t need to maintain the image.

Page 22: Data analytics in the cloud with Jupyter notebooks

Negatives (1)• More effort to customise experience.

• Build a custom Docker image to extend.

• Install extra packages each time you run it.

• Images can be very large.

• Multiple Python versions.

• Packages that you do not need.

Page 23: Data analytics in the cloud with Jupyter notebooks

Negatives (2)

• Access to and saving your notebooks/data.

• Need to mount persistent storage volumes.

• Ensuring access is done securely.

Page 24: Data analytics in the cloud with Jupyter notebooks

tmpnb.org

https://tmpnb.org/

Page 25: Data analytics in the cloud with Jupyter notebooks

Azure Notebooks

https://notebooks.azure.com/

Page 26: Data analytics in the cloud with Jupyter notebooks

Binder Service

http://mybinder.org/

Page 27: Data analytics in the cloud with Jupyter notebooks

Positives

• Somebody else looks after everything.

Page 28: Data analytics in the cloud with Jupyter notebooks

Negatives• Shared resource.

• Outside of your control.

• Reliability.

• Customisation.

• Software versions.

• Information security.

Page 29: Data analytics in the cloud with Jupyter notebooks

JupyterHub

https://jupyterhub.readthedocs.io

Page 30: Data analytics in the cloud with Jupyter notebooks

Positives

• Can customise however you want.

• Modify code for service.

• Use custom images.

Page 31: Data analytics in the cloud with Jupyter notebooks

Negatives

• Dedicated infrastructure.

• Effort to understand and set it up.

• Effort to keep it running.

Page 32: Data analytics in the cloud with Jupyter notebooks

Many Options to Choose From

Page 33: Data analytics in the cloud with Jupyter notebooks

OpenShift

Page 34: Data analytics in the cloud with Jupyter notebooks

Deployments

Page 35: Data analytics in the cloud with Jupyter notebooks

Docker Image

Page 36: Data analytics in the cloud with Jupyter notebooks

Image Stream

Page 37: Data analytics in the cloud with Jupyter notebooks

Notebook Storage

Page 38: Data analytics in the cloud with Jupyter notebooks

Attaching Storage

Page 39: Data analytics in the cloud with Jupyter notebooks

Shared Storage

Page 40: Data analytics in the cloud with Jupyter notebooks

Positives• Use existing features of OpenShift

• No special storage backends required.

• No custom provisioning applications.

• Cluster can still be used for other applications.

• Simply set quotas and users do what they want.

Page 41: Data analytics in the cloud with Jupyter notebooks

Source-to-Image

Page 42: Data analytics in the cloud with Jupyter notebooks

Positives• Easily build custom images.

• Pre-populated with required Python packages.

• Pre-populated with required Jupyter Notebooks.

• Pre-populated with required data files.

• Direct to application, or to create images.

Page 43: Data analytics in the cloud with Jupyter notebooks

Service Catalog

Page 44: Data analytics in the cloud with Jupyter notebooks

Templates (builder)

Page 45: Data analytics in the cloud with Jupyter notebooks

Templates (cluster)

Page 46: Data analytics in the cloud with Jupyter notebooks

Templates (notebook)

Page 47: Data analytics in the cloud with Jupyter notebooks

IPyParallel Cluster

Page 48: Data analytics in the cloud with Jupyter notebooks

Parallel Computing

Page 49: Data analytics in the cloud with Jupyter notebooks

Positives

• Templates enable complex deployments.

• Don’t need something like JupyterHub.

Page 50: Data analytics in the cloud with Jupyter notebooks

Challenges

• Custom base images and builders.

• Learning curve for writing templates.

Page 51: Data analytics in the cloud with Jupyter notebooks

Command Lineoc new-app stats101-notebook-template \ --param STUDENT_NUMBER=1 \ --param CLASS_NUMBER=1234

oc new-app stats101-notebook-template \ --param STUDENT_NUMBER=2 \ --param CLASS_NUMBER=1234

oc delete all --selector class=1234

Page 52: Data analytics in the cloud with Jupyter notebooks

REST APIimport powershift.endpoints as endpoints

client = endpoints.Client()projects = client.oapi.v1.projects.get()

def public_address(route): host = route.spec.host path = route.spec.path or '/' if route.spec.tls: return 'https://%s%s' % (host, path) return 'http://%s%s' % (host, path)

routes = client.oapi.v1.namespaces(namespace='stats101').routes.get()

for route in routes.items: print(' route=%r' % public_address(route))

Page 53: Data analytics in the cloud with Jupyter notebooks

Positives

• Easily trigger multiple deployments using CLI.

• REST API also available for custom front ends.

Page 54: Data analytics in the cloud with Jupyter notebooks

Resources• S2I enabled Jupyter Notebook images

• https://github.com/getwarped/jupyter-notebooks

• OpenShift versions of Jupyter Project images

• https://github.com/getwarped/jupyter-stacks

• Python REST API client for OpenShift

• https://github.com/getwarped/powershift