11
The OPTIRAD Platform: Cloud-hosted IPython Notebooks for collaborative EO Data Analysis and Processing ESA EO Open Science 2.0 Conference 12-14 October 2015 Philip Kershaw (CEDA), John Holt (Tessella plc.) José Gómez-Dans, Philip Lewis (UCL) Nicola Pounder, Jon Styles (Assimila Ltd.) JASMIN (STFC/Stephen Kill)

The OPTIRAD Platform: Cloud-hosted IPython Notebooks for ...seom.esa.int/openscience15/docs/default-source/15c... · The OPTIRAD Platform: Cloud-hosted IPython Notebooks for collaborative

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The OPTIRAD Platform: Cloud-hosted IPython Notebooks for ...seom.esa.int/openscience15/docs/default-source/15c... · The OPTIRAD Platform: Cloud-hosted IPython Notebooks for collaborative

The OPTIRAD Platform: Cloud-hosted IPython Notebooks for collaborative EO Data Analysis and Processing

ESA EO Open Science 2.0 Conference 12-14 October 2015

Philip Kershaw (CEDA), John Holt (Tessella plc.) José Gómez-Dans, Philip Lewis (UCL)

Nicola Pounder, Jon Styles (Assimila Ltd.)

JASMIN (STFC/Stephen Kill)

Page 2: The OPTIRAD Platform: Cloud-hosted IPython Notebooks for ...seom.esa.int/openscience15/docs/default-source/15c... · The OPTIRAD Platform: Cloud-hosted IPython Notebooks for collaborative

Introduction

• OPTIRAD = OPTImisation environment for joint retrieval of multi-sensor RADiances – Collaboration: CEDA, UCL, Assimila Ltd, FastOpt and VU Amsterdam

– Funded by ESA

• Overview of technical solution – Introduction to IPython (Jupyter) Notebook

– Deployment on JASMIN-CEMS science cloud

• Make the case, IPython Notebook + Cloud = powerful combination for EO Open Science 2.0

Page 3: The OPTIRAD Platform: Cloud-hosted IPython Notebooks for ...seom.esa.int/openscience15/docs/default-source/15c... · The OPTIRAD Platform: Cloud-hosted IPython Notebooks for collaborative

OPTIRAD Goals

Address the challenge of producing consistent EO land surface information products from heterogeneous EO data input:

Collaboration: provide a collaborative research environment as a means to engender closer working between algorithm specialists, modellers and end users.

Computing resources: processing at high spatial and temporal resolutions with computationally expensive algorithms.

Usability and access: easy execution and development of existing Python code and the provision of interactive tutorials for new users

Page 4: The OPTIRAD Platform: Cloud-hosted IPython Notebooks for ...seom.esa.int/openscience15/docs/default-source/15c... · The OPTIRAD Platform: Cloud-hosted IPython Notebooks for collaborative

IPython Notebook

• Provides Python kernels accessible via a web browser

• Sessions can be saved and shared • Trivial access to parallel processing

capabilities – IPython.parallel (ipyparallel)

• IPython Jupyter Notebook • Support for other languages such as

R

• New JupyterHub allows multi-user management of notebooks

• Gained traction as a teaching and collaborative tool

Page 5: The OPTIRAD Platform: Cloud-hosted IPython Notebooks for ...seom.esa.int/openscience15/docs/default-source/15c... · The OPTIRAD Platform: Cloud-hosted IPython Notebooks for collaborative

IPython Notebook + Cloud

• Cloud’s characteristics: – Broad network access, resource pooling, elasticity, scale – compute and

storage – Good fit for Big Data science applications

• Cloud-hosted Notebook - a model already demonstrated with public cloud services e.g. – Wakari, Azure, Rackspace

• Central hosting allows central management of software packages

– no installation steps needed for the user

• Algorithm prototyping environment next to Big Data

– Acts as a precursor to operational processing services

Page 6: The OPTIRAD Platform: Cloud-hosted IPython Notebooks for ...seom.esa.int/openscience15/docs/default-source/15c... · The OPTIRAD Platform: Cloud-hosted IPython Notebooks for collaborative

Notebook: a user – application perspective

Support a spectrum of usage models

Dif

fere

nt

clas

ses

of

use

r

Long-tail of science users

Page 7: The OPTIRAD Platform: Cloud-hosted IPython Notebooks for ...seom.esa.int/openscience15/docs/default-source/15c... · The OPTIRAD Platform: Cloud-hosted IPython Notebooks for collaborative

Design and development considerations

• Host on JASMIN-CEMS – Data analysis facility and science cloud at Rutherford Appleton Lab, UK – Advantage of proximity to locally hosted EO and climate science datasets – Integration with environmental sciences community

• Lightweight development and deployment philosophy – Build on Open Source and community efforts to use what’s already available

• How to meet multi-user support requirement?

– Buy off-the-shelf: run Wakari on JASMIN-CEMS platform or – Try JupyterHub: multi-user IPython Notebook solution or – Roll our own solution

• How to integrate parallel processing? – IPython.parallel (ipyparallel) Python API accessed via the Notebook

Page 8: The OPTIRAD Platform: Cloud-hosted IPython Notebooks for ...seom.esa.int/openscience15/docs/default-source/15c... · The OPTIRAD Platform: Cloud-hosted IPython Notebooks for collaborative

OPTIRAD JASMIN Cloud Tenancy

Docker Container

VM: Swarm pool 0 VM: Swarm pool 0

Deployment Architecture

JupyterHub

VM: Swarm pool 0

Docker Container

IPython Notebook

Kernel

Docker Container

IPython Notebook

Kernel

Kernel

Kernel Parallel Controller

Parallel Controller

VM: Swarm pool 0

VM: Swarm pool 0

VM: slave 0

Parallel Engine

Parallel Engine

Nodes for parallel Processing

Notebooks and kernels in containers

Swarm manages allocation of containers for notebooks

Manage users and provision of

notebooks

Swarm

Fire

wal

l VM: shared services

NFS LDAP

Browser access

Page 9: The OPTIRAD Platform: Cloud-hosted IPython Notebooks for ...seom.esa.int/openscience15/docs/default-source/15c... · The OPTIRAD Platform: Cloud-hosted IPython Notebooks for collaborative

Conclusions + Next Steps

• Experiences from project delivery – Off-shelf solution using JupyterHub paid off

– JupyterHub and Swarm was new but

– Installation straightforward + operationally robust

• Challenges and future development – Extend use of containers for parallel compute

– Challenge: managing cloud elasticity with both containers and host VMs

– Provide object storage – CEPH likely to be adopted

– Expand from OPTIRAD pilot to wider user community

– Deploy with toolboxes e.g. Sentinels or CIS.

Page 10: The OPTIRAD Platform: Cloud-hosted IPython Notebooks for ...seom.esa.int/openscience15/docs/default-source/15c... · The OPTIRAD Platform: Cloud-hosted IPython Notebooks for collaborative

Demo . . .

• A tutorial on EO data assimilation

– Notebook blurs the traditional separation between tutorial documentation and using the target system

– The two are one self-contained interactive unit

Page 11: The OPTIRAD Platform: Cloud-hosted IPython Notebooks for ...seom.esa.int/openscience15/docs/default-source/15c... · The OPTIRAD Platform: Cloud-hosted IPython Notebooks for collaborative

Further information

• OPTIRAD: – Optimisation Environment For Joint Retrieval Of Multi-Sensor Radiances

(OPTIRAD), Proceedings of the ESA 2014 Conference on Big Data from Space (BiDS’14) http://dx.doi.org/10.2788/1823

• JASMIN paper (Sept 2013) – http://home.badc.rl.ac.uk/lawrence/static/2013/10/14/LawEA13_Jasmin.

pdf – Cloud paper to follow soon

• Cloud-hosted JupyterHub with Docker for teaching: – https://developer.rackspace.com/blog/deploying-jupyterhub-for-

education/

• JASMIN and CEDA: – http://jasmin.ac.uk/ – http://www.ceda.ac.uk

• @PhilipJKershaw