12
Clusterous - Easy Cluster Computing with Docker and AWS SIRCA Balram Ramanathan Tuesday 22nd March 2016

Clusterous: Easy cluster computing with Docker and AWS

Embed Size (px)

Citation preview

Page 1: Clusterous: Easy cluster computing with Docker and AWS

Clusterous - Easy Cluster Computing with Docker and AWSSIRCABalram Ramanathan

Tuesday 22nd March 2016

Page 2: Clusterous: Easy cluster computing with Docker and AWS

Who we are● SIRCA was founded in 1997 by a group of Australian and New Zealand

universities as a not for profit company● Our mission is to enable data intensive research● We also provide academics access to a number of key large-scale data sets

primarily in the finance space

Page 3: Clusterous: Easy cluster computing with Docker and AWS

Project backgroundClusterous is part of SIRCA’s contribution to the Big Data Knowledge Discovery project, a collaborative project funded by the Science and Industry Endowment Fund (SIEF). The project was created to realise the potential of bringing scientists in data centric disciplines together with leaders in information technology to explore how they can utilise big data and machine learning to create a new paradigm in research and unlock new learnings.

Page 4: Clusterous: Easy cluster computing with Docker and AWS

Problem we are trying to solve● Scientists want access to compute power, but often end up stuck with

physical machines - hard to scale● AWS provides an answer, but can be daunting to get started and tedious to

setup and use a compute cluster○ Any productivity gained from faster compute threatens to be offset by setup/admin overhead

● Getting your code to run on remote machines can be a headache of its own○ Different OS versions, dependencies, etc.○ How to deploy across multiple machines?

● Clearly a need for a tool to make cluster computing in the cloud easy for those who write code but aren’t cloud experts

Page 5: Clusterous: Easy cluster computing with Docker and AWS

Clusterous makes cluster computing easier● Open source command line tool written in Python● Use the simple config “wizard” to enter your AWS credentials and configure

your account● Put a few cluster parameters in a YAML file - such as instance types and

number of instances● Start the cluster● All clusters have a shared volume for your data, config files, etc.

Page 6: Clusterous: Easy cluster computing with Docker and AWS
Page 7: Clusterous: Easy cluster computing with Docker and AWS

BYO Code● Clusterous doesn’t impose any parallel compute framework or language● Put your code plus supporting libraries in Docker containers, and deploy to

the cluster with the help of “Environments”

Page 8: Clusterous: Easy cluster computing with Docker and AWS

Environments

● An “environment” is a complete running environment for your code● An environment file is a simple YAML-based script for deploying your

containers to the cluster● Also copies files, builds Docker images (if needed), creates a tunnel● Get your application deployed and running in a single step ● Environment files are redistributable - write once, run many● We have created environments for IPython Parallel and PySpark -

many users may just use those

Page 9: Clusterous: Easy cluster computing with Docker and AWS

Our users so far

● Our partners have run their own parallel compute software on Clusterous clusters

● One project partner uses R for ecology simulations - they created rrqueue, an open source distributed task queue for R

● A team at Data61 ran Stateline, a framework for distributed Markov Chain Monte Carlo sampling on Clusterous

Page 10: Clusterous: Easy cluster computing with Docker and AWS

Demo time

Page 11: Clusterous: Easy cluster computing with Docker and AWS

Credits● Our team consists of Balram Ramanathan, Lolo Fernandez and Ben King● Big thank you to our project partners at Data61, University of Sydney and

Macquarie University for their input● Thanks to SIEF for the funding● We are aiming to release version 1.0 in the next few weeks