Zoe - Swarming Spark applications

Zoe: Swarming Spark applications

Daniele VenzanoResearch Engineer, EURECOM

2

My background

Software engineering (2010)• Linux embedded systems, kernel drivers,

graphical interfaces

Research (2012)• Code analysis, OpenFlow, automatic bug

detection

More research (now)• Virtualization, networking, distributed systems

performance

3

DSG and EurecomResearch center on the French Riviera

Like this?

4

DSG and EurecomResearch center on the French Riviera

Or more like this?

5

DSG and EurecomEngineering research center

• Academic research in telecommunication, multimedia, networks and security

• Close ties with local and international companies

Distributed Systems Group• Focusing on data-intensive applications (so called “big data”)

at all levels• Performance impact of virtualization, storage and network technologies (that’s

me!)• Data processing frameworks (Hadoop, Spark)• Machine learning algorithms

6

Docker at the Distributed Systems GroupStarted investigating Docker in 2012

•Virtualization platform for Big Data research

Summer 2015•Built Swarm cluster•Planning to shift from VMs to Containers for most use cases

Bigfoot project

7

Use casesInternally at Eurecom:

• Laboratory sessions for Data Science course• ~100 students, fixed configuration, throw-away environments• Academic research• very dynamic loads, all kinds of software combinations, higher priorities near

deadlines

Companies have similar use cases• Production jobs• Fixed configuration, periodic executions• Research teams

Smart airports

Power loadforecasting

Customer locationforecasting

8

The last 3 years: OpenStack + SaharaPublic/private cloud with VM-based virtualizationWe contributed Spark support to SaharaUsers can create clusters on-demand

Assumes infinite resourcesSlow

•Create an HDFS+Spark cluster: 5 to 10 minutes•Swarm takes a few seconds for the same task

Supporting new services/versions requires code changes

Users makestatic allocations

9

Why build on top of Docker and Swarm?Swarm has a simple, documented API

Start solving our problem immediatelyPackaging software is very easy

Freedom to experimentFast deployments

No static allocation, automatic resizingSwarm does only one thing and does it well

10

ZoeApplication scheduler on top of Swarm

Queues requests when resources are scarceUsers can submit their own applications

And create their own container images!Dynamically resizes active applications

Free unused resources to speed-up other appsCan coexist with other Swarm users

MSC ZoeLaunch: August 2015Tonnage: 197,362tCapacity: 19,224 TEULength: 395.4 mEngine: 83,800 HPCrew: 22

11

What is a Zoe application?

12

Zoe architecture

Zoe scheduler Swarm

Images fromprivate registry

or Docker Hub

Monitoring data

Users submitapplicationdescriptions

Zoe schedulesrequests

13

Automatic resize of running applications

Volumes

Data layer

Applications

Example: a data layer is not needed if there are no usersData is kept in volumesThe data layer can be restarted when needed

14

Examples of scheduling policiesFIFO – First In First OutPriority based

Researchers near deadlines have more priorityFits nicely the Swarm priority model

DeadlineFinish this work by 3 p.m.Streaming analysis latency must be less than 200ms

Size-basedRun first the smallest applicationsNeed to know the runtime in advance

15

Zoe implementationTwo client implementations

Web interfaceCommand line for scripting

Simple FIFO schedulerDocker images for Spark, HDFS, iPython and Spark

notebooksOpen source on GitHub, images available on the Docker Hub

16

Zoe - futureSet date: March 2016 version 1.0Big plans for Zoe

One full-time programmerCompanies we spoke to, all, are very interested

Features for 1.0 and after:Create Zoe applications with more and more servicesAutomatic resizing of applicationsUse the new volume managementMonitoringAdvanced scheduling

17

Using Docker Swarm for data-intensive apps

L2 networking for Docker containersService discovery via DNS

Docker bridge

eth0

eth1

Docker bridge

eth0

eth1

What about Swarm 1.0 multi-host networking?-We need hostnames to be visible from outside-Will run measurements on overlay network performance

c1

c2

c3

c4

18

Key takeaways1. Zoe is a data-intensive application scheduler that targets

data scientists and private clouds

2. It is very easy to build cloud applications on top of Swarm

3. Data-intensive frameworks like Spark can run easily and efficiently on top of Swarm

4. Network between Docker containers on different hosts can be made transparent

Thank you!Daniele Venzanohttp://[email protected]

Data & Analytics

Zoe - Swarming Spark applications