19
Zoe: Swarming Spark applications Daniele Venzano Research Engineer, EURECOM

DockerCon EU 2015: Zoe: Swarming Spark applications

Embed Size (px)

Citation preview

Page 1: DockerCon EU 2015: Zoe: Swarming Spark applications

Zoe: Swarming Spark applications

Daniele VenzanoResearch Engineer, EURECOM

Page 2: DockerCon EU 2015: Zoe: Swarming Spark applications

My background

Software engineering (2010)• Linux embedded systems, kernel drivers,

graphical interfaces

Research (2012)• Code analysis, OpenFlow, automatic bug

detection

More research (now)• Virtualization, networking, distributed systems

performance

Page 3: DockerCon EU 2015: Zoe: Swarming Spark applications

DSG and EurecomResearch center on the French Riviera

Like this?

Page 4: DockerCon EU 2015: Zoe: Swarming Spark applications

DSG and EurecomResearch center on the French Riviera

Or more like this?

Page 5: DockerCon EU 2015: Zoe: Swarming Spark applications

DSG and EurecomEngineering research center

• Academic research in telecommunication, multimedia, networks and security

• Close ties with local and international companies

Distributed Systems Group• Focusing on data-intensive applications (so called “big data”)

at all levels• Performance impact of virtualization, storage and network technologies (that’s

me!)• Data processing frameworks (Hadoop, Spark)• Machine learning algorithms

Page 6: DockerCon EU 2015: Zoe: Swarming Spark applications

Docker at the Distributed Systems GroupStarted investigating Docker in 2012

•Virtualization platform for Big Data research

Summer 2015•Built Swarm cluster•Planning to shift from VMs to Containers for most use cases

Bigfoot project

Page 7: DockerCon EU 2015: Zoe: Swarming Spark applications

Use casesInternally at Eurecom:

• Laboratory sessions for Data Science course• ~100 students, fixed configuration, throw-away environments• Academic research• very dynamic loads, all kinds of software combinations, higher priorities near

deadlines

Companies have similar use cases• Production jobs• Fixed configuration, periodic executions• Research teams

Smart airports

Power loadforecasting

Customer locationforecasting

Page 8: DockerCon EU 2015: Zoe: Swarming Spark applications

The last 3 years: OpenStack + SaharaPublic/private cloud with VM-based virtualizationWe contributed Spark support to SaharaUsers can create clusters on-demand

Assumes infinite resourcesSlow

•Create an HDFS+Spark cluster: 5 to 10 minutes•Swarm takes a few seconds for the same task

Supporting new services/versions requires code changes

Users makestatic allocations

Page 9: DockerCon EU 2015: Zoe: Swarming Spark applications

Why build on top of Docker and Swarm?Swarm has a simple, documented API

Start solving our problem immediatelyPackaging software is very easy

Freedom to experimentFast deployments

No static allocation, automatic resizingSwarm does only one thing and does it well

Page 10: DockerCon EU 2015: Zoe: Swarming Spark applications

ZoeApplication scheduler on top of Swarm

Queues requests when resources are scarceUsers can submit their own applications

And create their own container images!Dynamically resizes active applications

Free unused resources to speed-up other appsCan coexist with other Swarm users

MSC ZoeLaunch: August 2015Tonnage: 197,362tCapacity: 19,224 TEULength: 395.4 mEngine: 83,800 HPCrew: 22

Page 11: DockerCon EU 2015: Zoe: Swarming Spark applications

What is a Zoe application?

Page 12: DockerCon EU 2015: Zoe: Swarming Spark applications

Zoe architecture

Zoe scheduler Swarm

Images fromprivate registry

or Docker Hub

Monitoring data

Users submitapplicationdescriptions

Zoe schedulesrequests

Page 13: DockerCon EU 2015: Zoe: Swarming Spark applications

Automatic resize of running applications

Volumes

Data layer

Applications

Example: a data layer is not needed if there are no usersData is kept in volumesThe data layer can be restarted when needed

Page 14: DockerCon EU 2015: Zoe: Swarming Spark applications

Examples of scheduling policiesFIFO – First In First OutPriority based

Researchers near deadlines have more priorityFits nicely the Swarm priority model

DeadlineFinish this work by 3 p.m.Streaming analysis latency must be less than 200ms

Size-basedRun first the smallest applicationsNeed to know the runtime in advance

Page 15: DockerCon EU 2015: Zoe: Swarming Spark applications

Zoe implementationTwo client implementations

Web interfaceCommand line for scripting

Simple FIFO schedulerDocker images for Spark, HDFS, iPython and Spark

notebooksOpen source on GitHub, images available on the Docker Hub

Page 16: DockerCon EU 2015: Zoe: Swarming Spark applications

Zoe - futureSet date: March 2016 version 1.0Big plans for Zoe

One full-time programmerCompanies we spoke to, all, are very interested

Features for 1.0 and after:Create Zoe applications with more and more servicesAutomatic resizing of applicationsUse the new volume managementMonitoringAdvanced scheduling

Page 17: DockerCon EU 2015: Zoe: Swarming Spark applications

Using Docker Swarm for data-intensive apps

L2 networking for Docker containersService discovery via DNS

Docker bridge

eth0

eth1

Docker bridge

eth0

eth1

What about Swarm 1.0 multi-host networking?-We need hostnames to be visible from outside-Will run measurements on overlay network performance

c1

c2

c3

c4

Page 18: DockerCon EU 2015: Zoe: Swarming Spark applications

Key takeaways1. Zoe is a data-intensive application scheduler that targets

data scientists and private clouds

2. It is very easy to build cloud applications on top of Swarm

3. Data-intensive frameworks like Spark can run easily and efficiently on top of Swarm

4. Network between Docker containers on different hosts can be made transparent

Page 19: DockerCon EU 2015: Zoe: Swarming Spark applications

Thank you!Daniele Venzanohttp://[email protected]