19
Zoe: Swarming Spark applications Daniele Venzano Research Engineer, EURECOM

Zoe - Swarming Spark applications

Embed Size (px)

Citation preview

Page 1: Zoe - Swarming Spark applications

Zoe: Swarming Spark applications

Daniele VenzanoResearch Engineer, EURECOM

Page 2: Zoe - Swarming Spark applications

2

My background

Software engineering (2010)• Linux embedded systems, kernel drivers,

graphical interfaces

Research (2012)• Code analysis, OpenFlow, automatic bug

detection

More research (now)• Virtualization, networking, distributed systems

performance

Page 3: Zoe - Swarming Spark applications

3

DSG and EurecomResearch center on the French Riviera

Like this?

Page 4: Zoe - Swarming Spark applications

4

DSG and EurecomResearch center on the French Riviera

Or more like this?

Page 5: Zoe - Swarming Spark applications

5

DSG and EurecomEngineering research center

• Academic research in telecommunication, multimedia, networks and security

• Close ties with local and international companies

Distributed Systems Group• Focusing on data-intensive applications (so called “big data”)

at all levels• Performance impact of virtualization, storage and network technologies (that’s

me!)• Data processing frameworks (Hadoop, Spark)• Machine learning algorithms

Page 6: Zoe - Swarming Spark applications

6

Docker at the Distributed Systems GroupStarted investigating Docker in 2012

•Virtualization platform for Big Data research

Summer 2015•Built Swarm cluster•Planning to shift from VMs to Containers for most use cases

Bigfoot project

Page 7: Zoe - Swarming Spark applications

7

Use casesInternally at Eurecom:

• Laboratory sessions for Data Science course• ~100 students, fixed configuration, throw-away environments• Academic research• very dynamic loads, all kinds of software combinations, higher priorities near

deadlines

Companies have similar use cases• Production jobs• Fixed configuration, periodic executions• Research teams

Smart airports

Power loadforecasting

Customer locationforecasting

Page 8: Zoe - Swarming Spark applications

8

The last 3 years: OpenStack + SaharaPublic/private cloud with VM-based virtualizationWe contributed Spark support to SaharaUsers can create clusters on-demand

Assumes infinite resourcesSlow

•Create an HDFS+Spark cluster: 5 to 10 minutes•Swarm takes a few seconds for the same task

Supporting new services/versions requires code changes

Users makestatic allocations

Page 9: Zoe - Swarming Spark applications

9

Why build on top of Docker and Swarm?Swarm has a simple, documented API

Start solving our problem immediatelyPackaging software is very easy

Freedom to experimentFast deployments

No static allocation, automatic resizingSwarm does only one thing and does it well

Page 10: Zoe - Swarming Spark applications

10

ZoeApplication scheduler on top of Swarm

Queues requests when resources are scarceUsers can submit their own applications

And create their own container images!Dynamically resizes active applications

Free unused resources to speed-up other appsCan coexist with other Swarm users

MSC ZoeLaunch: August 2015Tonnage: 197,362tCapacity: 19,224 TEULength: 395.4 mEngine: 83,800 HPCrew: 22

Page 11: Zoe - Swarming Spark applications

11

What is a Zoe application?

Page 12: Zoe - Swarming Spark applications

12

Zoe architecture

Zoe scheduler Swarm

Images fromprivate registry

or Docker Hub

Monitoring data

Users submitapplicationdescriptions

Zoe schedulesrequests

Page 13: Zoe - Swarming Spark applications

13

Automatic resize of running applications

Volumes

Data layer

Applications

Example: a data layer is not needed if there are no usersData is kept in volumesThe data layer can be restarted when needed

Page 14: Zoe - Swarming Spark applications

14

Examples of scheduling policiesFIFO – First In First OutPriority based

Researchers near deadlines have more priorityFits nicely the Swarm priority model

DeadlineFinish this work by 3 p.m.Streaming analysis latency must be less than 200ms

Size-basedRun first the smallest applicationsNeed to know the runtime in advance

Page 15: Zoe - Swarming Spark applications

15

Zoe implementationTwo client implementations

Web interfaceCommand line for scripting

Simple FIFO schedulerDocker images for Spark, HDFS, iPython and Spark

notebooksOpen source on GitHub, images available on the Docker Hub

Page 16: Zoe - Swarming Spark applications

16

Zoe - futureSet date: March 2016 version 1.0Big plans for Zoe

One full-time programmerCompanies we spoke to, all, are very interested

Features for 1.0 and after:Create Zoe applications with more and more servicesAutomatic resizing of applicationsUse the new volume managementMonitoringAdvanced scheduling

Page 17: Zoe - Swarming Spark applications

17

Using Docker Swarm for data-intensive apps

L2 networking for Docker containersService discovery via DNS

Docker bridge

eth0

eth1

Docker bridge

eth0

eth1

What about Swarm 1.0 multi-host networking?-We need hostnames to be visible from outside-Will run measurements on overlay network performance

c1

c2

c3

c4

Page 18: Zoe - Swarming Spark applications

18

Key takeaways1. Zoe is a data-intensive application scheduler that targets

data scientists and private clouds

2. It is very easy to build cloud applications on top of Swarm

3. Data-intensive frameworks like Spark can run easily and efficiently on top of Swarm

4. Network between Docker containers on different hosts can be made transparent

Page 19: Zoe - Swarming Spark applications

Thank you!Daniele Venzanohttp://[email protected]