Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos and Luc Bourlier

Leverage Mesos for running Spark Streaming production jobs

Iulian DragoșLuc Bourlier

Agenda

● Intro to Mesos

● Spark deployment modes (coarse-fine cluster client)

● Fault-tolerance○ driver failures

○ --supervise○ WAL

○ Kafka direct

● Streaming backpressure

● profit !2Leverage Mesos for running Spark Streaming production jobs

3Leverage Mesos for running Spark Streaming production jobs

General (a “distributed kernel”)

Efficient resource management

Proven technology (in production at Apple and Twitter)

Typesafe & Mesosphere maintain the Spark/Mesos framework

Why Apache Mesos?

“Program against your datacenter as a single pool of resources”

Frameworks running on Mesos

Marathon

Cassandra

ElasticSearch

Yarn (Myriad)

and of course, Spark5

Resource Scheduling with Mesos

2-level scheduling

Mesos offers resources to frameworks

Frameworks accept or reject offers

Frameworks are free to schedule tasks on those resources as they see fit

Offers include CPU cores, memory, ports, disk

Spark Deployment modes

Deployment modes

Driver placementClient mode:

on the machine starting the job

(developer machine)

quick and easy

if the machine stops, the job is lost

Cluster mode:

inside the cluster

managed by the framework

no need to keep the developer machine online

less convenient

Spark UI may be hard to find/unavailable

spark-shell and stdout/stderr are not available 9

Fine-grained:

Each Spark task is a Mesos task. Resources are acquired and freed as needed.

better resource sharing

slower startup time

good for batch and static streaming

Coarse-grained:

Mesos executors are acquired at the beginning of a job.Spark tasks are dispatched immediately.

fast startup for tasks

better for interactive sessions

resources are locked for the duration of the app

Executor policy

Spark Streaming

Fault tolerance

Node dies

Write ahead log (WAL)

Driver failure

Driver restart

on restart, the driver can recover everything from checkpoint dir

can: you need to code it that way

Standalone and Yarn have --supervise for this purpose

On Mesos, use Marathon

Driver restarts

Recommended setup

use cluster-mode deployment

enable WAL or use a reliable source (Kafka)

driver restarts à la --supervise using Marathon

enable back-pressure (see next)

Back pressure inSpark Streaming

Resilient system ≠ stable system

Data may arrive faster than Spark can process

⇒ scheduling delay

⇒ memory spill

⇒ driver stopped

Congestion support in Spark 1.4

Static rate limit

● conservative

● difficult to find the right limit (depends on cluster size)

● one limit to rule all stream

Back pressure

● (also, flow control)

● a slow consumer should slow down the producer

○ classic example: TCP

● Reactive Streams

Back pressure in Spark 1.5

● dynamic rate limit

● rate estimator

○ estimates the number of element that can be safely processed by system

during the batch interval

● rate sent to the receivers

● rate limiter

○ relies on TCP to slow down producers

Rate estimator

● each BatchCompleted event contains

○ processing delay, scheduling delay

○ number of element in mini-batch

● the rate is (roughly) elements/processingDelay

● but what about accumulated delay?

Rate estimator

Propotional-Integrale-Derivative

● P, I, D constants can change

convergence, overshooting,

oscillations

https://en.wikipedia.org/wiki/PID_controller

Back pressure in Spark 1.5

● each input has its own estimator

● work with all stream receivers,

including KafkaDirectInputStream

● configuration:

○ spark.streaming.backpressure.enabled true

○ spark.streaming.backpressure.minRate R

Limitations

● linearity assumption

● records with similar execution times

● TCP back pressure accumulates in the TCP channel

Back pressure withReactive Streams

Reactive Streams

● standard for asynchronous stream processing

● consumer controls rate by asking for elements from producers

with Spark Streaming

● more direct back pressure control

Reactive Streams

● support expected with Spark 1.6

● SPARK-10420 (Spark Streaming Receiver)

Thank you

Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos and Luc Bourlier

Data & Analytics

Scala at GenevaJUG by Iulian Dragos

An Overview of the Scala Programming Language · PDF fileAn Overview of the Scala Programming Language Second Edition Martin Odersky, Philippe Altherr, Vincent Cremet, Iulian Dragos

Poezii Dragos

Dragos Triteanu

Iulian Boldea Caragiale

An Overview of the Scala Programming Language...An Overview of the Scala Programming Language Second Edition Martin Odersky, Philippe Altherr, Vincent Cremet, Iulian Dragos Gilles

Povestirile lui Alice Munro - Dragos Zetu lui Alice Munro... · Title: Povestirile lui Alice Munro - Dragos Zetu Author: Dragos Zetu Keywords: Povestirile lui Alice Munro - Dragos

ABSTRACT MEASUREMENTS OF DOPING-DEPENDENT …anlage.umd.edu/Dragos_Mircea_dissertation.pdf · Dragos Iulian Mircea, Doctor of Philosophy, 2007 Dissertation directed by: Professor

Instructions for Authors - ICMAS · Web viewStudy about the FLUID losses IN the branching pipes Iulian FLORESCU 1,*, Daniela FLORESCU 2, Dragos Iulian NED ELCU 3 1) Professor., PhD,

Iulian Boldea Eminescu

Iulian Filip

Scala: Object-Oriented Meets Functional, by Iulian Dragos

- admisi.pdf · ian z. kali-alexandru -ÂcÕvÈÏc. ioana-c t lina lesa d.m. dragos-iulian etre o.d. tudor 'reag d. mÄdÄlwa-andreea ojocaru s.c. catalin-gabriel cu 1. paula-alina

RETELE de CALCULATOARE[RO][Luminita Scripcariu][Iulian-Dragos ed Tehnopress - 2003]

Iulian 06.12

Tiberiu-Iulian IVANCEA

A Tour of the ScalaA Tour of the Scala Programming Language Version 1.7 May 21, 2007 Martin Odersky Vincent Cremet Iulian Dragos Gilles Dubochet Burak Emir Philipp Haller Sean McDirmid

Spark Summit EU talk by Luc Bourlier

SISTEME DE OPERARE - telecom.etc.tuiasi.rotelecom.etc.tuiasi.ro/telecom/staff/lscripca/Retele de calculatoare... · Lumini]a SCRIPCARIU, Iulian-Dragos SCRIPCARIU RE}ELE DE CALCULATOARE

Dragos Popescu