70
Page 1 July 2015 Scaling Spark Workloads on YARN Boulder/Denver Big Data Shane Kumpf & Mac Moore Solu2ons Engineers, Hortonworks July 2015

Scaling Spark Workloads on YARN - Boulder/Denver July 2015

  • Upload
    mmhw

  • View
    318

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  1  

         July  2015  

Scaling Spark Workloads on YARN

Boulder/Denver  Big  Data  Shane  Kumpf  &  Mac  Moore  Solu2ons  Engineers,  Hortonworks  July  2015  

Page 2: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  2  

Agenda  

§  Introduction – Why we love Spark, Spark Strategy, What’s Next

§ YARN: The Data Operating System § Spark: Processing Internals Review § Spark: on YARN § Demo: Scaling Spark on YARN in the cloud § Q & A

Page  2  

Page 3: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  3  

Made for Data Science"All apps need to get predictive at scale and fine granularity Democratizes Machine Learning"Spark is doing to ML on Hadoop what Hive did for SQL on Hadoop

Elegant Developer APIs"DataFrames, Machine Learning, and SQL

Realize Value of Data Operating System"A key tool in the Hadoop toolbox

Community"Broad developer, customer and partner interest

Why We Love Spark at Hortonworks

Storage

YARN: Data Operating System

Governance Security

Operations

Resource Management

Page 4: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  4  

Hadoop/YARN Powered data operating system"100% open source, multi-tenant data platform for any application, any dataset, anywhere." Built on a centralized architecture of shared enterprise services •  Scalable Tiered Storage •  Resource and workload management •  Trusted data governance and metadata management •  Consistent operations •  Comprehensive security •  Developer APIs and tools

Data Operating System: Open Enterprise Hadoop

Page 5: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  5  

Themes for Spark Strategy

Spark is made for Data Science •  Lead in the community for ML optimization • Data Science theme of Spark Summit / Hadoop Summit Provide Notebooks for data exploration & visualization •  iPython Ambari Stack • Zeppelin – we’re very excited about this project Process more Hadoop data efficiently in Spark • Hive/ORC data delivered, HBase work in progress Innovate at the core • Security, Spark on YARN improvements and more

Page 6: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  6  

Current State of Security in Spark

Only Spark on YARN supports Kerberos today •  Leverage Kerberos for authentication

Spark reads data from HDFS & ORC •  HDFS file permissions (& Ranger integration) applicable to Spark jobs

Spark submits job to YARN queue •  YARN queue ACL (& Ranger integration) applicable to Spark jobs

Wire Encryption •  Spark has some coverage, not all channels are covered

LDAP Authentication •  No Authentication in Spark UI OOB, supports filter for hooking in LDAP

Page 7: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  7  

What about ORC support?

ORC – Optimized Row Columnar format ORC is an Apache TLP providing columnar storage for Hadoop

Spark ORC Support •  ORC support in HDP/Spark since 1.2.x – (Alpha) •  ORC support merged into Apache Spark in 1.4

•  Joint blog with Databricks @ hortonworks.com •  Changes between ORC 1.3.1 and Spark 1.4.1

•  ORC now uses standard API to read/write.

orc.apache.org  

Page 8: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  8  

Introducing Apache Zeppelin…

Page 9: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  9  

Apache Zeppelin

Features

•  A web-based notebook for interactive analytics

•  Ad-hoc experimentation with Spark, Hive, Shell, Flink, Tajo, Ignite, Lens, etc

•  Deeply integrated with Spark and Hadoop •  Can be managed via Ambari Stacks

•  Supports multiple language backends •  Pluggable “Interpreters”

•  Incubating at Apache •  100% open source and open community

Use Cases

•  Data exploration & discovery

•  Visualization - tables, graphs, charts

•  Interactive snippet-at-a-time experience

•  Collaboration and publishing

•  “Modern Data Science Studio”

Page 10: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  10  

Where can I find more?

• Arun Murthy’s Keynote at Hadoop Summit & SparkSummit – Hadoop Summit (http://bit.ly/1IC1BEG) – Spark Summit (http://bit.ly/1M7qw47)

• DataScience with Spark & Zeppelin Session at Hadoop Summit – http://bit.ly/1DdKeTs

• DataScience with Spark + Zeppelin Blog – http://bit.ly/1HFd545

• ORC Support in Spark Blog – http://bit.ly/1OkA1uU

Page 11: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  11  

YARN: The Data Operating System

2015  

Page 12: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  12  

YARN Introduction

 

The Architectural Center •  YARN moved Hadoop “beyond batch”; run batch, interactive,

and real time applications simultaneously on shared hardware. •  Intelligently places workloads on cluster members based on

resource requirements, labels, and data locality. •  Runs user code in containers, providing isolation and lifecycle

management.

Hortonworks  Data  PlaBorm  2.2  

   

YARN: Data Operating System (Cluster  Resource  Management)  

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

Apa

che

Pig

° °

° °

° ° °

° ° °

HDFS (Hadoop Distributed File System)

   

GOVERNANCE   BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

Apache Falcon

Apa

che

Hiv

e C

asca

ding

A

pach

e H

Bas

e A

pach

e A

ccum

ulo

Apa

che

Sol

r A

pach

e S

park

Apa

che

Sto

rm

Apache Sqoop

Apache Flume

Apache Kafka

   

SECURITY  

Apache Ranger

Apache Knox

Apache Falcon    

OPERATIONS  

Apache Ambari

Apache Zookeeper

Apache Oozie

Page 13: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  13  

YARN Architecture - Overview

 

Resource Manager •  Global resource scheduler

Node Manager •  Per-machine agent •  Manages the life-cycle of container & resource

monitoring Container

•  Basic unit of allocation •  Fine-grained resource allocation across multiple

resource types (memory, cpu, future: disk, network, gpu, etc.)

Application Master •  Per-application master that manages application

scheduling and task execution •  E.g. MapReduce Application Master

   

Page 14: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  14  

YARN Concepts

• Application – Application is a job or a long running service submitted to YARN – Examples:

–  Job: Map Reduce Job

–  Service: HBase Cluster

• Container – Basic unit of allocation

–  Map Reduce map or reduce task

–  HBase HMaster or Region Server

– Fine-grained resource allocations –  container_0 = 2GB, 1CPU

–  container_1 = 1GB, 6 CPU

– Replaces the fixed map/reduce slots from Hadoop 1

14  

Page 15: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  15  

YARN Resource Request

15  

Resource Model •  Ask for a specific amount of resources (memory,

CPU, etc.) on a specific machine or rack •  Capabilities define how much memory and CPU is

requested. •  Relax Locality = false to force containers onto

subsets of machines aka YARN node labels.

ResourceRequest

priority

resourceName

capability

numContainers

relaxLocality

Page 16: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  16  

YARN Capacity Scheduler

Page 16

•  Elasticity •  Queues to subdivide resources •  Job submission Access Control Lists

Capacity  Sharing  

FUNCT

ION  

•  Max capacity per queue •  User limits within queue •  Preemption

Capacity  Enforcement  

FUNCT

ION  

•  Ambari Capacity Scheduler View AdministraWon  

FUNCT

ION  

Page 17: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  17  

Hierarchical Queues

17  

root  

Adhoc  10%  

DW  70%  

Mrk2ng  20%  

Dev  10%  

Reserved  20%  

Prod  70%  

Prod  80%  

Dev  20%  

P0  70%  

P1  30%  

Parent  

Leaf  

Page 18: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  18  

YARN  capacity  scheduler  helps  manage  resources  across  the  cluster  

Page 19: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  19  

NodeManager   NodeManager   NodeManager   NodeManager  

Container  1.1  

Container  2.4  

NodeManager   NodeManager   NodeManager   NodeManager  

NodeManager   NodeManager   NodeManager   NodeManager  

Container  1.2  

Container  1.3  

AM  1  

Container  2.2  

Container  2.1  

Container  2.3  

AM2  

YARN Application Submission - Walkthrough

Client2  

ResourceManager  

Scheduler  

Page 20: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  20  

Spark: Processing Internals Review

2015  

Page 21: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  21  

First, a bit of review - What is Spark?

• Distributed runtime engine for fast large scale data processing.

• Designed for iterative computations and interactive data mining.

• Provides a API framework to support In-Memory Cluster Computing.

• Multi-language support – Scala, Java, Python

Page 22: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  22  

So what makes Spark fast? Data access methods are not equal!

Page 23: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  23  

MapReduce vs Spark

• MapReduce – On disk

• Spark – In memory

Page 24: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  24  

RDD – The main programming abstraction

Resilient Distributed Datasets •  Collections of objects spread

across a cluster, cached or stored in RAM or on Disk

•  Built through parallel transformations

•  Automatically rebuilt on failure •  Immutable, each transformation

creates a new RDD

Operations •  Lazy Transformations"

(e.g. map, filter, groupBy) •  Actions"

(e.g. count, collect, save)

Page 25: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  25  

RDD In Action

RDDRDDRDDRDD

Transformations

Action Value

linesWithSpark = textFile.filter(lambda line: "Spark” in line) !

linesWithSpark.count()!74!!linesWithSpark.first()!# Apache Spark!

textFile = sc.textFile(”SomeFile.txt”) !

Page 26: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  26  

RDD Graph

map  map   reduceByKey   collect  textFile  

.flatMap(line=>line.split("  "))  

.reduceByKey(_  +  _,  3)  

.collect()  

RDD[String]

RDD[List[String]]

RDD[(String, Int)]

Array[(String, Int)]

RDD[(String, Int)] .map(word=>(word,  1)))  

Page 27: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  27  

DAG Scheduler

map  map   reduceByKey   collect  textFile  

map  

Stage  2  Stage  1  

map   reduceByKey   collect  textFile  

Goals •  Split graph into stages

based on the types of transformations

•  Pipe-line narrow transformations (transformations without data movement) into a single stage

Page 28: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  28  

DAG Scheduler - Double Click

map  

Stage  2  Stage  1  

map   reduceByKey   collect  textFile  

Stage  2  Stage  1  

Stage 1 1.  Read HDFS split 2.  Apply both maps 3.  Write shuffle data

Stage 2 1.  Read shuffle data 2.  Final reduce 3.  Send result to driver

Page 29: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  29  

Tasks – How work gets done

Execute  task  

Fetch  input  

Write  output  

The fundamental unit of work in Spark 1.  Fetch input based on the InputFormat or a shuffle. 2.  Execute the task. 3.  Materialize task output via shuffle, write, or a result to

the driver.

Page 30: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  30  

Input Formats control task input

• Hadoop InputFormats control how data on HDFS is read into each task. – Controls Splits – how data is split up – each task (by default) gets one split, which is typically

a single HDFS block – Controls the concept of a Record – is a record a whole line? A single word? An XML

element? • Spark can use both the old and new API InputFormats for creating RDD.

– newAPIHadoopRDD and hadoopRDD – Save time, use Hadoop InputFormats versus writing a custom RDD

Page 30

Page 31: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  31  

Executor – The Spark Worker

Isolation for tasks 1.  Each application gets it’s own executors. 2.  Executors run tasks in threads and cache data. 3.  Run in separate processes for isolation. 4.  Lives for the duration of the application.

Page 32: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  32  

Executor – The Spark Worker

Execute task Fetch input

Write output

Execute task Fetch input

Write output

Execute task Fetch input

Write output Execute task

Fetch input

Write output Execute task

Fetch input

Write output

Execute task Fetch input

Write output

Execute task Fetch input

Write output

Core 1

Core 2

Core 3

task   task  

task  task  

task   task   task  

EXECUTOR!

Page 33: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  33  

The gangs all here

Application Master

Spark Driver

Executor

Worker Node

Task

RDD Partition

Cache

Task

RDD Partition

Executor

Worker Node

Task

RDD Partition

Cache

Task

RDD Partition

Executor

Worker Node

Task

RDD Partition

Cache

Task

RDD Partition

Executor

Worker Node

Task

RDD Partition

Cache

Task

RDD Partition

Page 34: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  34  

Spark: on YARN

2015  

Page 35: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  35  

Spark on YARN

 

Modus Operandi •  1 executor = 1 yarn container •  2 modes: yarn-client or yarn-cluster •  yarn-client = driver on the client side – good for the REPL •  yarn-cluster = driver inside the YARN application master

(below) – good for batch and automated jobs

YARN  RM  

App  Master  

Monitoring  UI  

Page 36: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  36  

Why Spark on YARN

 

Core Features •  Run other workloads along with Spark •  Leverage Spark Dynamic Resource Allocation •  Currently the only way to run in a kerberized environment •  Ability to provide capacity guarantees via Capacity Scheduler

Hortonworks  Data  PlaBorm  2.2  

   

YARN: Data Operating System (Cluster  Resource  Management)  

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

Apa

che

Pig

° °

° °

° ° °

° ° °

HDFS (Hadoop Distributed File System)

   

GOVERNANCE   BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

Apache Falcon

Apa

che

Hiv

e C

asca

ding

A

pach

e H

Bas

e A

pach

e A

ccum

ulo

Apa

che

Sol

r A

pach

e S

park

Apa

che

Sto

rm

Apache Sqoop

Apache Flume

Apache Kafka

   

SECURITY  

Apache Ranger

Apache Knox

Apache Falcon    

OPERATIONS  

Apache Ambari

Apache Zookeeper

Apache Oozie

Page 37: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  37  

Executor Allocations on YARN

 

Static Allocation •  Static number of executors started on the cluster. •  Executors live for the duration of the application,

even when idle. Dynamic Allocation •  Minimal number of executors started initially. •  Executors added exponentially based on pending

tasks. •  After an idle period, executors are stopped and

resources are returned to the resource pool.

Page 38: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  38  

Static Allocation Details

 

Static Allocation •  Traditional means of starting executors on nodes.

spark-shell --master yarn-client \ --driver-memory 3686m \ --executor-memory 17g \ --executor-cores 7 \ --num-executors 7

•  Static number of executors specified by the submitter. •  Size and count of executors is key for good

performance.

Page 39: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  39  

Dynamic Allocation Details

 

Dynamic Allocation •  Scale executor count based on pending tasks

spark-shell --master yarn-client \ --driver-memory 3686m \ --executor-memory 3686m \ --executor-cores 1 \ --conf "spark.dynamicAllocation.enabled=true" \ --conf "spark.dynamicAllocation.minExecutors=1" \ --conf "spark.dynamicAllocation.maxExecutors=100" \ --conf "spark.shuffle.service.enabled=true"

•  Minimum and maximum number of executors specified.

•  Exclusive to running Spark on YARN

Page 40: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  40  

Enabling Dynamic Allocation

spark_shuffle YARN aux service Dynamic allocation is not enabled OOTB

--conf "spark.dynamicAllocation.enabled=true" \ --conf "spark.shuffle.service.enabled=true"

1.  Copy the spark-shuffle jar onto the NodeManager classpath.

2.  Configure the YARN aux service for spark_shuffle

Add: spark_shuffle to yarn.nodemanager.aux-services Add: yarn.nodemanager.aux-service.spark_shuffle.class =

Org.apache.spark.network.yarn.YarnShuffleService

3.  Restart the NodeManagers to pick up the spark-shuffle jar.

4.  Run the spark job with the dynamic allocation configs.

Page 41: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  41  

Dynamic Allocation Configuration Options

 

spark.dynamicAllocation.minExecutors Minimum number of executors, also the initial number to be spawned at

job submission. (can override initial count with initialExecutors) --conf "spark.dynamicAllocation.minExecutors=1”

spark.dynamicAllocation.maxExecutors Maximum number of executors, executors will be added

based on pending tasks up to this maximum. --conf "spark.dynamicAllocation.maxExecutors=100”

Page 42: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  42  

Dynamic Allocation Configuration Options

 

spark.dynamicAllocation.sustainedSchedulerBacklogTimeout After the initial round of executors are scheduled, how long until the next

round of scheduling? Default: 5 seconds.

--conf "spark.dynamicAllocation.schedulerBacklogTimeout=10”

spark.dynamicAllocation.schedulerBacklogTimeout Initial Delay to wait before allocating additional executors.

Default: 5 seconds

--conf "spark.dynamicAllocation.sustainedSchedulerBacklogTimeout=10”

E

Executors Started over Time

EE

E

E E

E E

E

E

E

E

E

E

E

E

Page 43: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  43  

Dynamic Allocation – Good citizenship in a shared environment

 

spark.dynamicAllocation.executorIdleTimeout Amount of idle time in seconds before a executor container is

killed and resource returned to YARN. Default: 10 minutes --conf "spark.dynamicAllocation.executorIdleTimeout=60”

spark.dynamicAllocation.cachedExecutorIdleTimeout Because caching RDDs is key to performance, this setting has been

introduced to keep executors with cached data around longer.

--conf "spark.dynamicAllocation.cachedExecutorIdleTimeout=1800”

Page 44: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  44  

Sizing your Spark job

 

Difficult Landscape •  Conflicting recommendations often found online. •  Need knowledge of the data set, task distribution,

cluster topology, RDD cache churn, hardware profile….

1 executor per core?

It Depends

1 executor per node?

3-5 executors if I/O bound?

yarn.nodemanager.resource.memory-mb?

18gb max heap?

Page 45: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  45  

Commons Suggestions to improve performance

Do these things 1.  Cache RDDs in memory* 2.  Don’t spill to disk if possible 3.  Use a better serializer 4.  Consider compression 5.  Limit GC activity 6.  Get parallelism right*

1.  … or scale elastically

* New considerations with Spark on YARN

Page 46: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  46  

Sizing Spark Executors on YARN

Relationship 1.  Setting the executor memory size is setting the JVM heap, NOT the container. 2.  Executor memory + the greater of (10% or 384mb) = container size. 3.  To avoid wasted resources, ensure Executor memory + memoryOverhead <

yarn.scheduler.minimum-allocation-mb

Page 47: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  47  

Sizing Spark Executors on YARN

 

Relevant YARN Container Settings •  yarn.nodemanager.resource.cpu-vcores

–  Number of vcores availble for YARN containers per nodemanager •  yarn.nodemanager.resource.memory-mb

–  Total memory available for YARN containers per nodemanager •  yarn.scheduler.minimum-allocation-mb

–  Minimum resource request allowed per allocation in megabytes. –  Smallest container available for an executor

•  yarn.scheduler.maximum-allocation-mb –  Maximum resource request allowed per allocation in megabytes. –  Largest container available for an executor –  Typically equal to yarn.nodemanager.resource.memory-mb

Page 48: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  48  

Tuning Advice

How do we get it right? •  Test, gather, and test some more •  Define a SLA! •  Tune the job, not the cluster •  Tune the job to meet SLA! •  Don’t tune prematurely, it’s the root of all evil

Starting Points •  Keep your heap reasonable, but large enough to

handle your dataset. –  Recall that we only get about 60% of the heap for

RDD caching. –  Measure GC and ensure the percent of time spent

here is low. •  For jobs that heavily depend on cached RDDs,

limit executors per machine to one where possible –  See the first point, if RDD cache churn or GC are a

problem, make smaller executors and run multiple per machine.

Starting Points •  High memory hardware, multiple executors per

machine. –  Keep the heap reasonable

•  For CPU bound tasks with limited data needs, more executors can be better

–  Run with 2-4GB executors with a single vcore and measure performance.

•  Tune task parallelism –  As a rule of thumb, increase the task count by 1.5x

each round of testing and measure the results.

Page 49: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  49  

Avoid spilling or caching to disk

 

Caching strategies •  Use the default .cache() or .persist() which stores data as deserialized java

objects (MEMORY_ONLY). –  Trade off: Lower CPU usage versus size of data in memory.

•  Don’t use disk persistence. –  It’s typically faster to recompute the partition and there is a good chance many of the

blocks are still in the Operating System page cache. •  If the default strategy results in the data not fitting in memory, use

MEMORY_ONLY_SER, which stores the data as serialized objects. –  Trade off: Higher CPU usage but data set is typically around 50% smaller in memory. –  Can result in significant impacts to the job run time for larger data sets, use with caution.

import org.apache.spark.storage.StorageLevel._ theRdd.persist(MEMORY_ONLY_SER)

Page 50: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  50  

Data Access with Spark on YARN

 

Gotchas •  Don’t cache base RDDs, poor distribution.

–  Do cache intermediate data sets, good distribution across dynamically allocated executors.

•  Ensure executors remain running until you are done with the cached data. –  Cached data goes away when the executors do, costly to recompute.

•  Data locality is getting better, but isn’t great. –  SPARK-1767 introduced locality waits for cached data.

•  computePreferredLocations is pretty broken. –  Only use if necessary, gets overwritten in some scenarios, better

approaches in the works.

val locData = InputFormatInfo.computePreferredLocations(Seq( new InputFormatInfo(conf, classOf[TextInputFormat], new Path("myfile.txt"))) val sc = new SparkContext(conf, locData)

Page 51: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  51  

Future Improvements for Spark on YARN

 

RDD Sharing –  Short term: Keep around executors with RDD cache longer –  HDFS Memory Tier for RDD caching –  Experimental Off-heap caching in Tachyon (lower overhead than persist()) –  Cache rebalancing

Data Locality for Dynamic Allocation –  No more preferredLocations, discover locality from RDD lineage.

Container/Executor Sizing –  Make it easier… automatically determine the appropriate size. –  Long term: specify task size only and memory, cores, and overhead are determined

automatically. Secure All The Things!

–  SASL for shuffle data –  SSL for the HTTP endpoints –  Encrypted Shuffle – SPARK-5682

Page 52: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  52  

DEMO: Scaling Spark workloads on YARN

2015  

Page 53: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  53  

Scaling compute independent of storage

HDP 2.3 Hadoop Cluster

Storage Nodes

Storage Node

NodeMgr

HDFS

Storage Node

NodeMgr

HDFS

Storage Node

NodeMgr

HDFS

Edge Node

Clients

Compute Node

NodeMgr

Compute Node

NodeMgr

Compute Node

NodeMgr

Compute Node

NodeMgr

Compute Nodes

Mgmt & Master Nodes

Ambari Node

Ambari

Master Node

Masters

Master Node

Masters

Master Node

Masters

Overview 1.  Pattern that is gaining

popularity in the cloud. 2.  Save costs and leverage the

elasticity of the cloud. 3.  Scale NodeManagers

(compute only) independent of traditional Nodemanager/Datanode (compute + storage) workers.

Page 54: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  54  

How it works?

Overview 1.  Leverage Spark Dynamic

Allocation on YARN to scale number of executors based on pending work.

2.  If additional capacity is still needed, provision additional compute nodes, add them to the cluster, and continue to scale executors onto the new nodes.

HDP 2.3 Hadoop Cluster

Storage Nodes

Storage Node

NodeMgr

HDFS

Storage Node

NodeMgr

HDFS

Storage Node

NodeMgr

HDFS

Compute Node

NodeMgr

Compute Node

NodeMgr

Compute Node

NodeMgr

Compute Node

NodeMgr

Compute Node

NodeMgr

Compute Node

NodeMgr

Compute Node

NodeMgr

Compute Node

NodeMgr

Compute Nodes

Edge Node

Clients

Mgmt & Master Nodes

Ambari Node

Ambari

Master Node

Masters

Master Node

Masters

Master Node

Masters

Page 55: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  55  

HDP/Spark ClusterCloudbreak

Ambari

Orchestration(REST API) Metrics

Spark Client

Compute Nodes

Container

Executor

Container

Executor

Container

Executor

Container

Executor

Process Overview

+Container

Executor

Container

Executor

+

More Compute!

Container

Executor

Container

Executor

Container

Executor

Container

Executor

1 Deploy Cluster

2 Set Alerts

3 Submit Job4 Executors Increase

5 Capacity reached, Alerts trigger

6 Scaling Policy adds compute nodes

Page 56: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  56  

DEMO – Leveraging Dynamic Allocation

Page 57: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  57  

Scenarios

Promising Use Cases 1.  CPU bound workloads 2.  Burst-y usage 3.  Zeppelin/ad-hoc data exploration 4.  Multi-tenant, multi-use, centralized cluster 5.  Dev/QA clusters

Page 58: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  58  

Cloudbreak

•  Developed by SequenceIQ •  Open source with options to extend

with custom UI •  Launches Ambari and deploys

selected distribution via Blueprints in Docker containers

•  Customer registers, delegates access to cloud credentials, and runs Hadoop on own cloud account (Azure, AWS, etc.)

•  Elastic – Spin up any number of nodes, up/down scale on the fly

“Cloud agnostic Hadoop As-A-Service API”

Page 59: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  59  

BI  /  AnalyWcs  (Hive)  

IoT  Apps  (Storm,  HBase,  Hive)  

Launch HDP on Any Cloud for Any Application

Dev  /  Test  (all  HDP  services)  

Data  Science  (Spark)  

Cloudbreak  1.  Pick  a  Blueprint  2.  Choose  a  Cloud  3.  Launch  HDP!  

Example  Ambari  Blueprints:    IoT  Apps,  BI  /  Analy2cs,  Data  Science,  

Dev  /  Test  

Page 60: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  60  

Step 1: Sign up for a free Cloudbreak account

Page  60  

URL to sign up for a free account:"https://accounts.sequenceiq.com/ ""General Cloudbreak documentation:"http://sequenceiq.com/cloudbreak/#cloudbreak

Page 61: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  61  

• Varies by cloud, but typically only a couple of steps.

Page 61

Step 2: Create or add credentials

Page 62: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  62  

Step 3: Note the blueprint for your use case

• An Ambari blueprint describes components of the HDP stack to include in the cloud deployment

• Cloudbreak comes with some default blueprints, such as a Spark cluster or a streaming architecture

• Pick the appropriate blueprint, or create your own!

Page 62

Page 63: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  63  

Step 4: Create Cluster

• Ensure your credential is selected by clicking on “select a credential”

• Click Create cluster, give it a name, choose a region, choose a network

• Choose desired blueprint

• Set the instance type and number of nodes.

• Click create and start cluster

Page 63

Page 64: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  64  

Step 5: Wait for cluster install to complete

• Depending on instance types and blueprint chosen, cluster install should complete in 10-35 mins

• Once cluster install is complete, click on the Ambari server address link (highlighted on screenshot) and login to Ambari with admin/admin

• Your HDP cluster is ready to use

Page 64

Page 65: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  65  

Periscope: Auto up and down scaling

• Define alerts for the number of pending YARN containers.

Page 65

Page 66: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  66  

Periscope: Auto up and down scaling

• Define scaling policies for how Periscope should react to the defined alerts.

Page 66

Page 67: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  67  

Periscope: Auto up and down scaling

• Define the min/max cluster size and “cooldown” period (how long to wait between scaling events).

Page 67

• The number of compute nodes will automatically scale when out of capacity for containers.

Page 68: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  68  

Benefits

Why do I care? •  Less contention between jobs

–  Less waiting for your neighbors job to finish, elastic scale gives us all compute time.

•  Improved job run times. –  Testing has shown a 30%+ decrease in job run times for moderate

duration CPU bound jobs. •  Decreased costs over persistent IaaS clusters

–  Spin down resources not in use. –  If time = money, improve job run times will decrease costs.

•  Capacity planning hack! –  Scaling up a lot? You should probably add more capacity… –  Never scaling up? You probably overbuilt…

Page 69: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  69  

DEMO – Auto Scaling IaaS

Page 70: Scaling Spark Workloads on YARN - Boulder/Denver July 2015

Page  70  

Q & A