26
© 2015 MapR Technologies 1 ® © 2014 MapR Technologies Maintaining Low Latency while Maximizing Throughput Yuliya Feldman February 19, 2015

Maintaining Low Latency While Maximizing Throughput on a Single Cluster

Embed Size (px)

Citation preview

Page 1: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 1

®

© 2014 MapR Technologies

Maintaining Low Latency while Maximizing Throughput Yuliya Feldman February 19, 2015

Page 2: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 2

Top-Ranked NoSQL

Top-Ranked Hadoop Distribution

Top-Ranked SQL-on-Hadoop Solution

®

Page 3: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 3

What We Have – Cluster per Use Case

YARN cluster Web Servers YARN cluster

Too much isolation and poor resource utilization

Page 4: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 4

Need Datacenter-wide Resource Manager What choices do we have?

•  YARN (capacity/fair scheduler)

•  Omega

•  Mesos

•  Others (e.g. Quasar)

Page 5: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 5

YARN •  Motivated by Mesos, but is a Hadoop resource manager

•  Manages Hadoop resources well – “retail”

•  Pluggable schedulers for Hadoop

•  Started handling long-lived tasks

•  Can pre-empt tasks

•  YARN-1051 - YARN Admission Control/Planner: enhancing the resource allocation model with time

Page 6: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 6

Mesos

•  Data-center wide resource manager – negotiator between

frameworks

•  Manages all resources for frameworks well, not particular framework (e.g. Hadoop) – “wholesale”

•  Doing two-level scheduling

•  Excellent Docker support

•  Schedules, allocates, and isolates cpu, mem, disk, network, and arbitrary custom resource types

Page 7: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 7

Can we….

–  Continue leveraging YARN resource scheduling capabilities for YARN-based applications?

–  Treat YARN as “yet another” framework within Mesos?

–  Let YARN not bother about non-YARN applications coexistence?

Page 8: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 8

Introducing Myriad

Page 9: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 9

Apache Myriad: True Multi-tenancy

•  Open-source project launched Oct `14 –  MapR, eBay, Mesosphere, others participating

•  Allows Mesos and YARN to cooperate with each other •  Mesos: datacenter-wide resource manager

–  Dockerized containers and/or cgroups used for isolation

•  Hadoop is launched inside cgroup containers •  Myriad manages conversation between RM and Mesos master

and between NM and Mesos slaves

Page 10: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 10

Why Myriad •  Run many types of compute frameworks side-by-side

–  Hadoop family, etc. (YARN, Spark, Kafka, Storm) –  Web-server farm –  MPP databases (e.g., Vertica) –  Other services: SOA web-services, Jenkins/build-farm, cron-jobs, shell

scripts, Kubernetes, Cassandra, ElasticSearch, etc. –  Each compute framework is a cluster in itself

•  Need to break up a physical cluster into many virtual clusters –  Using Docker (containers) for good isolation –  But most schedulers can only manage individual nodes inside a cluster

•  Move resources between virtual clusters on-demand

Page 11: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 11

Utilize Excess Capacity for Analytics DC Server Farm Hadoop Analytics

Util

izat

iion

Long lived excess capacity situations

•  “Scale up” Hadoop during long periods of low utilization •  “Scale down” Hadoop ahead of anticipated high utilization

Page 12: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 12

Myriad Again

•  Mesos creates virtual clusters

•  YARN uses resources provided by Mesos

•  Myriad can ask YARN to release some resources

•  Or give it more

Mesos

YARN cluster YARN cluster

Web Servers

Page 13: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 13

Myriad Services Architecture

Node Manager Resource Manager

Executor Mesos

Scheduler

Mesos

Container

Container

App

YARN Scheduler (fairshare)

Offers

Launch Tasks

Launch Tasks

Task Status

Launch containers via HB

Submit

Map<Node, Capacity>

Page 14: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 14

REST API

Framework +

Master

2.

Mesos Resource Manager

YARN

Mesos Slave

Mesos

Node

Node Manager

YARN

Launch Node Manager

2.5 CPU,2.5 GB

Advertise Resources

2 CPU,2 GB

How it works Mesos

scheduler

Page 15: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 15

REST API

Framework +

Master

2.

Mesos Resource Manager

YARN

Mesos Slave

Mesos

Node

Node Manager

YARN

Launch Containers

C1

C2

Mesos scheduler

Page 16: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 16

2.

Slave

Mesos

Node1

Node Manager

YARN

8 CPU, 8 GB

2.

Slave

Mesos

Node2

Node Manager

YARN

8 CPU,8 GB

REST API

Framework +

Master

Mesos Resource Manager

YARN

Web Traffic spike

Resize NodeManager(s)

6 CPU, 6 GB 6 CPU, 6 GB

WebService 2 CPU, 2 GB 2 CPU, 2 GB

WebService

Use Case – Web Traffic spikes

Mesos scheduler

Page 17: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 17

2.

Slave

Mesos

Node1

Node Manager

YARN

8 CPU, 8 GB

2.

Slave

Mesos

Node2

Node Manager

YARN

8 CPU,8 GB

REST API

Framework +

Master

Mesos Resource

Manager

YARN

Web Traffic spike over

Resize NodeManager(s)

6 CPU, 6 GB 6 CPU, 6 GB

WebService 2 CPU, 2 GB 2 CPU, 2 GB

WebService

Mesos scheduler

Page 18: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 18

Myriad Demo

At MapR booth 1009

Page 19: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 19

Maintaining Low Latency while Maximizing Throughput

on a single cluster

Page 20: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 20

Batch and Real-time Analytics Together

Compute Cluster

NM DrillBit

NM

DrillBit

NM

DrillBit

NM

DrillBit

NM DrillBit

NM

DrillBit

NM

DrillBit

NM

DrillBit

Cluster/DC Scheduler

Page 21: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 21

Sharing Resources between Batch and Real-Time

•  Real-time services resource usage pattern can be unpredictable

–  Analysts use services during the day

–  Analysts on the other side of the globe work during the night

–  There are steady states, spikes and dips in the workloads

•  Batch resource usage – more or less predictable

–  Running same jobs all over again with some occasional spikes and dips

Page 22: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 22

Real-time Services Resource Utilization/Provisioning

Aggressive resource provisioning. < 10% utilization

Moderate resource provisioning < 60% utilization

Conservative resource provisioning > 80% utilization

Page 23: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 23

What Can We Do To Provision Conservatively?

Compute Cluster

NM DrillBit

NM

DrillBit

NM

DrillBit

NM

DrillBit

Cluster/DC ResourceManager

Drill Service Watcher

Monitors Drill

Performance

Latency decrease

Accept Offers (Mesos) Need additional Containers (YARN)

Allocate Resources (Preempt if

needed)

C1

C2 C3

Dummy containers

Latency increase

Page 24: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 24

SHOWTIME

Page 25: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 25

Page 26: Maintaining Low Latency While Maximizing Throughput on a Single Cluster

®© 2015 MapR Technologies 26

Q & A

@mapr maprtech

[email protected]

Engage with us!

MapR

maprtech

mapr-technologies