®© 2015 MapR Technologies 1
®
© 2014 MapR Technologies
Maintaining Low Latency while Maximizing Throughput Yuliya Feldman February 19, 2015
®© 2015 MapR Technologies 2
Top-Ranked NoSQL
Top-Ranked Hadoop Distribution
Top-Ranked SQL-on-Hadoop Solution
®
®© 2015 MapR Technologies 3
What We Have – Cluster per Use Case
YARN cluster Web Servers YARN cluster
Too much isolation and poor resource utilization
®© 2015 MapR Technologies 4
Need Datacenter-wide Resource Manager What choices do we have?
• YARN (capacity/fair scheduler)
• Omega
• Mesos
• Others (e.g. Quasar)
®© 2015 MapR Technologies 5
YARN • Motivated by Mesos, but is a Hadoop resource manager
• Manages Hadoop resources well – “retail”
• Pluggable schedulers for Hadoop
• Started handling long-lived tasks
• Can pre-empt tasks
• YARN-1051 - YARN Admission Control/Planner: enhancing the resource allocation model with time
®© 2015 MapR Technologies 6
Mesos
• Data-center wide resource manager – negotiator between
frameworks
• Manages all resources for frameworks well, not particular framework (e.g. Hadoop) – “wholesale”
• Doing two-level scheduling
• Excellent Docker support
• Schedules, allocates, and isolates cpu, mem, disk, network, and arbitrary custom resource types
®© 2015 MapR Technologies 7
Can we….
– Continue leveraging YARN resource scheduling capabilities for YARN-based applications?
– Treat YARN as “yet another” framework within Mesos?
– Let YARN not bother about non-YARN applications coexistence?
®© 2015 MapR Technologies 8
Introducing Myriad
®© 2015 MapR Technologies 9
Apache Myriad: True Multi-tenancy
• Open-source project launched Oct `14 – MapR, eBay, Mesosphere, others participating
• Allows Mesos and YARN to cooperate with each other • Mesos: datacenter-wide resource manager
– Dockerized containers and/or cgroups used for isolation
• Hadoop is launched inside cgroup containers • Myriad manages conversation between RM and Mesos master
and between NM and Mesos slaves
®© 2015 MapR Technologies 10
Why Myriad • Run many types of compute frameworks side-by-side
– Hadoop family, etc. (YARN, Spark, Kafka, Storm) – Web-server farm – MPP databases (e.g., Vertica) – Other services: SOA web-services, Jenkins/build-farm, cron-jobs, shell
scripts, Kubernetes, Cassandra, ElasticSearch, etc. – Each compute framework is a cluster in itself
• Need to break up a physical cluster into many virtual clusters – Using Docker (containers) for good isolation – But most schedulers can only manage individual nodes inside a cluster
• Move resources between virtual clusters on-demand
®© 2015 MapR Technologies 11
Utilize Excess Capacity for Analytics DC Server Farm Hadoop Analytics
Util
izat
iion
Long lived excess capacity situations
• “Scale up” Hadoop during long periods of low utilization • “Scale down” Hadoop ahead of anticipated high utilization
®© 2015 MapR Technologies 12
Myriad Again
• Mesos creates virtual clusters
• YARN uses resources provided by Mesos
• Myriad can ask YARN to release some resources
• Or give it more
Mesos
YARN cluster YARN cluster
Web Servers
®© 2015 MapR Technologies 13
Myriad Services Architecture
Node Manager Resource Manager
Executor Mesos
Scheduler
Mesos
Container
Container
App
YARN Scheduler (fairshare)
Offers
Launch Tasks
Launch Tasks
Task Status
Launch containers via HB
Submit
Map<Node, Capacity>
®© 2015 MapR Technologies 14
REST API
Framework +
Master
2.
Mesos Resource Manager
YARN
Mesos Slave
Mesos
Node
Node Manager
YARN
Launch Node Manager
2.5 CPU,2.5 GB
Advertise Resources
2 CPU,2 GB
How it works Mesos
scheduler
®© 2015 MapR Technologies 15
REST API
Framework +
Master
2.
Mesos Resource Manager
YARN
Mesos Slave
Mesos
Node
Node Manager
YARN
Launch Containers
C1
C2
Mesos scheduler
®© 2015 MapR Technologies 16
2.
Slave
Mesos
Node1
Node Manager
YARN
8 CPU, 8 GB
2.
Slave
Mesos
Node2
Node Manager
YARN
8 CPU,8 GB
REST API
Framework +
Master
Mesos Resource Manager
YARN
Web Traffic spike
Resize NodeManager(s)
6 CPU, 6 GB 6 CPU, 6 GB
WebService 2 CPU, 2 GB 2 CPU, 2 GB
WebService
Use Case – Web Traffic spikes
Mesos scheduler
®© 2015 MapR Technologies 17
2.
Slave
Mesos
Node1
Node Manager
YARN
8 CPU, 8 GB
2.
Slave
Mesos
Node2
Node Manager
YARN
8 CPU,8 GB
REST API
Framework +
Master
Mesos Resource
Manager
YARN
Web Traffic spike over
Resize NodeManager(s)
6 CPU, 6 GB 6 CPU, 6 GB
WebService 2 CPU, 2 GB 2 CPU, 2 GB
WebService
Mesos scheduler
®© 2015 MapR Technologies 18
Myriad Demo
At MapR booth 1009
®© 2015 MapR Technologies 19
Maintaining Low Latency while Maximizing Throughput
on a single cluster
®© 2015 MapR Technologies 20
Batch and Real-time Analytics Together
Compute Cluster
NM DrillBit
NM
DrillBit
NM
DrillBit
NM
DrillBit
NM DrillBit
NM
DrillBit
NM
DrillBit
NM
DrillBit
Cluster/DC Scheduler
®© 2015 MapR Technologies 21
Sharing Resources between Batch and Real-Time
• Real-time services resource usage pattern can be unpredictable
– Analysts use services during the day
– Analysts on the other side of the globe work during the night
– There are steady states, spikes and dips in the workloads
• Batch resource usage – more or less predictable
– Running same jobs all over again with some occasional spikes and dips
®© 2015 MapR Technologies 22
Real-time Services Resource Utilization/Provisioning
Aggressive resource provisioning. < 10% utilization
Moderate resource provisioning < 60% utilization
Conservative resource provisioning > 80% utilization
®© 2015 MapR Technologies 23
What Can We Do To Provision Conservatively?
Compute Cluster
NM DrillBit
NM
DrillBit
NM
DrillBit
NM
DrillBit
Cluster/DC ResourceManager
Drill Service Watcher
Monitors Drill
Performance
Latency decrease
Accept Offers (Mesos) Need additional Containers (YARN)
Allocate Resources (Preempt if
needed)
C1
C2 C3
Dummy containers
Latency increase
®© 2015 MapR Technologies 24
SHOWTIME
®© 2015 MapR Technologies 25
®© 2015 MapR Technologies 26
Q & A
@mapr maprtech
Engage with us!
MapR
maprtech
mapr-technologies