堵俊平：Hadoop virtualization extensions

Big Data in Cloud

堵俊平 Apache Hadoop Committer

Staff Engineer, VMware

Bio 堵俊平 (Junping Du) - Join VMware in 2008 for cloud product first - Initiate earliest effort on big data within VMware since 2010 - Automate Hadoop deployment on vSphere which becomes Open Source project – Serengeti later - Start contributing to Apache Hadoop community since 2012 - Become Apache Hadoop committer recently only 1 in +8 timezone today

Agenda - Virtualization, SDDC and Cloud - Trends from my observation in Big Data - YARN: resource hub for Big Data Applications - YARN in the Cloud

What is Virtualization? - @see VMware’s vSphere

Monitor Emulates Physical Devices: CPU, Memory, I/O

VMkernel

Guest

Physical Hardware

Monitor

Guest

Memory Manager

NIC Drivers

Virtual Switch

I/O Drivers

File System

Monitor

Scheduler

Virtual NIC Virtual SCSI

TCP/IP

File System

CPU is controlled by scheduler and virtualized by monitor Memory is allocated by the VMkernel and virtualized by the monitor

Network and I/O devices are emulated and proxied though native device drivers

Server Virtualization Adoption on Path to 80% Over Next 5 Years

0%

40%

80%

2010 2011 2012 2013 2014 2015 2016 2017 2018

% Virtualized of x86 Workloads

IDC

2012 to 2016

Change = +12 pts

Source(s): IDC: Annual Virtualization Forecast, Feb-13; Gartner: x86 Server Virtualization, Worldwide, 3Q12 Update; Gartner: Forecast x86 Server Virtualization, Worldwide, 2008-2018, Jul-11; VMware estimates,

Note: Server workloads only 1 Installed Base totals assume 5-year refresh

Gartner

2012 to 2016

Change = +22 pts

Total x86 Workloads

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

-

20

40

60

80

100

120

140

160

180

200

2009 2010 2011 2012 2013 2014 2015 2016

百万

x86 % Physical Servers Unvirtualized

IDC+ VMW

Estimate:

Workloads1

2012 to 2016

CAGR = 21%

Windows Linux Databases Mission Critical

HPC Big Data

Apps on Traditional Infrastructure

Windows Linux Databases Mission Critical

HPC Big Data

Software-Defined Data Center

VDC VDC VDC VDC VDC

Software-Defined Data Center Services

Abstract Pool Automate

Apps on Software-Defined Data Center

Infrastructure for Traditional Apps

Infrastructure for Traditional Enterprise Apps

Existing Application bound to vendor specific HW

Hardware-based Resiliency

Hardware-based QOS

Hard To automate

Complex to scale

Traditional Applications

2016 141M

2012 83M

70%

Infrastructure for New Apps

Infrastructure for New/Cloud/Data Apps

Application Specific Network and Storage

Next Gen Cloud Applications

2016 48M

2012 6M

700%

Software-based Infrastructure

Transformational Economics

Automation and Agility

Designed For Scale

SDDC Delivers Single Architecture for New and Existing Apps

Infrastructure for New/Cloud/Data Apps

Application Specific Network and Storage

Infrastructure for Existing Enterprise Apps

Existing Application bound to vendor specific HW

Any Application

Any Hardware

Let’s back to Big Data …

New Trends of Big Data from my observation - Hadoop 2.0, YARN plays as key resource hub in big data ecosystem - MapReduce is not good enough, we need faster one, like: Tez, Spark, etc. - HDFS tries to support more scenarios, i.e. cache for low-latency apps, snapshot for disaster recovery, storage tiers awareness, etc. - More Hadoop-based SQL engines: Apache Drill, Impala, Stinger, Hawq, etc. - For enterprise-ready, more efforts are spent on Security, HA, QoS, Monitor & Management

Hadoop MapReduce v1 (Classic)

• JobTracker

– Manage cluster resources and job scheduling

• TaskTracker

– Per node agent

– Manage tasks

MapReduce v1 Limitations

• Scalability – Manage cluster resources and job scheduling

• SPOF (Single Point Of Failure) • JobTracker failure cause all queued and running job

failure – Restart is very tricky due to complex state

• Hard partition of resources into map and reduce slots – Low resource utilization

• Lacks support for alternate paradigms • Lack of wire-compatible protocols

YARN Architecture • Splits up the two major functions of

JobTracker – Resource Manager (RM) - Cluster resource

management

– Application Master (AM) - Task scheduling and monitoring

• NodeManager (NM) - A new per-node slave – launching the applications’ containers

– monitoring their resource usage (cpu, memory) and reporting to the Resource Manager.

• YARN maintains compatibility with existing MapReduce application and support other applications

YARN – Hub for Big Data Applications

YARN

MapReduce Tez

HDFS

Storm

Spark

HBase Impala

OpenMPI Distributed Shell

• App-specific AM • HOYA (Hbase On YArn)

– Long running services (YARN-896)

• LLAMA (Low Latency Application MAster) – Gang Scheduler (YARN-624)

• Two different prospective:

– YARN-centric prospective

• YARN is the key platform to apps

• YARN is independent of infrastructure, running on top of Cloud shows YARN’s generality

– Cloud-centric prospective

• YARN is an umbrella kind of applications

• Supporting YARN shows Cloud’s generality

YARN and Cloud

YARN and Cloud: YARN-centric Prospective

YARN

Bare-metal machines

MapReduce Tez Storm

Spark HBase

Impala

Open MPI Distributed Shell

VMware Open Stack

Infrastructure

Big Data Apps

…

…

Cloud Infrastructure

…

• YARN is “OS”

• Infrastructure (no matter physical or cloud) is “hardware”

YARN and Cloud: Cloud-centric Prospective

YARN

MapReduce Tez Storm

Spark HBase

Impala

Open MPI D.S

Cloud Infrastructure (VMware, Open Stack, etc.)

YARN Apps Legacy Apps Other Big Data Apps

… …

• Cloud Infrastructure is “OS”

• YARN is a group of “process”

• Similarity – Target to share resources across applications – Provide Global Resource Management

• YARN vs. Cloud – YARN managing resource in OS layer vs. Cloud managing

resources in Hypervisor (Not comparable, but Hypervisor is more powerful than OS in isolation)

– Apps managed by YARN need specific AppMaster, Apps managed by Cloud is exactly the same as running on physical machines (Cloud +1)

– YARN layer is closed to big data app, better understand/estimate app’s requirement (YARN +1)

– Cloud layer is closed to hardware resources, easier to track real time and global resource utilization (Cloud +1)

YARN vs. Cloud

• Why YARN + Cloud? – Leverage virtualization in strong isolation, fine-grained

resource sharing and other benefits – Uniform infrastructure to simplify IT in enterprise

• What it looks like? – Running YARN NM inside of VMs managed by Cloud

Infrastructure – Build communication channel between YARN RM and

Cloud Resource Manager for coordination

• How we do? – First thing above is very easy and smoothly – Second things to achieve in two ways

• YARN can aware/manipulate Cloud resource change • YARN provide a generic resource notification mechanism so

Cloud Manager can use when resource changing

YARN + Cloud

Virtualization Host

Elastic YARN Node in the Cloud

Virtual YARN Node

Other Workload

VMDK

Datanode

NodeManager

Container

Container Add/Remove Resources?

Grow/Shrink by tens of GB in memory?

Grow/Shrink resource of a VM

• VM’s resource boundary can be elastic – CPU is easy – time slicing (with constraints) – Memory is harder – page sharing and memory ballooning – In case of contention, enforce limits and proportional sharing – “Stealing” resources behind apps could cause bad

performance (paging) – App aware resource management could address these issues

• Hadoop YARN Resource Model – Dynamic with adding/removing nodes – But static for per node

• In this case, shall we enable resource elasticity on VM? – If yes, low performance when resource contention happens. – If no, low utilization as physical boxes because free resources

cannot be leveraged by other busy VMs

• We need better answer .


HVE provide the answer! • Hadoop Virtualization Extensions

– A project initiated from VMware to enhance Hadoop running on virtualization

– A “driver” for Hadoop “OS” running on cloud “hardware”

• Goal: Make Hadoop Cloud-Ready – Provide Virtualization-awareness to Hadoop, i.e.

virtual topology, virtual resources, etc.

– Deliver generic utility that can be leveraged by virtualized platform

• Independent of virtualization platform and cloud infrastructure

• 100% contribute to Apache Hadoop Community

HVE • Philosophy

– make infrastructure related components abstract

– deliver different implementations that can be configured properly

• E.g.

BlockPlacementPolicy

BlockPlacementPolicy (Abstract)

BlockPlacementPolicy Default

BlockPlacementPolicy For Virtualization

• In this case, shall we enable resource elasticity on VM?

• Yes, and we try to get rid of resource contention

– Notify YARN that node’s resource get changed

– YARN RM scheduler won’t schedule new tasks on nodes get congestion

– YARN scheduler preempt low priority tasks if necessary

– The work is addressed in YARN-291


Implementation – YARN-291 (umbrella)

• YARN-311

– Core scheduler changes

• YARN-313

• CLI

• YARN-312

– AdminProtocol changes

• REST API, JMX, etc.

Node Manager

SchedulerNode

Cloud Resource

Manager

Resource Manager

Resource Tracker Service

Scheduler

RMContext

RMNode

Heartbeat

Admin CLI AdminService

Cluster Resource

UpdateNodeResource()

yarn rmadmin -updateNodeResource

<NodeId> <Resource>

Welcome contribution to Apache Hadoop!

• Hadoop is the key platform – For architecting Big Data

– Contribute a bit can change the world!

• Open source project is a great platform – For people to share great ideas, works from different

organizations

– Community is a great work place

• Companies and persons get credit – From work and resources they are putting

– Also easy to build a ecosystem and show expertise

• So many challenges in Big Data, like building Babel – Open source is the common language to make sure we can

work together

• SDDC and Cloud are the future for architecting enterprise IT

• New trends in big data: YARN plays as a “OS” for big data apps

• In VMware, we tries to support any “OS”, include “YARN”

• HVE plays as “driver” to enable Hadoop on virtualization/cloud

• Contribute to Apache Hadoop

Key messages in today’s talk

Reference • YARN MapReduce 2.0

– https://issues.apache.org/jira/browse/MAPREDUCE-279

• HVE topology extension – https://issues.apache.org/jira/browse/HADOOP-8468

• HVE topology extension for YARN – https://issues.apache.org/jira/browse/YARN-18

• HVE elastic resource configuration – https://issues.apache.org/jira/browse/YARN-291

• Gang Scheduling – https://issues.apache.org/jira/browse/YARN-624

• Long-lived services in YARN – https://issues.apache.org/jira/browse/YARN-896

Technology

堵俊平：Hadoop virtualization extensions