Riding the Elephant - Hadoop 2.0

Preview:

DESCRIPTION

Hadoop 2.0, and in particular YARN has opened up a lot of potential applications beyond MapReduce. This presentation explains some of the ways this happened, and what you can now do that you couldn't before. It also introduces some new tools (Spark) and infrastructure pieces (Mesos) to achieve even more efficient cluster use.

Citation preview

Simon Elliston Ball Head of Big Data - Red Gate Ventures

@sireb

Riding the Elephant: Hadoop 2.0

http://bit.ly/RidingElephants

Append only distributed file-system

In the beginning…

Map ReduceJava.

JVM Based (scala, groovy, jython, clojure)

More languages

Streaming (python, whatever)HDP for Windows and .NET SDK

Abstraction

Photo: https://www.flickr.com/photos/puroticorico/

Hive, PigCascadingScalding

SQL on Hadoop

Learning to share the toys

HBaseSolr on Hadoop

Sharing HDFS…

Map Reduce v1

JobTracker

Job

Head Node

TaskTrackerTask (Map /

Reduce)

Data Nodem slot 1m slot 2…m slot n

Task

Task

Task

r slot 1r slot 2…r slot n

TaskTrackerTask (Map /

Reduce)

Data Nodem slot 1m slot 2…m slot n

r slot 1r slot 2…r slot n

TaskTrackerTask (Map /

Reduce)

Data Nodem slot 1m slot 2…m slot n

r slot 1r slot 2…r slot n

Map Reduce v1

JobTracker

Job

Head Node

TaskTrackerTask (Map /

Reduce)

Data Nodem slot 1m slot 2…m slot n

MR Status

MR Status

MR Status

r slot 1r slot 2…r slot n

TaskTrackerTask (Map /

Reduce)

Data Nodem slot 1m slot 2…m slot n

r slot 1r slot 2…r slot n

TaskTrackerTask (Map /

Reduce)

Data Nodem slot 1m slot 2…m slot n

r slot 1r slot 2…r slot n

Typical Hadoop 1.x setup

HBase

Production

Adhoc

Typical Hadoop 1.x setup

HBase

Production

Adhoc

YARN architecture

Container

Application

Master

Container

Data Node

Node Manager

Container

Container

Container

Data Node

Node Manager

Application

Master

Container

Free Slot

Data Node

Node Manager

ResourceManager

YARN Client

YARN architecture

Container

Application

Master

Container

Data Node

Node Manager

Container

Container

Container

Data Node

Node Manager

Application

Master

Container

Free Slot

Data Node

Node Manager

ResourceManager

YARN Client

YARN architecture

Container

Application

Master

Container

Data Node

Node Manager

Container

Container

Container

Data Node

Node Manager

Application

Master

Container

Free Slot

Data Node

Node Manager

ResourceManager

YARN Client

YARN architecture

Container

Application

Master

Container

Data Node

Node Manager

Container

Container

Container

Data Node

Node Manager

Application

Master

Container

Free Slot

Data Node

Node Manager

ResourceManager

YARN Client

Removing the choke point

Advantages

60%-150% better usageLong running applications

Not quite…

Operating system for Big Data?

Security

…but a framework for Big Data Apps

Data Access abstraction

Storm on YARN

A whole batch of new applications

HOYA

Tez (Stinger)

MapReduce 2

Giraph

<Insert your application here>

Batch applications

Spinning YARNs with Spring

ServicesDirect to YARN APIsSpring Data Hadoop abstraction

Streamin

g

Why?

Machine

LearningGraph

sService

sDistributed Shell -

Anything.

Spark

A higher abstraction

Hadoop based?

… but can run on YARN

In Memory

Distributed

Fault tolerant

Real-time

✓✓✓

✓�

RRDs

Mesos

Wider sharing

Hadoop

Spark

Aurora

Mesos Framework

Hardware

YARN

MapReduce

HBase etc

HDFS

Hadoop is more than MapReduce

The new world

YARN opens up new paradigmsInfrastructure maturing: better sharing

Hadoop and beyond!

Thank you

Questions?Simon Elliston Ball Head of Big Data - Red Gate Ventures

@sirebsimon@simonellistonball.com

http://bit.ly/RidingElephants

Recommended