Lessons from building large clusters

Phil Day, HP Consulting

8th November 2010

Small vs Large Clusters

Small Production Clusters and

Proof of Concept

– Build and run by a few skilful

people

– Can be a natural extension

to conventional IT

– You know the servers by

Large Production Clusters

– Build and run by pioneers

– Large development staff

– Major Hadoop contributors

– Understand the problems of

Images: Creative Commons 2.0 – Attribution Andrew Morrell (Flickr )

– Have, or want to start with, a small PoC (10’s of nodes)

– Want to quickly scale to large cluster (100’s of nodes)

– Want the scale of large clusters, but with the build and operational

model of a small one

– Want to run the cluster rather than build and develop it

– Need to integrate it with existing systems

Large Scale Early Adopters

Unfortunately not all things in life scale as well as Hadoop

Design – The Technology Challenge

Build – The Engineering Challenge

Transfer to Operations - The Service Management Challenge

Design – The Technology ChallengeSelecting all the right bits

Server Selection

– Core Nodes: Resilient, Big Memory, RAID

– Data Nodes: Not resilient, no RAID or hot swap, basic iLO

– Trade off Disks vs Cores vs Memory to match target load

– Need to consider disc allocation policy

– Network redundancy is useful to avoid rack switch failures

– Edge Nodes (Data ingress/egress & Mgmt)

– Higher spec data nodes

– Help provide the “appliance” view of the cluster

– Have Hadoop installed but don’t run as part of the cluster.

– Network Selection

– Dual 1Gb from data nodes to rack switches

– 10Gb from rack switches to core, and from Edge nodes

Build – The Engineering ChallengeDo you realise how many cardboard boxes that is ?

Building at the scale of 500+ servers has its own set of problems

• Space and Environment

• Consistency of Build

• Failures during the Build

• Deployment time and the cost of rework

Two things we found very helpful:

Factory Integration Services

Cluster Management Utility

Build – HP Factory Integration ServicesReducing risk and time

• Many years experience of building large clusters

• Site inspection

• Build, Configure, Soak Test

• Diagnose and fix DoAs

• Rack and Label

• Asset tagging

• Custom build and set-up

• Pack and Ship

• On-Site build and integration

www.hp.com/go/factoryexpress

Complex solutions ...

... Made simple

Build – HP Cluster Management UtilityRack aware deployment and monitoring

• Proven cluster deployment and management tool

• 11 Years of experience

• Proven with clusters of 3500+ nodes

• Deployment

• Network and power load aware deployment

• Easily extensible

• Kickstart integration

• Monitoring

• Scalable non intrusive monitoring

• Collectl integration

• Administration

• Command Line or GUI

• Cluster wide configuration

www.hp.com/go/cmu

CMU Dashboard

Cluster Performance over time

Disk (read)

Disk (write)Network

Operate – the organisational challengeHow do we know when its working ?

Clusters are not just large numbers of servers

• At scale it may never be 100% up (like a network)

.... but it can be 100% down (like a server)

• Need to think more in terms of “How healthy is it ?”

• Core nodes are important

• Data nodes much less so – unless they fail in patterns

• Edge nodes – somewhere in between

• Look at HDFS health for replication counts

• Nagios & ganglia

• Collectl / CMU to visualise the cluster

Summary

Key considerations when building a large cluster

• Use a pilot system to establish your server configuration

• Stand on the shoulders of the Pioneers

• Build and test in the factory if you can

• Consistency in the build and configuration is vital

• Cherish the NameNode, protect the Edge Nodes, and develop the

right level of indifference to the Data Nodes

• Practice the key recovery cases

• Match training and support to the service expectations

And remember not all things in life scale as well as Hadoop

Questions ?

Lessons from building large clusters

Documents

Large networks , clusters and Kronecker products

Lessons from Netflix Mesos Clusters

Map Reduce: Simplified Data Processing on Large Clusters

MapReduce : simplified data processing on large clusters abstract

Large Scale Visualization using PC Clusters · Large Scale Visualization using PC Clusters ... 200 time steps ... “Cluster to Wall with VTK,” Parallel and Large Data Volume

Map reduce - simplified data processing on large clusters

Parallel Visualization on Large Clusters using MapReduceVo, Bronson, Summa, Comba, Freire, Howe, Pascucci, Silva / Parallel Visualization on Large Clusters using MapReduce 3 likely

Clusters & Super Clusters Large Scale Structure Chapter 22

Large Clusters - faculty.wcas.northwestern.eduiac879/wp/wildfewclusters.pdf · The Wild Bootstrap with a \Small" Number of \Large" Clusters Ivan A. Canayy Andres Santosz Azeem M

ABACUS: Mining Arbitrary Shaped Clusters from Large ...zaki/PaperDir/SDM11-abacus.pdf · ABACUS: Mining Arbitrary Shaped Clusters from Large Datasets based on Backbone Identiﬁcation

HaLoop: Efﬁcient Iterative Data Processing on Large Clusters

Omega: ﬂexible, scalable schedulers for large compute clusters · 2018-04-06 · Omega: ﬂexible, scalable schedulers for large compute clusters Malte Schwarzkopfy Andy Konwinskiz

Stop disease clusters. Health Alert: Disease Clusters ... · Stop disease clusters. Protect people. Control toxic chemicals. Disease Clusters in Arkansas An unusually large number

Omega: ﬂexible, scalable schedulers for large compute clusters€¦ · Omega: ﬂexible, scalable schedulers for large compute clusters Malte Schwarzkopfy Andy Konwinskiz Michael

Running & Scaling Large Elasticsearch Clusters

Large Partially-connected Erlang Clusters

MapReduce: simplified data processing on large clusters

MapReduce: distributed computing on large commodity clusters

MapReduce : Simplified Data Processing on Large Clusters

Large noble metal clusters: electron confinement and band