Making sense of performance and identifying stragglers in Data Analytics Framework

Making sense of performance and identifying stragglers inData Analytics Framework

CSCI 8780 Advanced Distributed Systems

Manish Ranjan and Narita Pandhe

Introduction

- Large-scale data analytics has become widespread

- Research devoted to improving the performance of data analytics frameworks

- BUT comparatively little effort : spent in identifying the performance bottlenecks!!

More resource efficient

Faster

Experiments

What Cluster Configuration did we use?

- #1 Master, #6 Slaves

- Master Config- 64 - Bit,

- 8GB RAM,

- 2 Cores,

- 50GB SSD

- Slaves Config(each):- 64 - Bit

- 2GB RAM,

- 1 Core,

- 30GB SSD

Config related modifications: eg. Replication + SSDs

First Benchmarking namenode

To first test Namenode hardware and config: NNBench

What it does:

Generates a lot of HDFS related requests

Why it does:

To put a “HIGH” HDFS management stress on the namenode

How it does:

Simulates request for creating, reading, renaming and deleting files on HDFS

What Workload did we use?

- TeraSort benchmark suite

- Goal of TeraSort: sort 1TB of data (or any other amount of data you want) as fast as possible.

- Limited by our cluster configuration, we performed several experiments with data of size 1GB, 5GB and 10GB.

- TeraSort benchmark can be utilized to iron out your Hadoop configuration

Hadoop

i-6c76c1da (M), i-40684ef0

(s1), i-41684ef1 (s2), i-42684ef2 (s3), i-43684ef3 (s4),i-4e684efe (s5), i-4f684eff (s6)

i-6c76c1da (M), i-40684ef0

Red : s6Dark Green: s4

i-6c76c1da (M), i-40684ef0

Observations for 10GB

i-6c76c1da (M), i-40684ef0

Observations for 10GB

i-6c76c1da (M), i-40684ef0

Identified Stragglers

i-6c76c1da (M), i-40684ef0

Orange: s2Red: s6

Hadoop SparkRed s6Bright Blue :

s5Orange : s2

Conclusions- Straggler task spends an unusually long amount of time in a particular part

of task execution.

- It usually not too hard to found a straggler for a specific execution- what is hard is to get it consistently enough!

- Though we were lucky enough to spot few even in a mediocre strength cluster. Which emphasizes the necessity of understanding the cluster meta info well.

Eg: DFS disk read time, shuffle write time, shuffle read time, and Java’s garbage collection

- Since, Spark:

- often breaks jobs into many more tasks

- has much lower task launch overhead than Hadoop

References- Making Sense of Performance in Data Analytics Frameworks,

Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, Byung-Gon Chun, UC Berkeley, ICSI,

VMware, Seoul National University- No One (Cluster) Size Fits All: Automatic Cluster Sizing for Data-intensive Analytics

https://www.cs.duke.edu/starfish/files/socc11-cluster-sizing.pdf- http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-ha

doop-cluster-with-terasort-testdfsio-nnbench-mrbench/- https://github.com/ehiggs/spark-terasort- aws.amazon.com

Making sense of performance and identifying stragglers in Data Analytics Framework

Data & Analytics

sense.pdfa number sense assessment tool for identifying children at risk for mathematical difficulties nancy c. jordan, joseph glutting, and chaitanya ramineni

A Catalog of Blue Stragglers in Open Clusters

ESSEX STRAGGLERS ORIENTEERING SOCIETY … stragglers orienteering society annual general meeting 2014 ... 858 675----12 16 6,407 ... 215 38 400 76 11

Smelly Balloons - nisenet.orgnisenet.org/sites/default/files/catalog/uploads/9881/diy_nano... · Smelly Balloons Round balloons in a ... Your sense of smell works by identifying the

Variable Blue Stragglers in M67

Ecology of Blue Stragglers: Closing Thoughts and Discussion

Beyond Blue Stragglers - Kepler & K2 Science Center · 2018. 1. 23. · Beyond Blue Stragglers: K2 Observations Reveal Post-Mass-Transfer Binaries Hidden on the M67 Main-Sequence

The blue stragglers in Galactic Globular Cluster

GRASS : Trimming Stragglers in Approximation Analytics

Climate Change in Developing Countries: Identifying and ... · Same old Same Old • Number of adaptation options are common sense ... Lessons Partnerships - Adaptation is an inclusive,

Barium Enhanced Blue Stragglers in Open Cluster NGC · PDF fileBarium Enhanced Blue Stragglers in Open Cluster NGC 6819 Katelyn Milliman University of Wisconsin-Madison K. Milliman,

February 11, 2015 CAI Breakfast Seminar “Identifying Concepts and Tools that Build a Sense of Community”

FIELD BLUE STRAGGLERS AND RELATED MASS TRANSFER ISSUES George Preston, ESO, Santiago, 2012

Essex Stragglers Orienteering Societystragglers.info/newsletter/vol20no10.doc · Web view2008 21st September - Danbury (SOS) 9th November - Baddow Ridge (SOS) 23rd November - Tangham

Like this FIELD BLUE STRAGGLERS AND RELATED MASS TRANSFER ISSUES George Preston, ESO, Santiago, 2012

GRASS : Trimming Stragglers in Approximation …...GRASS : Trimming Stragglers in Approximation Analytics Ganesh Ananthanarayanan, Michael Hung, Xiaoqi Ren, Ion Stoica, Adam Wierman,

chamaeleons.comchamaeleons.com/doc/downloads/276_FullPaper.doc · Web viewThe problem of identifying the correct expansion of an ambiguous abbreviation can be viewed as a Word Sense

Any last Core 2 assignments to turn in? Any works cited page stragglers? Any presentations? Welcome back!

Email- watsonia.heights.ps@edumail.vic.gov.au LINKLETTER · 2016-04-29 · developing a sense of fun and inquiry in learning tasks ... • identifying the function of vital organs

Blue Stragglers Caroline Darin Nico Salzetta Advisors: Aaron Geller, Daryl Haggard