Upload
manish-ranjan
View
92
Download
0
Embed Size (px)
Citation preview
Making sense of performance and identifying stragglers inData Analytics Framework
CSCI 8780 Advanced Distributed Systems
Manish Ranjan and Narita Pandhe
Introduction
- Large-scale data analytics has become widespread
- Research devoted to improving the performance of data analytics frameworks
- BUT comparatively little effort : spent in identifying the performance bottlenecks!!
2
More resource efficient
Faster
3
4
5
6
7
8
9
Experiments
10
What Cluster Configuration did we use?
- #1 Master, #6 Slaves
- Master Config- 64 - Bit,
- 8GB RAM,
- 2 Cores,
- 50GB SSD
- Slaves Config(each):- 64 - Bit
- 2GB RAM,
- 1 Core,
- 30GB SSD
Config related modifications: eg. Replication + SSDs
11
First Benchmarking namenode
To first test Namenode hardware and config: NNBench
What it does:
Generates a lot of HDFS related requests
Why it does:
To put a “HIGH” HDFS management stress on the namenode
How it does:
Simulates request for creating, reading, renaming and deleting files on HDFS
12
What Workload did we use?
- TeraSort benchmark suite
- Goal of TeraSort: sort 1TB of data (or any other amount of data you want) as fast as possible.
- Limited by our cluster configuration, we performed several experiments with data of size 1GB, 5GB and 10GB.
- TeraSort benchmark can be utilized to iron out your Hadoop configuration
13
14
Hadoop
i-6c76c1da (M), i-40684ef0
(s1), i-41684ef1 (s2), i-42684ef2 (s3), i-43684ef3 (s4),i-4e684efe (s5), i-4f684eff (s6)
15
i-6c76c1da (M), i-40684ef0
(s1), i-41684ef1 (s2), i-42684ef2 (s3), i-43684ef3 (s4),i-4e684efe (s5), i-4f684eff (s6)
Red : s6Dark Green: s4
16
i-6c76c1da (M), i-40684ef0
(s1), i-41684ef1 (s2), i-42684ef2 (s3), i-43684ef3 (s4),i-4e684efe (s5), i-4f684eff (s6)
Observations for 10GB
Red : s6Dark Green: s4
17
i-6c76c1da (M), i-40684ef0
(s1), i-41684ef1 (s2), i-42684ef2 (s3), i-43684ef3 (s4),i-4e684efe (s5), i-4f684eff (s6)
Observations for 10GB
Red : s6Dark Green: s4
18
i-6c76c1da (M), i-40684ef0
(s1), i-41684ef1 (s2), i-42684ef2 (s3), i-43684ef3 (s4),i-4e684efe (s5), i-4f684eff (s6)
Identified Stragglers
19
Spark
i-6c76c1da (M), i-40684ef0
(s1), i-41684ef1 (s2), i-42684ef2 (s3), i-43684ef3 (s4),i-4e684efe (s5), i-4f684eff (s6)
Orange: s2Red: s6
20
Hadoop SparkRed s6Bright Blue :
s5Orange : s2
Conclusions- Straggler task spends an unusually long amount of time in a particular part
of task execution.
- It usually not too hard to found a straggler for a specific execution- what is hard is to get it consistently enough!
- Though we were lucky enough to spot few even in a mediocre strength cluster. Which emphasizes the necessity of understanding the cluster meta info well.
Eg: DFS disk read time, shuffle write time, shuffle read time, and Java’s garbage collection
- Since, Spark:
- often breaks jobs into many more tasks
- has much lower task launch overhead than Hadoop
21
References- Making Sense of Performance in Data Analytics Frameworks,
Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, Byung-Gon Chun, UC Berkeley, ICSI,
VMware, Seoul National University- No One (Cluster) Size Fits All: Automatic Cluster Sizing for Data-intensive Analytics
https://www.cs.duke.edu/starfish/files/socc11-cluster-sizing.pdf- http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-ha
doop-cluster-with-terasort-testdfsio-nnbench-mrbench/- https://github.com/ehiggs/spark-terasort- aws.amazon.com
22
23
24