Upload
absi-ahmed
View
474
Download
1
Tags:
Embed Size (px)
Citation preview
Scaling MapReduce Applications
across Hybrid Clouds to Meet Soft
Deadlines
Dongseo UniversityDivision of Computer & Information Engineering
Intelligent Smart Systems Research Lab
Presented by:
Ahmed Abdulhakim Al-Absi
2
About The Paper
2013 IEEE International Conference on Advanced
Information Networking and Applications (AINA)
Cloud Computing and Distributed Systems
(CLOUDS) Laboratory
University of Melbourne, Australia
Michael Mattess, Rodrigo N. Calheiros, and Rajkumar Buyya, Proceedings of the 27th IEEE
International Conference on Advanced Information Networking and Applications (AINA 2013,
IEEE CS Press, USA), Barcelona, Spain, March 25-28, 2013.
Outline
Motivations
Proposed Policy Proposed Policy Scenario
Policy parameters
Policy Parameters Algorithm
Implementation
Performance Evaluation Experimental Testbed and Sample Application
Performance Analysis Experimental Testbed and Sample Application
Results – Map phase
Results – Reduce phase
Conclusion
My Opinion
3
As MapReduce is becoming the prevalent
programming model for building data processing
applications in Clouds, the need for timely
execution of such applications becomes a
necessity.
Existing approaches for execution of such deadline-
constrained MapReduce applications focus in
meeting deadlines via admission control. However,
they cannot solve conflicts when applications with
Motivation4
Proposed Policy 5
To tackle this limitation of current approaches for
the problem, authors proposed a novel policy for
dynamic provisioning of public Cloud resources to
speed up execution of MapReduce applications.
Authors target the Map phase as the target of the
soft deadline because this phase contains tasks
that generally have uniform execution time.
Policy System
1. The initial state of the system consists of
local worker nodes registered with the master
node and ready to execute MapReduce tasks.
Datasets are available for local workers and are
also stored in the Cloud provider’s storage
service.
2. A MapReduce application is submitted to the
master node and the scheduler begins assigning
Map tasks to worker nodes;
3. When a predefined number or fraction of the Map
tasks completes, the master uses the provisioning
policy and, based on the application deadline,
requests additional resources from a Cloud
provider as defined by the policy;
6
4. Requested resources register with the master node
as they become available and the scheduler assigns
Map tasks to them;
5. When the Map phase completes, the scheduler
begins assigning Reduce tasks to workers;
6. Each Reduce task obtains the intermediate data to
be reduced from other nodes;
7. The output of the Reduce tasks may remain on the
nodes in anticipation of another MapReduce
application, which will further process the data.
Alternatively, the output can be collected by the
master node and sent back to the user that
submitted the application.
7
Policy System
Proposed Policy Scenario 8
9
Policy Parameters
MARGIN is a ‘safety margin’ that is removed from the
remaining time to account for errors in the prediction of
tasks execution time.
LOCAL FACTOR is a multiplier applied to the average
run time of the first batch of Map tasks to generate an
estimation of the expected Map task execution time on
the Cluster considering variation in the worker
performance.
REMOTE FACTOR is set to predict the expected Map
task execution time on public Cloud resources. This
allows accounting for the expected variation in the
performance of local and public Cloud resources.
BOOT TIME is the expected amount of time between
requesting a new resource from the public Cloud and
that resource becoming ready to execute tasks.
Policy Parameters Algorithm10
The decision of
number of public
Cloud resources to
be allocated to a
MapReduce
application
11
Implementation
Implemented in the Aneka Cloud Platform with some
changes:
Dynamic Provisioning: Extended the existing
MapReduce Service of Aneka to enable its
interaction with the Provisioning Service
Data Exchange: enable Aneka MapReduce to work
across local and remote resources by HTTP and IIS
to serve the intermediary data files.
Remote Storage: S3 as the source of local input
data files when running on EC2.
12
Performance EvaluationExperimental Testbed and Sample
Application
The environment is a hybrid Cloud composed of a
local Cluster and a public Cloud:
Local Cluster: 4 IBM System X3200 M3 servers
running Citrix XenServer. 2 Windows 2008 virtual
machines= 8 worker nodes.
Public Cloud Resources: provisioned from Amazon
EC2, USA East Coast data center. m1.small
instances, 1.0-1.2 GHz CPU, 1.7 GB of memory.
Dataset: 4.5 GB copied in Local and S3
The MapReduce application: word count
Performance EvaluationExperimental Testbed and Sample
Application13
14
Performance AnalysisResults – Map Phase
Authors sequentially executed several
requests for the word count application.
On each request, they modified the
sleep time in each Map task, and kept
the deadline for completing the Map
phase constant and equal to 30 minutes
for each application.
Performance AnalysisResults – Map Phase
Performance AnalysisResults – Reduce Phase
352 MB as Small Data
3422 MB as Big Data
Similarly, a sleep was inserted in the
Reduce tasks in order to increase their
execution time and observe how different
sizes of Reduce tasks affect execution time
of applications. Two different values for
sleep,
60 seconds and 5 seconds
16
This paper presented dynamic provisioning policy for
MapReduce applications and a prototype in the Aneka
Cloud Platform.
Results showed that the approach, even though its
lower complexity, delivers good results.
The policy was able to meet deadlines of applications,
which are defined in terms of completion time of the
Map phase, for increasing execution times of Map
tasks and decreasing deadlines.
Conclusion17
My Opinion18
Q & AThank You!