Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning...

Autonomic Cloud

Provisioning For Scientific

WorkflowsRyan Chard

Victoria University of Wellington

Science in the Cloud

Taverna

The Globus Galaxies Platform

A platform for creating cloud-hosted Science as a

Service gateways:

Galaxy to create, manage, execute, share workflows

Globus for data and identity management

HTCondor for job scheduling and execution

AWS for executing analyses

Spot instances

Elastic provisioner

20 gateways, 300+ researchers

Globus Genomics Usage

Cloud Provisioning Challenges:

Naïve use can be expensive and limit performance

38 instance types, multiple regions and availability zones

Compute/memory/network/GPU optimised

Different pricing models (On-demand, spot, reserved)

Differing tool requirements

Technical challenges

Instance setup, tools, dependencies, termination

Autoscaling/resource management

A Globus Galaxies Gateway

Worker

Instance Provisioning

Galaxy

Cost-aware

provisionerGlobus

Provisioning

HistoryTool Profiles

HTCondorQueue

Worker

Provisioning System

Worker

Galaxy

Globus

Provisioning

History

HTCondorQueue

Worker

Cost-aware

provisioner

Tool ProfilesProvisioning

History

Improving the Cloud Provisioner

Simplify cloud usage

Make it cheaper and faster

Monitor arbitrary queues to acquire resources and fulfil workloads

Applications

Platform

ProvisioningService

Cost Analysis

Tool Profiling

Smarter Cloud Use

Broaden search scopes

Use many instance types and all AZs

Use profiles to match workloads to

instances

Encode memory, CPU, and disk

requirements in ClassAds etc.

Evaluate spot market prices

All availability zones (AZs) and all suitable

instance types

Self termination

Over-provision requests

Request more resources than needed

Repurpose excess spot

requests

Migrate requests to idle jobs

Revert to on-demand

Cost Throughput

Platform

ProvisioningService

Cost Analysis

Cost-aware Provisioning Approach

1. Filter instance types with profiles

2. Determine price for each instance type across all AZs

3. Rank potential requests

4. Make requests and monitor

5. Cancel or repurpose excess active requests once one is fulfilled

Evaluation

Six production GG gateways

Galaxy/Condor logs + spot price

history

145 day period

Evaluation by mapping jobs to AWS spot

price history

Single Instance, Single AZ (SI-SAZ)

Single Instance, Multi AZ (SI-MAZ)

Multi Instance, Single AZ (MI-SAZ)

Multi Instance, Multi AZ (MI-MAZ)

Four provisioning scopes

Cost-aware Results

Cost/Day

Cost-aware Results

Increasing the search scope to include multiple instance types and availability zones results in cost savings between 43% and 95% per gateway

Overall reduction in cost of 92.1%

The difference between MI-SAZ and MI-MAZ less than 1%

$27,686.99 $23,321.29 $2,259.36 $2,200.24

SI-SAZ SI-MAZ MI-SAZ MI-MAZ

Profiling Application

Executions

Predict execution time and cost

Enable co-allocation of jobs

Minimise monetary cost of execution

Maximise performance

Applications

Provisioning

Service

Tool Profiling

Matching tools to resources

AWS is flexible

38 instance types

We need a way to determine resource

Identify suitable instance types

Trial and error is tedious

What Why

A profiling service!

Profiler

1. Submit

profiling request 5. Return profiles

Worker

Worker web service

PCP HTCondor

2. Provision

workers

3. Start/monitor profiling

Worker

W orker w eb service

PCP HTCondor

Worker

PCP HTCondor

Worker

PCP HTCondor

4. Parse PCP

log and store

profiles

A Cloud Tool Profiling Service

1. Describe profile requests in

2. Provision resources and apply

a profiling Web Service

3. Use Performance Co-pilot

(PCP) to capture usage

4. Capture and process PCP logs

5. Return profiles as JSON (or logs

via s3)

Evaluation

10 instance types

Instance storage (not EBS)

Five Globus Genomics tools

Range from 20-90 minute exec time

Account for 17.7% of total compute across four gateways

Capture:

Execution time, CPU utilisation, memory utilisation, disk, and network

Assess impact on cost

c3.2xlarge

c3.4xlarge

c3.8xlarge

g2.2xlarge

g2.8xlarge

r3.xlarge

r3.2xlarge

r3.4xlarge

r3.8xlarge

m3.2xlarge

BWA ALN

FASTQC

Bowtie

MarkDups

BWA MEM

BWA ALN

FASTQC

Bowtie

MarkDups

BWA MEM

Tools Instances Profiles

Execution Time

BWA MEM BWA ALN

Utilisation (Bowtie)

CPU Memory

Spot Instance Price

A dataset of 2000 production jobs

from GG gateways

Use submission time and AWS spot

price history to determine price per

instance type

Adaptive fastest, slowest, cheapest,

costliest = quickest, slowest,

cheapest, most expensive instance

Globus Genomics = naïve instance/zone selection

Provisioning as a Service

A service to acquire and manage cloud resources

Gateways subscribe to the service

Combines profiles and cost-aware techniques

Monitor a queue (HTCondor/Spark) and provision resources on

demand

Applications

Platform

ProvisioningService

Cost Analysis

Tool Profiling

The Globus Galaxies Platform

Worker

Galaxy

Cost-aware

provisionerGlobus

Provisioning

HistoryTool Profiles

HTCondorQueue

Worker

Provisioning History

Profiles

Galaxy

Globus

HTCondor

Worker

Cost-aware

provisionerHTCondor

A Provisioning Service

Cost-aware

provisionerHTCondor

Cost-aware

provisionerHTCondor

Provisioning

History

Tool Profiles

Cost-aware

provisionerHTCondor

Galaxy

Globus

HTCondor

Worker

Galaxy

Globus

HTCondor

Worker

Galaxy

Globus

HTCondor

Worker

Provisioning Architecture

Evaluation

Tune configurable parameters and see how it

changes performance

Run rate, idle requirement, repurpose

requests, resource restrictions

A reproducible dataset of production GG workloads (busiest days from 5 gateways)

1000 jobs (total ~8500 in 24 hours)

~3.5 hours of jobs

~8 hours execution duration

Transform exec time (sleep) by instance type

Monitor number of requests, fulfilled

instances, and cost to fulfil workload

Average Wait Per Job

Dashboard like configuration

AZ=1: single availability zone

IR: Idle requirement of jobs before

resources are provisioned

Re: Whether or not requests are

repurposed once orphaned

RR: Run rate, or interval between provisioning rounds

Base: base case (Re=T, RR=5,

IR=120,AZ=All)

Instances and Cost

Summary

Using the cloud naïvely can increase costs and decrease

performance

There are simple techniques that can substantially reduce

the cost of using the cloud

Matching tools to resources is essential for effective cloud

The provisioning service and profiler are available on github

Thanks!

Questions?

Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning...

Documents

Private-Cloud Provisioning in Multi-Cloud Environments...Private-Cloud Provisioning in Multi-Cloud Environments Get it right with 2 levels of abstraction The Provisioning Problem Powerful

Resource provisioning optimization in cloud computing

Distributed Services Scheduling and Cloud Provisioning

Williams Cloud-based Provisioning

Integrating Cloud & Cyberinfrastructure Manish Parashar NSF Cloud and Autonomic Computing Center

Resource Provisioning in Spot Market-based Cloud Computing ... · Resource Provisioning in Spot Market-based Cloud Computing Environments WilliamVoorsluys Submittedintotalfulﬁlment

IBM Smart Cloud Provisioning Overview

Provisioning software defined servers using Cloud OS · Provisioning Software-defined Servers ... Template-based software defined server provisioning HP Cloud OS for Moonshot

ElastMan: Autonomic Elasticity Manager for Cloud-Based Key ...pvr/GrascompCloudDay2012/AlShishtaw… · Autonomic Computing Cloud Computing and Elastic Services Automation of Elasticity

Autonomic Provisioning of Backend Databases in Dynamic Content Web Servers

Optimizing Resource Provisioning in Shared Cloud …delimitrou/papers/2014.techreport.hybrid.pdfOptimizing Resource Provisioning in Shared Cloud Systems Abstract Cloud computing promises

Grouper Provisioning: Locally & Cloud

FLEXIBLE ORGANIZATION OF REPOSITORIES FOR PROVISIONING CLOUD … · 2 REQUIREMENTS OF CLOUD INFRASTRUCTURE PROVISIONING Both grid computing and cloud computing are attempts at utility

Deploying SaaSLicensing & Provisioning for Cloud Applications

Autonomic Provisioning and Application Mapping on Spot ...menasce/cs788/slides/Tebekaemi... · from a cloud provider, the mapping of the various application components to these resources,and

Enabling Automated Network Services Provisioning for Cloud ... · Enabling Automated Network Services Provisioning for Cloud Based Applications Using Zero Touch Provisioning ... the

Cloud Computing Principles and Paradigms: 10 comet cloud-an autonomic cloud engine

Autonomic Resource Provisioning for Cloud-Based Software

Autonomic SLA-driven Provisioning for Cloud Applications Nicolas Bonvin, Thanasis Papaioannou, Karl Aberer Presented by Ismail Alan

Citrix xa xd cloud provisioning webinar