29
Integration of Workflow Partitioning and Resource Provisioning Weiwei Chen, Ewa Deelman {wchen,deelman}@isi.edu Information Sciences Institute University of Southern California CCGrid 2012, Ottawa, Canada 1

Partitioning CCGrid 2012

Embed Size (px)

DESCRIPTION

A workflow partitioning and resource provisioning approach to solve the execution of large-scale workflows

Citation preview

Page 1: Partitioning CCGrid 2012

Integration of Workflow Partitioning and Resource Provisioning

Weiwei Chen, Ewa Deelman {wchen,deelman}@isi.edu

Information Sciences Institute University of Southern California

CCGrid 2012, Ottawa, Canada 1  

Page 2: Partitioning CCGrid 2012

Outline •  Introduction •  System Overview •  Solution

–  Heuristics –  Genetic Algorithms –  Ant Colony Optimization

•  Evaluation –  Heuristics

•  Related Work •  Q&A

2  

Page 3: Partitioning CCGrid 2012

Introduction •  Scientific Workflows

–  A set of jobs and the dependencies between them. –  DAG (Directed Acyclic Graph), where nodes represent

computation and directed edges represent data flow dependencies.

•  Pegasus Workflow Management System –  Workflow Planner: Pegasus

•  Abstract Workflow: portable, execution site independent •  Concrete Workflow: bound to specific sites

–  Workflow Engine: DAGMan –  Resource Provisioner: Wrangler –  Execution/Scheduling System: Condor/Condor-G –  Environment: Grids, Clouds, Clusters, many-cores

3  

!!!

!

!!!!!!!! !!!!

Job1

Job2 Job3 Job4

Job5

Page 4: Partitioning CCGrid 2012

Introduction •  Background

–  Large scale workflows require multiple execution sites to run. –  The entire CyberShake earthquake science workflow has 16,000 sub-

workflows and each sub-workflow has ~24,000 jobs and requires ~58GB.

–  A Montage workflow with a size of 8 degree square of sky has ~10,000 jobs and requires ~57GB data. the Galactic Plane that covers 360 degrees along the plane and +/-20 degrees on either side of it.

4  

Figure  1.1  Output  of  the  Montage  workflow.  The  image  above  was  recently  created  to  verify  a  bar  in  the  spiral  galaxy  M31.  

Figure   1.2   CyberShake   workflow  and   example   output   for   the  Southern  California  Area.    

Page 5: Partitioning CCGrid 2012

Job  Scheduler  

VM    Provisioner  

Data  Staging  

Workflow    Engine  

Workflow    Planner  

DAX  

Single Site

DAG  

Page 6: Partitioning CCGrid 2012

Single Site •  Constraints/Concerns

–  Storage systems –  File systems –  Data transfer services –  Data constraints –  Services constraints

6  

Page 7: Partitioning CCGrid 2012

Job  Scheduler  

VM    Provisioner  

Data  Staging  

Workflow    Engine  

VM    Provisioner  

Data  Staging  

Workflow    Planner  

DAX  

Multiple Sites, No Partitioning

DAG  

Page 8: Partitioning CCGrid 2012

Multiple Sites, No Partitioning •  Constraints/Concerns

–  Job migration –  Load balancing –  Overhead –  Cost –  Deadline –  Resource utilizations

8  

Page 9: Partitioning CCGrid 2012

Job  Scheduler  

ParPPoner  

VM    Provisioner  

Data  Staging  

Workflow    Engine  

Workflow    Planner  

Workflow  Scheduler  

Workflow    Engine  

Job  Scheduler  

VM    Provisioner  

Data  Staging  

Workflow    Planner  

DAX   DAX  

DAX  

DAX  

Multiple Sites, Partitioning

DAG  

DAG  

Page 10: Partitioning CCGrid 2012

Solution

•  A hierarchical workflow Ø  It contains workflows (sub-workflow) as its jobs. Ø  Sub-workflows are planned at the execution sites and

matched to the resources in them. •  Workflow Partitioning vs Job Grouping/Clustering

Ø  Heterogeneous Environments §  MPIDAG, Condor DAG, etc.

Ø  Data Placement Services §  Bulk Data Transfer

10  

Page 11: Partitioning CCGrid 2012

Solution

•  Resource Provisioning Ø  Virtual Cluster Provisioning Ø  The number of resources and the type of VM instances

(worker node, master node and I/O node) are the parameters indicating the storage and computational capability of a virtual cluster.

Ø  The topology and structure of a virtual cluster: balance the load in different services (scheduling service, data transfer service, etc.) and avoid a bottleneck.

Ø  On grids, usually the data transfer service is already available and does not need further configuration.

11  

Page 12: Partitioning CCGrid 2012

Data Transfer across Sites

•  A pre-script to transfer data before and after the job execution

•  A single data transfer job on demand •  A bulk data transfer job

Ø  merge data transfer

12  

Computation��� Data Transfer

Page 13: Partitioning CCGrid 2012

Backward Search Algorithm •  Targeting a workflow with a fan-in-fan-out

structure •  Search operation involves three steps. It starts

from the sink job and proceeds backward. –  First, check if it’s safe to add the whole fan

structure into the sub-workflow (aggressive search).

–  If not, a cut is issued between this fan-in job and its parents to avoid cycle dependency and increase parallelism.

–  Second, a neutral search is performed on its parent jobs, which include all of its predecessors until the search reaches a fan-out job.

–  If this partition is still too large, a conservative search is performed that includes all of its predecessors until it reaches a fan-in job or a fan-out job.

13  

Figure  2.3  Search  OperaPon  

Page 14: Partitioning CCGrid 2012

Heuristics (Storage Constraints) •  Heuristics I

–  Dependencies between sub-workflows should be reduced since they represent data transfer between sites.

–  Usually jobs that have parent-child relationships share a lot of data. It’s reasonable to schedule such jobs into the same sub-workflow.

–  Heuristic I only checks three types of nodes: the fan-out job, the fan-in job, and the parents of the fan-in job and search for the potential candidate jobs that have parent-child relationships between them.

–  Check operation means checking whether one job and its potential candidate jobs can be added to a sub-workflow without violating constraints.

–  Our algorithm reduces the time complexity of check operations by n folds, while n equals to the average depth of the fan-in-fan-out structure.

14  

Page 15: Partitioning CCGrid 2012

15  

J1 Heuristic I

Search Operation: Aggressive Search

Candidate List(CL):

Job to be examined(J):

Partition (P): P1={}

Sum (CL+J+P)=100 > 50

{J1, J2, J3, J4, J5, J6, J7, J8, J9}

J10

Less Aggressive Search

P1={J10}

P2={}

J8 J9 J1

Sum (CL+J+P)=40< 50

{J2, J3, J6}

P2={J2, J3, J6, J8}

Sum (CL+J+P)=80> 50

{J4, J5, J7}

P3={} P3={J4, J5, J7, J9} P4={} P4={J1}

Sum (CL+J+P)=10< 50

J2 J3 J4 J5

J6 J7

J8 J9

J10

Check Operation:

Final Results:

Scheduled Being Examined

Candidate Not Examined Partition

Page 16: Partitioning CCGrid 2012

Heuristics/Hints •  Two other heuristics

–  Heuristic II adds a job to a sub-workflow if all of its unscheduled children can be added to that sub-workflow.

–  For a job with multiple children, Heuristic III adds it to a sub-workflow when all of its children has been scheduled.

16  

Figure  2.4    HeurisPc  I,  II,  and  III  (from  leW  to   right)   parPPon   an   example   workflow  into  different  sub-­‐workflows.    

Page 17: Partitioning CCGrid 2012

17  Partition

Heuristic II: check unscheduled children

Search Operation:

Candidate List(CL):

Job to be examined(J):

Partition (P): P1={J10}

Sum (CL+J+P)=20 < 50

J8

P1={J10}

{J6}

Check Operation:

Final Results:

Scheduled Being Examined

Candidate Not Examined

The first step is similar to Heuristic I that puts J10 into P1

P2={} P2={J8, J6}

Similar to J8, we put J2, J3, J6 into P2.

P2={J8, J2,J3, J6}

{J4, J5, J7, J9}

Sum (CL+J+P)=90 > 50

J1

P3={} P3={J1,J4,J5,J7,J9}

P2={J8, J2,J3, J6} P3={J1,J4,J5,J7,J9}

Sum (CL+J+P)=50

J2

J1

J3 J4 J5

J6 J7

J8 J9

J10

Page 18: Partitioning CCGrid 2012

18  Partition

Heuristic III: all children should be examined

Search Operation:

Candidate List(CL):

Job to be examined(J):

Partition(P): P1={J10}

Sum (CL+J+P)=20 < 50 and J6 has no Non-examined job

J8

P1={J10}

{J6}

Check Operation:

Final Results:

Scheduled Being Examined

Candidate Not Examined

The first step is similar to Heuristic I that puts J10 into P1

P2={} P2={J8, J6}

Similar to J8, we put J2, J3, J6 into P2.

P2={J8, J2,J3, J6}

{J4}

J1

P3={}

P2={J8, J2,J3, J6} P3={J1,J4,J5,J7,J9}

J1 has a child Non-examined job J4

Similar to J8, we put J9, J7, J4, J5, J1 into P3.

P3={J1, J4,J5, J7, J9}

J2

J1

J3 J4 J5

J7

J8 J9

J10

J6

Page 19: Partitioning CCGrid 2012

Genetics Algorithm

19  

1   2   2   1   2   2   2   2   1   1   1  Job1  Job2  Job3   Job4  Job5   VM1  VM2  VM3  VM4  VM5  VM6  

Page 20: Partitioning CCGrid 2012

Fitness Functions

With Constraints

20  

Min(! MakespanDeadline

+"CostBudget

)

Min(Makespan),Cost < BudgetMin(Cost),Makespan < Deadline

Page 21: Partitioning CCGrid 2012

Ant Colony Optimization

21  

1   2   2   1   2   2   2   2   1   1   1  Job1  Job2  Job3   Job4  Job5   VM1  VM2  VM3  VM4  VM5  VM6  

1   1   1   1   1   1  

2   2   2   2   2  

VM1  VM2  VM3  

VM4  VM5  VM6  

Job1  Job2  Job3  

Job4  Job5  

Global Optimization: Local Optimization:

Page 22: Partitioning CCGrid 2012

Scheduling Sub-workflows •  Estimating the overall runtime of sub-workflows

–  Critical Path –  Average CPU Time is cumulative CPU time of all jobs

divided by the number of available resources. –  Earliest Finish Time is the moment the last sink job

completes •  Provisioning resources based on the estimation

results •  Scheduling Sub-workflows on Sites

22  

Page 23: Partitioning CCGrid 2012

Evaluation: Heuristics •  In this example, we aim to reduce data movement and

makespan with storage constraints. •  Workflows used:

–  Montage: an astronomy application, I/O intensive, ~24,000 tasks and 58GB data.

–  CyberShake: a seismology application, memory intensive, ~10,000 tasks and 57GB data.

–  Epigenomics: a bioinformatics application, CPU intensive, ~1,500 tasks and 23GB data.

–  Each were run five times.

23  

Page 24: Partitioning CCGrid 2012

Performance: CyberShake •  Heuristic II produces 5 sub-workflows

with 10 dependencies between them. Heuristic I produces 4 sub-workflows and 3 dependencies. Heuristic III produces 4 sub-workflows and 5 dependencies

•  Heuristic II and III simply add a job if it doesn’t violate the storage or cross dependency constraints.

•  Heuristic I performs better in terms of both runtime reduction and disk usage because it tends to put the whole fan structure into the same sub-workflow.

24  

Page 25: Partitioning CCGrid 2012

Performance: CyberShake •  Storage Constraints •  With more sites and partitions, data movement is increased

although computational capability is improved. •  The CyberShake workflow across two sites with a storage

constraint of 35GB performs best.

25  

Page 26: Partitioning CCGrid 2012

Performance of Estimator and Scheduler •  Three estimators and two schedulers are evaluated with

CyberShake workflow. •  The combination of EFT estimator + HEFT scheduler (EFT

+HEFT) performs best (>10%). •  HEFT scheduler is slightly better than MinMin scheduler

with all three estimators.

26  

Page 27: Partitioning CCGrid 2012

Publications

27  

Integration of Workflow Partitioning and Resource Provisioning, Weiwei Chen, Ewa Deelman, accepted, The 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012), Doctoral Symposium, Ottawa, Canada, May 13-15, 2012 Improving Scientific Workflow Performance using Policy Based Data Placement, Muhammad Ali Amer, Ann Chervenak and Weiwei Chen, accepted, 2012 IEEE International Symposium on Policies for Distributed Systems and Networks, Chapel Hill, NC, July 2012 Fault Tolerant Clustering in Scientific Workflows, Weiwei Chen, Ewa Deelman, IEEE International Workshop on Scientific Workflows (SWF), accepted, in conjunction with 8th IEEE World Congress on Servicess, Honolulu, Hawaii, Jun 2012 Workflow Overhead Analysis and Optimizations, Weiwei Chen, Ewa Deelman, The 6th Workshop on Workflows in Support of Large-Scale Science, in conjunction with Supercomputing 2011, Seattle, Nov 2011 Partitioning and Scheduling Workflows across Multiple Sites with Storage Constraints, Weiwei Chen, Ewa Deelman, 9th International Conference on Parallel Processing and Applied Mathematics (PPAM 2011), Poland, Sep 2011

Page 28: Partitioning CCGrid 2012

Future Work

•  GA and ACO: Efficiency •  Provisioning Algorithms •  Other Algorithms

28  

Page 29: Partitioning CCGrid 2012

Q&A Thank you!

For further info: ���pegasus.isi.edu

www.isi.edu/~wchen

29