Scheduling Mixed Parallel Applications with Reservations

Scheduling Mixed Parallel Applications with Reservations

Henri CasanovaInformation and Computer Science Dept.

University of Hawai`i at Manoa

[email protected]

Mixed Parallelism Both task- and data-parallelism

“Malleable tasks with precedence constraints”

. . .

time

procs

Mixed Parallelism Mixed parallelism arises in many

applications, many of them scientific workflows

Example: Image processing applications that apply a graph of data-parallel filters e.g., [Hastings et al., 2003]

Many workflow toolkits support mixed-parallel applications e.g., [Stef-Praun et al., 2007], [Kanazawa, 2005],

[Hunold et al., 2003]

Mixed-Parallel Scheduling Mixed-parallel scheduling has been studied by

several researchers NP-hard, with guaranteed algorithms [Lepere et al.,

2001] [Jansen et al., 2006]

Several heuristics have been proposed in the literature One-step algorithms [Boudet et al., 2003] [Vydyanathan

et al., 2006] • Task allocations and task mapping decisions happen

concurrently Two-step algorithms [Radulescu et al., 2001] [Bandala et

al., 2006] [Rauber et al., 1998] [Suter et al. 2007]• First, compute task allocations• Second, map tasks to processors using some standard list-

scheduling approach

The Allocation Problem We can give each task very few (one?)

processors We have tasks that run for a long time But we can do a lot of them in parallel

We can give each task many (all?) processors We have tasks that run quickly, but typically with

diminishing return due to <1 parallel efficiencies But we can’t run many tasks in parallel

Trade-off: parallelism and task execution times Question: How do we achieve a good trade-off?

Critical Path and Work

time

proc

esso

rs

Two constraints: Makespan * #procs > total work Makespan > critical path length

total work = sum of rectangle surfacescritical path length = execution time of the longest path in the DAG

Work vs. CP Trade-off

task allocations largesmall

critical pathtotal work /

# procs

best lower bound on makespan

The CPA 2-Step Algorithm Original Algorithm [Radulescu et al., 2001]

For a homogeneous platform Start by allocating 1 processor to all tasks Then pick a task and increase its allocation by

1 processor• Picking the task that benefits the most from one

extra processor, in terms of execution time Repeat until the critical path length and the

total work / # procs become approximately equal

Improved Algorithm [Suter et al., 2007] Uses an empirically better stopping criterion

Presentation Outline

Mixed-Parallel Scheduling

The Scheduling Problem with Reservations

Models and Assumptions

Algorithms for Minimizing Makespan

Algorithms for Meeting a Deadline

Conclusion

Batch Scheduling and Reservations

Platforms are shared by users, today typically by batch schedulers

Batch schedulers have known drawbacks non-deterministic queue waiting times

In many scenarios, one needs guarantees regarding application completion times

As a result, most batch schedulers today support advance reservations: One can acquire reservations for some number of

processors and for some period of time

Reservations

time

proc

esso

rs

We have to schedule around the holesin the reservation schedule

Reservations

time

proc

esso

rs

One reservation per task

Complexity The makespan minimization problem is NP-hard

at several levels (and thus also for meeting a deadline) Mixed-parallel scheduling is NP-hard

• Guaranteed algorithms [Lepère et al., 2001] [Jansen et al., 2006]

Scheduling independent tasks with reservations is NP-hard and unapproximable in general [Eyraud-Dubois et al., 2007]

• Guaranteed algorithms with restrictions

Guaranteed algorithms for mixed-parallel scheduling with reservations are open

In this work we focus on developing heuristics







Conclusion

Models and Assumptions Application

We assume that the application is fully specified and static• Conservative reservations can be used to be safe

Random DAGs are generated using the method in [Suter et al., 2007]

Data-parallelism is modeled based on Amdahl’s law Platform

We assume that the reservation schedule does not change while we compute the schedule

We assume that we know the reservation schedule• Sometimes not enabled by cluster administrators

We ignore communication between tasks• Since a parent task may complete well before one of its children can

start, data must be written to disk anyway• Can be modeled via task execution time and/or Amdahl’s law

parameter

Minimizing Makespan Natural approach: adapt the CPA algorithm

It’s a simple algorithm:• First phase: compute allocations• Second phase: list-scheduling

Problem: Allocations are computed without considering

reservations Considering reservations would involve considering

time, which is only done in the second phase Greedy Approach:

Sort the tasks by decreasing bottom-level For each task in this order, determine the best feasible

processor allocation• i.e., the one that has the earliest completion time

Example

time

proc

esso

rs

CBApossible task configurations:

D

AB

C

D

B

Computing Bottom-Levels Problem:

Computing bottom levels (BL) requires that we know task execution times

Task execution times depend on allocations But we compute the allocations after using the bottom levels

We compare four ways to compute BLs use 1-processor allocations use “all”-processor allocations use CPA-computed allocations, using all processors use CPA-computed allocations, using historical average number

of non-reserved processors We find that the 4th method is marginally better

wins in 78.4% of our simulations (more details on simulations later)

All results hereafter use this method for computing BLs

Bounding Allocations A known problem with such a greedy

approach is that allocations are too large reduction in parallelism ends up being

detrimental to makespan Let’s try to bound allocations Three methods

BD_HALF: bound to half of the processors BD_CPA: bound by allocations in the CPA

schedule computed using all processors BD_CPAR: bound by allocations in the CPA

schedule computed using the historical average number of non-reserved processors

Reservation Schedule Model? We conduct our experiments in simulation

cheap, repeatable, controllable We need to simulate environments for

given reservation schedules Question: what does a typical reservation

schedule look like? Answer: we don’t really know yet

There is no “reservation schedule” archive Let’s look at what people have done in the

past...

Synthetic Reservation Schedules

We have schedules of batch jobs e.g., “parallel workload archive”, by D. Feitelson

Typical approach, e.g., in [Smith et al., 2000] Take a batch job schedule Mark some jobs as “reserved” Remove all other jobs

Problem: the amount of reservation is approximately constant, while in the real world we expect it to be approximately decreasing And we see it to behave in this way in a real-world 2.5-

year trace from the Grid5K platform We should generate reservation schedules where

the amount of reservation decreases with time

Synthetic Reservation Schedules Three methods to “drop” reservations after the simulated

application start time Linearly or exponentially

• so that there are no reservations after 7 days Based on job submission time

Preliminary evaluations indicate that the exponential method leads to schedules that are more correlated to the Grid5K data For 4 logs from the “parallel workload archive”

But this is not conclusive because we have only one (good) data set at this point

We run simulations with 4 logs, the 3 above methods, and with the Grid5K data

Bottom-line for this work: we do not observe discrepancies in our results for our purpose regarding any of the above

Simulation Procedure We use 40 application specifications

DAG size, width, regularity, etc. 20 samples

We use 36 reservation schedule specifications batch log, generation method, etc. 50 samples

Total: 1,440 x 1,000 = 1,440,000 experiments Two metrics:

Makespan CPU-hour consumptions

Simulation Results

Algorithm

Makespan CPU-hours

avg. deg. from best

# of wins avg. deg. from best

# of wins

BD_ALL 33.75% 36 42.48% 0

BD_HALF 28.38% 3 37.83% 1

BD_CPA 0.29% 1,026 0.75% 6

BD_CPAR 0.21% 386 0.00% 1,434

Similar results for Grid5K reservation schedules







Conclusion

Meeting a Deadline A simple approach for meeting a deadline is to

simply schedule backwards from the deadline Picking tasks by increasing bottom-levels

The way to be as safe as possible is to find for each task the feasible allocation that starts as late as possible given that: The exit task must complete before the deadline The task must complete before all of its children begin

Let’s see this on a simple example

Meeting a Deadline Example

E

BAD

C

ED

CBA

time

procs

Task 1

Task 2

possible Task 1 configurations

possible Task 2 configurations


time

proc

esso

rs

deadline

A

E

BAD

C


time

proc

esso

rs

deadline

B

E

BAD

C


time

proc

esso

rs

deadline

C

E

BAD

C


time

proc

esso

rs

deadline

D

E

BAD

C


time

proc

esso

rs

deadline

E

E

BAD

C


time

proc

esso

rs

deadline

Ta

sk 2

E

BAD

C


time

proc

esso

rs

deadline

Ta

sk 2

A

ED

CBA


time

proc

esso

rs

deadline

Ta

sk 2

B

ED

CBA


time

proc

esso

rs

deadline

Ta

sk 2

C

ED

CBA


time

proc

esso

rs

deadline

Ta

sk 2

D

ED

CBA


time

proc

esso

rs

deadline

Ta

sk 2

E

ED

CBA


time

proc

esso

rs

deadline

Ta

sk 2

ED

CBA

Ta

sk 1

Algorithms We can employ the same techniques for

bounding allocations as for the makespan minimization algorithms BD_ALL, BD_HALF, BD_CPA, BD_CPAR

Problem: the algorithms do not consider the tightness of the deadline If the deadline is loose, the above algorithms will

consume unnecessarily high numbers of CPU-hours For a very loose deadline there should be no data-

parallelism, and thus no parallel efficiency loss due to Amdahl’s law

Question: How can we reason about deadline tightness?

Deadline Tightness For each task we have a choice of allocations:

Ones that use too many processors may be wasteful Ones that use too few processors may be dangerous

Idea: Consider the CPA-computed schedule assuming an

empty reservation schedule• Using all processors, or the historical average number of non-

reserved processors Determine when the task would start in that schedule,

i.e., at which fraction of the overall makespan Pick the allocation that allows the task to start at the

same fraction of the time interval between “now” and the deadline

Matching the CPA schedule

CPASchedule

time

proc

esso

rsq procs

a b


CPASchedule

time

proc

esso

rsq procs

a b

Schedulewith

Reservation

time

proc

esso

rs

p

c d

task “deadline”


CPASchedule

time

proc

esso

rsq procs

a b

Schedulewith

Reservation

time

proc

esso

rs

p

c d

Pick the cheapest allocation such that: b / (a+b) > d / (c+d)

task “deadline”

Simulation Experiments We call this new approach “resource conservative” (RC) We conduct simulation similar to those for the makespan

minimization algorithms Issue: the RC approach can be in trouble when it tries to

schedule the first tasks if the reservation schedule is non-stationary and/or tight could be addressed via some tunable parameter (e.g., pick an

allocation that starts at least x% after the scaled CPA start time) We do not use such a parameter in our results

We use two metrics: Tightest deadline achieved

• Necessary because deadline tightness depends on instance• Determined via binary search

CPU-hours consumption for a deadline that’s 50% later than the tightest deadline

Simulation Results

Algorithm Tightest deadline

(average degradation from best)

CPU-hours consumed

for a loose deadline

Reservation schedule Reservation schedule

sparse medium tight Grid5K sparse medium tight Grid5K

BD_ALL 178% 175% 188% 227% 3556 3486 3768 2006

BD_CPAR 6.52% 6.44% 6.91% 8.38% 231 236 243 179

RC_CPA 13.17% 13.27% 17.36% 19.51% 6.39 6.80 7.98 2.15

RC_CPAR 4.12% 4.27% 8.26% 15.14% 0.16 0.15 0.16 0.09

Conclusions Makespan minimization

Bounding task allocations based on the CPA schedule works well

Meeting a deadline Using the CPA schedule for determining task start

times works well, at least when the reservation schedule isn’t to tight

• Some tuning parameter may help for tight schedules• Or, one can use the same approach as for makespan

minimization but backwards

In both cases using the historical number of unreserved processors leads to marginal improvements

Possible Future Directions Use a recent one-step algorithm instead of

CPA iCASLB [Vydyanathan, 2006]

Experiments in a real-world setting What kind of interface should a batch

scheduler expose if the full reservation schedule must remain hidden?

Reservation schedule archive Needs to be a community effort

Scheduling Mixed-Parallel Applications with Advance Reservations, Kento Aida and Henri Casanova, to appear in Proc. of HPDC 2008

Questions?

Documents

Scheduling Mixed Parallel Applications with Reservations