RUN: Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

RUN: Optimal Multiprocessor

Real-Time Scheduling via

Reduction to Uniprocessor

Paul Regnier† George Lima† Ernesto Massa†

Greg Levin‡ Scott Brandt‡

†Federal University of Bahia ‡ University of California Brazil Santa Cruz

Multiprocessors

Most high-end computers today have multiple processors

In a busy computational environment, multiple processes compete for processor time More processors means more scheduling

complexity

2

3

Real-Time Multiprocessor Scheduling Real-time tasks have workload deadlines

Hard real-time = “Meet all deadlines!”

Problem: Schedule a set of periodic, hard real-time tasks on a multiprocessor systems so that all deadlines are met.

4

Example: EDF on One Processor On a single processor, Earliest Deadline First (EDF) is

optimal (it can schedule any feasible task set)

Task 1:

time = 0 10 20 30

CPU

2 units of work forevery 10 units of time

Task 3: 10 units of work forevery 25 units of time

Task 2: 6 units of work forevery 15 units of time

5

Scheduling Three Tasks Example: 2 processors; 3 tasks, each with 2 units

of work required every 3 time unitsjob release

time = 0

deadline

Task 1

Task 2

Task 3

1 2 3

6

Global Schedule Example: 2 processors; 3 tasks, each with 2 units

of work required every 3 time units

time = 0

CPU 1

CPU 2

1 2 3

Task 1 migrates between processors

Taxonomy of Multiprocessor Scheduling Algorithms

UniprocessorAlgorithms

LLFEDF

PartitionedAlgorithms

GlobalizedUniprocessor

Algorithms

GlobalEDF

DedicatedGlobal

Algorithms

PartitionedEDF

OptimalAlgorithms

EKG

DP-Wrap

pfair

LLREF

7

Uniprocessor

Multiprocessor

8

Problem Model

n tasks running on m processors

A periodic task T = (p, e) requires a workload e be completed within each period of length p

T's utilization u = e / p is the fraction of each period that the task must execute

job release

time = 0 p 2p 3p

job releasejob releasejob release

period p

workload e

9

Assumptions

Processor Identity: All processors are equivalent

Task Independence: Tasks are independent

Task Unity: Tasks run on one processor at a time

Task Migration: Tasks may run on different processors at different times

No overhead: free context switches and migrations

In practice: built into WCET estimates

10

The Big Goal (Version 1)

Design an optimal scheduling algorithm for periodic task sets on multiprocessors

A task set is feasible if there exists a schedule that meets all deadlines

A scheduling algorithm is optimal if it can

always schedule any feasible task set

11

Necessary and Sufficient Conditions

Any set of tasks needing at most

1) 1 processor for each task ( for all i, ui ≤ 1 ) , and

2) m processors for all tasks ( ui ≤ m)

is feasible

1) Status: Solved

1) pfair (1996) was the first optimal algorithm

12


Design an optimal scheduling algorithm with fewer context switches and migrations

( Finding feasible schedule with fewest migrations is NP-Complete )

13



Status: Ongoing…

All existing improvements over pfair use some form of deadline partitioning…

14

Deadline Partitioning

Task 1

Task 2

Task 4

Task 3

15

Deadline Partitioning

CPU 2

CPU 1

16



Status: Ongoing…

All existing improvements over pfair use some form of deadline partitioning…

… but all the activity in each time window still leads to a large amount of preemptions and migrations

Our Contribution: The first optimal algorithm which does not depend on deadline partitioning, and which therefore has much lower overhead

17

The RUN Scheduling Algorithm

RUN uses a sequence of reduction operations to transform a multiprocessor problem into a collection of uniprocessor problems Uniprocessor scheduling is solved optimally by

Earliest Deadline First (EDF) Uniprocessor schedules are transformed back into a

single multiprocessor schedule

A reduction operation is composed of two steps: packing and dual

18

A collection of tasks with total utilization at most one can be packed into a single fixed-utilization task:

The Packing Operation

Task 1: u = 0.1

Task 2: u = 0.3

Task 3: u = 0.4

Packed Task:u = 0.8

19

We divide time with the packed tasks’ deadlines, and schedule them with EDF. In each segment, consume according to total utilization.

Scheduling a Packed Task’s Clients

Task 1: p=5 , u=.1

Task 2: p=3 , u=.3

Task 3: p=2 , u=.4

EDF schedule of tasks

20

The packed task temporarily replaces (acts as a proxy for) its clients. It may not be periodic, but has a fixed utilization in each segment.

Defining a Packed Task

Packed Task

EDF schedule of tasks

Task 1: p=5 , u=.1

Task 2: p=3 , u=.3

Task 3: p=2 , u=.4

21

The dual of a task T is a task T* with the same deadlines, but complementary utilization / workloads:

The Dual of a Task

T: u = 0.4, p = 3

T*: u = 0.6, p = 3

22

The Dual of a System

Given a system of n tasks and m processors, assume full utilization ( ui = m )

The dual system consists of n dual tasks running on n-m processors Note that the dual utilizations sum to ui = n - ui = n-m

If n < 2m, the dual system has fewer processors

The dual task T* represents the idle time of T. T* is executed in the dual system precisely when T is idle, and vice versa.

23

The Dual of a SystemTask 1

Task 2

Task 3

24

The Dual of a SystemTask 1

Task 2

Task 3

time = 0 1 2 3

Dual System

25

The Dual of a System

Because each dual task represents the idle time of its primal task, scheduling the dual system is equivalent to scheduling the original system.

Original System

time = 0 1 2 3

Dual System

26

The RUN Algorithm

Reduction = Packing + Dual Keep packing until remaining tasks satisfy:

ui + uj > 1 for all tasks Ti , Tj

Schedule of Dual System

Schedule for “Primal” System

Replace packed tasks in schedule with their clients, ordered by EDF

27

RUN: A Simple Example

Task 1: u=.2

Task 2: u=.6

Task 3: u=.3

Step 1: Pack tasks until all pairs have utilization > 1

Task 5: u=.5

Task 4: u=.4

Task 12: u=.8

28

RUN: A Simple Example

Task 3: u=.3

Step 1: Pack tasks until all pairs have utilization > 1

Task 5: u=.5

Task 4: u=.4

Task 12: u=.8

Task 34: u=.7

29

RUN: A Simple Example Step 2: Find the dual system on n-m = 3-2 = 1 processor

Task 5: u=.5

Task 12: u=.8

Task 34: u=.7

30

RUN: A Simple Example Step 2: Find the dual system on n-m = 3-2 = 1 processor

Task 5: u=.5

Task 12: u=.8

Task 34: u=.7

Task 5*: u=.5

Task 12*: u=.2

Task 34*: u=.3

31

RUN: A Simple Example Step 3: Schedule uniprocessor dual with EDF

Task 5*: u=.5

Task 12*: u=.2

Task 34*: u=.3

time = 0 2 4 6

Dual CPU

32

RUN: A Simple Example Step 4: Schedule packed tasks from dual schedule

time = 0 2 4 6

Dual CPU

33

RUN: A Simple Example Step 4: Schedule packed tasks from dual schedule

time = 0 2 4 6

Dual CPU

CPU 1

CPU 2

34

RUN: A Simple Example Step 5: Packed tasks schedule their clients with EDF

time = 0 2 4 6

Dual CPU

CPU 1

CPU 2

35

RUN: A Simple Example The original task set has been successfully scheduled!

time = 0 2 4 6

CPU 1

CPU 2

Task 1: u=.2

Task 2: u=.6

Task 3: u=.3

Task 5: u=.5

Task 4: u=.4

36

RUN: A Few Details

Several reductions may be necessary to produce a uniprocessor system Reduction reduces the number of processors by at

least half On random task sets with 100 processors and

hundreds of tasks, less than 1 in 600 task sets required more than 2 reductions.

RUN requires ui = m . When this is not the case, dummy tasks may be

added during the first packing step to create a partially or entirely partitioned system.

37

Proven Theoretical Performance

RUN is optimal

Reduction is done off-line, prior to execution, and takes O(n log n)

Each scheduler invocation is O(n), with total scheduling overhead O(jn log m) when j jobs are scheduled

RUN suffers at most (3r+1)/2 preemptions per job on task sets requiring r reductions.

Since r ≤ 2 for most task sets, this gives a theoretical upper bound of 4 preemptions per job.

In practice, we never observed more than 3 preeptions per job on our randomly generated task sets.

38

Simulation and Comparison

We evaluated RUN in side-by-side simulations with existing optimal algorithms LLREF, DP-Wrap, and EKG. Each data point is the average of 1000 task sets,

generated uniformly at random with utilizations in the range [0.1, 0.99] and integer periods in the range [5,100].

Simulations were run for 1000 time units each.

Values shown are average migrations and preemptions per job.

39

Comparison Varying Processors Number of processors varies from m = 2 to 32,

with 2m tasks and 100% utilization

40

Comparison Varying Utilization Total utilization varies from 55 to 100%, with 24

tasks running on 16 processors

Ongoing Work

Heuristic improvements to RUN

Extend RUN to broader problems: sporadic arrivals, arbitrarily deadlines, non-identical multiprocessors, etc

Develop general theory and new examples of non-deadline-partitioning algorithms

41

Summary- The RUN Algorithm:

is the first optimal algorithm without deadline partitioning

outperforms existing optimal algorithms by a factor of 5 on large systems

scales well to larger systems

reduces gracefully to the very efficient Partitioned EDF on any task set that Partitioned EDF can schedule

42

43

Thanks for Listening

Questions?

Documents

RUN: Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor