43
RUN: Optimal Multiprocessor Real- Time Scheduling via Reduction to Uniprocessor Paul Regnier George Lima Ernesto Massa Greg Levin Scott Brandt Federal University of Bahia University of California Brazil

RUN: Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

  • Upload
    rupali

  • View
    58

  • Download
    0

Embed Size (px)

DESCRIPTION

Paul Regnier † George Lima † Ernesto Massa † Greg Levin ‡ Scott Brandt ‡ † Federal University of Bahia ‡ University of California Brazil Santa Cruz. - PowerPoint PPT Presentation

Citation preview

Page 1: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

RUN: Optimal Multiprocessor

Real-Time Scheduling via

Reduction to Uniprocessor

Paul Regnier† George Lima† Ernesto Massa†

Greg Levin‡ Scott Brandt‡

†Federal University of Bahia ‡ University of California Brazil Santa Cruz

Page 2: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

Multiprocessors

Most high-end computers today have multiple processors

In a busy computational environment, multiple processes compete for processor time More processors means more scheduling

complexity

2

Page 3: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

3

Real-Time Multiprocessor Scheduling Real-time tasks have workload deadlines

Hard real-time = “Meet all deadlines!”

Problem: Schedule a set of periodic, hard real-time tasks on a multiprocessor systems so that all deadlines are met.

Page 4: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

4

Example: EDF on One Processor On a single processor, Earliest Deadline First (EDF) is

optimal (it can schedule any feasible task set)

Task 1:

time = 0 10 20 30

CPU

2 units of work forevery 10 units of time

Task 3: 10 units of work forevery 25 units of time

Task 2: 6 units of work forevery 15 units of time

Page 5: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

5

Scheduling Three Tasks Example: 2 processors; 3 tasks, each with 2 units

of work required every 3 time unitsjob release

time = 0

deadline

Task 1

Task 2

Task 3

1 2 3

Page 6: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

6

Global Schedule Example: 2 processors; 3 tasks, each with 2 units

of work required every 3 time units

time = 0

CPU 1

CPU 2

1 2 3

Task 1 migrates between processors

Page 7: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

Taxonomy of Multiprocessor Scheduling Algorithms

UniprocessorAlgorithms

LLFEDF

PartitionedAlgorithms

GlobalizedUniprocessor

Algorithms

GlobalEDF

DedicatedGlobal

Algorithms

PartitionedEDF

OptimalAlgorithms

EKG

DP-Wrap

pfair

LLREF

7

Uniprocessor

Multiprocessor

Page 8: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

8

Problem Model

n tasks running on m processors

A periodic task T = (p, e) requires a workload e be completed within each period of length p

T's utilization u = e / p is the fraction of each period that the task must execute

job release

time = 0 p 2p 3p

job releasejob releasejob release

period p

workload e

Page 9: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

9

Assumptions

Processor Identity: All processors are equivalent

Task Independence: Tasks are independent

Task Unity: Tasks run on one processor at a time

Task Migration: Tasks may run on different processors at different times

No overhead: free context switches and migrations

In practice: built into WCET estimates

Page 10: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

10

The Big Goal (Version 1)

Design an optimal scheduling algorithm for periodic task sets on multiprocessors

A task set is feasible if there exists a schedule that meets all deadlines

A scheduling algorithm is optimal if it can

always schedule any feasible task set

Page 11: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

11

Necessary and Sufficient Conditions

Any set of tasks needing at most

1) 1 processor for each task ( for all i, ui ≤ 1 ) , and

2) m processors for all tasks ( ui ≤ m)

is feasible

1) Status: Solved

1) pfair (1996) was the first optimal algorithm

Page 12: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

12

The Big Goal (Version 2)

Design an optimal scheduling algorithm with fewer context switches and migrations

( Finding feasible schedule with fewest migrations is NP-Complete )

Page 13: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

13

The Big Goal (Version 2)

Design an optimal scheduling algorithm with fewer context switches and migrations

Status: Ongoing…

All existing improvements over pfair use some form of deadline partitioning…

Page 14: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

14

Deadline Partitioning

Task 1

Task 2

Task 4

Task 3

Page 15: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

15

Deadline Partitioning

CPU 2

CPU 1

Page 16: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

16

The Big Goal (Version 2)

Design an optimal scheduling algorithm with fewer context switches and migrations

Status: Ongoing…

All existing improvements over pfair use some form of deadline partitioning…

… but all the activity in each time window still leads to a large amount of preemptions and migrations

Our Contribution: The first optimal algorithm which does not depend on deadline partitioning, and which therefore has much lower overhead

Page 17: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

17

The RUN Scheduling Algorithm

RUN uses a sequence of reduction operations to transform a multiprocessor problem into a collection of uniprocessor problems Uniprocessor scheduling is solved optimally by

Earliest Deadline First (EDF) Uniprocessor schedules are transformed back into a

single multiprocessor schedule

A reduction operation is composed of two steps: packing and dual

Page 18: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

18

A collection of tasks with total utilization at most one can be packed into a single fixed-utilization task:

The Packing Operation

Task 1: u = 0.1

Task 2: u = 0.3

Task 3: u = 0.4

Packed Task:u = 0.8

Page 19: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

19

We divide time with the packed tasks’ deadlines, and schedule them with EDF. In each segment, consume according to total utilization.

Scheduling a Packed Task’s Clients

Task 1: p=5 , u=.1

Task 2: p=3 , u=.3

Task 3: p=2 , u=.4

EDF schedule of tasks

Page 20: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

20

The packed task temporarily replaces (acts as a proxy for) its clients. It may not be periodic, but has a fixed utilization in each segment.

Defining a Packed Task

Packed Task

EDF schedule of tasks

Task 1: p=5 , u=.1

Task 2: p=3 , u=.3

Task 3: p=2 , u=.4

Page 21: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

21

The dual of a task T is a task T* with the same deadlines, but complementary utilization / workloads:

The Dual of a Task

T: u = 0.4, p = 3

T*: u = 0.6, p = 3

Page 22: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

22

The Dual of a System

Given a system of n tasks and m processors, assume full utilization ( ui = m )

The dual system consists of n dual tasks running on n-m processors Note that the dual utilizations sum to ui = n - ui = n-m

If n < 2m, the dual system has fewer processors

The dual task T* represents the idle time of T. T* is executed in the dual system precisely when T is idle, and vice versa.

Page 23: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

23

The Dual of a SystemTask 1

Task 2

Task 3

Page 24: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

24

The Dual of a SystemTask 1

Task 2

Task 3

time = 0 1 2 3

Dual System

Page 25: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

25

The Dual of a System

Because each dual task represents the idle time of its primal task, scheduling the dual system is equivalent to scheduling the original system.

Original System

time = 0 1 2 3

Dual System

Page 26: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

26

The RUN Algorithm

Reduction = Packing + Dual Keep packing until remaining tasks satisfy:

ui + uj > 1 for all tasks Ti , Tj

Schedule of Dual System

Schedule for “Primal” System

Replace packed tasks in schedule with their clients, ordered by EDF

Page 27: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

27

RUN: A Simple Example

Task 1: u=.2

Task 2: u=.6

Task 3: u=.3

Step 1: Pack tasks until all pairs have utilization > 1

Task 5: u=.5

Task 4: u=.4

Task 12: u=.8

Page 28: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

28

RUN: A Simple Example

Task 3: u=.3

Step 1: Pack tasks until all pairs have utilization > 1

Task 5: u=.5

Task 4: u=.4

Task 12: u=.8

Task 34: u=.7

Page 29: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

29

RUN: A Simple Example Step 2: Find the dual system on n-m = 3-2 = 1 processor

Task 5: u=.5

Task 12: u=.8

Task 34: u=.7

Page 30: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

30

RUN: A Simple Example Step 2: Find the dual system on n-m = 3-2 = 1 processor

Task 5: u=.5

Task 12: u=.8

Task 34: u=.7

Task 5*: u=.5

Task 12*: u=.2

Task 34*: u=.3

Page 31: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

31

RUN: A Simple Example Step 3: Schedule uniprocessor dual with EDF

Task 5*: u=.5

Task 12*: u=.2

Task 34*: u=.3

time = 0 2 4 6

Dual CPU

Page 32: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

32

RUN: A Simple Example Step 4: Schedule packed tasks from dual schedule

time = 0 2 4 6

Dual CPU

Page 33: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

33

RUN: A Simple Example Step 4: Schedule packed tasks from dual schedule

time = 0 2 4 6

Dual CPU

CPU 1

CPU 2

Page 34: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

34

RUN: A Simple Example Step 5: Packed tasks schedule their clients with EDF

time = 0 2 4 6

Dual CPU

CPU 1

CPU 2

Page 35: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

35

RUN: A Simple Example The original task set has been successfully scheduled!

time = 0 2 4 6

CPU 1

CPU 2

Task 1: u=.2

Task 2: u=.6

Task 3: u=.3

Task 5: u=.5

Task 4: u=.4

Page 36: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

36

RUN: A Few Details

Several reductions may be necessary to produce a uniprocessor system Reduction reduces the number of processors by at

least half On random task sets with 100 processors and

hundreds of tasks, less than 1 in 600 task sets required more than 2 reductions.

RUN requires ui = m . When this is not the case, dummy tasks may be

added during the first packing step to create a partially or entirely partitioned system.

Page 37: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

37

Proven Theoretical Performance

RUN is optimal

Reduction is done off-line, prior to execution, and takes O(n log n)

Each scheduler invocation is O(n), with total scheduling overhead O(jn log m) when j jobs are scheduled

RUN suffers at most (3r+1)/2 preemptions per job on task sets requiring r reductions.

Since r ≤ 2 for most task sets, this gives a theoretical upper bound of 4 preemptions per job.

In practice, we never observed more than 3 preeptions per job on our randomly generated task sets.

Page 38: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

38

Simulation and Comparison

We evaluated RUN in side-by-side simulations with existing optimal algorithms LLREF, DP-Wrap, and EKG. Each data point is the average of 1000 task sets,

generated uniformly at random with utilizations in the range [0.1, 0.99] and integer periods in the range [5,100].

Simulations were run for 1000 time units each.

Values shown are average migrations and preemptions per job.

Page 39: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

39

Comparison Varying Processors Number of processors varies from m = 2 to 32,

with 2m tasks and 100% utilization

Page 40: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

40

Comparison Varying Utilization Total utilization varies from 55 to 100%, with 24

tasks running on 16 processors

Page 41: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

Ongoing Work

Heuristic improvements to RUN

Extend RUN to broader problems: sporadic arrivals, arbitrarily deadlines, non-identical multiprocessors, etc

Develop general theory and new examples of non-deadline-partitioning algorithms

41

Page 42: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

Summary- The RUN Algorithm:

is the first optimal algorithm without deadline partitioning

outperforms existing optimal algorithms by a factor of 5 on large systems

scales well to larger systems

reduces gracefully to the very efficient Partitioned EDF on any task set that Partitioned EDF can schedule

42

Page 43: RUN:  Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor

43

Thanks for Listening

Questions?