Upload
rupali
View
58
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Paul Regnier † George Lima † Ernesto Massa † Greg Levin ‡ Scott Brandt ‡ † Federal University of Bahia ‡ University of California Brazil Santa Cruz. - PowerPoint PPT Presentation
Citation preview
RUN: Optimal Multiprocessor
Real-Time Scheduling via
Reduction to Uniprocessor
Paul Regnier† George Lima† Ernesto Massa†
Greg Levin‡ Scott Brandt‡
†Federal University of Bahia ‡ University of California Brazil Santa Cruz
Multiprocessors
Most high-end computers today have multiple processors
In a busy computational environment, multiple processes compete for processor time More processors means more scheduling
complexity
2
3
Real-Time Multiprocessor Scheduling Real-time tasks have workload deadlines
Hard real-time = “Meet all deadlines!”
Problem: Schedule a set of periodic, hard real-time tasks on a multiprocessor systems so that all deadlines are met.
4
Example: EDF on One Processor On a single processor, Earliest Deadline First (EDF) is
optimal (it can schedule any feasible task set)
Task 1:
time = 0 10 20 30
CPU
2 units of work forevery 10 units of time
Task 3: 10 units of work forevery 25 units of time
Task 2: 6 units of work forevery 15 units of time
5
Scheduling Three Tasks Example: 2 processors; 3 tasks, each with 2 units
of work required every 3 time unitsjob release
time = 0
deadline
Task 1
Task 2
Task 3
1 2 3
6
Global Schedule Example: 2 processors; 3 tasks, each with 2 units
of work required every 3 time units
time = 0
CPU 1
CPU 2
1 2 3
Task 1 migrates between processors
Taxonomy of Multiprocessor Scheduling Algorithms
UniprocessorAlgorithms
LLFEDF
PartitionedAlgorithms
GlobalizedUniprocessor
Algorithms
GlobalEDF
DedicatedGlobal
Algorithms
PartitionedEDF
OptimalAlgorithms
EKG
DP-Wrap
pfair
LLREF
7
Uniprocessor
Multiprocessor
8
Problem Model
n tasks running on m processors
A periodic task T = (p, e) requires a workload e be completed within each period of length p
T's utilization u = e / p is the fraction of each period that the task must execute
job release
time = 0 p 2p 3p
job releasejob releasejob release
period p
workload e
9
Assumptions
Processor Identity: All processors are equivalent
Task Independence: Tasks are independent
Task Unity: Tasks run on one processor at a time
Task Migration: Tasks may run on different processors at different times
No overhead: free context switches and migrations
In practice: built into WCET estimates
10
The Big Goal (Version 1)
Design an optimal scheduling algorithm for periodic task sets on multiprocessors
A task set is feasible if there exists a schedule that meets all deadlines
A scheduling algorithm is optimal if it can
always schedule any feasible task set
11
Necessary and Sufficient Conditions
Any set of tasks needing at most
1) 1 processor for each task ( for all i, ui ≤ 1 ) , and
2) m processors for all tasks ( ui ≤ m)
is feasible
1) Status: Solved
1) pfair (1996) was the first optimal algorithm
12
The Big Goal (Version 2)
Design an optimal scheduling algorithm with fewer context switches and migrations
( Finding feasible schedule with fewest migrations is NP-Complete )
13
The Big Goal (Version 2)
Design an optimal scheduling algorithm with fewer context switches and migrations
Status: Ongoing…
All existing improvements over pfair use some form of deadline partitioning…
14
Deadline Partitioning
Task 1
Task 2
Task 4
Task 3
15
Deadline Partitioning
CPU 2
CPU 1
16
The Big Goal (Version 2)
Design an optimal scheduling algorithm with fewer context switches and migrations
Status: Ongoing…
All existing improvements over pfair use some form of deadline partitioning…
… but all the activity in each time window still leads to a large amount of preemptions and migrations
Our Contribution: The first optimal algorithm which does not depend on deadline partitioning, and which therefore has much lower overhead
17
The RUN Scheduling Algorithm
RUN uses a sequence of reduction operations to transform a multiprocessor problem into a collection of uniprocessor problems Uniprocessor scheduling is solved optimally by
Earliest Deadline First (EDF) Uniprocessor schedules are transformed back into a
single multiprocessor schedule
A reduction operation is composed of two steps: packing and dual
18
A collection of tasks with total utilization at most one can be packed into a single fixed-utilization task:
The Packing Operation
Task 1: u = 0.1
Task 2: u = 0.3
Task 3: u = 0.4
Packed Task:u = 0.8
19
We divide time with the packed tasks’ deadlines, and schedule them with EDF. In each segment, consume according to total utilization.
Scheduling a Packed Task’s Clients
Task 1: p=5 , u=.1
Task 2: p=3 , u=.3
Task 3: p=2 , u=.4
EDF schedule of tasks
20
The packed task temporarily replaces (acts as a proxy for) its clients. It may not be periodic, but has a fixed utilization in each segment.
Defining a Packed Task
Packed Task
EDF schedule of tasks
Task 1: p=5 , u=.1
Task 2: p=3 , u=.3
Task 3: p=2 , u=.4
21
The dual of a task T is a task T* with the same deadlines, but complementary utilization / workloads:
The Dual of a Task
T: u = 0.4, p = 3
T*: u = 0.6, p = 3
22
The Dual of a System
Given a system of n tasks and m processors, assume full utilization ( ui = m )
The dual system consists of n dual tasks running on n-m processors Note that the dual utilizations sum to ui = n - ui = n-m
If n < 2m, the dual system has fewer processors
The dual task T* represents the idle time of T. T* is executed in the dual system precisely when T is idle, and vice versa.
23
The Dual of a SystemTask 1
Task 2
Task 3
24
The Dual of a SystemTask 1
Task 2
Task 3
time = 0 1 2 3
Dual System
25
The Dual of a System
Because each dual task represents the idle time of its primal task, scheduling the dual system is equivalent to scheduling the original system.
Original System
time = 0 1 2 3
Dual System
26
The RUN Algorithm
Reduction = Packing + Dual Keep packing until remaining tasks satisfy:
ui + uj > 1 for all tasks Ti , Tj
Schedule of Dual System
Schedule for “Primal” System
Replace packed tasks in schedule with their clients, ordered by EDF
27
RUN: A Simple Example
Task 1: u=.2
Task 2: u=.6
Task 3: u=.3
Step 1: Pack tasks until all pairs have utilization > 1
Task 5: u=.5
Task 4: u=.4
Task 12: u=.8
28
RUN: A Simple Example
Task 3: u=.3
Step 1: Pack tasks until all pairs have utilization > 1
Task 5: u=.5
Task 4: u=.4
Task 12: u=.8
Task 34: u=.7
29
RUN: A Simple Example Step 2: Find the dual system on n-m = 3-2 = 1 processor
Task 5: u=.5
Task 12: u=.8
Task 34: u=.7
30
RUN: A Simple Example Step 2: Find the dual system on n-m = 3-2 = 1 processor
Task 5: u=.5
Task 12: u=.8
Task 34: u=.7
Task 5*: u=.5
Task 12*: u=.2
Task 34*: u=.3
31
RUN: A Simple Example Step 3: Schedule uniprocessor dual with EDF
Task 5*: u=.5
Task 12*: u=.2
Task 34*: u=.3
time = 0 2 4 6
Dual CPU
32
RUN: A Simple Example Step 4: Schedule packed tasks from dual schedule
time = 0 2 4 6
Dual CPU
33
RUN: A Simple Example Step 4: Schedule packed tasks from dual schedule
time = 0 2 4 6
Dual CPU
CPU 1
CPU 2
34
RUN: A Simple Example Step 5: Packed tasks schedule their clients with EDF
time = 0 2 4 6
Dual CPU
CPU 1
CPU 2
35
RUN: A Simple Example The original task set has been successfully scheduled!
time = 0 2 4 6
CPU 1
CPU 2
Task 1: u=.2
Task 2: u=.6
Task 3: u=.3
Task 5: u=.5
Task 4: u=.4
36
RUN: A Few Details
Several reductions may be necessary to produce a uniprocessor system Reduction reduces the number of processors by at
least half On random task sets with 100 processors and
hundreds of tasks, less than 1 in 600 task sets required more than 2 reductions.
RUN requires ui = m . When this is not the case, dummy tasks may be
added during the first packing step to create a partially or entirely partitioned system.
37
Proven Theoretical Performance
RUN is optimal
Reduction is done off-line, prior to execution, and takes O(n log n)
Each scheduler invocation is O(n), with total scheduling overhead O(jn log m) when j jobs are scheduled
RUN suffers at most (3r+1)/2 preemptions per job on task sets requiring r reductions.
Since r ≤ 2 for most task sets, this gives a theoretical upper bound of 4 preemptions per job.
In practice, we never observed more than 3 preeptions per job on our randomly generated task sets.
38
Simulation and Comparison
We evaluated RUN in side-by-side simulations with existing optimal algorithms LLREF, DP-Wrap, and EKG. Each data point is the average of 1000 task sets,
generated uniformly at random with utilizations in the range [0.1, 0.99] and integer periods in the range [5,100].
Simulations were run for 1000 time units each.
Values shown are average migrations and preemptions per job.
39
Comparison Varying Processors Number of processors varies from m = 2 to 32,
with 2m tasks and 100% utilization
40
Comparison Varying Utilization Total utilization varies from 55 to 100%, with 24
tasks running on 16 processors
Ongoing Work
Heuristic improvements to RUN
Extend RUN to broader problems: sporadic arrivals, arbitrarily deadlines, non-identical multiprocessors, etc
Develop general theory and new examples of non-deadline-partitioning algorithms
41
Summary- The RUN Algorithm:
is the first optimal algorithm without deadline partitioning
outperforms existing optimal algorithms by a factor of 5 on large systems
scales well to larger systems
reduces gracefully to the very efficient Partitioned EDF on any task set that Partitioned EDF can schedule
42
43
Thanks for Listening
Questions?