19
Real-Time Computing and Communications Lab. Hanyang University Multicore Scheduling Minsoo Ryu Department of Computer Science and Engineering Hanyang University

Lecture 14 Multicore Scheduling.ppt [호환 모드]rtcc.hanyang.ac.kr/.../Lecture_14_Multicore_Scheduling.pdf3 3 Real-Time Computing and Communications Lab. HanyangUniversity Uniprocessor

Embed Size (px)

Citation preview

Real-Time Computing and Communications Lab.Hanyang University

Multicore Scheduling

Minsoo Ryu

Department of Computer Science and EngineeringHanyang University

2

2

Real-Time Computing and Communications Lab.Hanyang University

Global vs. Partitioned Scheduling1 XPage

Other Scheduling Approaches2 XPage

Q & A3 XPage

3

3

Real-Time Computing and Communications Lab.Hanyang University

Uniprocessor vs. Multiprocessor Scheduling

Uniprocessor scheduling It is to decide when and which job will run

Multiprocessor scheduling It is to decide not only when but also where a job will run Almost the same goals as those of uniprocessor

scheduling But it raises new issues

• How to assign applications to multiple processors?• How to balance workload among processors?• How to define and exploit affinity?• How to manage processor heterogeneity?

4

4

Real-Time Computing and Communications Lab.Hanyang University

Multiprocessor Scheduling Policies

The same policies as uniprocessor policies Priority-based scheduling: FCFS, SJF, SRTF, RM, EDF Proportional share scheduling: PGPS, SFQ, WF2Q, Lottery

and Stride, BVT, VTRR

Two approaches Global scheduling

• The system has a single global process queue• Processes are dispatched to any available processors

Partitioned scheduling• Each processor has a separate process queue• Each queue is scheduled by an independent scheduler• Process migration may be allowed or not

5

5

Real-Time Computing and Communications Lab.Hanyang University

Global vs. Partitioned Scheduling

Global Scheduling Partitioned Scheduling

6

6

Real-Time Computing and Communications Lab.Hanyang University

Global vs. Partitioned Scheduling

Global scheduling It is generally believed that global scheduling can achieve

better performance However, it can be inefficient due to the contention at the

single queue and increased cache misses

Partitioned scheduling Performance can vary depending on the initial distribution

of processes, i.e., a bin-packing problem Different scheduling policies can be employed across

processors We can use the rich and extensive results from the

uniprocessor scheduling theory

7

7

Real-Time Computing and Communications Lab.Hanyang University

Global EDF

Consider the following tasks Process X : period = 20, WCET = 15, deadline = 20 Process Y : period = 30, WCET = 15, deadline = 30 Process Z : period = 40, WCET = 10, deadline = 40

10 20 30 40 50 600

10 20 30 40 50 600

CPU #1

CPU #2

8

8

Real-Time Computing and Communications Lab.Hanyang University

Partitioned EDF

Consider the following tasks Process X : period = 20, WCET = 15, deadline = 20 Process Y : period = 30, WCET = 15, deadline = 30 Process Z : period = 40, WCET = 10, deadline = 40

10 20 30 40 50 600

10 20 30 40 50 600

CPU #1(X)

CPU #2(Y, Z)

9

9

Real-Time Computing and Communications Lab.Hanyang University

Schedulability Analysis

Global EDF There is no single efficient test Most tests are very complex

Partitioned EDF Sufficient to check if the CPU utilization does not exceed

100% for each processor

……

10

10

Real-Time Computing and Communications Lab.Hanyang University

Global WFQ

Consider the following tasks Process A : weight = 1 Process B : weight = 2 Process C : weight = 2 Process D : weight = 4

CPU #1

CPU #2

11

11

Real-Time Computing and Communications Lab.Hanyang University

Partitioned WFQ with Load Balancing

Consider the following tasks Process A : weight = 1 Process B : weight = 2 Process C : weight = 2 Process D : weight = 4

CPU #1(A, D)

CPU #2(B, C)

migrate migrate migrate

12

12

Real-Time Computing and Communications Lab.Hanyang University

Other Scheduling Approaches

Coscheduling Gang Scheduling and Symbiotic Scheduling

Processor sets Locality and cache issues Processor Heterogeneity

13

13

Real-Time Computing and Communications Lab.Hanyang University

Gang Scheduling

First proposed by Ousterhout in 1982 Also known as coscheduling

Basic idea All processes of an application are scheduled to run

simultaneously When a time slice ends, all running processes are

preempted simultaneously, and all processes from a second application are scheduled for the next time slice

14

14

Real-Time Computing and Communications Lab.Hanyang University

Some Properties of Gang Scheduling

Advantages Gang scheduling solves the problems associated with

synchronization and communications among related processes

Disadvantages Gang scheduling is a centralized scheduling strategy,

which can become a bottleneck for large machine It can result in poor cache performance It can lead to fragmentation of processors where there are

applications that do not need all of the processors in the system, but do not leave enough free processors to fit all of the processes of another application

15

15

Real-Time Computing and Communications Lab.Hanyang University

Symbiotic Scheduling

Proposed by Snavely et al. in 1999 The term symbiosis was introduced to refer to an increase

in throughput that can occur when particular jobs are coscheduled on multithreaded machines

Throughput may go up or down depending on how well the jobs in the running set symbios or ‘get along’

Symbiosis measurement Throughput rate TR

••

16

16

Real-Time Computing and Communications Lab.Hanyang University

Processor Sets

Proposed by Black in 1990 The machine is partitioned into sets of processors, each of

which executes a single parallel application Using processor sets can also ensure that different

applications get an equal portion of the machine (equi-partitioning)

17

17

Real-Time Computing and Communications Lab.Hanyang University

Thread Clustering

Proposed by D. Tam, R. Azimi, and M. Stumm in 2007

Motivation Cost of cross-chip sharing is high

Key ideas Detect thread sharing

patterns using performancecounters

Locate threads that heavily share data onto the same chip

18

18

Real-Time Computing and Communications Lab.Hanyang University

Processor Heterogeneity

ASISA (Asymmetric Single ISA) processors Processors have the same ISA, but different performance

characteristics (clock speed, power, cache size, …)

Heterogeneity, scheduling, and performance

19

19

Real-Time Computing and Communications Lab.Hanyang University