87
Univ. of Tehran Distributed Operating Sys tems 1 Advanced Advanced Operating Systems Operating Systems University of Tehran Dept. of EE and Computer Engineering By: Dr. Nasser Yazdani Lecture 6: Scheduling Scheduling

Advanced Operating Systems

  • Upload
    nowles

  • View
    48

  • Download
    0

Embed Size (px)

DESCRIPTION

Advanced Operating Systems. Lecture 6: Scheduling. University of Tehran Dept. of EE and Computer Engineering By: Dr. Nasser Yazdani. How efficiently use resources. Sharing CPU and other resources of the systm. References - PowerPoint PPT Presentation

Citation preview

Page 1: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

1

Advanced Advanced

Operating SystemsOperating Systems

University of TehranDept. of EE and Computer Engineering

By:Dr. Nasser Yazdani

Lecture 6: SchedulingScheduling

Page 2: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

2

How efficiently use resourcesHow efficiently use resources Sharing CPU and other resources of the systm. References

Surplus Fair Scheduling: A Proportional-Share CPU Scheduling Algorithm for Symmetric Multiprocessors

Scheduler Activations: Effective Kernel Support for User-Level Management of Parallelism",

Condor- A Hunter of Idle Workstation Virtual-Time Round-Robin: An O(1) Proportional

Share Scheduler A SMART Scheduler for Multimedia Applications Linux CPU scheduling,

Page 3: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

3

OutlineOutline Scheduling Scheduling policies. Scheduling on Multiprocessor Thread scheduling

Page 4: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

4

What is Scheduling?What is Scheduling? OS policies and mechanisms to allocates

resources to entities. An O/S often has many pending tasks.

Threads, async callbacks, device input. The order may matter.

Policy, correctness, or efficiency. Providing sufficient control is not easy.

Mechanisms must allow policy to be expressed. A good scheduling policy ensures that the most

important entity gets the resources it needs

Page 5: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

5

Why Scheduling?Why Scheduling? This topic was popular in the days of time

sharing, when there was a shortage of resources.

It seemed irrelevant in era of PCs and workstations, when resources were plenty.

Now the topic is back from the dead to handle massive Internet servers with paying customers Where some customers are more important than others

Page 6: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

6

Resources to Schedule?Resources to Schedule? Resources you might want to schedule:

CPU time, physical memory, disk and network I/O, and I/O bus bandwidth.

Entities that you might want to give resources to: users, processes, threads, web requests, or MIT accounts.

Page 7: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

7

Key problems ?Key problems ? Gap between desired policy and available

mechanism. The desired policies often include elements that not implementable. Furthermore, often there are many conflicting goals (low latency, high throughput, and fairness), and the scheduler must make a trade-off between the goals.

Interaction between different schedulers. One have to take a systems view. Just optimizing the CPU scheduler may do little to for the overall desired policy.

Page 8: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

8

Scheduling Policy Scheduling Policy ExamplesExamples

Allocate cycles in proportion to money.

Maintain high throughput under high load.

Never delay high pri thread by > 1ms. Maintain good interactive response. Can we enforce policy with the thread

scheduler?

Page 9: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

9

General plan General plan Understand where scheduling is

occurring. Expose scheduling decisions, allow

control. Account for resource consumption, to

allow intelligent control.

Page 10: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

10

Parallel ComputingParallel Computing

Speedup - the final measure of success Parallelism vs Concurrency

Actual vs possible by application Granularity

Size of the concurrent tasks Reconfigurability

Number of processors Communication cost Preemption v. non-preemption Co-scheduling

Some things better scheduled together

Page 11: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

11

Best place for scheduling?Best place for scheduling?

Application is in best position to know its own specific scheduling requirements Which threads run best simultaneously Which are on Critical path But Kernel must make sure all play fairly

MACH Scheduling Lets process provide hints to discourage running Possible to hand off processor to another thread

Makes easier for Kernel to select next thread Allow interleaving of concurrent threads

Leaves low level scheduling in Kernel Based on higher level info from application space

Page 12: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

12

ExampleExample

Give each process one equal CPU time. Interrupt every 10 msec and then selecting another in a round-robin fashion. Works if processes are compute-bound. What if a process gives up some of its 10 ms to wait for input?

How long should the quantum be? is 10 msec the right answer? Shorter quantum => better interactive performance, but lowers overall system throughput.

What if the environment computes for 1 msec and sends an IPC to the file server environment? Shouldn't the file server get more CPU time because it operates on behalf of all other functions?

Potential improvements: track "recent" CPU use (e.g., over the last second) and always run environment with least recent CPU use. (Still, if you sleep long enough you lose.) Other solution: directed yield; specify on the yield to which environment you are donating the remainder of the quantuam (e.g., to the file server so that it can compute on the environment's behalf).

Page 13: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

13

Scheduling is a System Scheduling is a System ProblemProblem

Thread/process scheduler can’t enforce policies by itself.

Needs cooperation from: All resource schedulers. Software structure.

Conflicting goals may limit effectiveness.

Page 14: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

14

GoalsGoals Low latency

People typing at editors want fast response - Network services can be latency-bound,

not CPU-bound High throughput

Minimize context switches to avoid wasting CPU, TLB

misses, cache misses, even page faults. Fairness

Page 15: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

15

Scheduling ApproachesScheduling Approaches FIFO

+ Fair- High latency

Round robin + fair+ low latency- poor throughput

STCF/SRTCF (shortest time/remaining time to completion first)+ low latency+ high throughput- unfair: Starvation

Page 16: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

16

Shortest Job First (SJF)Shortest Job First (SJF) Two types:

Non-preemptive Preemptive

Requirement: the elapse time needs to be known in advance

Optimal if all jobs are available simultaneously (provable)

Is SJF optimal if all the jobs are not available simultaneously?

Page 17: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

17

Preemptive SJFPreemptive SJF Also called Shortest Remaining Time

First Schedule the job with the shortest

remaining time required to complete Requirement: the elapse time needs

to be known in advance

Page 18: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

18

Interactive SchedulingInteractive Scheduling Usually preemptive

Time is sliced into quantum (time intervals) Decision made at the beginning of each quantum

Performance Criteria Min Response time best proportionality

Representative algorithms: Priority-based Round-robin Multi Queue & Multi-level Feedback Shortest process time Guaranteed Scheduling Lottery Scheduling Fair Sharing Scheduling

Page 19: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

19

Priority SchedulingPriority Scheduling

Each job is assigned a priority with FCFS within each priority level.

Select highest priority job over lower ones.

Rational: higher priority jobs are more mission-critical Example: DVD movie player vs. send email

Problems: May not give the best AWT indefinite blocking or starvation a process

Page 20: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

20

Set PrioritySet Priority Two approaches

Static (for system with well known and regular application behaviors)

Dynamic (otherwise) Priority may be based on:

Cost to user. Importance of user. Aging Percentage of CPU time used in last X

hours.

Page 21: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

21

Pitfall: Priority Pitfall: Priority InversionInversion

• Low-priority thread X holds a lock.• High-priority thread Y waits for the lock.• Medium-priority thread Z pre-empts X.• Y is indefinitely delayed despite high priority. When a higher priority process needs to read or modify

kernel data that are currently being accessed by a lower priority process.

The higher priority process must wait! But the lower priority cannot proceed quickly due to

scheduling. Solution: priority inheritance

When a lower-priority process accesses a resource, it inherits high-priority until it is done with the resource in question. And then its priority reverses to its natural value.

Page 22: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

22

Pitfall: Long Code Pitfall: Long Code PathsPaths

Large-granularity locks are convenient. Non-pre-emptable threads are an

extreme case. May delay high-priority processing.

Page 23: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

23

Pitfall: EfficiencyPitfall: Efficiency Efficient disk use requires unfairness.

Shortest-seek-first vs FIFO. Read-ahead vs data needed now.

Efficient paging policy creates delays. O/S may swap out my idle Emacs to free

memory. What happens when I type a key?

Thread scheduler doesn’t control these.

Page 24: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

24

Pitfall: Multiple Pitfall: Multiple SchedulersSchedulers

Every resource with multiple waiting threads has a scheduler.

Locks, disk driver, memory allocator. The schedulers may not cooperate

or even be explicit.

Page 25: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

25

Example: UNIXExample: UNIX Goals:

Simple kernel concurrency model. Limited pre-emption.

Quick response to device interrupts. Many kinds of execution

environments. Some transitions are not possible. Some transitions can’t be controlled.

Page 26: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

26

UNIX EnvironmentsUNIX Environments

User

KernelKernelHalf

ProcessUser Half

KernelHalf

ProcessUser Half

TimerSoft Interrupt

NetworkSoft Interrupt

DeviceInterrupt

DeviceInterrupt

TimerInterrupt

Page 27: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

27

UNIX: Process User HalfUNIX: Process User Half Interruptable. Pre-emptable via timer interrupt.

We don’t trust user processes. Enters kernel half via system calls, faults.

Save user state on stack. Raise privilege level. Jump to known point in the kernel.

Each process has a stack and saved registers.

Page 28: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

28

UNIX: Process Kernel UNIX: Process Kernel HalfHalf

Executes system calls for its user process. May involve many steps separated by sleep().

Interruptable. May postpone interrupts in critical sections.

Not pre-emptable. Simplifies concurrent programming. No context switch until voluntary sleep(). No user process runs if a kernel half is runnable.

Each kernel half has a stack and saved registers.

Many processes may be sleep()ing in the kernel.

Page 29: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

29

UNIX: Device InterruptsUNIX: Device Interrupts Device hardware asks CPU for an

interrupt. To signal new input or completion of output. Cheaper than polling, lower latency.

Interrupts take priority over u/k half. Save current state on stack. Mask other interrupts. Run interrupt handler function. Return and restore state.

The real-time clock is a device.

Page 30: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

30

UNIX: Soft InterruptsUNIX: Soft Interrupts Device interrupt handlers must be short. Expensive processing deferred to soft intr.

Can’t do it in kernel-half: process not known. Example: TCP protocol input processing. Example: periodic process scheduling.

Devices can interrupt soft intr. Soft intr has priority over user & kernel

processes. But only entered on return from device intr. Similar to async callback. Can’t be high-pri thread, since no pre-emption.

Page 31: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

31

UNIX EnvironmentsUNIX Environments

User

KernelKernelHalf

ProcessUser Half

KernelHalf

ProcessUser Half

Soft Interrupt

DeviceInterrupt

Transfer w/ choiceTransfer, limited choiceTransfer, no choice

Page 32: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

32

Pitfall: Server Pitfall: Server Processes Processes

User-level servers schedule requests. X11, DNS, NFS.

They usually don’t know about kernel’s scheduling policy.

Network packet scheduling also interferes.

Page 33: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

33

Pitfall: Hardware Pitfall: Hardware SchedulersSchedulers

Memory system scheduled among CPUs.

I/O bus scheduled among devices. Interrupt controller chooses next

interrupt. Hardware doesn’t know about O/S

policy. O/S often doesn’t understand

hardware.

Page 34: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

34

Time QuantumTime Quantum Time slice too large

FIFO behavior Poor response time

Time slice too small Too many context switches (overheads) Inefficient CPU utilization

Heuristic: 70-80% of jobs block within time-slice

Typical time-slice 10 to 100 ms Time spent in system depends on size of

job.

Page 35: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

35

Multi-Queue Multi-Queue SchedulingScheduling

Hybrid between priority and round-robin Processes assigned to one queue permanently Scheduling between queues

Fixed Priorities % CPU spent on queue

Example System processes Interactive programs Background Processes Student Processes

Address the starvation and infinite blocking problems

Page 36: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

36

Multi-Queue Multi-Queue Scheduling: ExampleScheduling: Example

20%

50%

30%

Page 37: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

37

Multi-Processor Scheduling: Multi-Processor Scheduling: Load SharingLoad Sharing

Decides Which process to run? How long does it run Where to run it?

(CPU (horsepower))

Process 1 Process 2 Process n

I want to ride it

Page 38: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

38

Multi-Processor Multi-Processor Scheduling ChoicesScheduling Choices

Self-Scheduled Each CPU dispatches a job from the

ready queue Master-Slave

One CPU schedules the other CPUs Asymmetric

One CPU runs the kernel and the others runs the user applications.

One CPU handles network and the other handles applications

Page 39: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

39

Gang Scheduling for Gang Scheduling for Multi-ProcessorsMulti-Processors

A collection of processes belonging to one job

All the processes are running at the same time If one process is preempted, all the

processes of the gang are preempted. Helps to eliminate the time a process

spends waiting for other processes in its parallel computation.

Page 40: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

40

Scheduling ApproachesScheduling Approaches Multilevel feedback queues

A job starts with the highest priority queue

If time slice expires, lower the priority by one level

If time slice does not expire, raise the priority by one level

Age long-running jobs

Page 41: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

41

Lottery SchedulingLottery Scheduling Claim

Priority-based schemes are ad hoc Lottery scheduling

Randomized scheme Based on a currency abstraction Idea:

Processes own lottery tickets CPU randomly draws a ticket and execute

the corresponding process

Page 42: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

42

Properties of Lottery Properties of Lottery SchedulingScheduling

Guarantees fairness through probability

Guarantees no starvation, as long as each process owns one ticket

To approximate SRTCF Short jobs get more tickets Long jobs get fewer

Page 43: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

43

Partially Consumed Partially Consumed TicketsTickets

What if a process is chosen, but it does not consume the entire time slice? The process receives compensation

tickets Idea

Get chosen more frequently But with shorter time slice

Page 44: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

44

Ticket CurrenciesTicket Currencies Load Insulation

A process can dynamically change its ticketing policies without affecting other processes

Need to convert currencies before transferring tickets

base:3000

1000 2000

Alice:200 Bob:100

100200

process1:500

200 300

thread1 thread2

process2:100

100

thread3

Page 45: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

45

CondorCondor Identifies idle workstations and

schedules background jobs on them Guarantees job will eventually

complete Analysis of workstation usage patterns

Only 30% Remote capacity allocation algorithms

Up-Down algorithm Allow fair access to remote capacity

Remote execution facilities Remote Unix (RU)

Page 46: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

46

Condor IssuesCondor Issues Leverage: performance measure

Ratio of the capacity consumed by a job remotely to the capacity consumed on the home station to support remote execution

Checkpointing: save the state of a job so that its execution can be resumed

Transparent placement of background jobs Automatically restart if a background job

fails Users expect to receive fair access Small overhead

Page 47: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

47

Condor - schedulingCondor - scheduling Hybrid of centralized static and

distributed approach Each workstation keeps own state

information and schedule Central coordinator assigns capacity

to workstations Workstations use capacity to schedule

Page 48: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

48

Real time SystemsReal time Systems

Issues are scheduling and interrupts Must complete task by a particular deadline Examples:

Accepting input from real time sensors Process control applications Responding to environmental events

How does one support real time systems If short deadline, often use a dedicated system Give real time tasks absolute priority Do not support virtual memory

Use early binding

Page 49: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

49

Real time SchedulingReal time Scheduling

To initiate, must specify Deadline Estimate/upper-bound on resources

System accepts or rejects If accepted, agrees that it can meet the

deadline Places job in calendar, blocking out the

resources it will need and planning when the resources will be allocated

Some systems support priorities But this can violate the RT assumption for

already accepted jobs

Page 50: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

50

User-level Thread User-level Thread SchedulingScheduling

Possible Scheduling

50-msec process quantum

run 5 msec/CPU burst

Page 51: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

51

Kernel-level Thread Kernel-level Thread SchedulingScheduling

Possible scheduling 50-msec process

quantum threads run 5

msec/CPU burst

Page 52: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

52

Thread Scheduling Thread Scheduling ExamplesExamples

Solaris 2 priority-based process scheduling with four scheduling classes:

real-time, system, time sharing, interactive. A set of priorities within each class. The scheduler converts the class-specific priorities into global

priorities and selects to run the thread with the highest global priority. The thread runs until (1) it blocks, (2) it uses its time slice, or (3) it is preempted by a higher priority threads.

JVM schedules threads using a preemptive, priority-based

scheduling algorithm. schedules the ``runnable'' thread with the highest priority. If

two threads have the same priority, JVM applies FIFO. schedules a thread to run if (1) other thread exits the

``runnable state'' due to block(), exit(), suspend() or stop() methods; (2) a thread with higher priority enters the ``runnable''state.

Page 53: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

53

Surplus Fair Scheduling Surplus Fair Scheduling MotivationMotivation

Diverse web and multimedia applications popular HTTP, Streaming, e-commerce, games, etc.

Applications hosted on large servers (typically multiprocessors)

Key Challenge: Design OS mechanisms for Resource Management

End-stationsNetwork

Server

Streaming

E-commerce

Web

Page 54: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

54

Requirements for OS Requirements for OS Resource ManagementResource Management

Fair, Proportionate Allocation Eg: 20% for http, 30% for streaming, etc.

Application Isolation Misbehaving/overloaded applications should

not affect other applications Efficiency

OS mechanisms should have low overheads

Focus: Achieving these objectives for CPU scheduling on multiprocessor machines

Page 55: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

55

Proportional-Share Proportional-Share SchedulingScheduling

Associate a weight with each application and allocate CPU bandwidth proportional to weight

Existing Algorithms Ideal algorithm: Generalized Processor Sharing E.g.: WFQ, SFQ, SMART, BVT, etc.

Question: Are the existing algorithms adequate for multiprocessor systems?

Wt=2 Wt=1

2/3 1/3 CPU bandwidth

Applications

Page 56: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

56

Starvation ProblemStarvation Problem SFQ : Start tag of a thread ( Service / weight ) Schedules the thread with minimum start tag

CPU 1

CPU 2

0 100 1000 1100

C arrives

B starves

Time

A(Wt=100)

B (Wt=1)

C(Wt=1)

S1=10 S1=11S1=0 S1=1

S2=0 S2=100 S2=1000

S3=10 S3=110CPU 2

. . .

. . .

. . .

. . .

Page 57: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

57

Weight ReadjustmentWeight Readjustment Reason for starvation:

Infeasible Weight Assignment (eg: 1:100 for 2 CPUs)

Accounting is different from actual allocation

Observation: A thread can’t consume more than 1 CPU

bandwidth A thread can be assigned at most (1/p) of

total CPU bandwidth Feasibility Constraint:

pw

w

j

j

i 1

Page 58: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

58

Weight Readjustment Weight Readjustment (contd.)(contd.)

. . .

. . .

CPU 1 CPU 2 CPU 3 CPU p

Decreasing Order of weights

Efficient: Algorithm is O(p) Can be combined with existing algorithms

Goal: Convert given weights to feasible weights

Page 59: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

59

Effect of ReadjustmentEffect of Readjustment

0

5

10

15

20

25

30

0 10 20 30 40 50

SFQ without Readjustment

Nu

mb

er o

f it

era

tio

ns

(10

5)

Time (s)0 10 20 30 40 50

SFQ with Readjustment

Time (s)

Weight Readjustment gets rid of starvation problem

A (wt=10)

B (wt=1)

C (wt=1)

A (wt=10)

B (wt=1)

C (wt=1)

Page 60: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

60

Short Jobs ProblemShort Jobs Problem

0 5 10 15 20 25 30 35 40

SFQ

J1, wt=20J2-J21, wt=1x20J_short, wt=5

Time (s)

0

5

10

15

20

0 5 10 15 20 25 30 35 40

Ideal

J1, wt=20J2-J21, wt=1x20J_short, wt=5

Time (s)

Frequent arrivals and departures of short jobs

SFQ does unfair allocation!

Nu

mb

er o

f it

era

tio

ns

(10

5)

Page 61: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

61

Surplus Fair SchedulingSurplus Fair Scheduling

Servicereceived

by thread i Ideal

Actual

Time

Surplus

t

Scheduler picks the threads with least surplus values Lagging threads get closer to their due Threads that are ahead are restrained

Surplus = ServiceActual - ServiceIdeal

Page 62: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

62

Surplus Fair Scheduling Surplus Fair Scheduling (contd.)(contd.)

Start tag (Si) : Weighted Service of thread i

Si = Servicei / wi

Virtual time (v) : Minimum start tag of all runnable threads

Surplus (α i ) : α i = Servicei - Servicelagging

= wi Si - wi v Scheduler selects threads in increasing order of

surplus

Page 63: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

63

Surplus Fair Sched with Surplus Fair Sched with Short JobsShort Jobs

0 5 10 15 20 25 30 35 40

Surplus Fair Sched

Time (s)

00 5 10 15 20 25 30 35 40

Ideal

Time (s)

5

10

15

20

Nu

mb

er o

f it

era

tio

ns

(10

5)

Surplus Fair Scheduling does proportionate allocation

J1, wt=20J2-J21, wt=1x20J_short, wt=5

J1, wt=20J2-J21, wt=1x20J_short, wt=5

Page 64: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

64

Proportionate Proportionate AllocationAllocation

0

1

2

3

4

5

6

7

1:1 1:2 1:4 1:7

Processor Shares received by two web servers

Processor Allocation

(Normalized)

Weight Assignment

Page 65: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

65

Application IsolationApplication IsolationMPEG decoder with background compilations

0

10

20

30

40

50

0 2 4 6 8 10

Frame Rate (frames/sec)

Number of background compilations

Surplus FairTime-sharing

Page 66: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

66

Scheduling OverheadScheduling Overhead

0

2

4

6

8

10

0 10 20 30 40 50

Surplus FairTime-sharing

Context switch time

(microsec)

Number of processes

Context-switch time(~10μ s) vs. Quantum size (~100ms)

Page 67: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

67

SummarySummary Existing proportional-share algorithms

inadequate for multiprocessors Readjustment Algorithm can reduce unfairness Surplus Fair Scheduling practical for

multiprocessors Achieves proportional fairness, isolation Has low overhead

Heuristics for incorporating processor affinity Source code available at:

http://lass.cs.umass.edu/software/gms

Page 68: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

68

Scheduler ActivationsScheduler Activations

In a multiprocessor system, threads could be managed in: User Space only

Key feature: Cooperative Kernel Space only

Key feature: Preemptive User Space on top of Kernel Space

Some User-Level Threads-------------------------

Some Kernel-Level Threads-------------------------

Some CPUs

Page 69: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

69

Scheduler activationsScheduler activations

User level scheduling of threads Application maintains scheduling queue

Kernel allocates threads to tasks Makes upcall to scheduling code in application

when thread is blocked for I/O or preempted Only user level involved if blocked for critical

section User level will block on kernel calls

Kernel returns control to application scheduler

Page 70: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

70

User-Level Thread User-Level Thread ManagementManagement

Sample measurements were obtained by Firefly running Topaz (in microsecs).

Procedure call: 7 microsecs.

Kernel Trap: 19 microsecs.

Operation

FastThreads

Topaz Threads

Ultrix Processes

Null Fork 34 948 11300

Signal-Wait

37 441 1840

Page 71: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

71

User-level on top of User-level on top of kernel threadskernel threads

Three layers: Some User-Level Threads (how many?)

--------------Some Kernel-Level Threads (how many?)

--------------Some CPUs (how many?)

Problems caused by: Kernel threads are

scheduled obliviously with respect to the user-level thread state

Kernel threads block, resume, and are preempted without notification to user level

Page 72: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

72

The way out: Scheduler The way out: Scheduler ActivationActivation

Processor allocation is done by the kernel

Thread scheduling is done by each address space

The kernel notifies the address space thread scheduler of every event affecting the address space

The address space notifies kernel of the subset of user-level events that can affect processor allocation decisions

Page 73: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

73

Scheduler Activation (cont.)Scheduler Activation (cont.) Goal

Design a kernel interface and a user-level thread package that can combine the functionality of kernel threads with performance and flexibility of user-level threads.

Secondary Goal:

If thread operations do not involve kernel intervention the achieved performance should be similar to user-level threads.

Page 74: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

74

Scheduler Activation (cont.)Scheduler Activation (cont.) The difficulty is IN achieving all the above:

in a multi-programmed/multiprocessor system the required control and scheduling information is distributed between the kernel and the user-space!

To be able to manage the application’s parallelism successfully: user-level support routines (software) must be

aware of kernel events (processor reallocations, I/O requests and completions etc.) –this often is all HIDDEN stuff from the application.

Page 75: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

75

Scheduler Activation (cont.)Scheduler Activation (cont.)1. Provide each application with a VIRTUAL MULTIPROCESSOR.

each application knows how many processors are allocated. each application has total control over the processors and its own scheduling. The OS has control over the allocation of processors among address spaces and

ability to change the number of processors assigned to an application during its execution.

2. To achieve the above, the kernel NOTIFIES. the address space thread scheduler for kernel events affecting it! application has complete knowledge of its scheduling state.

3. The user-space thread system NOTIFIES the kernel for kernel operations that may affect the allocation of processors (helping in good performance!).

Kernel mechanism that achieves the above: SCHEDULER ACTIVATIONS.

Page 76: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

76

Scheduler Activation Data Scheduler Activation Data StructuresStructures

Each scheduler activation maintains two execution stacks: One mapped into the kernel. Another mapped into the application address space.

Each user-level thread is provided with its own stack at creation time.

When a user-level thread calls into kernel, it uses its activation’s kernel stack.

The user-level thread scheduler runs on the activation’s user-level stack.

The kernel maintains. Activation control block: records the state of the scheduler

activation’s thread when it blocks in the kernel or it is preempted. Keeps track of which thread is running on every scheduler

activation.

Page 77: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

77

When a new Program is When a new Program is started…started…

Kernel creates a scheduler activation Assigns to it a processor Does an upcall to the user-space application (at a fixed

entry point). THEN The user-lever system Receives the upcall and uses the scheduler activation as

the context to initialize itself and start running the first (main) thread!

The first thread may ask for the creation of more threads and additional processors.

For each processor the kernel will create a new activation and upcall the user-level to say that the new processor is there

The user-level picks a thread and executes it on the new processor.

Page 78: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

78

Notify the user level of an Notify the user level of an event…event…

The kernel created a new scheduler activation Assigns to it a new processor and upcalls the user-space. As soon as the upcall happens

An event can be processed Run a user-level thread Trap and block into the kernel

KEY DIFFERENCE between Scheduler Activation and Kernel Threads If an activation’s user-level thread is stopped by the kernel, the thread is never

directly resumed by the kernel! INSTEAD a new scheduler activation is done whose prime objective is to notify the

user-space that the thread has been suspended. Then… the user-level thread system removes the state of the thread from the “old”

activation. Tells the kernel that the “old” activation can be reused. Decides which thread to run on the processor

INVARIANT: number of activations = number of processors allocated to a job.

Page 79: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

79

Events that the Kernel vectors to user-Events that the Kernel vectors to user-Space as ActivationsSpace as Activations

Add-this-processor(processor #)/* Execute a runnable user-level thread */

Processor-has-been-preempted(preempted activation # and its machine state)/* Return to the ready list the user-level thread that was

executing in the context of the preempted scheduler activation */

Scheduler-activation-has-blocked(blocked activation #)/* the blocked activation no longer uses its processor */

Scheduler-activation-has-been-unlocked(unblocked activation # and its machine state)/*Return to the ready list the user-level thread that was executing in

context of the blocked scheduler activation */

Page 80: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

80

I/O happens for Thread (1)…..I/O happens for Thread (1)…..

(4)(3)(2)

(1)

User Program

User-LevelRuntime System

Operating System Kernel

Processors

AddProcessor

AddProcessor

(A) (B)

T1

Page 81: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

81

A’s Thread has blocked on an I/O requestA’s Thread has blocked on an I/O request

(4)

(3)(2)(1)

User Program

User-LevelRuntime System

Processors

B

(A) (B) ( C )

A’s thread has blockedOperating System Kernel

T2

Page 82: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

82

(4)(3)(2)

(1)

User Program

User-LevelRuntime System

Processors

(A) (B) ( C )Operating System Kernel

(1)

(D)

A’s Thread I/O completedA’s Thread I/O completed

T3

Page 83: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

83

A’s Thread resumes on Scheduler Activation D

(4)(3)(2)

User Program

User-LevelRuntime System

Processors

( C )Operating System Kernel

(1)

(D)

(1)

T4

Page 84: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

84

User-Level Events Notifying the KernelUser-Level Events Notifying the KernelAdd-more-processors (additional # of processor needed)

/* allocate more processors to this address space and start them running scheduler activations */

This-processor-is-idle()/* preempt this processor if another address space needs it */

The kernel’s processor allocator can favor address spaces that use fewer processors (and penalize those that use more).

Page 85: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

85

Thread Operation Thread Operation Latencies (Latencies (sec.)sec.)

Page 86: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

86

SpeedupSpeedup

Page 87: Advanced  Operating Systems

Univ. of Tehran Distributed Operating Systems

87

Next Lecture Concurrency References