Univ. of Tehran Distributed Operating Systems
1
Advanced Advanced
Operating SystemsOperating Systems
University of TehranDept. of EE and Computer Engineering
By:Dr. Nasser Yazdani
Lecture 6: SchedulingScheduling
Univ. of Tehran Distributed Operating Systems
2
How efficiently use resourcesHow efficiently use resources Sharing CPU and other resources of the systm. References
Surplus Fair Scheduling: A Proportional-Share CPU Scheduling Algorithm for Symmetric Multiprocessors
Scheduler Activations: Effective Kernel Support for User-Level Management of Parallelism",
Condor- A Hunter of Idle Workstation Virtual-Time Round-Robin: An O(1) Proportional
Share Scheduler A SMART Scheduler for Multimedia Applications Linux CPU scheduling,
Univ. of Tehran Distributed Operating Systems
3
OutlineOutline Scheduling Scheduling policies. Scheduling on Multiprocessor Thread scheduling
Univ. of Tehran Distributed Operating Systems
4
What is Scheduling?What is Scheduling? OS policies and mechanisms to allocates
resources to entities. An O/S often has many pending tasks.
Threads, async callbacks, device input. The order may matter.
Policy, correctness, or efficiency. Providing sufficient control is not easy.
Mechanisms must allow policy to be expressed. A good scheduling policy ensures that the most
important entity gets the resources it needs
Univ. of Tehran Distributed Operating Systems
5
Why Scheduling?Why Scheduling? This topic was popular in the days of time
sharing, when there was a shortage of resources.
It seemed irrelevant in era of PCs and workstations, when resources were plenty.
Now the topic is back from the dead to handle massive Internet servers with paying customers Where some customers are more important than others
Univ. of Tehran Distributed Operating Systems
6
Resources to Schedule?Resources to Schedule? Resources you might want to schedule:
CPU time, physical memory, disk and network I/O, and I/O bus bandwidth.
Entities that you might want to give resources to: users, processes, threads, web requests, or MIT accounts.
Univ. of Tehran Distributed Operating Systems
7
Key problems ?Key problems ? Gap between desired policy and available
mechanism. The desired policies often include elements that not implementable. Furthermore, often there are many conflicting goals (low latency, high throughput, and fairness), and the scheduler must make a trade-off between the goals.
Interaction between different schedulers. One have to take a systems view. Just optimizing the CPU scheduler may do little to for the overall desired policy.
Univ. of Tehran Distributed Operating Systems
8
Scheduling Policy Scheduling Policy ExamplesExamples
Allocate cycles in proportion to money.
Maintain high throughput under high load.
Never delay high pri thread by > 1ms. Maintain good interactive response. Can we enforce policy with the thread
scheduler?
Univ. of Tehran Distributed Operating Systems
9
General plan General plan Understand where scheduling is
occurring. Expose scheduling decisions, allow
control. Account for resource consumption, to
allow intelligent control.
Univ. of Tehran Distributed Operating Systems
10
Parallel ComputingParallel Computing
Speedup - the final measure of success Parallelism vs Concurrency
Actual vs possible by application Granularity
Size of the concurrent tasks Reconfigurability
Number of processors Communication cost Preemption v. non-preemption Co-scheduling
Some things better scheduled together
Univ. of Tehran Distributed Operating Systems
11
Best place for scheduling?Best place for scheduling?
Application is in best position to know its own specific scheduling requirements Which threads run best simultaneously Which are on Critical path But Kernel must make sure all play fairly
MACH Scheduling Lets process provide hints to discourage running Possible to hand off processor to another thread
Makes easier for Kernel to select next thread Allow interleaving of concurrent threads
Leaves low level scheduling in Kernel Based on higher level info from application space
Univ. of Tehran Distributed Operating Systems
12
ExampleExample
Give each process one equal CPU time. Interrupt every 10 msec and then selecting another in a round-robin fashion. Works if processes are compute-bound. What if a process gives up some of its 10 ms to wait for input?
How long should the quantum be? is 10 msec the right answer? Shorter quantum => better interactive performance, but lowers overall system throughput.
What if the environment computes for 1 msec and sends an IPC to the file server environment? Shouldn't the file server get more CPU time because it operates on behalf of all other functions?
Potential improvements: track "recent" CPU use (e.g., over the last second) and always run environment with least recent CPU use. (Still, if you sleep long enough you lose.) Other solution: directed yield; specify on the yield to which environment you are donating the remainder of the quantuam (e.g., to the file server so that it can compute on the environment's behalf).
Univ. of Tehran Distributed Operating Systems
13
Scheduling is a System Scheduling is a System ProblemProblem
Thread/process scheduler can’t enforce policies by itself.
Needs cooperation from: All resource schedulers. Software structure.
Conflicting goals may limit effectiveness.
Univ. of Tehran Distributed Operating Systems
14
GoalsGoals Low latency
People typing at editors want fast response - Network services can be latency-bound,
not CPU-bound High throughput
Minimize context switches to avoid wasting CPU, TLB
misses, cache misses, even page faults. Fairness
Univ. of Tehran Distributed Operating Systems
15
Scheduling ApproachesScheduling Approaches FIFO
+ Fair- High latency
Round robin + fair+ low latency- poor throughput
STCF/SRTCF (shortest time/remaining time to completion first)+ low latency+ high throughput- unfair: Starvation
Univ. of Tehran Distributed Operating Systems
16
Shortest Job First (SJF)Shortest Job First (SJF) Two types:
Non-preemptive Preemptive
Requirement: the elapse time needs to be known in advance
Optimal if all jobs are available simultaneously (provable)
Is SJF optimal if all the jobs are not available simultaneously?
Univ. of Tehran Distributed Operating Systems
17
Preemptive SJFPreemptive SJF Also called Shortest Remaining Time
First Schedule the job with the shortest
remaining time required to complete Requirement: the elapse time needs
to be known in advance
Univ. of Tehran Distributed Operating Systems
18
Interactive SchedulingInteractive Scheduling Usually preemptive
Time is sliced into quantum (time intervals) Decision made at the beginning of each quantum
Performance Criteria Min Response time best proportionality
Representative algorithms: Priority-based Round-robin Multi Queue & Multi-level Feedback Shortest process time Guaranteed Scheduling Lottery Scheduling Fair Sharing Scheduling
Univ. of Tehran Distributed Operating Systems
19
Priority SchedulingPriority Scheduling
Each job is assigned a priority with FCFS within each priority level.
Select highest priority job over lower ones.
Rational: higher priority jobs are more mission-critical Example: DVD movie player vs. send email
Problems: May not give the best AWT indefinite blocking or starvation a process
Univ. of Tehran Distributed Operating Systems
20
Set PrioritySet Priority Two approaches
Static (for system with well known and regular application behaviors)
Dynamic (otherwise) Priority may be based on:
Cost to user. Importance of user. Aging Percentage of CPU time used in last X
hours.
Univ. of Tehran Distributed Operating Systems
21
Pitfall: Priority Pitfall: Priority InversionInversion
• Low-priority thread X holds a lock.• High-priority thread Y waits for the lock.• Medium-priority thread Z pre-empts X.• Y is indefinitely delayed despite high priority. When a higher priority process needs to read or modify
kernel data that are currently being accessed by a lower priority process.
The higher priority process must wait! But the lower priority cannot proceed quickly due to
scheduling. Solution: priority inheritance
When a lower-priority process accesses a resource, it inherits high-priority until it is done with the resource in question. And then its priority reverses to its natural value.
Univ. of Tehran Distributed Operating Systems
22
Pitfall: Long Code Pitfall: Long Code PathsPaths
Large-granularity locks are convenient. Non-pre-emptable threads are an
extreme case. May delay high-priority processing.
Univ. of Tehran Distributed Operating Systems
23
Pitfall: EfficiencyPitfall: Efficiency Efficient disk use requires unfairness.
Shortest-seek-first vs FIFO. Read-ahead vs data needed now.
Efficient paging policy creates delays. O/S may swap out my idle Emacs to free
memory. What happens when I type a key?
Thread scheduler doesn’t control these.
Univ. of Tehran Distributed Operating Systems
24
Pitfall: Multiple Pitfall: Multiple SchedulersSchedulers
Every resource with multiple waiting threads has a scheduler.
Locks, disk driver, memory allocator. The schedulers may not cooperate
or even be explicit.
Univ. of Tehran Distributed Operating Systems
25
Example: UNIXExample: UNIX Goals:
Simple kernel concurrency model. Limited pre-emption.
Quick response to device interrupts. Many kinds of execution
environments. Some transitions are not possible. Some transitions can’t be controlled.
Univ. of Tehran Distributed Operating Systems
26
UNIX EnvironmentsUNIX Environments
User
KernelKernelHalf
ProcessUser Half
KernelHalf
ProcessUser Half
TimerSoft Interrupt
NetworkSoft Interrupt
DeviceInterrupt
DeviceInterrupt
TimerInterrupt
Univ. of Tehran Distributed Operating Systems
27
UNIX: Process User HalfUNIX: Process User Half Interruptable. Pre-emptable via timer interrupt.
We don’t trust user processes. Enters kernel half via system calls, faults.
Save user state on stack. Raise privilege level. Jump to known point in the kernel.
Each process has a stack and saved registers.
Univ. of Tehran Distributed Operating Systems
28
UNIX: Process Kernel UNIX: Process Kernel HalfHalf
Executes system calls for its user process. May involve many steps separated by sleep().
Interruptable. May postpone interrupts in critical sections.
Not pre-emptable. Simplifies concurrent programming. No context switch until voluntary sleep(). No user process runs if a kernel half is runnable.
Each kernel half has a stack and saved registers.
Many processes may be sleep()ing in the kernel.
Univ. of Tehran Distributed Operating Systems
29
UNIX: Device InterruptsUNIX: Device Interrupts Device hardware asks CPU for an
interrupt. To signal new input or completion of output. Cheaper than polling, lower latency.
Interrupts take priority over u/k half. Save current state on stack. Mask other interrupts. Run interrupt handler function. Return and restore state.
The real-time clock is a device.
Univ. of Tehran Distributed Operating Systems
30
UNIX: Soft InterruptsUNIX: Soft Interrupts Device interrupt handlers must be short. Expensive processing deferred to soft intr.
Can’t do it in kernel-half: process not known. Example: TCP protocol input processing. Example: periodic process scheduling.
Devices can interrupt soft intr. Soft intr has priority over user & kernel
processes. But only entered on return from device intr. Similar to async callback. Can’t be high-pri thread, since no pre-emption.
Univ. of Tehran Distributed Operating Systems
31
UNIX EnvironmentsUNIX Environments
User
KernelKernelHalf
ProcessUser Half
KernelHalf
ProcessUser Half
Soft Interrupt
DeviceInterrupt
Transfer w/ choiceTransfer, limited choiceTransfer, no choice
Univ. of Tehran Distributed Operating Systems
32
Pitfall: Server Pitfall: Server Processes Processes
User-level servers schedule requests. X11, DNS, NFS.
They usually don’t know about kernel’s scheduling policy.
Network packet scheduling also interferes.
Univ. of Tehran Distributed Operating Systems
33
Pitfall: Hardware Pitfall: Hardware SchedulersSchedulers
Memory system scheduled among CPUs.
I/O bus scheduled among devices. Interrupt controller chooses next
interrupt. Hardware doesn’t know about O/S
policy. O/S often doesn’t understand
hardware.
Univ. of Tehran Distributed Operating Systems
34
Time QuantumTime Quantum Time slice too large
FIFO behavior Poor response time
Time slice too small Too many context switches (overheads) Inefficient CPU utilization
Heuristic: 70-80% of jobs block within time-slice
Typical time-slice 10 to 100 ms Time spent in system depends on size of
job.
Univ. of Tehran Distributed Operating Systems
35
Multi-Queue Multi-Queue SchedulingScheduling
Hybrid between priority and round-robin Processes assigned to one queue permanently Scheduling between queues
Fixed Priorities % CPU spent on queue
Example System processes Interactive programs Background Processes Student Processes
Address the starvation and infinite blocking problems
Univ. of Tehran Distributed Operating Systems
36
Multi-Queue Multi-Queue Scheduling: ExampleScheduling: Example
20%
50%
30%
Univ. of Tehran Distributed Operating Systems
37
Multi-Processor Scheduling: Multi-Processor Scheduling: Load SharingLoad Sharing
Decides Which process to run? How long does it run Where to run it?
(CPU (horsepower))
Process 1 Process 2 Process n
I want to ride it
…
Univ. of Tehran Distributed Operating Systems
38
Multi-Processor Multi-Processor Scheduling ChoicesScheduling Choices
Self-Scheduled Each CPU dispatches a job from the
ready queue Master-Slave
One CPU schedules the other CPUs Asymmetric
One CPU runs the kernel and the others runs the user applications.
One CPU handles network and the other handles applications
Univ. of Tehran Distributed Operating Systems
39
Gang Scheduling for Gang Scheduling for Multi-ProcessorsMulti-Processors
A collection of processes belonging to one job
All the processes are running at the same time If one process is preempted, all the
processes of the gang are preempted. Helps to eliminate the time a process
spends waiting for other processes in its parallel computation.
Univ. of Tehran Distributed Operating Systems
40
Scheduling ApproachesScheduling Approaches Multilevel feedback queues
A job starts with the highest priority queue
If time slice expires, lower the priority by one level
If time slice does not expire, raise the priority by one level
Age long-running jobs
Univ. of Tehran Distributed Operating Systems
41
Lottery SchedulingLottery Scheduling Claim
Priority-based schemes are ad hoc Lottery scheduling
Randomized scheme Based on a currency abstraction Idea:
Processes own lottery tickets CPU randomly draws a ticket and execute
the corresponding process
Univ. of Tehran Distributed Operating Systems
42
Properties of Lottery Properties of Lottery SchedulingScheduling
Guarantees fairness through probability
Guarantees no starvation, as long as each process owns one ticket
To approximate SRTCF Short jobs get more tickets Long jobs get fewer
Univ. of Tehran Distributed Operating Systems
43
Partially Consumed Partially Consumed TicketsTickets
What if a process is chosen, but it does not consume the entire time slice? The process receives compensation
tickets Idea
Get chosen more frequently But with shorter time slice
Univ. of Tehran Distributed Operating Systems
44
Ticket CurrenciesTicket Currencies Load Insulation
A process can dynamically change its ticketing policies without affecting other processes
Need to convert currencies before transferring tickets
base:3000
1000 2000
Alice:200 Bob:100
100200
process1:500
200 300
thread1 thread2
process2:100
100
thread3
Univ. of Tehran Distributed Operating Systems
45
CondorCondor Identifies idle workstations and
schedules background jobs on them Guarantees job will eventually
complete Analysis of workstation usage patterns
Only 30% Remote capacity allocation algorithms
Up-Down algorithm Allow fair access to remote capacity
Remote execution facilities Remote Unix (RU)
Univ. of Tehran Distributed Operating Systems
46
Condor IssuesCondor Issues Leverage: performance measure
Ratio of the capacity consumed by a job remotely to the capacity consumed on the home station to support remote execution
Checkpointing: save the state of a job so that its execution can be resumed
Transparent placement of background jobs Automatically restart if a background job
fails Users expect to receive fair access Small overhead
Univ. of Tehran Distributed Operating Systems
47
Condor - schedulingCondor - scheduling Hybrid of centralized static and
distributed approach Each workstation keeps own state
information and schedule Central coordinator assigns capacity
to workstations Workstations use capacity to schedule
Univ. of Tehran Distributed Operating Systems
48
Real time SystemsReal time Systems
Issues are scheduling and interrupts Must complete task by a particular deadline Examples:
Accepting input from real time sensors Process control applications Responding to environmental events
How does one support real time systems If short deadline, often use a dedicated system Give real time tasks absolute priority Do not support virtual memory
Use early binding
Univ. of Tehran Distributed Operating Systems
49
Real time SchedulingReal time Scheduling
To initiate, must specify Deadline Estimate/upper-bound on resources
System accepts or rejects If accepted, agrees that it can meet the
deadline Places job in calendar, blocking out the
resources it will need and planning when the resources will be allocated
Some systems support priorities But this can violate the RT assumption for
already accepted jobs
Univ. of Tehran Distributed Operating Systems
50
User-level Thread User-level Thread SchedulingScheduling
Possible Scheduling
50-msec process quantum
run 5 msec/CPU burst
Univ. of Tehran Distributed Operating Systems
51
Kernel-level Thread Kernel-level Thread SchedulingScheduling
Possible scheduling 50-msec process
quantum threads run 5
msec/CPU burst
Univ. of Tehran Distributed Operating Systems
52
Thread Scheduling Thread Scheduling ExamplesExamples
Solaris 2 priority-based process scheduling with four scheduling classes:
real-time, system, time sharing, interactive. A set of priorities within each class. The scheduler converts the class-specific priorities into global
priorities and selects to run the thread with the highest global priority. The thread runs until (1) it blocks, (2) it uses its time slice, or (3) it is preempted by a higher priority threads.
JVM schedules threads using a preemptive, priority-based
scheduling algorithm. schedules the ``runnable'' thread with the highest priority. If
two threads have the same priority, JVM applies FIFO. schedules a thread to run if (1) other thread exits the
``runnable state'' due to block(), exit(), suspend() or stop() methods; (2) a thread with higher priority enters the ``runnable''state.
Univ. of Tehran Distributed Operating Systems
53
Surplus Fair Scheduling Surplus Fair Scheduling MotivationMotivation
Diverse web and multimedia applications popular HTTP, Streaming, e-commerce, games, etc.
Applications hosted on large servers (typically multiprocessors)
Key Challenge: Design OS mechanisms for Resource Management
End-stationsNetwork
Server
Streaming
E-commerce
Web
Univ. of Tehran Distributed Operating Systems
54
Requirements for OS Requirements for OS Resource ManagementResource Management
Fair, Proportionate Allocation Eg: 20% for http, 30% for streaming, etc.
Application Isolation Misbehaving/overloaded applications should
not affect other applications Efficiency
OS mechanisms should have low overheads
Focus: Achieving these objectives for CPU scheduling on multiprocessor machines
Univ. of Tehran Distributed Operating Systems
55
Proportional-Share Proportional-Share SchedulingScheduling
Associate a weight with each application and allocate CPU bandwidth proportional to weight
Existing Algorithms Ideal algorithm: Generalized Processor Sharing E.g.: WFQ, SFQ, SMART, BVT, etc.
Question: Are the existing algorithms adequate for multiprocessor systems?
Wt=2 Wt=1
2/3 1/3 CPU bandwidth
Applications
Univ. of Tehran Distributed Operating Systems
56
Starvation ProblemStarvation Problem SFQ : Start tag of a thread ( Service / weight ) Schedules the thread with minimum start tag
CPU 1
CPU 2
0 100 1000 1100
C arrives
B starves
Time
A(Wt=100)
B (Wt=1)
C(Wt=1)
S1=10 S1=11S1=0 S1=1
S2=0 S2=100 S2=1000
S3=10 S3=110CPU 2
. . .
. . .
. . .
. . .
Univ. of Tehran Distributed Operating Systems
57
Weight ReadjustmentWeight Readjustment Reason for starvation:
Infeasible Weight Assignment (eg: 1:100 for 2 CPUs)
Accounting is different from actual allocation
Observation: A thread can’t consume more than 1 CPU
bandwidth A thread can be assigned at most (1/p) of
total CPU bandwidth Feasibility Constraint:
pw
w
j
j
i 1
Univ. of Tehran Distributed Operating Systems
58
Weight Readjustment Weight Readjustment (contd.)(contd.)
. . .
. . .
CPU 1 CPU 2 CPU 3 CPU p
Decreasing Order of weights
Efficient: Algorithm is O(p) Can be combined with existing algorithms
Goal: Convert given weights to feasible weights
Univ. of Tehran Distributed Operating Systems
59
Effect of ReadjustmentEffect of Readjustment
0
5
10
15
20
25
30
0 10 20 30 40 50
SFQ without Readjustment
Nu
mb
er o
f it
era
tio
ns
(10
5)
Time (s)0 10 20 30 40 50
SFQ with Readjustment
Time (s)
Weight Readjustment gets rid of starvation problem
A (wt=10)
B (wt=1)
C (wt=1)
A (wt=10)
B (wt=1)
C (wt=1)
Univ. of Tehran Distributed Operating Systems
60
Short Jobs ProblemShort Jobs Problem
0 5 10 15 20 25 30 35 40
SFQ
J1, wt=20J2-J21, wt=1x20J_short, wt=5
Time (s)
0
5
10
15
20
0 5 10 15 20 25 30 35 40
Ideal
J1, wt=20J2-J21, wt=1x20J_short, wt=5
Time (s)
Frequent arrivals and departures of short jobs
SFQ does unfair allocation!
Nu
mb
er o
f it
era
tio
ns
(10
5)
Univ. of Tehran Distributed Operating Systems
61
Surplus Fair SchedulingSurplus Fair Scheduling
Servicereceived
by thread i Ideal
Actual
Time
Surplus
t
Scheduler picks the threads with least surplus values Lagging threads get closer to their due Threads that are ahead are restrained
Surplus = ServiceActual - ServiceIdeal
Univ. of Tehran Distributed Operating Systems
62
Surplus Fair Scheduling Surplus Fair Scheduling (contd.)(contd.)
Start tag (Si) : Weighted Service of thread i
Si = Servicei / wi
Virtual time (v) : Minimum start tag of all runnable threads
Surplus (α i ) : α i = Servicei - Servicelagging
= wi Si - wi v Scheduler selects threads in increasing order of
surplus
Univ. of Tehran Distributed Operating Systems
63
Surplus Fair Sched with Surplus Fair Sched with Short JobsShort Jobs
0 5 10 15 20 25 30 35 40
Surplus Fair Sched
Time (s)
00 5 10 15 20 25 30 35 40
Ideal
Time (s)
5
10
15
20
Nu
mb
er o
f it
era
tio
ns
(10
5)
Surplus Fair Scheduling does proportionate allocation
J1, wt=20J2-J21, wt=1x20J_short, wt=5
J1, wt=20J2-J21, wt=1x20J_short, wt=5
Univ. of Tehran Distributed Operating Systems
64
Proportionate Proportionate AllocationAllocation
0
1
2
3
4
5
6
7
1:1 1:2 1:4 1:7
Processor Shares received by two web servers
Processor Allocation
(Normalized)
Weight Assignment
Univ. of Tehran Distributed Operating Systems
65
Application IsolationApplication IsolationMPEG decoder with background compilations
0
10
20
30
40
50
0 2 4 6 8 10
Frame Rate (frames/sec)
Number of background compilations
Surplus FairTime-sharing
Univ. of Tehran Distributed Operating Systems
66
Scheduling OverheadScheduling Overhead
0
2
4
6
8
10
0 10 20 30 40 50
Surplus FairTime-sharing
Context switch time
(microsec)
Number of processes
Context-switch time(~10μ s) vs. Quantum size (~100ms)
Univ. of Tehran Distributed Operating Systems
67
SummarySummary Existing proportional-share algorithms
inadequate for multiprocessors Readjustment Algorithm can reduce unfairness Surplus Fair Scheduling practical for
multiprocessors Achieves proportional fairness, isolation Has low overhead
Heuristics for incorporating processor affinity Source code available at:
http://lass.cs.umass.edu/software/gms
Univ. of Tehran Distributed Operating Systems
68
Scheduler ActivationsScheduler Activations
In a multiprocessor system, threads could be managed in: User Space only
Key feature: Cooperative Kernel Space only
Key feature: Preemptive User Space on top of Kernel Space
Some User-Level Threads-------------------------
Some Kernel-Level Threads-------------------------
Some CPUs
Univ. of Tehran Distributed Operating Systems
69
Scheduler activationsScheduler activations
User level scheduling of threads Application maintains scheduling queue
Kernel allocates threads to tasks Makes upcall to scheduling code in application
when thread is blocked for I/O or preempted Only user level involved if blocked for critical
section User level will block on kernel calls
Kernel returns control to application scheduler
Univ. of Tehran Distributed Operating Systems
70
User-Level Thread User-Level Thread ManagementManagement
Sample measurements were obtained by Firefly running Topaz (in microsecs).
Procedure call: 7 microsecs.
Kernel Trap: 19 microsecs.
Operation
FastThreads
Topaz Threads
Ultrix Processes
Null Fork 34 948 11300
Signal-Wait
37 441 1840
Univ. of Tehran Distributed Operating Systems
71
User-level on top of User-level on top of kernel threadskernel threads
Three layers: Some User-Level Threads (how many?)
--------------Some Kernel-Level Threads (how many?)
--------------Some CPUs (how many?)
Problems caused by: Kernel threads are
scheduled obliviously with respect to the user-level thread state
Kernel threads block, resume, and are preempted without notification to user level
Univ. of Tehran Distributed Operating Systems
72
The way out: Scheduler The way out: Scheduler ActivationActivation
Processor allocation is done by the kernel
Thread scheduling is done by each address space
The kernel notifies the address space thread scheduler of every event affecting the address space
The address space notifies kernel of the subset of user-level events that can affect processor allocation decisions
Univ. of Tehran Distributed Operating Systems
73
Scheduler Activation (cont.)Scheduler Activation (cont.) Goal
Design a kernel interface and a user-level thread package that can combine the functionality of kernel threads with performance and flexibility of user-level threads.
Secondary Goal:
If thread operations do not involve kernel intervention the achieved performance should be similar to user-level threads.
Univ. of Tehran Distributed Operating Systems
74
Scheduler Activation (cont.)Scheduler Activation (cont.) The difficulty is IN achieving all the above:
in a multi-programmed/multiprocessor system the required control and scheduling information is distributed between the kernel and the user-space!
To be able to manage the application’s parallelism successfully: user-level support routines (software) must be
aware of kernel events (processor reallocations, I/O requests and completions etc.) –this often is all HIDDEN stuff from the application.
Univ. of Tehran Distributed Operating Systems
75
Scheduler Activation (cont.)Scheduler Activation (cont.)1. Provide each application with a VIRTUAL MULTIPROCESSOR.
each application knows how many processors are allocated. each application has total control over the processors and its own scheduling. The OS has control over the allocation of processors among address spaces and
ability to change the number of processors assigned to an application during its execution.
2. To achieve the above, the kernel NOTIFIES. the address space thread scheduler for kernel events affecting it! application has complete knowledge of its scheduling state.
3. The user-space thread system NOTIFIES the kernel for kernel operations that may affect the allocation of processors (helping in good performance!).
Kernel mechanism that achieves the above: SCHEDULER ACTIVATIONS.
Univ. of Tehran Distributed Operating Systems
76
Scheduler Activation Data Scheduler Activation Data StructuresStructures
Each scheduler activation maintains two execution stacks: One mapped into the kernel. Another mapped into the application address space.
Each user-level thread is provided with its own stack at creation time.
When a user-level thread calls into kernel, it uses its activation’s kernel stack.
The user-level thread scheduler runs on the activation’s user-level stack.
The kernel maintains. Activation control block: records the state of the scheduler
activation’s thread when it blocks in the kernel or it is preempted. Keeps track of which thread is running on every scheduler
activation.
Univ. of Tehran Distributed Operating Systems
77
When a new Program is When a new Program is started…started…
Kernel creates a scheduler activation Assigns to it a processor Does an upcall to the user-space application (at a fixed
entry point). THEN The user-lever system Receives the upcall and uses the scheduler activation as
the context to initialize itself and start running the first (main) thread!
The first thread may ask for the creation of more threads and additional processors.
For each processor the kernel will create a new activation and upcall the user-level to say that the new processor is there
The user-level picks a thread and executes it on the new processor.
Univ. of Tehran Distributed Operating Systems
78
Notify the user level of an Notify the user level of an event…event…
The kernel created a new scheduler activation Assigns to it a new processor and upcalls the user-space. As soon as the upcall happens
An event can be processed Run a user-level thread Trap and block into the kernel
KEY DIFFERENCE between Scheduler Activation and Kernel Threads If an activation’s user-level thread is stopped by the kernel, the thread is never
directly resumed by the kernel! INSTEAD a new scheduler activation is done whose prime objective is to notify the
user-space that the thread has been suspended. Then… the user-level thread system removes the state of the thread from the “old”
activation. Tells the kernel that the “old” activation can be reused. Decides which thread to run on the processor
INVARIANT: number of activations = number of processors allocated to a job.
Univ. of Tehran Distributed Operating Systems
79
Events that the Kernel vectors to user-Events that the Kernel vectors to user-Space as ActivationsSpace as Activations
Add-this-processor(processor #)/* Execute a runnable user-level thread */
Processor-has-been-preempted(preempted activation # and its machine state)/* Return to the ready list the user-level thread that was
executing in the context of the preempted scheduler activation */
Scheduler-activation-has-blocked(blocked activation #)/* the blocked activation no longer uses its processor */
Scheduler-activation-has-been-unlocked(unblocked activation # and its machine state)/*Return to the ready list the user-level thread that was executing in
context of the blocked scheduler activation */
Univ. of Tehran Distributed Operating Systems
80
I/O happens for Thread (1)…..I/O happens for Thread (1)…..
(4)(3)(2)
(1)
User Program
User-LevelRuntime System
Operating System Kernel
Processors
AddProcessor
AddProcessor
(A) (B)
T1
Univ. of Tehran Distributed Operating Systems
81
A’s Thread has blocked on an I/O requestA’s Thread has blocked on an I/O request
(4)
(3)(2)(1)
User Program
User-LevelRuntime System
Processors
B
(A) (B) ( C )
A’s thread has blockedOperating System Kernel
T2
Univ. of Tehran Distributed Operating Systems
82
(4)(3)(2)
(1)
User Program
User-LevelRuntime System
Processors
(A) (B) ( C )Operating System Kernel
(1)
(D)
A’s Thread I/O completedA’s Thread I/O completed
T3
Univ. of Tehran Distributed Operating Systems
83
A’s Thread resumes on Scheduler Activation D
(4)(3)(2)
User Program
User-LevelRuntime System
Processors
( C )Operating System Kernel
(1)
(D)
(1)
T4
Univ. of Tehran Distributed Operating Systems
84
User-Level Events Notifying the KernelUser-Level Events Notifying the KernelAdd-more-processors (additional # of processor needed)
/* allocate more processors to this address space and start them running scheduler activations */
This-processor-is-idle()/* preempt this processor if another address space needs it */
The kernel’s processor allocator can favor address spaces that use fewer processors (and penalize those that use more).
Univ. of Tehran Distributed Operating Systems
85
Thread Operation Thread Operation Latencies (Latencies (sec.)sec.)
Univ. of Tehran Distributed Operating Systems
86
SpeedupSpeedup
Univ. of Tehran Distributed Operating Systems
87
Next Lecture Concurrency References