25
Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Unit OS4: Scheduling and Dispatch Dispatch 4.4. Windows Thread Scheduling 4.4. Windows Thread Scheduling

Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

Embed Size (px)

Citation preview

Page 1: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas PolzeWindows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze

Unit OS4: Scheduling and DispatchUnit OS4: Scheduling and Dispatch

4.4. Windows Thread Scheduling4.4. Windows Thread Scheduling

Page 2: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

2

Copyright NoticeCopyright Notice© 2000-2005 David A. Solomon and Mark Russinovich© 2000-2005 David A. Solomon and Mark Russinovich

These materials are part of the These materials are part of the Windows Operating Windows Operating System Internals Curriculum Development Kit,System Internals Curriculum Development Kit, developed by David A. Solomon and Mark E. developed by David A. Solomon and Mark E. Russinovich with Andreas PolzeRussinovich with Andreas Polze

Microsoft has licensed these materials from David Microsoft has licensed these materials from David Solomon Expert Seminars, Inc. for distribution to Solomon Expert Seminars, Inc. for distribution to academic organizations solely for use in academic academic organizations solely for use in academic environments (and not for commercial use)environments (and not for commercial use)

Page 3: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

3

Roadmap for Section 4.4.Roadmap for Section 4.4.

Windows Scheduling PrinciplesWindows Scheduling Principles

Windows API vs. NT Kernel PrioritiesWindows API vs. NT Kernel Priorities

Scheduling Data StructuresScheduling Data Structures

Scheduling ScenariosScheduling Scenarios

Priority Boosts and Priority AdjustmentsPriority Boosts and Priority Adjustments

Page 4: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

4

Scheduling CriteriaScheduling Criteria

CPU utilization – keep the CPU as busy as possibleCPU utilization – keep the CPU as busy as possible

Throughput – # of processes/threads that complete Throughput – # of processes/threads that complete their execution per time unittheir execution per time unit

Turnaround time – amount of time to execute a Turnaround time – amount of time to execute a particular process/threadparticular process/thread

Waiting time – amount of time a process/thread has Waiting time – amount of time a process/thread has been waiting in the ready queuebeen waiting in the ready queue

Response time – amount of time it takes from when a Response time – amount of time it takes from when a request was submitted until the first response is request was submitted until the first response is produced, produced, notnot output (i.e.; the hourglass) output (i.e.; the hourglass)

Page 5: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

5

How does the Windows scheduler How does the Windows scheduler relate to the issues discussed:relate to the issues discussed:

Priority-driven, preemptive scheduling systemPriority-driven, preemptive scheduling system

Highest-priority runHighest-priority runnnable thread always runsable thread always runs

Thread runs for time amount of Thread runs for time amount of quantumquantum

No single scheduler – event-based scheduling code No single scheduler – event-based scheduling code spread across the kernelspread across the kernel

Dispatcher routines triggered by the following events:Dispatcher routines triggered by the following events:

Thread becomes ready for executionThread becomes ready for execution

Thread leaves running state (quantum expires, wait state)Thread leaves running state (quantum expires, wait state)

Thread‘s priority changes (system call/NT acThread‘s priority changes (system call/NT acttivity)ivity)

Processor affinity of a running thread changesProcessor affinity of a running thread changes

Page 6: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

6

Windows Scheduling PrinciplesWindows Scheduling Principles

32 priority levels32 priority levels

Threads within same priority are scheduled Threads within same priority are scheduled following the Round-Robin policyfollowing the Round-Robin policy

Non-Realtime Priorities are adjusted dynamicallyNon-Realtime Priorities are adjusted dynamically

Priority elevation as response to certain I/O and Priority elevation as response to certain I/O and dispatch eventsdispatch events

Quantum stretching to optimize responsivenessQuantum stretching to optimize responsiveness

Realtime priorities (i.e.; > 15) are assigned Realtime priorities (i.e.; > 15) are assigned statically to threadsstatically to threads

Page 7: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

7

Windows Scheduling-related APIs:Windows Scheduling-related APIs:

Get/SetPriorityClassGet/SetPriorityClass

Get/SetThreadPriorityGet/SetThreadPriority

Get/SetProcessAffinityMaskGet/SetProcessAffinityMask

SetThreadAffinityMaskSetThreadAffinityMask

SetThreadIdealProcessorSetThreadIdealProcessor

Suspend/ResumeThreadSuspend/ResumeThread

SchedulingSchedulingMultiple threads may be Multiple threads may be readyready to run to run

““Who gets to use the CPU?”Who gets to use the CPU?”

From Windows API point of view:From Windows API point of view:

Processes are given a priority class upon creationProcesses are given a priority class upon creation

Idle, Normal, High, RealtimeIdle, Normal, High, Realtime

Windows 2000 added “Above normal” and “Below normal”Windows 2000 added “Above normal” and “Below normal”

Threads have a relative priority within the classThreads have a relative priority within the class

Idle, Lowest, Below_Normal, Normal, Above_Normal, Highest, and Idle, Lowest, Below_Normal, Normal, Above_Normal, Highest, and Time_CriticalTime_Critical

From the kernel’s view:From the kernel’s view:

Threads have priorities Threads have priorities 0 through 310 through 31

Threads are scheduled, Threads are scheduled, not processesnot processes

Process priority class is not usedProcess priority class is not usedto make scheduling decisionsto make scheduling decisions

Page 8: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

8

Kernel: Thread Priority LevelsKernel: Thread Priority Levels

16 “real-time” levels

15 variable levels

Used by zero page thread

Used by idle thread(s)

31

16

0

i

15

1

Page 9: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

9

Windows vs. NT Kernel Priorities Windows vs. NT Kernel Priorities Win32 Process Classes

Realtime HighAboveNormal Normal

BelowNormal Idle

Win32 Time-critical 31 15 15 15 15 15Thread Highest 26 15 12 10 8 6

Priorities Above-normal 25 14 11 9 7 5

Normal 24 13 10 8 6 4

Below-normal 23 12 9 7 5 3

Lowest 22 11 8 6 4 2

Idle 16 1 1 1 1 1

Table shows Table shows basebase priorities (“current” or “dynamic” thread priority may priorities (“current” or “dynamic” thread priority may be higher if base is < 15)be higher if base is < 15)

Many utilities (such as Process Viewer) show the “dynamic priority” of Many utilities (such as Process Viewer) show the “dynamic priority” of threads rather than the base (Performance Monitor can show both) threads rather than the base (Performance Monitor can show both)

Drivers can set to any value with KeSetPriorityThreadDrivers can set to any value with KeSetPriorityThread

Page 10: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

10

Special Thread PrioritiesSpecial Thread Priorities

Idle threads -- one per CPUIdle threads -- one per CPU

When no threads want to run, Idle thread “runs”When no threads want to run, Idle thread “runs”

Not a real priority level - appears to have priority zero, but actually runs “below” Not a real priority level - appears to have priority zero, but actually runs “below” priority 0priority 0

Provides CPU idle time accounting (unused clock ticks are charged to the idle Provides CPU idle time accounting (unused clock ticks are charged to the idle thread)thread)

Loop:Loop:

Calls HAL to allow for power managementCalls HAL to allow for power management

Processes DPC listProcesses DPC list

Dispatches to a thread if selectedDispatches to a thread if selected

Server 2003: in certain cases, scans per-CPU ready queues for next threadServer 2003: in certain cases, scans per-CPU ready queues for next thread

Zero page thread -- one per NT systemZero page thread -- one per NT system

Zeroes pages of memory in anticipation of “demand zero” page faultsZeroes pages of memory in anticipation of “demand zero” page faults

Runs at priority zero (lower than any reachable from Windows)Runs at priority zero (lower than any reachable from Windows)

Part of the “System” process (not a complete process)Part of the “System” process (not a complete process)

Page 11: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

11

Thread Scheduling Priorities vs. Thread Scheduling Priorities vs. Interrupt Request Levels (IRQLs)Interrupt Request Levels (IRQLs)

Passive_LevelAPC

Dispatch/DPCDevice 1

.

.

.Device n

ClockInterprocessor Interrupt

Power failHigh

Hardware interrupts

IRQLs (x86)

Software interrupts

012

302928

31

Thread priorities

0-31

Page 12: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

12

Single Processor Thread SchedulingSingle Processor Thread Scheduling

Priority driven, preemptivePriority driven, preemptive

32 queues (FIFO lists) of “ready” threads32 queues (FIFO lists) of “ready” threads

UP: highest priority thread always runsUP: highest priority thread always runs

MP: MP: OneOne of the highest priority runnable thread will be running of the highest priority runnable thread will be running somewheresomewhere

No attempt to share processor(s) “fairly” among processes, only No attempt to share processor(s) “fairly” among processes, only among threadsamong threads

Time-sliced, round-robin within a priority levelTime-sliced, round-robin within a priority level

Event-driven; no guaranteed execution period before Event-driven; no guaranteed execution period before preemptionpreemption

When a thread becomes Ready, it either runs immediately or is When a thread becomes Ready, it either runs immediately or is inserted at the tail of the Ready queue for its current (dynamic) inserted at the tail of the Ready queue for its current (dynamic) prioritypriority

Page 13: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

13

Thread SchedulingThread Scheduling

No central scheduler!No central scheduler!

i.e. there is no always-instantiated routine called “the scheduler”i.e. there is no always-instantiated routine called “the scheduler”

The “code that does scheduling” is not a threadThe “code that does scheduling” is not a thread

Scheduling routines are simply called whenever events occur that Scheduling routines are simply called whenever events occur that change the Ready state of a threadchange the Ready state of a thread

Things that cause scheduling events include:Things that cause scheduling events include:

interval timer interrupts (for quantum end)interval timer interrupts (for quantum end)

interval timer interrupts (for timed wait completion)interval timer interrupts (for timed wait completion)

other hardware interrupts (for I/O wait completion)other hardware interrupts (for I/O wait completion)

one thread changes the state of a waitable object upon which other one thread changes the state of a waitable object upon which other thread(s) are waitingthread(s) are waiting

a thread waits on one or more dispatcher objectsa thread waits on one or more dispatcher objects

a thread priority is changeda thread priority is changed

Based on doubly-linked lists (queues) of Ready threadsBased on doubly-linked lists (queues) of Ready threads

Nothing that takes “order-Nothing that takes “order-nn time” for time” for nn threads threads

Page 14: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

14

Scheduling Data StructuresScheduling Data Structures

Process

thread thread

Process

thread thread

Default base prioDefault proc affinityDefault quantum

31

0

Ready summary Idle summary31 (or 63) 0 31 (or 63) 0

Base priorityCurrent priorityProcessor affinityQuantum

Bitmask for non-emptyready queuesBitmask for idle CPUs

Page 15: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

15

Scheduling ScenariosScheduling Scenarios

PreemptionPreemption

A thread becomes Ready at a higher priority than the running threadA thread becomes Ready at a higher priority than the running thread

Lower-priority Running thread is preemptedLower-priority Running thread is preempted

Preempted thread goes back to Preempted thread goes back to headhead of its Ready queue of its Ready queue

action:action: pick lowest priority thread to preempt pick lowest priority thread to preempt

Voluntary switchVoluntary switch

Waiting on a dispatcher objectWaiting on a dispatcher object

TerminationTermination

Explicit lowering of priorityExplicit lowering of priority

action:action: scan for next Ready thread (starting at your priority & down) scan for next Ready thread (starting at your priority & down)

Running thread experiences quantum endRunning thread experiences quantum end

Priority is decremented unless already at thread base priorityPriority is decremented unless already at thread base priority

Thread goes to Thread goes to tailtail of ready queue for its new priority of ready queue for its new priority

May continue running if no equal or higher-priority threads are ReadyMay continue running if no equal or higher-priority threads are Ready

action:action: pick next thread at same priority level pick next thread at same priority level

Page 16: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

16

Preemption is strictly event-drivenPreemption is strictly event-driven

does not wait for the next clock tickdoes not wait for the next clock tick

no guaranteed execution period before preemptionno guaranteed execution period before preemption

threads in kernel mode may be preempted (unless they raise IRQL to >= 2)threads in kernel mode may be preempted (unless they raise IRQL to >= 2)

A preempted thread goes back to the head of its ready queueA preempted thread goes back to the head of its ready queue

Scheduling ScenariosScheduling Scenarios

Preemption Preemption

181716151413

Running Ready

from Wait state

Page 17: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

17

If newly-ready thread is not of higher priority than the running thread…If newly-ready thread is not of higher priority than the running thread…

……it is put at the tail of the ready queue for its current priorityit is put at the tail of the ready queue for its current priority

If priority >=14 quantum is reset (t.b.d.)If priority >=14 quantum is reset (t.b.d.)

If priority <14 and you’re about to be boosted and didn’t already have a If priority <14 and you’re about to be boosted and didn’t already have a boost, quantum is set to process quantum - 1boost, quantum is set to process quantum - 1

Scheduling ScenariosScheduling Scenarios

Ready after Wait ResolutionReady after Wait Resolution

181716151413

Running Ready

from Wait state

Page 18: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

18

Scheduling ScenariosScheduling Scenarios

Voluntary SwitchVoluntary SwitchWhen the running thread gives up the CPU…When the running thread gives up the CPU…

……Schedule the thread at the head of the next non-empty “ready” queueSchedule the thread at the head of the next non-empty “ready” queue

to Waiting state

181716151413

Running Ready

Page 19: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

19

Scheduling ScenariosScheduling Scenarios

Quantum End (“time-slicing”)Quantum End (“time-slicing”)When the running thread exhausts its CPU quantum, it goes to the end When the running thread exhausts its CPU quantum, it goes to the end of its ready queueof its ready queue

Applies to both real-time and dynamic priority threads, user and kernel Applies to both real-time and dynamic priority threads, user and kernel modemode

Quantums can be disabled for a thread by a kernel functionQuantums can be disabled for a thread by a kernel function

Default quantum on Professional is 2 clock ticks, 12 on ServerDefault quantum on Professional is 2 clock ticks, 12 on Serverstandard clock tick is 10 msec; might be 15 msec on some MP Pentium systemsstandard clock tick is 10 msec; might be 15 msec on some MP Pentium systems

if no other ready threads at that priority, same thread continues running (just if no other ready threads at that priority, same thread continues running (just gets new quantum)gets new quantum)

if running at boosted priority, priority decays by one at quantum end if running at boosted priority, priority decays by one at quantum end (described later) (described later)

Running Ready181716151413

Page 20: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

20

Basic Thread Scheduling StatesBasic Thread Scheduling States

Ready (1) Running (2)

Waiting (5)

voluntaryswitch

preemption, quantum end

Page 21: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

21

Priority AdjustmentsPriority Adjustments

Dynamic priority adjustments (boost and decay) are applied to threads in Dynamic priority adjustments (boost and decay) are applied to threads in “dynamic” classes“dynamic” classes

Threads with base priorities 1-15 (technically, 1 through 14)Threads with base priorities 1-15 (technically, 1 through 14)

Disable if desired with SetThreadPriorityBoost or SetProcessPriorityBoostDisable if desired with SetThreadPriorityBoost or SetProcessPriorityBoost

Five types:Five types:

I/O completionI/O completion

Wait completion on events or semaphoresWait completion on events or semaphores

When threads in the foreground process complete a waitWhen threads in the foreground process complete a wait

When GUI threads wake up for windows inputWhen GUI threads wake up for windows input

For CPU starvation avoidanceFor CPU starvation avoidance

No automatic adjustments in “real-time” class (16 or above)No automatic adjustments in “real-time” class (16 or above)

““Real time” here really means “system won’t change the relative priorities of Real time” here really means “system won’t change the relative priorities of your real-time threads”your real-time threads”

Hence, scheduling is predictable with respect to other “real-time” threads (but Hence, scheduling is predictable with respect to other “real-time” threads (but not for absolute latency) not for absolute latency)

Page 22: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

22

Priority BoostingPriority BoostingTo favor I/O intense threads:To favor I/O intense threads:

After an I/O: specified by device driverAfter an I/O: specified by device driver

IoCompleteRequest( Irp, PriorityBoost ) IoCompleteRequest( Irp, PriorityBoost )

Other cases discussed in the Windows Scheduling Internals SectionOther cases discussed in the Windows Scheduling Internals Section

After a wait on executive event or After a wait on executive event or semaphoresemaphore

After any wait on a dispatcher object by a thread in the foreground processAfter any wait on a dispatcher object by a thread in the foreground process

GUI threads that wake up to process windowing input (e.g. windows GUI threads that wake up to process windowing input (e.g. windows messages) get a boost of 2messages) get a boost of 2

Common boost values (see NTDDK.H)1: disk, CD-ROM, parallel, Video2: serial, network, named pipe, mailslot6: keyboard or mouse8: sound

Page 23: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

23

Priority

BasePriority

Run Wait Run

Preempt(beforequantumend)

Run

Priority decayat quantum end

Boostuponwaitcomplete

Round-robin atbase priority

quantum

Time

Thread Priority Boost and DecayThread Priority Boost and DecayBehavior of these boosts:Behavior of these boosts:

Applied to thread’s base priorityApplied to thread’s base priority

will not take you above priority 15will not take you above priority 15

After a boost, you get one quantumAfter a boost, you get one quantum

Then decays 1 level, Then decays 1 level, runs another quantumruns another quantum

Page 24: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

24

Further ReadingFurther Reading

Mark E. Russinovich and David A. Solomon, Microsoft Windows Mark E. Russinovich and David A. Solomon, Microsoft Windows Internals, 4th Edition, Microsoft Press, 2004. Internals, 4th Edition, Microsoft Press, 2004.

Chapter 6 - Processes, Thread, and JobsChapter 6 - Processes, Thread, and Jobs(from pp. 289)(from pp. 289)

Thread Scheduling (from pp. 325)Thread Scheduling (from pp. 325)

Thread States (from pp. 334)Thread States (from pp. 334)

Scheduling Scenarios (from pp. 345)Scheduling Scenarios (from pp. 345)

Page 25: Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze Unit OS4: Scheduling and Dispatch 4.4. Windows Thread

25

Source Code ReferencesSource Code References

Windows Research Kernel sourcesWindows Research Kernel sources

\base\ntos\ke\i386, \base\ntos\ke\amd64:\base\ntos\ke\i386, \base\ntos\ke\amd64:

Ctxswap.asm – Context SwapCtxswap.asm – Context Swap

Clockint.asm – Clock Interrupt HandlerClockint.asm – Clock Interrupt Handler

\base\ntos\ke\base\ntos\ke

procobj.c - Process objectprocobj.c - Process object

thredobj.c, thredsup.c – Thread objectthredobj.c, thredsup.c – Thread object

Idsched.c – Idle schedulerIdsched.c – Idle scheduler

Wait.c – quantum management, wait resolutionWait.c – quantum management, wait resolution

Waitsup.c – dispatcher exit (deferred ready queue)Waitsup.c – dispatcher exit (deferred ready queue)

\base\ntos\inc\ke.h – structure/type definitions\base\ntos\inc\ke.h – structure/type definitions