Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Chapter 10

Synchronization and Scheduling in Multiprocessor Operating Systems

Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.2Operating Systems, by Dhananjay Dhamdhere 2

Introduction

• Architecture of Multiprocessor Systems• Issues in Multiprocessor Operating Systems• Kernel Structure• Process Synchronization• Process Scheduling• Case Studies


Architecture of Multiprocessor Systems

• Performance of uniprocessor systems depends on CPU and memory performance, and Caches– Further improvements in system performance can be

obtained only by using multiple CPUs


Architecture of Multiprocessor Systems (continued)



• Use of a cache coherence protocol is crucial to ensure that caches do not contain stale copies of data– Snooping-based approach (bus interconnection)

• CPU snoops on the bus to analyze traffic and eliminate stale copies

• Write-invalidate variant– At a write, CPU updates memory and invalidates copies in

other caches

– Directory-based approach• Directory contains information about copies in caches

• TLB coherence is an analogous problem– Solution: TLB shootdown action



• Multiprocessor Systems are classified according to the manner of associating CPUs and memory units– Uniform memory access (UMA) architecture

• Previously called tightly coupled multiprocessor• Also called symmetrical multiprocessor (SMP)• Examples: Balance system and VAX 8800

– Nonuniform memory access (NUMA) architecture• Examples: HP AlphaServer and IBMNUMA-Q

– No-remote-memory-access (NORMA) architecture• Example: Hypercube system by Intel• Is actually a distributed system (discussed later)





SMP Architecture

• Popularly use a bus or a cross-bar switch as the interconnection network– Only one conversation can be in progress over the bus at

any time; other conversations are delayed• CPUs face unpredictable delays in accessing memory• Bus may become a bottleneck

– With a cross-bar switch, performance is better• Switch delays are also more predictable

• Cache coherence protocols add to the delays• SMP systems do not scale well beyond a small number

of CPUs


NUMA Architecture

• Actual performance of a NUMA system depends on the nonlocal memory accesses made by processes


Issues in Multiprocessor Operating Systems

• Synchronization and scheduling algorithms should be scalable, so that system performance does not degrade with a growth in its size


Kernel Structure

• Kernel of a multiprocessor OS (SMP architecture) is called an SMP kernel– Any CPU can execute code in the kernel, and many

CPUs could do so in parallel• Based on two fundamental provisions:

– Kernel is reentrant

– CPUs coordinate their activities through synchronization and interprocessor interrupts


Kernel Structure: Synchronization

• Mutex locks for synchronization– Locking can be coarse-grained or fine-grained

• Tradeoffs: simplicity vs. loss of parallelism• Deadlocks are an issue in fine-grained locking

• Parallelism can be ensured without substantial locking overhead:– Use of separate locks for kernel functionalities

– Partitioning of the data structures of a kernel functionality


Kernel Structure: Heap Management

• Parallelism in heap management can be provided by maintaining several free lists

• Locking is unnecessary if each CPU has its own free list– Would degrade performance

• Allocation decisions would not be optimal

• Alternative: separate free lists to hold free memory areas of different sizes– CPU locks an appropriate free list


Kernel Structure: Scheduling

• Suffers from heavy contention for mutex locks Lrq and Lawt because every CPU needs to set/release these locks while scheduling– Alternative: Partition processes into subsets and entrust

each subset to a CPU for scheduling

– Fast scheduling but suboptimal performance

• An SMP kernel provides graceful degradation


Kernel Structure: NUMA Kernel

• CPUs in NUMA systems have different memory access times for local and nonlocal memory

• Each node in a NUMA system has its own separate kernel– Exclusively schedules processes whose address spaces

are in local memory of the node

– Concept can be generalized: An application region ensures good performance of an application. It has

• A resource partition with one or more CPUs • An instance of the kernel


Process Synchronization


Process Synchronization (continued)

• Queued locks may not be scalable• In NUMA, spin locks may lead to lock starvation• Sleep locks may be preferred to spin locks if the

memory or network traffic densities are high


Special Hardware for Process Synchronization

• The Sequent Balance system uses a special bus called system link and interface controller (SLIC) for synchronization– Special 64-bit register in each CPU in the system

• Each bit implements a spin lock using SLIC– Spinning doesn’t generate memory/network traffic


A Scalable Software Scheme for Process Synchronization

• Scheme for process synchronization– NUMA and NORMA architectures

– Scalable performance• Minimizes synchronization traffic to nonlocal memory units

(NUMA) and over network (NORMA)

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.21

Process Synchronization (continued)

• Scheduling aware synchronization– Adaptive lock

• A process waiting for this lock spins if holder of the lock is scheduled to run in parallel

• Otherwise, the process is preempted and queued as in a queued lock

Operating Systems, by Dhananjay Dhamdhere 21


Process Scheduling

• CPU scheduling decisions affect performance– How, when and where to schedule processes

• Affinity scheduling: schedule a process on a CPU where it has executed in the past

• Good cache hit ratio

• Interferes with load balancing across CPUs

• In SMP kernel CPUs can perform own scheduling– Prevents kernel from becoming bottleneck

– Leads to scheduling anomalies

• Correcting requires shuffling of processes


Example: Process Shuffling in an SMP Kernel

• Process shuffling can be implemented by using the assigned workload table AWT and the interprocessor interrupt (IPI)– However, it leads to high scheduling overhead

• Effect is more pronounced in a system containing a large number of CPUs


Process Scheduling (continued)

• Processes of an application should be scheduled on different CPUs at the same time if they use spin locks for synchronization– Called coscheduling or gang scheduling

• A different approach is required when processes exchange messages by using a blocking protocol– In some situations, special efforts should be made not to

schedule such processes in same time slice


Case Studies

• Mach• Linux• SMP Support in Windows


Mach

• Mach OS implements scheduling hints– Thread issues hint to influence processor scheduling

• For example, a hands-off hint to relinquish CPU in favor of a specific thread


Linux

• Multiprocessing support introduced in 2.0 kernel– Coarse-grained locking was employed

• Granularity of locks was made finer in later releases– Kernel was still nonpreemptible until 2.6 kernel

• Kernel provides:– Spin locks for locking of data structures– Reader–writer spin locks– Sequence lock

• Per-CPU data structures to reduce lock contention• Other features: hard and soft affinity, load balancing


SMP Support in Windows

• A hyperthreaded CPU is considered to be several logical processors

• Spin locks provide mutual exclusion over kernel data – A thread holding a spinlock is never preempted

• Queued spinlock uses a scalable software implementation scheme

• Uses many free lists of memory for parallel access• Process default processor affinity and thread processor

affinity together define thread affinity set• Ideal processor defines hard affinity for a thread• Uses both hard and soft affinity


Summary

• Multiprocessor OS exploits multiple CPUs in computer to provide high throughput (system), computation speedup (application), and graceful degradation (of OS, when faults occur)

• Classification of uniprocessors– Uniform memory architecture (UMA)

• Also called Symmetrical multiprocessor (SMP)

– Nonuniform memory architecture (NUMA)

• OS efficiently schedules user processes in parallel– Issues: kernel structure and synchronization delays


Summary (continued)

• Multiprocessor OS algorithms must be scalable• Use of special kinds of locks:

– Spin locks and sleep locks

• Important scheduling concepts in multiprocessor OSs:– Affinity scheduling

– Coscheduling

– Process shuffling

Documents

Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008