30
Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Embed Size (px)

Citation preview

Page 1: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Chapter 10

Synchronization and Scheduling in Multiprocessor Operating Systems

Copyright © 2008

Page 2: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.2Operating Systems, by Dhananjay Dhamdhere 2

Introduction

• Architecture of Multiprocessor Systems• Issues in Multiprocessor Operating Systems• Kernel Structure• Process Synchronization• Process Scheduling• Case Studies

Page 3: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.3Operating Systems, by Dhananjay Dhamdhere 3

Architecture of Multiprocessor Systems

• Performance of uniprocessor systems depends on CPU and memory performance, and Caches– Further improvements in system performance can be

obtained only by using multiple CPUs

Page 4: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.4Operating Systems, by Dhananjay Dhamdhere 4

Architecture of Multiprocessor Systems (continued)

Page 5: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.5Operating Systems, by Dhananjay Dhamdhere 5

Architecture of Multiprocessor Systems (continued)

• Use of a cache coherence protocol is crucial to ensure that caches do not contain stale copies of data– Snooping-based approach (bus interconnection)

• CPU snoops on the bus to analyze traffic and eliminate stale copies

• Write-invalidate variant– At a write, CPU updates memory and invalidates copies in

other caches

– Directory-based approach• Directory contains information about copies in caches

• TLB coherence is an analogous problem– Solution: TLB shootdown action

Page 6: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.6Operating Systems, by Dhananjay Dhamdhere 6

Architecture of Multiprocessor Systems (continued)

• Multiprocessor Systems are classified according to the manner of associating CPUs and memory units– Uniform memory access (UMA) architecture

• Previously called tightly coupled multiprocessor• Also called symmetrical multiprocessor (SMP)• Examples: Balance system and VAX 8800

– Nonuniform memory access (NUMA) architecture• Examples: HP AlphaServer and IBMNUMA-Q

– No-remote-memory-access (NORMA) architecture• Example: Hypercube system by Intel• Is actually a distributed system (discussed later)

Page 7: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.7Operating Systems, by Dhananjay Dhamdhere 7

Architecture of Multiprocessor Systems (continued)

Page 8: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.8Operating Systems, by Dhananjay Dhamdhere 8

Page 9: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.9Operating Systems, by Dhananjay Dhamdhere 9

SMP Architecture

• Popularly use a bus or a cross-bar switch as the interconnection network– Only one conversation can be in progress over the bus at

any time; other conversations are delayed• CPUs face unpredictable delays in accessing memory• Bus may become a bottleneck

– With a cross-bar switch, performance is better• Switch delays are also more predictable

• Cache coherence protocols add to the delays• SMP systems do not scale well beyond a small number

of CPUs

Page 10: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.10Operating Systems, by Dhananjay Dhamdhere 10

NUMA Architecture

• Actual performance of a NUMA system depends on the nonlocal memory accesses made by processes

Page 11: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.11Operating Systems, by Dhananjay Dhamdhere 11

Issues in Multiprocessor Operating Systems

• Synchronization and scheduling algorithms should be scalable, so that system performance does not degrade with a growth in its size

Page 12: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.12Operating Systems, by Dhananjay Dhamdhere 12

Kernel Structure

• Kernel of a multiprocessor OS (SMP architecture) is called an SMP kernel– Any CPU can execute code in the kernel, and many

CPUs could do so in parallel• Based on two fundamental provisions:

– Kernel is reentrant

– CPUs coordinate their activities through synchronization and interprocessor interrupts

Page 13: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.13Operating Systems, by Dhananjay Dhamdhere 13

Kernel Structure: Synchronization

• Mutex locks for synchronization– Locking can be coarse-grained or fine-grained

• Tradeoffs: simplicity vs. loss of parallelism• Deadlocks are an issue in fine-grained locking

• Parallelism can be ensured without substantial locking overhead:– Use of separate locks for kernel functionalities

– Partitioning of the data structures of a kernel functionality

Page 14: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.14Operating Systems, by Dhananjay Dhamdhere 14

Kernel Structure: Heap Management

• Parallelism in heap management can be provided by maintaining several free lists

• Locking is unnecessary if each CPU has its own free list– Would degrade performance

• Allocation decisions would not be optimal

• Alternative: separate free lists to hold free memory areas of different sizes– CPU locks an appropriate free list

Page 15: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.15Operating Systems, by Dhananjay Dhamdhere 15

Kernel Structure: Scheduling

• Suffers from heavy contention for mutex locks Lrq and Lawt because every CPU needs to set/release these locks while scheduling– Alternative: Partition processes into subsets and entrust

each subset to a CPU for scheduling

– Fast scheduling but suboptimal performance

• An SMP kernel provides graceful degradation

Page 16: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.16Operating Systems, by Dhananjay Dhamdhere 16

Kernel Structure: NUMA Kernel

• CPUs in NUMA systems have different memory access times for local and nonlocal memory

• Each node in a NUMA system has its own separate kernel– Exclusively schedules processes whose address spaces

are in local memory of the node

– Concept can be generalized: An application region ensures good performance of an application. It has

• A resource partition with one or more CPUs • An instance of the kernel

Page 17: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.17Operating Systems, by Dhananjay Dhamdhere 17

Process Synchronization

Page 18: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.18Operating Systems, by Dhananjay Dhamdhere 18

Process Synchronization (continued)

• Queued locks may not be scalable• In NUMA, spin locks may lead to lock starvation• Sleep locks may be preferred to spin locks if the

memory or network traffic densities are high

Page 19: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.19Operating Systems, by Dhananjay Dhamdhere 19

Special Hardware for Process Synchronization

• The Sequent Balance system uses a special bus called system link and interface controller (SLIC) for synchronization– Special 64-bit register in each CPU in the system

• Each bit implements a spin lock using SLIC– Spinning doesn’t generate memory/network traffic

Page 20: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.20Operating Systems, by Dhananjay Dhamdhere 20

A Scalable Software Scheme for Process Synchronization

• Scheme for process synchronization– NUMA and NORMA architectures

– Scalable performance• Minimizes synchronization traffic to nonlocal memory units

(NUMA) and over network (NORMA)

Page 21: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.21

Process Synchronization (continued)

• Scheduling aware synchronization– Adaptive lock

• A process waiting for this lock spins if holder of the lock is scheduled to run in parallel

• Otherwise, the process is preempted and queued as in a queued lock

Operating Systems, by Dhananjay Dhamdhere 21

Page 22: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.22Operating Systems, by Dhananjay Dhamdhere 22

Process Scheduling

• CPU scheduling decisions affect performance– How, when and where to schedule processes

• Affinity scheduling: schedule a process on a CPU where it has executed in the past

• Good cache hit ratio

• Interferes with load balancing across CPUs

• In SMP kernel CPUs can perform own scheduling– Prevents kernel from becoming bottleneck

– Leads to scheduling anomalies

• Correcting requires shuffling of processes

Page 23: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.23Operating Systems, by Dhananjay Dhamdhere 23

Example: Process Shuffling in an SMP Kernel

• Process shuffling can be implemented by using the assigned workload table AWT and the interprocessor interrupt (IPI)– However, it leads to high scheduling overhead

• Effect is more pronounced in a system containing a large number of CPUs

Page 24: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.24Operating Systems, by Dhananjay Dhamdhere 24

Process Scheduling (continued)

• Processes of an application should be scheduled on different CPUs at the same time if they use spin locks for synchronization– Called coscheduling or gang scheduling

• A different approach is required when processes exchange messages by using a blocking protocol– In some situations, special efforts should be made not to

schedule such processes in same time slice

Page 25: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.25Operating Systems, by Dhananjay Dhamdhere 25

Case Studies

• Mach• Linux• SMP Support in Windows

Page 26: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.26Operating Systems, by Dhananjay Dhamdhere 26

Mach

• Mach OS implements scheduling hints– Thread issues hint to influence processor scheduling

• For example, a hands-off hint to relinquish CPU in favor of a specific thread

Page 27: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.27Operating Systems, by Dhananjay Dhamdhere 27

Linux

• Multiprocessing support introduced in 2.0 kernel– Coarse-grained locking was employed

• Granularity of locks was made finer in later releases– Kernel was still nonpreemptible until 2.6 kernel

• Kernel provides:– Spin locks for locking of data structures– Reader–writer spin locks– Sequence lock

• Per-CPU data structures to reduce lock contention• Other features: hard and soft affinity, load balancing

Page 28: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.28Operating Systems, by Dhananjay Dhamdhere 28

SMP Support in Windows

• A hyperthreaded CPU is considered to be several logical processors

• Spin locks provide mutual exclusion over kernel data – A thread holding a spinlock is never preempted

• Queued spinlock uses a scalable software implementation scheme

• Uses many free lists of memory for parallel access• Process default processor affinity and thread processor

affinity together define thread affinity set• Ideal processor defines hard affinity for a thread• Uses both hard and soft affinity

Page 29: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.29Operating Systems, by Dhananjay Dhamdhere 29

Summary

• Multiprocessor OS exploits multiple CPUs in computer to provide high throughput (system), computation speedup (application), and graceful degradation (of OS, when faults occur)

• Classification of uniprocessors– Uniform memory architecture (UMA)

• Also called Symmetrical multiprocessor (SMP)

– Nonuniform memory architecture (NUMA)

• OS efficiently schedules user processes in parallel– Issues: kernel structure and synchronization delays

Page 30: Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.30Operating Systems, by Dhananjay Dhamdhere 30

Summary (continued)

• Multiprocessor OS algorithms must be scalable• Use of special kinds of locks:

– Spin locks and sleep locks

• Important scheduling concepts in multiprocessor OSs:– Affinity scheduling

– Coscheduling

– Process shuffling