SYLLABUS

SYLLABUSSection ASection A

•Multi-Processor and and Distributed Operating System: Operating System: – Introduction, Architecture, Organization, Introduction, Architecture, Organization, – Resource sharing, Resource sharing, – Load Balancing, Load Balancing, – Availability and Fault Tolerance, Availability and Fault Tolerance, – Design and Development Challenges, Design and Development Challenges, – Inter-process Communication, Inter-process Communication,

•Distributed Applications:Distributed Applications:– Logical Clock, Logical Clock, – Mutual Exclusion, Mutual Exclusion, – Distributed File System.Distributed File System.

1

ADVANCED OPERATING SYSTEMSADVANCED OPERATING SYSTEMS

MCA 404MCA 404

SYLLABUS

4

SYLLABUSSection ASection A

•Multi-Processor and Distributed Operating System: Multi-Processor and Distributed Operating System: – Introduction, Introduction, – Architecture, Architecture, – Organization, Organization, – Resource sharing, Resource sharing, – Load Balancing, Load Balancing, – Availability and Fault Tolerance, Availability and Fault Tolerance, – Design and Development Challenges, Design and Development Challenges, – Inter-process Communication, Inter-process Communication,

•Distributed Applications:Distributed Applications:– Logical Clock, Logical Clock, – Mutual Exclusion, Mutual Exclusion, – Distributed File System.Distributed File System.

5

SYLLABUSSection BSection B

•Real Time and Embedded Operating Systems: Real Time and Embedded Operating Systems: – Introduction, Introduction, – Hardware Elements, Hardware Elements, – Structure Structure

• Interrupt Driven, Interrupt Driven, • Nanokernel, Nanokernel, • Microkernel Microkernel and and • Monolithic kernel based models. Monolithic kernel based models.

– Scheduling – Scheduling – • Periodic, Periodic, • Aperiodic Aperiodic and and • Sporadic Tasks, Sporadic Tasks,

– Introduction to Energy Aware CPU Scheduling.Introduction to Energy Aware CPU Scheduling.

6

SYLLABUSSection CSection C

•Cluster and Grid Computing: Cluster and Grid Computing: – Introduction to Cluster Computing and MOSIX OS, Introduction to Cluster Computing and MOSIX OS, – Introduction to the Grid, Introduction to the Grid, – Grid Architecture, Grid Architecture,

•Computing Platforms: Computing Platforms: – Operating Systems and Network Interfaces, Operating Systems and Network Interfaces, – Grid Monitoring and Scheduling, Grid Monitoring and Scheduling, – Performance Analysis, Performance Analysis, – Case Studies.Case Studies.

7

SYLLABUSSection DSection D

•Cloud Computing: Cloud Computing: – Introduction to Cloud, Introduction to Cloud, – Cloud Building Blocks, Cloud Building Blocks, – Cloud as IaaS, PaaS and SaaS, Cloud as IaaS, PaaS and SaaS, – Hardware and software virtualization, Hardware and software virtualization, – Virtualization of OSVirtualization of OS– Hypervisor KVM, Hypervisor KVM, – SAN and SAN and – NAS back-end concepts.NAS back-end concepts.

•Mobile Computing: Mobile Computing: – Introduction, Introduction, – Design Principles, Design Principles, – Structure, Platform and Features of Mobile Operating Systems (Android, IOS, Structure, Platform and Features of Mobile Operating Systems (Android, IOS,

Windows Mobile OS).Windows Mobile OS). 8

SYLLABUSReferencesReferences::•Sibsankar Haldar, Alex A. Arvind, “Operattng Systems”, Pearson Sibsankar Haldar, Alex A. Arvind, “Operattng Systems”, Pearson Education Inc.Education Inc.•Tanenbaum and Van Steen, “Distributed systems: Principles and Tanenbaum and Van Steen, “Distributed systems: Principles and Paradigms”, Pearson, 2007.Paradigms”, Pearson, 2007.•M. L. Liu,M. L. Liu, “Distributed Computing: Principles and Applications”, “Distributed Computing: Principles and Applications”, Addison Wesley, Pearson Addison Wesley, Pearson •Maozhen Li, Mark Baker,Maozhen Li, Mark Baker, “The Grid – Core Technologies”, John “The Grid – Core Technologies”, John Wiley & Sons 2005Wiley & Sons 2005

9

10

HappyHappy NewNew YearYear20142014

Happy Happy NewNew YearYear: 2014: 2014How to be Happy?How to be Happy?

There are There are Nine Philosophies Nine Philosophies (Darshan) (Darshan) to be happy:-to be happy:-1.1.Brahm Brahm Darshan: Darshan: Philosophy of understanding the God or Brahm.Philosophy of understanding the God or Brahm.

2.2.DevDev Darshan: Darshan: Philosophy of understanding the lords (Devtas).Philosophy of understanding the lords (Devtas).

3.3.Gayatri Gayatri Darshan:Darshan: Understand the meaning of Gayatri Mantra.Understand the meaning of Gayatri Mantra.

4.4.GangaGanga Darshan: Darshan: Understand the meaning of Ganga.Understand the meaning of Ganga.5.5.Vichar Vichar Darshan:Darshan: Understand the power of a thought.Understand the power of a thought.6.6.Karm yog:Karm yog: Philosophy of Action/Effort.Philosophy of Action/Effort.7.7.SamSam Darshan: Darshan: Understand the philosophy of balance in life.Understand the philosophy of balance in life.

8.8.Dukh Dukh Darshan:Darshan: Understand the value of stress and strain.Understand the value of stress and strain.9.9.Sukh Sukh Darshan:Darshan: Understand the key behind Happiness.Understand the key behind Happiness.

12

13

SECTION A• Multi-ProcessorMulti-Processor and Distributed Operating System: and Distributed Operating System: – Introduction, Architecture, OrganisationIntroduction, Architecture, Organisation

14

MULTIPROCESSOR SYSTEMS: MULTIPROCESSOR SYSTEMS: INTRODUCTION, ARCHITECTURE AND ORGANISATION INTRODUCTION, ARCHITECTURE AND ORGANISATION

15


• A multiprocessor system is one A multiprocessor system is one that has that has more than one processor more than one processor on-board on-board in the computerin the computer. .

16

MULTI-MULTI-PROCESSORPROCESSOR SYSTEM: SYSTEM: TWO PROCESSORSTWO PROCESSORS• There are There are two CPU Chips two CPU Chips on the same mother board. on the same mother board. • Each CPU may be multiEach CPU may be multicorecore. Eg Dual Core, Quad Core etc.. Eg Dual Core, Quad Core etc.• Each CPU has its Each CPU has its own Memory slotsown Memory slots..

17

MULTI-MULTI-PROCESSORPROCESSOR SYSTEM: SYSTEM: FOUR PROCESSORSFOUR PROCESSORS

• There are There are four CPU Chips four CPU Chips on the same mother on the same mother board.board.

• Each CPU may be Each CPU may be multicore multicore (Dual Core, (Dual Core, Quad Core etc).Quad Core etc).

• Each CPU has its own Each CPU has its own Memory slotsMemory slots..

18

MULTI-MULTI-PROCESSORPROCESSOR SYSTEM SYSTEM• A multiprocessor is a tightly coupled computer system having A multiprocessor is a tightly coupled computer system having two two

or more or more processing unitsprocessing units ((Multiple ProcessorsMultiple Processors) ) each sharing each sharing main main memory and peripherals, in order to simultaneously process memory and peripherals, in order to simultaneously process programs.programs.

• Sometimes the term Sometimes the term MultiprocessMultiprocessoror is confused with the term is confused with the term MultiprocessMultiprocessinging..

• While While MultiprocessingMultiprocessing is a type of processing is a type of processing in which two or more in which two or more processors work together to execute more than one program processors work together to execute more than one program simultaneously, simultaneously, the term the term MultiprocessorMultiprocessor refers to the hardware refers to the hardware architecture that allows architecture that allows multiprocessingmultiprocessing..

19

MULTI-MULTI-PROCESSORPROCESSOR SYSTEM: INTRODUCTION, SYSTEM: INTRODUCTION, ARCHITECTURE AND ORGANISATIONARCHITECTURE AND ORGANISATION

• A CPU, or Central Processing Unit, is what is typically referred to as A CPU, or Central Processing Unit, is what is typically referred to as a processor. A processor a processor. A processor contains many discrete parts contains many discrete parts within it, within it, such as one or more such as one or more memory caches memory caches for instructions and data, for instructions and data, instruction decodersinstruction decoders, and various types of , and various types of execution units execution units for for performing arithmetic or logical operations.performing arithmetic or logical operations.

• A multiprocessor system A multiprocessor system contains more than one such CPU, contains more than one such CPU, allowing them to work in parallel. allowing them to work in parallel. This is called SMP, or This is called SMP, or Simultaneous Multiprocessing.Simultaneous Multiprocessing.

• A A multimulticorecore CPU has multiple execution cores on one CPU. Now, CPU has multiple execution cores on one CPU. Now, this can mean different things depending on the exact architecture, this can mean different things depending on the exact architecture, but but it basically means it basically means that a certain subset of the CPU's that a certain subset of the CPU's components is duplicated, components is duplicated, so that multiple "cores" can work in so that multiple "cores" can work in parallel on separate operationsparallel on separate operations. This is called CMP, . This is called CMP, Chip-level Chip-level MultiprocessingMultiprocessing..

20

21

MULTI-MULTI-CORECORE PROCESSOR PROCESSOR• Multi-core processing refers to Multi-core processing refers to

the use of multiple the use of multiple microprocessorsmicroprocessors, called ", called "corescores," ," that are that are built onto a single built onto a single silicon die. silicon die.

• A multi-core processor A multi-core processor acts as a acts as a single unit. single unit.

• As such, As such, it is more efficientit is more efficient, and , and establishes a standardized establishes a standardized platform, platform, for which mass-for which mass-produced software produced software can easily be can easily be developed. developed.

22

MULTI-MULTI-CORECORE PROCESSOR PROCESSOR• The design of a multi-core The design of a multi-core

processor allows for each core processor allows for each core to communicate with the to communicate with the othersothers, , so that processing tasks so that processing tasks may be divided and delegated may be divided and delegated appropriatelyappropriately. .

• However, the However, the actual delegation actual delegation is dictated by software. is dictated by software.

• When a task is completedWhen a task is completed, the , the processed information from all processed information from all cores is returned to the cores is returned to the motherboard via a single shared motherboard via a single shared conduit. conduit.

23

• This process can often significantly This process can often significantly improve performance improve performance over a single-over a single-core processorcore processor of comparable of comparable speed. speed.

• The degree of performance The degree of performance improvementimprovement will depend upon the will depend upon the efficiency of the software code efficiency of the software code being run. being run.

MULTI-MULTI-CORECORE PROCESSOR PROCESSOR• Multi-core is usually the term used to describe Multi-core is usually the term used to describe two or more CPUs two or more CPUs

working together on the same chip. working together on the same chip.

24

MULTI-MULTI-CORECORE PROCESSOR PROCESSOR• Multi-core is usually the term used to describe Multi-core is usually the term used to describe two or more CPUs two or more CPUs

working together on the same chip. working together on the same chip. • A multi-core processor A multi-core processor is a single computing component with two is a single computing component with two

or more independent actual central processing units (called or more independent actual central processing units (called "cores"), which are the units that read and execute program "cores"), which are the units that read and execute program instructions.instructions.

• The instructions are ordinary CPU instructions such as The instructions are ordinary CPU instructions such as add, move add, move data, and branch, data, and branch, but the but the multiple cores can run multiple multiple cores can run multiple instructions at the same timeinstructions at the same time, increasing overall speed , increasing overall speed for programs for programs amenable to parallel computingamenable to parallel computing. .

• Manufacturers typically integrate the Manufacturers typically integrate the corescores onto a single integrated onto a single integrated circuit diecircuit die (known as a chip multiprocessor or CMP), or onto (known as a chip multiprocessor or CMP), or onto multiple dies in a single chip package.multiple dies in a single chip package.

25

MULTI-MULTI-CORECORE CPU CPU

Substrate: An underlying substance or layer.Substrate: An underlying substance or layer. 26

MULTI-CORE MULTI-CORE PROCESSORPROCESSOR

• For exampleFor example, a multi, a multicorecore processor processor may have may have a separate a separate L1 and L2 cache L1 and L2 cache and execution unitand execution unit for each corefor each core, while it has a shared L3 cache for the , while it has a shared L3 cache for the entire processor. entire processor.

• That means that while the That means that while the processor has one big pool of slower cacheprocessor has one big pool of slower cache, it , it has has separate fast memory and artithmetic/logic units separate fast memory and artithmetic/logic units for each of several for each of several corescores. .

• This would allow each core to perform operations at the same time as the This would allow each core to perform operations at the same time as the othersothers..

27


• There is an even further division, There is an even further division, called SMTcalled SMT, , Simultaneous Simultaneous MultithreadingMultithreading. .

• This is where an even smaller subset of a processor's or core's This is where an even smaller subset of a processor's or core's components is duplicated. components is duplicated.

• For exampleFor example, an SMT core might have duplicate thread scheduling , an SMT core might have duplicate thread scheduling resources, so that the core looks like two separate "processors" to resources, so that the core looks like two separate "processors" to the operating system, even though it only has the operating system, even though it only has one set of execution one set of execution unitsunits. .

• One common implementation of this is One common implementation of this is Intel's HyperthreadingIntel's Hyperthreading..

28

MULTI-CORE MULTI-CORE PROCESSORPROCESSORCache HierarchyCache Hierarchy•Modern system architectures have Modern system architectures have 2 or 3 levels in the cache hierarchy 2 or 3 levels in the cache hierarchy before going before going to main memory. to main memory. •Typically the Typically the outermost or Last Level Cache (LLC) outermost or Last Level Cache (LLC) will be shared by all cores will be shared by all cores on the on the same physical chipsame physical chip while the innermost are per core.while the innermost are per core.•We are most interested in the We are most interested in the data caches (D-cache)data caches (D-cache), although there will also be , although there will also be caches for caches for instructions (I-cache)instructions (I-cache)..

29

MULTI-COREMULTI-CORE

Cache Hierarchy…Cache Hierarchy…•As an example, on the Intel Westmere EP processors of 2010 we see:As an example, on the Intel Westmere EP processors of 2010 we see:

– a 64KB L1 D-cache per corea 64KB L1 D-cache per core– a 256KB L2 D-cache per corea 256KB L2 D-cache per core– a single 12MB L3 D-cache per socket(some products went as high as 30MB a single 12MB L3 D-cache per socket(some products went as high as 30MB

L3)L3)

30

MULTI-CORE MULTI-CORE PROCESSORPROCESSORCache Hierarchy…Cache Hierarchy…

•Cache HitsCache Hits– When data is successfully found in the cache it is called a cache When data is successfully found in the cache it is called a cache

hit.hit.– Data found in the L1 cache takes a few cycles to access.Data found in the L1 cache takes a few cycles to access.– The L2 cache may take 10 cycles.The L2 cache may take 10 cycles.– The L3 cache takes 50+ cycles.The L3 cache takes 50+ cycles.– Main memory can take hundreds of cycles.Main memory can take hundreds of cycles.

31


Cache Hierarchy…Cache Hierarchy…

•Cache Lines.Cache Lines.1.1. The The CPU manages the allocation of space in the cache.CPU manages the allocation of space in the cache.2.2. When an address is read that is not already in the cache When an address is read that is not already in the cache it it

loads a larger chunk of memoryloads a larger chunk of memory than was requested. than was requested.3.3. The The expectationexpectation is that nearby addresses will soon be used. is that nearby addresses will soon be used.4.4. These chunks of memory are called cache lines.These chunks of memory are called cache lines.5.5. Cache lines are commonly 32, 64 or 128 Cache lines are commonly 32, 64 or 128 bytesbytes in size. in size.6.6. A cache can only hold a limited number of lines A cache can only hold a limited number of lines dtermined by dtermined by

the cache size.the cache size.7.7. A A 64KB cache with 64 byte lines 64KB cache with 64 byte lines has 1024 cache lines.has 1024 cache lines.

32



Replacement PolicyReplacement Policy•When all the cache lines are being used When all the cache lines are being used a line must be evicted a line must be evicted to to make room for new data.make room for new data.•The process used to select a cache line The process used to select a cache line is called the is called the replacement replacement policy.policy.•The most common replacement policy The most common replacement policy is is least recently used (LRU).least recently used (LRU).•This policy assumes that the more recently used a cache lines is, the This policy assumes that the more recently used a cache lines is, the more likely it is to be needed again soon.more likely it is to be needed again soon.•Another replacement policy Another replacement policy is is random replacement: random replacement: a random cache a random cache line is evicted.line is evicted.

33



Cache MissesCache Misses•When a program accesses an uncached memory address it is called a When a program accesses an uncached memory address it is called a cache miss.cache miss.•Processing stalls while it attempts to fetch the data from the next Processing stalls while it attempts to fetch the data from the next level cache.level cache.•In the worst caseIn the worst case, , the miss continues all the way to the miss continues all the way to main memory.main memory.

34

35

MULTI-PROCESSOR SYSTEMMULTI-PROCESSOR SYSTEM

36


• ..

(MMU(MMU: Memory : Memory Management Unit)Management Unit)

37

MULTIPROCESSOR SYSTEMS: (SH & AAA: 1.11.1)MULTIPROCESSOR SYSTEMS: (SH & AAA: 1.11.1)INTRODUCTION, ARCHITECTURE AND ORGANISATION INTRODUCTION, ARCHITECTURE AND ORGANISATION

• A multiprocessor system is one A multiprocessor system is one that has that has more than one processor more than one processor on-board on-board in the computerin the computer. .

• They execute They execute independent streams of instructions simultaneouslyindependent streams of instructions simultaneously. . • They share They share

– system buses, system buses, – the system clock, the system clock, – and the main memory, and the main memory, – and may share peripheral devices too. and may share peripheral devices too.

• Such systems are also referred to as Such systems are also referred to as tightly coupled multiprocessor tightly coupled multiprocessor systemssystems as opposed to as opposed to network of computers network of computers (called(called distributed distributed systemssystems). ).

• A uniprocessor system A uniprocessor system can execute only one process at any point of can execute only one process at any point of real lime, though there might be many processes ready to be real lime, though there might be many processes ready to be executed. executed.

38


• By contrast, a multiprocessor system By contrast, a multiprocessor system can execute many different can execute many different processes simultaneously at the same real time. processes simultaneously at the same real time.

• However, the number of processors in the system restricts the However, the number of processors in the system restricts the degree of simultaneous process executions. degree of simultaneous process executions.

• There are two primary models of multiprocessor operating systemsThere are two primary models of multiprocessor operating systems: : symmetric and asymmetricsymmetric and asymmetric..

• In a In a symmetric multiprocessor systemsymmetric multiprocessor system, each processor executes the , each processor executes the same copy of the resident operating system, takes its own same copy of the resident operating system, takes its own decisions, and cooperates with other processors for smooth decisions, and cooperates with other processors for smooth functioning of the entire system. functioning of the entire system.

• In an asymmetric multiprocessor systemIn an asymmetric multiprocessor system, each processor is assigned , each processor is assigned at specific task, and there is at specific task, and there is a designated master processor a designated master processor that that controls activities of the other subordinate processors. controls activities of the other subordinate processors. The master The master processor assigns works to subordinate processors. processor assigns works to subordinate processors.

39

MULTIPROCESSOR SYSTEMS: (SH & AAA: 1.11.1) MULTIPROCESSOR SYSTEMS: (SH & AAA: 1.11.1) INTRODUCTION, ARCHITECTURE AND ORGANISATION INTRODUCTION, ARCHITECTURE AND ORGANISATION

• In multiprocessor systemsIn multiprocessor systems, many processors can execute operating , many processors can execute operating system programs simultaneously. system programs simultaneously.

• Consequently, Consequently, kernel path synchronization kernel path synchronization is a major challenge in is a major challenge in designing multiprocessor operating systems. designing multiprocessor operating systems.

• We need a highly concurrent kernel to achieve real gains in system We need a highly concurrent kernel to achieve real gains in system performance. performance.

• Synchronization has a much stronger impact on performance in Synchronization has a much stronger impact on performance in multiprocessor systems than on uniprocessor systems. multiprocessor systems than on uniprocessor systems.

• Many known uniprocessor synchronization techniques are Many known uniprocessor synchronization techniques are ineffective in multiprocessor systems. ineffective in multiprocessor systems.

• Multiprocessor systems need very sophisticated, specialized Multiprocessor systems need very sophisticated, specialized synchronization schemes. synchronization schemes.

• Another challenge in symmetric multiprocessor Another challenge in symmetric multiprocessor systems is to systems is to balance the workload among processors rationally. balance the workload among processors rationally.

40

MULTIPROCESSOR SYSTEMS: (SH & AAA: 1.11.1)MULTIPROCESSOR SYSTEMS: (SH & AAA: 1.11.1)INTRODUCTION, ARCHITECTURE AND ORGANISATION INTRODUCTION, ARCHITECTURE AND ORGANISATION

• Multiprocessor operating systems are expected to be fault tolerantMultiprocessor operating systems are expected to be fault tolerant, , that is, failures of a few processors should not halt the entire that is, failures of a few processors should not halt the entire system, a concept called system, a concept called graceful degradation graceful degradation of the system. of the system.

• >> >> • In multiprocessor systems, many processes may execute the kernel In multiprocessor systems, many processes may execute the kernel

simultaneously. simultaneously. • In uniprocessor systems, concurrency is only achieved in the form In uniprocessor systems, concurrency is only achieved in the form

of execution interleavings; only one process can make progress in of execution interleavings; only one process can make progress in the kernel mode, while others are blocked in the kernel wailing for the kernel mode, while others are blocked in the kernel wailing for processor allocation or some events to occur.processor allocation or some events to occur.

41

42

MULTITHREAD SYSTEMS (SH & AAA: 1.11.6)MULTITHREAD SYSTEMS (SH & AAA: 1.11.6)• A thread is an independent strand A thread is an independent strand that executes a program that executes a program

concurrently with other threads within the context of the some concurrently with other threads within the context of the some process. process.

• A thread is a single sequential flow of control A thread is a single sequential flow of control within a program within a program execution. execution.

• Each thread has a beginning, a sequence of instruction executions, Each thread has a beginning, a sequence of instruction executions, and an end. and an end.

• At any given point of time, there is one single point of execution in At any given point of time, there is one single point of execution in each thread. each thread.

• A thread is not a process by itself. A thread is not a process by itself. • It cannot run on its own; it always runs within at process. It cannot run on its own; it always runs within at process.

43

MULTITHREAD SYSTEMS (SH & AAA: 1.11.6)…MULTITHREAD SYSTEMS (SH & AAA: 1.11.6)…• Thus, a Thus, a multithreaded process may have multiple execution flowsmultithreaded process may have multiple execution flows, ,

different ones belonging to different threads. different ones belonging to different threads. • These These threads share the same private address space of the processthreads share the same private address space of the process, ,

and they share all the resources acquired by the process.and they share all the resources acquired by the process.• They run in the same process execution context, and therefore, They run in the same process execution context, and therefore, one one

thread may influence other threads in the processthread may influence other threads in the process. . • Different systems implement the thread concept differently. Different systems implement the thread concept differently. • Some systems have user-level library routines to manage threads in Some systems have user-level library routines to manage threads in

a process. a process. • An application process can be multithreaded, An application process can be multithreaded, but the operating but the operating

system sees only the process and not the contained threads*system sees only the process and not the contained threads*. . • **In some other systemsIn some other systems, , every thread has a kind of process entity every thread has a kind of process entity called called

lightweight process (LWP) in the operating system. lightweight process (LWP) in the operating system.

• **The LWPs in a process are truly independent strandsThe LWPs in a process are truly independent strands. . 44

MULTITHREAD SYSTEMS (SH & AAA: 1.11.6)…MULTITHREAD SYSTEMS (SH & AAA: 1.11.6)…• When any thread makes a system call and is blocked, the entire When any thread makes a system call and is blocked, the entire

process is blocked too, process is blocked too, and no other threads in the process can and no other threads in the process can make any progress until the former thread returns from the system make any progress until the former thread returns from the system call. call.

• No change in the operating system is required for thread handling. No change in the operating system is required for thread handling. • We often say the operating system is single threaded but We often say the operating system is single threaded but

applications are multithreaded. applications are multithreaded. • In some other systemsIn some other systems, , every thread has a kind of process entity every thread has a kind of process entity

called lightweight process (LWP) in the operating system. called lightweight process (LWP) in the operating system. • The LWPs in a process are truly independent strands. The LWPs in a process are truly independent strands. • If one LWP is blocked in the operating system, other sibling LWPs in If one LWP is blocked in the operating system, other sibling LWPs in

the process can make progress in their executions. the process can make progress in their executions. • These systems are truly multithreaded as the threads are visible to These systems are truly multithreaded as the threads are visible to

the operating system. the operating system. These systems need to provide support for These systems need to provide support for LWP creation, maintenance, scheduling, and LWP creation, maintenance, scheduling, and synchronizationsynchronization. . 45

46

PROCESSES AND THREADSPROCESSES AND THREADSProcess SynchronisationProcess Synchronisation•Process synchronization is required when one process must wait for Process synchronization is required when one process must wait for another another to complete some operation before proceeding. to complete some operation before proceeding. •For exampleFor example, , – one process one process (called a (called a writerwriter) may be writing data to a certain ) may be writing data to a certain

main memory area, main memory area, – while another process while another process (a (a readerreader) may be reading data from that ) may be reading data from that

area and sending it to the printer. area and sending it to the printer. – The reader and writer must be synchronized The reader and writer must be synchronized so that the writer so that the writer

does not overwrite existing data with new data does not overwrite existing data with new data untiluntil the reader the reader has processed it. has processed it.

– SimilarlySimilarly, the reader should not start to read until data has , the reader should not start to read until data has actually been written to the area. actually been written to the area.

47

PROCESSES AND THREADSPROCESSES AND THREADSProcess Synchronisation…Process Synchronisation…•Various synchronization techniques have been developed. Various synchronization techniques have been developed. •In one methodIn one method, the operating system provides special commands , the operating system provides special commands that that allow allow one process to signal to the second one process to signal to the second when it begins and when it begins and completes its operationscompletes its operations, so that the second knows when it may start. , so that the second knows when it may start. •In another approachIn another approach, shared data, along with the code to read or , shared data, along with the code to read or write them, write them, are encapsulated in a protected program moduleare encapsulated in a protected program module. The . The operating system then enforces rules of operating system then enforces rules of mutual exclusionmutual exclusion, , which allow which allow only one reader or writer at a time to access the moduleonly one reader or writer at a time to access the module. . •Process synchronization may also be supported by an interprocess Process synchronization may also be supported by an interprocess communication facility, communication facility, a feature of the operating system that allows a feature of the operating system that allows processes to send messages to one another.processes to send messages to one another.

48

PROCESSES AND THREADSPROCESSES AND THREADSProcess Synchronisation…Process Synchronisation…•Designing software as a group of cooperating processes Designing software as a group of cooperating processes has been has been made simpler by the concept of made simpler by the concept of ““threadsthreads.” .” •A single process may contain several executable programs A single process may contain several executable programs ((threadsthreads) ) that work together as a coherent whole. that work together as a coherent whole. •Example:Example:– One thread mightOne thread might, for example, , for example, handle error signalshandle error signals, , – another might another might send a message about the error to the usersend a message about the error to the user, , – while a third thread while a third thread is executing the actual task of the processis executing the actual task of the process. .

•Modern operating systems provide Modern operating systems provide management services (e.g., management services (e.g., scheduling, synchronization) scheduling, synchronization) for such multithreaded processesfor such multithreaded processes..

49

50

PROCESSES AND THREADSPROCESSES AND THREADSThreadsThreads •The majority of processes seen on operating systems today are single The majority of processes seen on operating systems today are single threaded, threaded, meaningmeaning there is a single path of execution there is a single path of execution within the within the processprocess. . •Should a process have to perform many sub tasks during it's Should a process have to perform many sub tasks during it's operation then operation then a single threaded process would sequence these tasks a single threaded process would sequence these tasks in a serial mannerin a serial manner, , with each sub task being required to wait for the with each sub task being required to wait for the completion of the previous sub task completion of the previous sub task before commencement. before commencement. •Such an arrangement can lead to great inefficiency Such an arrangement can lead to great inefficiency in the use of the in the use of the processor and in the apparent responsiveness of the computer. processor and in the apparent responsiveness of the computer. •An example can illustrate the advantages of having multiple threads An example can illustrate the advantages of having multiple threads of execution as shown in the figure. of execution as shown in the figure.

51

PROCESSES AND THREADSPROCESSES AND THREADS• An example can An example can

illustrate the advantages illustrate the advantages of having multiple of having multiple threads of execution threads of execution as as shown in the figure. shown in the figure.

• Suppose a user wants to Suppose a user wants to print a documentprint a document. .

• A user process can be A user process can be initiated to accept input initiated to accept input from the operator to from the operator to select the print action select the print action andand start the printing start the printing action. action.

52

• Should the user process be required to check for further user commands Should the user process be required to check for further user commands subsequent to initiating the print there are two options : subsequent to initiating the print there are two options : 1.1. the process can stop the printing periodically, poll for user input, then the process can stop the printing periodically, poll for user input, then

continue printing, or continue printing, or 2.2. wait until printing has completed before accepting user input. wait until printing has completed before accepting user input.

PROCESSES AND THREADSPROCESSES AND THREADS• An example An example can illustrate the can illustrate the

advantages advantages of having multiple of having multiple threads of execution threads of execution as shown in as shown in the figure. the figure.

• Suppose a user wants to print a Suppose a user wants to print a documentdocument. .

• A user process can be initiated A user process can be initiated to accept input from the to accept input from the operator to select the print operator to select the print action action andand start the printing start the printing action. action.

• Should the user process be Should the user process be required to check for further required to check for further user commands user commands subsequent subsequent to initiating the print there to initiating the print there are two options : are two options : 1.1. the process can stop the the process can stop the

printing periodicallyprinting periodically, poll , poll for user inputfor user input, , then then continue printingcontinue printing, or , or

2.2. wait until printing has wait until printing has completedcompleted before before accepting user inputaccepting user input. .

53

• Either of these alternatives slow down Either of these alternatives slow down printingprinting and/or decrease responsiveness. and/or decrease responsiveness.

• By contrast By contrast a a multi-threaded multi-threaded process process can have can have many paths of execution. many paths of execution.

• A multi-threaded application can delegate the A multi-threaded application can delegate the print operation to a different thread of print operation to a different thread of execution. execution.

• The input thread and print thread then run in The input thread and print thread then run in parallel until printing is completed. parallel until printing is completed.

54

MULTI-THREADINGMULTI-THREADINGMultithreading Computer ArchitectureMultithreading Computer Architecture•Applications Applications designed for the use in designed for the use in multiprocessing multiprocessing are said to be are said to be threadedthreaded, which means that , which means that they are they are broken into smaller routines broken into smaller routines that can that can be run independentlybe run independently. .

55

• Multithreading computer Multithreading computer central processing units central processing units have hardware have hardware support to efficiently execute multiple threadssupport to efficiently execute multiple threads. .

• These are distinguished from multiprocessing systems (such as These are distinguished from multiprocessing systems (such as multi-core systems) in that multi-core systems) in that the threads have to share the resources the threads have to share the resources of a single coreof a single core: the computing units, the CPU caches and the : the computing units, the CPU caches and the translation lookaside buffer (translation lookaside buffer (TLBTLB). ).

• TLBTLB is a cache that memory management hardware uses to improve is a cache that memory management hardware uses to improve virtual address virtual address translationtranslation speed. speed.

• All current desktop, laptop, and server processors include one or more TLBs in the All current desktop, laptop, and server processors include one or more TLBs in the memory management hardware.memory management hardware.

Multithreading Computer Architecture…Multithreading Computer Architecture…•Where Where multiprocessing systems include multiple complete processing multiprocessing systems include multiple complete processing unitsunits, , multithreading aims to increase utilization of a single core by multithreading aims to increase utilization of a single core by using thread-level as well using thread-level as well as instruction-level parallelismas instruction-level parallelism. . •As the As the two techniques are complementarytwo techniques are complementary, they are sometimes , they are sometimes combined in systems with combined in systems with multiple multithreading CPUs multiple multithreading CPUs and in CPUs and in CPUs with multiple multithreading cores.with multiple multithreading cores.

56

MULTI-THREADINGMULTI-THREADINGThreadThread•A thread of execution is the smallest A thread of execution is the smallest sequence of programmed instructionssequence of programmed instructions that that can be managed independently by an can be managed independently by an operating system scheduleroperating system scheduler. . •The scheduler itself is a light-weight process. The scheduler itself is a light-weight process. •A scheduler is a program that coordinates A scheduler is a program that coordinates the use of shared resources, the use of shared resources, such as a CPU, such as a CPU, RAM, Hard Disk, printer etc.RAM, Hard Disk, printer etc.•The implementation of threads The implementation of threads and and processes differs from one operating system processes differs from one operating system to another, to another, but in most cases, a thread is but in most cases, a thread is contained inside a process. contained inside a process.

A process with two threads of execution on a single

processor

57

MULTI-THREADINGMULTI-THREADINGThread…Thread…•Multiple threads can exist Multiple threads can exist within the same process within the same process and share and share resources such as memory,resources such as memory, while while different processes different processes do not share do not share these resourcesthese resources. . •In particular, the threads of a process share the latter's instructions In particular, the threads of a process share the latter's instructions (its code) and its context ((its code) and its context (the values that its variables reference at any the values that its variables reference at any given momentgiven moment).).•On a single processorOn a single processor, , multithreading is generally implemented by multithreading is generally implemented by time-division multiplexing (as in multitasking): time-division multiplexing (as in multitasking): the processor switches the processor switches between different threads. between different threads. •This context switching generally happens frequently enough that the This context switching generally happens frequently enough that the user perceives the threads or tasks as running at the same time. user perceives the threads or tasks as running at the same time. •On a multiprocessor or multi-core system,On a multiprocessor or multi-core system, threads can be truly threads can be truly concurrent,concurrent, with every processor or core executing a separate thread with every processor or core executing a separate thread simultaneously.simultaneously. 58

MULTI-THREADINGMULTI-THREADINGThread…Thread…•Many modern operating systems directly support both Many modern operating systems directly support both time-sliced time-sliced and multiprocessor threading with a process scheduler. and multiprocessor threading with a process scheduler. •The kernel of an operating system allows programmers to manipulate The kernel of an operating system allows programmers to manipulate threads via the system call interface. threads via the system call interface. •Some implementations are called a kernel threadSome implementations are called a kernel thread, whereas a , whereas a lightweight process (LWP) is a specific type of kernel thread lightweight process (LWP) is a specific type of kernel thread that that shares the same state and informationshares the same state and information..•Programs can have user-space threads when threading with timers, Programs can have user-space threads when threading with timers, signals, or other methods to interrupt their own execution, performing signals, or other methods to interrupt their own execution, performing a sort of ad hoc time-slicing.a sort of ad hoc time-slicing.

59

MULTI-THREADINGMULTI-THREADINGMultithreadingMultithreading•Multithreading is the ability of an operating system to execute Multithreading is the ability of an operating system to execute different parts of a program, called different parts of a program, called threads, threads, simultaneously. simultaneously. •The programmer must carefully design the program The programmer must carefully design the program in such a way in such a way that all the threads can run at the same time without interfering with that all the threads can run at the same time without interfering with each other. each other.

60

61

MULTIPMULTIPROCESSORROCESSOR VS VS MULTIMULTITHREADEDTHREADED PROCESSORPROCESSOR

• A computer system may have more than one on board processor A computer system may have more than one on board processor ((as shown in Fig. 2.1 on page 48as shown in Fig. 2.1 on page 48).).

• Such a computer system is called a multiprocessor or tightly Such a computer system is called a multiprocessor or tightly coupled system.coupled system.

62

I/O BusI/O Bus

System (host) BusSystem (host) Bus

MULTIPROCESSOR VS MULTITHREADED PROCESSORMULTIPROCESSOR VS MULTITHREADED PROCESSOR

• A computer system may have more than one on board processor A computer system may have more than one on board processor (as shown in Fig. 2.1 on page 48).(as shown in Fig. 2.1 on page 48).

• Such a computer system is called a multiprocessor or tightly Such a computer system is called a multiprocessor or tightly coupled system.coupled system.

• In a multiprocessor system all processors share In a multiprocessor system all processors share the the system bussystem bus, the , the system clocksystem clock, and the , and the main memorymain memory, and , and may share peripheral devicesmay share peripheral devices. .

• The The CPUs in the processors operate concurrentlyCPUs in the processors operate concurrently, and , and execute different execute different instructions simultaneouslyinstructions simultaneously. .

• The instruction executions overlap in real timeThe instruction executions overlap in real time, and one instruction , and one instruction execution may affect the behaviour of another if both processors access execution may affect the behaviour of another if both processors access the same memory locations in a conflicting manner. the same memory locations in a conflicting manner.

• A A multithreadedmultithreaded processor processor is one that can execute is one that can execute two or more two or more threads threads of control of control in parallel within the processor itselfin parallel within the processor itself. .

63

I/O BusI/O Bus

System (host) BusSystem (host) Bus

MULTIPROCESSORMULTIPROCESSOR VS VS MULTITHREADEDMULTITHREADED PROCESSORPROCESSOR• A A multithreaded processor multithreaded processor is one that can execute is one that can execute two or more two or more threadsthreads of of

controlcontrol in parallel in parallel within the processor itselfwithin the processor itself. . • A thread is viewed as a hardware-supported thread A thread is viewed as a hardware-supported thread which can be which can be

1.1. a full program (Single-threaded process), a full program (Single-threaded process), 2.2. an operating system thread (a lightweight process), an operating system thread (a lightweight process), 3.3. a compiler generated thread (subordinate microthread), a compiler generated thread (subordinate microthread), 4.4. or even a hardware generated thread.or even a hardware generated thread.

• Processor multithreading Processor multithreading provides provides many “logical” CPUs many “logical” CPUs (called (called hyperthreadshyperthreads) ) on a single physical processoron a single physical processor..

• These hyperthreads share These hyperthreads share the the execution units in the physical CPUexecution units in the physical CPU..• The physical processor has different register sets The physical processor has different register sets (containing the PC and (containing the PC and

other general-purpose registers), other general-purpose registers), one for each hyper-threadone for each hyper-thread..• The execution contents of The execution contents of currently executing threads currently executing threads are stored in are stored in

separate register-sets.separate register-sets.• Unless stated otherwise, Unless stated otherwise, we will deal with we will deal with non-threadednon-threaded, single processor systems, single processor systems, and , and

hence we will mostly use the term “the processor” or “the CPU” in this context.hence we will mostly use the term “the processor” or “the CPU” in this context. 64

MULTIPROCESSOR VS MULTITHREADED PROCESSORMULTIPROCESSOR VS MULTITHREADED PROCESSOR• Intel was the first to implement hyper-threading Intel was the first to implement hyper-threading in its Xeon in its Xeon

processor, and it later ported hyper-threads to Pentium 4.processor, and it later ported hyper-threads to Pentium 4.• Xeon forms two logical CPUs.Xeon forms two logical CPUs.

65

MULTIPROCESSOR VS MULTITHREADED PROCESSORMULTIPROCESSOR VS MULTITHREADED PROCESSOR• A thread of execution A thread of execution is the is the smallest sequence of programmed smallest sequence of programmed

instructionsinstructions that can be managed independently by an operating that can be managed independently by an operating system scheduler.system scheduler.

• The scheduler itself is a light-weight process. The scheduler itself is a light-weight process. • The implementation of threads and processes differs from one The implementation of threads and processes differs from one

operating system to another, but in most cases, operating system to another, but in most cases, a thread is a thread is contained inside a processcontained inside a process. .

• Multiple threads can exist within the same process and share Multiple threads can exist within the same process and share resources such as memory, while resources such as memory, while differentdifferent processes do not share processes do not share these resourcesthese resources. In particular, the threads of a process share the . In particular, the threads of a process share the latter's instructions (its code) and its context (the values that its latter's instructions (its code) and its context (the values that its variables reference at any given moment).variables reference at any given moment).

• On a single processor, multithreading is generally implemented by On a single processor, multithreading is generally implemented by time-division multiplexing time-division multiplexing (as in multitasking): the processor (as in multitasking): the processor switches between different threads.switches between different threads.

66

MULTIPROCESSOR VS MULTITHREADED PROCESSORMULTIPROCESSOR VS MULTITHREADED PROCESSOR• A thread and a task are similar and are often confused. A thread and a task are similar and are often confused. • Most computers can only execute one program instruction at a Most computers can only execute one program instruction at a

time, but because they operate so fast, they appear to run many time, but because they operate so fast, they appear to run many programs and serve many users simultaneously. programs and serve many users simultaneously.

• The computer operating system gives The computer operating system gives each program each program a "a "turnturn" " at at running, then requires it to wait while another program gets a turn. running, then requires it to wait while another program gets a turn.

• Each of these programs is viewed by the operating system as a Each of these programs is viewed by the operating system as a tasktask for which certain resources are identified and kept track of. for which certain resources are identified and kept track of.

• The The operating system manages operating system manages each application program each application program in your PC in your PC system system (spreadsheet, word processor, Web browser) (spreadsheet, word processor, Web browser) as a separate as a separate task task and lets you look at and control items on a task list. and lets you look at and control items on a task list.

67

MULTIPROCESSOR VS MULTITHREADED PROCESSORMULTIPROCESSOR VS MULTITHREADED PROCESSOR• The operating system manages each application program in your PC The operating system manages each application program in your PC

system (system (spreadsheet, word processor, Web browserspreadsheet, word processor, Web browser) as a separate ) as a separate task and lets you look at and control items on a task list. task and lets you look at and control items on a task list.

• If the program initiates an I/O requestIf the program initiates an I/O request, , such as such as reading a file or reading a file or writing to a printerwriting to a printer, , it creates a threadit creates a thread. .

• The data kept as part of a thread The data kept as part of a thread allows a program to be reentered allows a program to be reentered at the right place when the I/O operation completes. at the right place when the I/O operation completes.

• MeanwhileMeanwhile, other concurrent uses of the program are maintained , other concurrent uses of the program are maintained on other threads. on other threads.

• Most of today's operating systems provide support for both Most of today's operating systems provide support for both multitasking and multithreadingmultitasking and multithreading. They also allow multithreading . They also allow multithreading within program processes so that the system is saved the overhead within program processes so that the system is saved the overhead of creating a new process for each thread.of creating a new process for each thread.

68

PROCESSES AND THREADSPROCESSES AND THREADSApplicationApplication•An application consists of one or more processes. An application consists of one or more processes. ProcessProcess•A A processprocess, in the simplest terms, is an executing program. , in the simplest terms, is an executing program. ThreadThread•One or more threads run in the context of the process. One or more threads run in the context of the process. •A A threadthread is the basic unit to which the operating system allocates is the basic unit to which the operating system allocates processor time.processor time. •A thread can execute any part of the process code, including parts A thread can execute any part of the process code, including parts currently being executed by another thread. currently being executed by another thread. ContextContext1.1.The part of a text or statement The part of a text or statement that surrounds a particularthat surrounds a particular wordword or or passagepassage and and determines its meaning.determines its meaning.2.2.The circumstances in which an event occurs; a setting.The circumstances in which an event occurs; a setting.

69

PROCESSES AND THREADSPROCESSES AND THREADSJobJob•A A job objectjob object allows allows groups of processes groups of processes to be managed as a unit. to be managed as a unit. •Job objects are namable, securable, sharable objects that control Job objects are namable, securable, sharable objects that control attributes of the processes associated with them. attributes of the processes associated with them. •Operations performed on the job object affect all processes Operations performed on the job object affect all processes associated with the job object.associated with the job object.Thread PoolThread Pool•A A thread poolthread pool is a collection of worker threads that efficiently execute is a collection of worker threads that efficiently execute asynchronous callbacks on behalf of the application. asynchronous callbacks on behalf of the application. •The thread pool is primarily used to reduce the number of application The thread pool is primarily used to reduce the number of application threads and provide management of the worker threads. threads and provide management of the worker threads. FiberFiber•A A fiberfiber is a unit of execution that must be manually scheduled by the application. is a unit of execution that must be manually scheduled by the application. •Fibers run in the context of the threads Fibers run in the context of the threads that schedule them.that schedule them.

70

71


• Thus, Thus, you could have a multiprocessor, multicore, multithreaded you could have a multiprocessor, multicore, multithreaded systemsystem. . Something like two quad-core, hyperthreaded processors Something like two quad-core, hyperthreaded processors would give you 2x4x2 = 16 logical processors would give you 2x4x2 = 16 logical processors from the point of view from the point of view of the operating system.of the operating system.

• Different workloads benefit from different setups. Different workloads benefit from different setups. • A single threaded workload being done on a mostly single-purpose A single threaded workload being done on a mostly single-purpose

machine benefits from a very fast, single-core/cpu system. machine benefits from a very fast, single-core/cpu system. • Workloads that benefit from highly-parallelized systems such as Workloads that benefit from highly-parallelized systems such as

SMP/CMP/SMT setups includeSMP/CMP/SMT setups include those that have lots of small parts those that have lots of small parts that can be worked on simultaneously, or systems that are used for that can be worked on simultaneously, or systems that are used for lots of things at once, lots of things at once, such as a desktop being used to surf the web, such as a desktop being used to surf the web, play a Flash game, and watch a video all at onceplay a Flash game, and watch a video all at once. .

• In generalIn general, , hardware these days is trending more and more toward hardware these days is trending more and more toward highly parallel architectureshighly parallel architectures, as most single CPU/core raw speeds , as most single CPU/core raw speeds are "fast enough" for common workloads across most models.are "fast enough" for common workloads across most models. 72

73


COMPARISON OF COMPARISON OF SINGLESINGLE AND AND DUAL CORE DUAL CORE CPUCPU

74

MULTI-CORE PROCESSORMULTI-CORE PROCESSOR• Processors were originally developed with only one core. Processors were originally developed with only one core. • A A dual-coredual-core processor has two cores (e.g. AMD Phenom II X2, Intel processor has two cores (e.g. AMD Phenom II X2, Intel

Core 2 Duo), …Core 2 Duo), …– a a quad-corequad-core processor contains four cores (e.g. AMD Phenom II processor contains four cores (e.g. AMD Phenom II

X4, Intel's quad-core processors, see i5, and i7 at Intel Core), X4, Intel's quad-core processors, see i5, and i7 at Intel Core), – a a 6-core processor 6-core processor contains six cores (e.g. AMD Phenom II X6, contains six cores (e.g. AMD Phenom II X6,

Intel Core i7 Extreme Edition 980X), Intel Core i7 Extreme Edition 980X), – an an 8-core processor 8-core processor contains eight cores (e.g. Intel Xeon E7-contains eight cores (e.g. Intel Xeon E7-

2820, AMD FX-8350), 2820, AMD FX-8350), – a a 10-core processor 10-core processor contains ten cores (e.g. Intel Xeon E7-2850), contains ten cores (e.g. Intel Xeon E7-2850), – a 12-core processor contains twelve cores. a 12-core processor contains twelve cores.

75

NETWORK TOPOLOGIES TO INTERCONNECT CORESNETWORK TOPOLOGIES TO INTERCONNECT CORES

• A multi-core processor implements multiprocessing in a single A multi-core processor implements multiprocessing in a single physical package. physical package.

• Designers may couple cores in a multi-core device Designers may couple cores in a multi-core device tightly or looselytightly or loosely. . • For example, cores For example, cores may or may not share cachesmay or may not share caches, and , and they may they may

implement message passing or shared memory inter-core implement message passing or shared memory inter-core communication methods. communication methods.

• Common network topologies to interconnect cores include Common network topologies to interconnect cores include – bus, bus, – ring, ring, – two-dimensional mesh, two-dimensional mesh, and and – crossbarcrossbar..((CrossbarCrossbar:: A given crossbar is a single layer, non-blocking switch. "Non-blocking" A given crossbar is a single layer, non-blocking switch. "Non-blocking"

means that other concurrent connections do not prevent connecting an arbitrary means that other concurrent connections do not prevent connecting an arbitrary input to any arbitrary output. )input to any arbitrary output. ) 76

77

Full Bus Crossbar Full Bus Crossbar (or point-to-point bus)(or point-to-point bus)

Partial Bus Crossbar Partial Bus Crossbar

MULTI-CORE PROCESSORMULTI-CORE PROCESSOR• HomogeneousHomogeneous multi-core systems include only identical cores, multi-core systems include only identical cores,

heterogeneous multi-core systems have cores that are not heterogeneous multi-core systems have cores that are not identical. identical.

• Just as with single-processor systems, cores in multi-core systems Just as with single-processor systems, cores in multi-core systems may implement architectures such as superscalar, VLIW (may implement architectures such as superscalar, VLIW (Very long Very long instruction wordinstruction word ), vector processing, SIMD (), vector processing, SIMD (single instruction for multiple single instruction for multiple datadata), or multithreading.), or multithreading.

• Multi-core processors are widely used across many application Multi-core processors are widely used across many application domains including domains including 1.1. general-purpose, general-purpose, 2.2. embedded, embedded, 3.3. network, network, 4.4. digital signal processing (DSP), digital signal processing (DSP), and and 5.5. graphics.graphics.

78

MULTI-CORE PROCESSORMULTI-CORE PROCESSOR• The improvement in performance gained by the use of a multi-core The improvement in performance gained by the use of a multi-core

processor processor depends very much on the software algorithms used and depends very much on the software algorithms used and their implementationtheir implementation. .

• In particular, possible gains are limited by the fraction of the software In particular, possible gains are limited by the fraction of the software that can be run in parallel simultaneously on multiple coresthat can be run in parallel simultaneously on multiple cores; this effect ; this effect is described by Amdahl's law. is described by Amdahl's law.

• In the best case, so-called In the best case, so-called embarrassingly parallel problems embarrassingly parallel problems may may realize speedup factors near the number of cores, or even more realize speedup factors near the number of cores, or even more if the if the problem is split up enough to fit within each core's cache(s), problem is split up enough to fit within each core's cache(s), avoiding avoiding use of much slower main system memory. use of much slower main system memory.

• Most applications, however, are not accelerated so much Most applications, however, are not accelerated so much unlessunless programmers invest a prohibitive amount of effort in re-factoring the programmers invest a prohibitive amount of effort in re-factoring the whole problem. whole problem.

• The parallelization of software is a significant ongoing topic of The parallelization of software is a significant ongoing topic of research.research.

79

80

MULTI-PROCESSOR SYSTEM: MULTI-PROCESSOR SYSTEM: ARCHITECTURE AND ORGANISATIONARCHITECTURE AND ORGANISATION

• ..

81

82

SECTION A• Multi-Processor and Multi-Processor and Distributed Operating SystemDistributed Operating System: : – Introduction, Architecture, OrganisationIntroduction, Architecture, Organisation

83

DISTRIBUTED OPERATING SYSTEMSDISTRIBUTED OPERATING SYSTEMS

IntroductionIntroduction•In many systems such as In many systems such as banking, telephony, airline reservation, banking, telephony, airline reservation, flight control, industrial process control, etc., flight control, industrial process control, etc., data and functions are data and functions are spatially distributedspatially distributed. . •For exampleFor example, ,

– a bank may have many computers installed at various branches a bank may have many computers installed at various branches that are that are geographically spread across distances. geographically spread across distances.

– Some branches may have sophisticated resources Some branches may have sophisticated resources that other branches lack. that other branches lack. – Another thing is that a Another thing is that a user may like to withdraw money from any branchuser may like to withdraw money from any branch

office even though she holds accounts at a particular branch. office even though she holds accounts at a particular branch.

•There is a great need to connect these computers for the purpose of There is a great need to connect these computers for the purpose of resource sharing by all branches. resource sharing by all branches.

84

MULTI-PROCESSOR AND DISTRIBUTED OPERATING SYSTEMMULTI-PROCESSOR AND DISTRIBUTED OPERATING SYSTEM

Introduction…Introduction…•Connecting together these computers Connecting together these computers enables users to access data enables users to access data stored at one branch from other branches, and thereby stored at one branch from other branches, and thereby helps to helps to eliminate or reduce redundancies of data and functionseliminate or reduce redundancies of data and functions. . •Distributed systems are a necessity in modem day life. We will Distributed systems are a necessity in modem day life. We will become even more dependent on them in coming days. become even more dependent on them in coming days. •A distributed system provides an environment A distributed system provides an environment in which in which users can users can conveniently use resources residing anywhere in the systemconveniently use resources residing anywhere in the system. . •In this chapter we will study some basic issues In this chapter we will study some basic issues (such as (such as inter-process inter-process communicationscommunications, , deadlocksdeadlocks, , fault-tolerancefault-tolerance, etc.), etc.) in designing and in designing and developing distributed systems.developing distributed systems. •We will also study some distributed computation problems and their We will also study some distributed computation problems and their solutions. solutions.

85


Introduction…Introduction…•>> A distributed system is a powerful paradigm envisioned in the >> A distributed system is a powerful paradigm envisioned in the history of computing to make several computers working together to history of computing to make several computers working together to solve user problems. solve user problems. It is essentially a system with multiple computers It is essentially a system with multiple computers interconnected by computer networks interconnected by computer networks that are intended to work that are intended to work cooperatively. cooperatively. •>> In real networks, messages from one site to another may hop >> In real networks, messages from one site to another may hop many intermediate sites before reaching their destinations. many intermediate sites before reaching their destinations. •>> >> TraditionallyTraditionally, distributed system is a term used to refer a large , distributed system is a term used to refer a large array of computer systems, ranging from array of computer systems, ranging from tightly coupled tightly coupled (Parallel) (Parallel) systems connected by systems connected by switching networksswitching networks to to loosely coupled systems loosely coupled systems connected by connected by computer networks computer networks such as local area network, wide such as local area network, wide area network, etc. area network, etc. In this bookIn this book, , distributed systems refer only to distributed systems refer only to loosely coupled systems. loosely coupled systems.

86


Introduction…Introduction…•In tightly coupled systemsIn tightly coupled systems, processor intercommunication is bus , processor intercommunication is bus based. The bus is time shared.based. The bus is time shared.•Tight-coupling communication schemes provides a Tight-coupling communication schemes provides a set of shared set of shared registers registers which may be accessed by the CPUs at rates commensurate which may be accessed by the CPUs at rates commensurate with intra-CPU operation. with intra-CPU operation. •The shared registers thus provide a fast inter-CPU communication The shared registers thus provide a fast inter-CPU communication pathpath to minimize overhead for multi-tasking of small tasks with to minimize overhead for multi-tasking of small tasks with frequent data interchange. frequent data interchange. •Examples of switching networks are shown in following slides.Examples of switching networks are shown in following slides.

87

88

SWITCHING NETWORKSSWITCHING NETWORKS• Multiprocessors.Multiprocessors. A computer system in which two or more A computer system in which two or more

CPUs share full access to a common RAM.CPUs share full access to a common RAM.• Uniform memory accessUniform memory access (UMA) is a shared memory architecture used in (UMA) is a shared memory architecture used in

parallel computers. parallel computers. • All the processors in the UMA model share the physical memory All the processors in the UMA model share the physical memory

uniformly. uniformly. • In a UMA architecture, In a UMA architecture, access time to a memory location is independent access time to a memory location is independent

of which processor makes the request of which processor makes the request or or which memory chip contains which memory chip contains the transferred data. the transferred data.

• Uniform memory access computer architectures are often contrastedUniform memory access computer architectures are often contrasted with with non-uniform memory access non-uniform memory access (NUMA) architectures. (NUMA) architectures.

• In the UMA architecture, each processor may use a private cacheIn the UMA architecture, each processor may use a private cache. . Peripherals are also shared in some fashionPeripherals are also shared in some fashion. . The UMA model is suitable The UMA model is suitable for general purpose and time sharing applications for general purpose and time sharing applications by multiple usersby multiple users. It . It can be used to speed up the execution of a single large program in time can be used to speed up the execution of a single large program in time critical applications.critical applications.

89

SWITCHING NETWORKSSWITCHING NETWORKS

90UMA Multiprocessor using a crossbar switchUMA Multiprocessor using a crossbar switch

http://www.csd.uoc.gr/~hy345/notes/pdf/8.pdf

UMAUMA: Uniform memory access: Uniform memory access


91

UMAUMA: Uniform memory access: Uniform memory access


92

• An Omega network is a network configuration often used in parallel An Omega network is a network configuration often used in parallel computing architectures. computing architectures. It is an indirect topology that relies on the It is an indirect topology that relies on the perfect shuffle interconnection algorithm.perfect shuffle interconnection algorithm.


93

• An 8x8 Omega network An 8x8 Omega network is a multistage interconnection network, meaning that is a multistage interconnection network, meaning that processing elements (PEs) are connected using multiple stages of switches. processing elements (PEs) are connected using multiple stages of switches.

• Inputs and outputs are given addresses as shown in the figure. Inputs and outputs are given addresses as shown in the figure. • The outputs from each stage are connected to the inputs of the next stage using The outputs from each stage are connected to the inputs of the next stage using

a perfect shuffle connection system.a perfect shuffle connection system.• A perfect shuffle deck of cards A perfect shuffle deck of cards with an even number of cards is accomplished by with an even number of cards is accomplished by

splitting the deck of cards into an upper half and a lower half and then interlacing splitting the deck of cards into an upper half and a lower half and then interlacing the cards alternately, one at a time from each half of the deck.the cards alternately, one at a time from each half of the deck.

94


Distributed SystemsDistributed Systems •A distributed system is a well-knit collection of A distributed system is a well-knit collection of independent-, independent-, autonomous computers, autonomous computers, connected together via a communications connected together via a communications network. network. •These computers do not share memory, I/O devices, or system clocks.These computers do not share memory, I/O devices, or system clocks. •A computer network, A computer network, in a restricted wayin a restricted way, is often referred to as a , is often referred to as a distributed system. distributed system. •A computer in the network is called variously A computer in the network is called variously (in diverse ways)(in diverse ways) site, site, host, node, etc. host, node, etc. •A site may be a uniprocessor or multi-processor computerA site may be a uniprocessor or multi-processor computer, and is a , and is a full fledged computer system, that is typically managed by a local full fledged computer system, that is typically managed by a local operating system. operating system.

95


Distributed SystemsDistributed Systems … …•The sites work concurrentlyThe sites work concurrently and and communicate among themselves communicate among themselves by by exchanging messages over the network. exchanging messages over the network. •Each site is fitted with network interface cards Each site is fitted with network interface cards and and supporting supporting software for the purpose of communications software for the purpose of communications with other sites on the with other sites on the network (network (see Section 10.6 on page 295see Section 10.6 on page 295). ). •The messages are exchanged via communication linesThe messages are exchanged via communication lines. . •The message exchange is also asynchronousThe message exchange is also asynchronous; that is, messages may ; that is, messages may be delivered after arbitrary delays. be delivered after arbitrary delays.

96


Distributed SystemsDistributed Systems … …•Figure 16.1 presents a model of a typical computer network. Figure 16.1 presents a model of a typical computer network.

•There are four sites that are connected by five communication lines. There are four sites that are connected by five communication lines. 97



•There are four sites that are connected by five communication lines. There are four sites that are connected by five communication lines. •All sites are directly connected to one another, All sites are directly connected to one another, except sites A and Cexcept sites A and C. . •Message exchange between sites A and C Message exchange between sites A and C has to be routed via sites B has to be routed via sites B and/or Dand/or D. .

98



•Distributed systems are more often referred to as loosely coupled Distributed systems are more often referred to as loosely coupled systems, in contrast to tightly coupled multiprocessor systems. systems, in contrast to tightly coupled multiprocessor systems. •One of the key differences between loosely coupled and tightly One of the key differences between loosely coupled and tightly coupled systems coupled systems is is that in the latter there is only one operating system that in the latter there is only one operating system shared by all the processorsshared by all the processors, , but in a distributed system but in a distributed system usually there is usually there is one operating system for each site and different sites can have different one operating system for each site and different sites can have different kinds of operating systemskinds of operating systems. . •A distributed system A distributed system promotes resource sharingpromotes resource sharing, , expedites user expedites user computationscomputations, and , and enhances system availability andenhances system availability and reliabilityreliability. . •It allows It allows incremental system growth incremental system growth by adding or replacing an by adding or replacing an individual computer. individual computer.

99



•Users at one site are able to Users at one site are able to access resources at other sitesaccess resources at other sites. . •They may be able to They may be able to break up their computational tasksbreak up their computational tasks, and utilize , and utilize multiple processors available at different sites to speed up multiple processors available at different sites to speed up computations. computations. •Availability of resources Availability of resources is another feature a distributed system can is another feature a distributed system can provide: provide: If a few sites failIf a few sites fail, the remaining sites can continue their , the remaining sites can continue their operations and provide services to users. operations and provide services to users.

100



•There were many attempts to develop operating systems to facilitate There were many attempts to develop operating systems to facilitate distributed processing in distributed systems. distributed processing in distributed systems. •The earlier operating systems developed for distributed systems were The earlier operating systems developed for distributed systems were calledcalled network operating systems network operating systems which later evolved to include which later evolved to include various levels of transparencies various levels of transparencies (such as location transparency, access (such as location transparency, access transparency, control transparency, data transparency, execution transparency, control transparency, data transparency, execution transparency, name transparency, migration transparency, network transparency, name transparency, migration transparency, network transparency, performance transparency. etc.) in the system. transparency, performance transparency. etc.) in the system. •Such systems are referred to as distributed operating systems. Such systems are referred to as distributed operating systems.

101



•The concept of a distributed operating system was very ambitious in The concept of a distributed operating system was very ambitious in the beginning. the beginning. •For the users it must look and feel like an ordinary centralized For the users it must look and feel like an ordinary centralized operating systemoperating system, , but runs on several independent computers but runs on several independent computers connected by networksconnected by networks. . •That is, That is, the network, the computers, and their management the network, the computers, and their management must be must be completely transparent to the userscompletely transparent to the users. . •The (distributed) operating system must create the illusion The (distributed) operating system must create the illusion that that everything is done locally in one systemeverything is done locally in one system. .

102



•Although it was apparent that building such a truly distributed Although it was apparent that building such a truly distributed operating system is almost impossibleoperating system is almost impossible, there were numerous attempts , there were numerous attempts to build one. to build one. •In this book, we look at some basic issues related to distributed In this book, we look at some basic issues related to distributed systems and study solutions proposed in those and related efforts. systems and study solutions proposed in those and related efforts.

103

SECTION A• Multi-Processor and Distributed Operating System: Multi-Processor and Distributed Operating System: – Introduction, Architecture, Organization, Introduction, Architecture, Organization, – Resource sharingResource sharing, ,

104


16.3 16.3 Goals and ChallengesGoals and Challenges..•Distributed systems are a necessity in modem life. Distributed systems are a necessity in modem life. •We will become even more dependent on them in coming days. We will become even more dependent on them in coming days. •They have been developed for many different reasons: They have been developed for many different reasons:

1.1. resource sharing, resource sharing, 2.2. load balancing, load balancing, 3.3. electronic communications, electronic communications, 4.4. fault-tolerance, fault-tolerance, 5.5. high availability, etc. high availability, etc.

•Users should be able to develop and run distributed applications. Users should be able to develop and run distributed applications. •The following subsections briefly discuss these topics. The following subsections briefly discuss these topics.

105

MULTI-PROCESSOR AND DISTRIBUTED OPERATING SYSTEMMULTI-PROCESSOR AND DISTRIBUTED OPERATING SYSTEM16.3.1 16.3.1 Resource SharingResource Sharing.. •Resource sharing is the prime goal Resource sharing is the prime goal in developing distributed systems. in developing distributed systems. •Users at one site can conveniently use resources available at other Users at one site can conveniently use resources available at other sites. sites. For exampleFor example, ,

– Users at site S1 can draw pictures on a graphics plotter available at site S2. Users at site S1 can draw pictures on a graphics plotter available at site S2. – Users at site S2 can make use of various storage devices available at site S1. Users at site S2 can make use of various storage devices available at site S1.

•Many organizations such as banking systems Many organizations such as banking systems have data distributed have data distributed geo-graphically. geo-graphically. •Such data are vital resources for the survival of these organizations. Such data are vital resources for the survival of these organizations. •A site may need to refer to data stored at other sites to make various A site may need to refer to data stored at other sites to make various decisions. decisions. •Distributed databases have become very common in recent years. Distributed databases have become very common in recent years. •A distributed database is a database in which storage devices are not all attached to a A distributed database is a database in which storage devices are not all attached to a common processing unit. It may be stored in multiple computers, located in the same common processing unit. It may be stored in multiple computers, located in the same physical location; or may be dispersed over a network of interconnected computers.physical location; or may be dispersed over a network of interconnected computers.

106

SECTION A• Multi-Processor and Distributed Operating System: Multi-Processor and Distributed Operating System: – Introduction, Introduction, – Architecture, Architecture, – Organization, Organization, – Resource sharing, Resource sharing, – Load BalancingLoad Balancing, ,

107


16.3.2 Load Balancing16.3.2 Load Balancing..•For a given distributed system, For a given distributed system, workloads at different sites workloads at different sites vary with vary with time. time. •If some sites are highly overloaded with workIf some sites are highly overloaded with work, parts of computations , parts of computations from these sites may be transferred to lightly loaded sites to from these sites may be transferred to lightly loaded sites to normalize workload among all sites.normalize workload among all sites.•Load balancing enables busy sites to offload some work to lightly Load balancing enables busy sites to offload some work to lightly loaded sitesloaded sites, and thereby improve overall system performance. , and thereby improve overall system performance.

108


• >> The term transparency >> The term transparency in the context of distributed computing in the context of distributed computing means that means that the system should hide its “distributed nature” the system should hide its “distributed nature” from its from its applications and users by creating the illusion of a normal applications and users by creating the illusion of a normal centralized system. centralized system. For exampleFor example, the location transparency must , the location transparency must assure that the users and applications should not have to be aware assure that the users and applications should not have to be aware of the physical location of the resources. of the physical location of the resources.

• >> >> Railways, airlines, hotel reservation systems are examples of Railways, airlines, hotel reservation systems are examples of distributed databases. distributed databases.

• >> There is an analogy in our day-to-day life. When someone has >> There is an analogy in our day-to-day life. When someone has too much work to do, one seeks other people to help him to finish too much work to do, one seeks other people to help him to finish the workthe work

• >> >> Reliability is also a major concern Reliability is also a major concern in non-distributed systems in non-distributed systems where devices holding information may be physically damaged. where devices holding information may be physically damaged.

109

110

COMMUNICATION SPEEDUPCOMMUNICATION SPEEDUP• Computation Speedup.Computation Speedup. – Using several resources (such as processors, disks) from different Using several resources (such as processors, disks) from different

sites in parallel can significantly improve performance of some sites in parallel can significantly improve performance of some applications. applications.

– Users may partition their computation tasks, and execute these Users may partition their computation tasks, and execute these partitions concurrently at different sites. partitions concurrently at different sites.

– This enables them reduce response times for their overall This enables them reduce response times for their overall computation. computation.

– Note the users, Note the users, and and not the systemnot the system, , partition their taskspartition their tasks. .

111

ELECTRONIC COMMUNICATIONSELECTRONIC COMMUNICATIONS

• Electronic CommunicationsElectronic Communications..– Users at various sites are able to interact in real-time. Users at various sites are able to interact in real-time. – They can They can chatchat, , browse web pagesbrowse web pages, , exchange electronic mailsexchange electronic mails, ,

etc. etc. – The communications are done without physically exchange of The communications are done without physically exchange of

hardware devices such as CDs, sending in parcels through postal hardware devices such as CDs, sending in parcels through postal mail. mail.

112

IMPORTANT POINTSIMPORTANT POINTS• >> >> The term transparency The term transparency in the context of distributed computing in the context of distributed computing

means that the system should hide its “distributed nature” from its means that the system should hide its “distributed nature” from its applications and users by creating the illusion of a normal applications and users by creating the illusion of a normal centralized system. centralized system. For exampleFor example, the location transparency must , the location transparency must assure that the users and applications should not have to be aware assure that the users and applications should not have to be aware of the physical location of the resources. of the physical location of the resources.

• >> >> Railways, airlines, hotel reservation systems are examples of Railways, airlines, hotel reservation systems are examples of distributed databases. distributed databases.

• >> >> There is an analogy There is an analogy in our day-to-day life. When someone has in our day-to-day life. When someone has too much work to do, one seeks other people to help him to finish too much work to do, one seeks other people to help him to finish the work. the work.

• >> >> ReliabilityReliability is also a major concern in non-distributed systems is also a major concern in non-distributed systems where devices holding information may be physically damaged. where devices holding information may be physically damaged.

113

SECTION A• Multi-Processor and Distributed Operating System: Multi-Processor and Distributed Operating System: – Introduction, Introduction, – Architecture, Architecture, – Organization, Organization, – Resource sharing, Resource sharing, – Load Balancing, Load Balancing, – Availability and Fault ToleranceAvailability and Fault Tolerance, ,

114

AVAILABILITY AND FAULT TOLERANCEAVAILABILITY AND FAULT TOLERANCE• Two important goals of a distributed system Two important goals of a distributed system are are enhanced enhanced

availability of information availability of information (data and functions) to users, with (data and functions) to users, with higher higher reliabilityreliability. .

• AvailabilityAvailability ensures that information is accessible when it is ensures that information is accessible when it is needed; needed; it is the percentage of time the system is up and accessible it is the percentage of time the system is up and accessible to users. to users.

• ReliabilityReliability ensures that the system does not corrupt or lose ensures that the system does not corrupt or lose information. It is of utmost concern in a distributed system; information. It is of utmost concern in a distributed system; butbut a a highly reliable highly reliable system that is system that is poorly available poorly available is hardly of any use. is hardly of any use.

• A distributed system may suffer from many kinds of failures A distributed system may suffer from many kinds of failures (e.g., (e.g., site or link failuressite or link failures, , message loss or corruptionmessage loss or corruption) at unpredictable ) at unpredictable times. times.

115

AVAILABILITY AND FAULT TOLERANCEAVAILABILITY AND FAULT TOLERANCE• Development of distributed systems that are tolerant to certain Development of distributed systems that are tolerant to certain

kinds of failures is important. kinds of failures is important. • A fault-tolerant system is one A fault-tolerant system is one that functions smoothly, that functions smoothly, with with

graceful degradationgraceful degradation, when some failures occur in the system. , when some failures occur in the system. • Fault-tolerance is a system’s ability to behave in a well-defined Fault-tolerance is a system’s ability to behave in a well-defined

manner when faults do occur in the system. manner when faults do occur in the system. • Such systems are said to be reliable and available. Such systems are said to be reliable and available. • Note that Note that reliabilityreliability and and correctnesscorrectness are two different concepts. are two different concepts. • A system is A system is correctcorrect as long as it is free of faults and its internal data as long as it is free of faults and its internal data

structures do not contain any error. structures do not contain any error. • A system is A system is reliablereliable if failures do not seriously impair its satisfactory if failures do not seriously impair its satisfactory

operation. We quote from Peter Denning here: “operation. We quote from Peter Denning here: “Reliability means Reliability means not freedom from faults and errorsnot freedom from faults and errors, , but tolerance against thembut tolerance against them.” .”

116

117

SECTION A• Multi-Processor and Distributed Operating System: Multi-Processor and Distributed Operating System: – Introduction, Introduction, – Architecture, Architecture, – Organization, Organization, – Resource sharing, Resource sharing, – Load Balancing, Load Balancing, – Availability and Fault Tolerance, Availability and Fault Tolerance, – Design and Development ChallengesDesign and Development Challenges, ,

118

DESIGN AND DEVELOPMENT CHALLENGESDESIGN AND DEVELOPMENT CHALLENGES• Computers in a distributed system do not share memory Computers in a distributed system do not share memory or have or have

common I/O devices. common I/O devices. • They communicate among themselves They communicate among themselves only by exchanging only by exchanging

messages. messages. • Computations as well as message communications Computations as well as message communications are are

asynchronous.asynchronous. • >> An asynchronous exchange of messages means that messages >> An asynchronous exchange of messages means that messages

may be delivered after arbitrary delays. may be delivered after arbitrary delays. • Relative process execution speeds Relative process execution speeds and and message transmission delays message transmission delays

are are both unknownboth unknown, and often, , and often, unboundedunbounded. . • Algorithms for asynchronous distributed systems should not rely on Algorithms for asynchronous distributed systems should not rely on

such boundssuch bounds for their correctness. for their correctness. • These pose a severe challenge in design and development of These pose a severe challenge in design and development of

distributed systems. distributed systems. 119

DESIGN AND DEVELOPMENT CHALLENGESDESIGN AND DEVELOPMENT CHALLENGES• A fundamental problem encountered in distributed systems A fundamental problem encountered in distributed systems is that is that

no computer may have perfect knowledge of the global state of the no computer may have perfect knowledge of the global state of the entire system. entire system.

• Such lack of “up-to-date” information by computers makes Such lack of “up-to-date” information by computers makes implementation of many features much harder or impossible. implementation of many features much harder or impossible.

• For exampleFor example, , distributed deadlock detectiondistributed deadlock detection, , load balancingload balancing, and , and resource management resource management are difficult problems. are difficult problems.

• Managing global resources without accurate state information is Managing global resources without accurate state information is very difficult. very difficult.

• For exampleFor example, , there are no online perfect load balancing algorithmsthere are no online perfect load balancing algorithms. . • Practical systems employ some kind of Practical systems employ some kind of heuristicheuristic algorithms to algorithms to

rationalize workload among computers. rationalize workload among computers. • HeuristicHeuristic: encouraging discovery of solutions.

120

DESIGN AND DEVELOPMENT CHALLENGESDESIGN AND DEVELOPMENT CHALLENGES

• Fault-toleranceFault-tolerance is another formidable challenge. is another formidable challenge. • Constructing fault-tolerant communication primitives to exchange Constructing fault-tolerant communication primitives to exchange

messages among different sites is of utmost importance. messages among different sites is of utmost importance.

• >> “>> “You know you have a distributed system You know you have a distributed system when when the crash of a the crash of a computer you’ve never heard of computer you’ve never heard of stops you from getting any work stops you from getting any work done.” done.” — Leslie Lamport. — Leslie Lamport.

• Sometimes, crashing of an unknown computer Sometimes, crashing of an unknown computer (somewhere in a distributed system)(somewhere in a distributed system) may be the cause of failure of the system. may be the cause of failure of the system.

121

DESIGN AND DEVELOPMENT CHALLENGESDESIGN AND DEVELOPMENT CHALLENGESSummarySummary1.1.Relative process execution speeds Relative process execution speeds and and message transmission delays message transmission delays are are both unknownboth unknown, and often, , and often, unboundedunbounded. . 2.2.No computer may have perfect knowledge of the global state of the No computer may have perfect knowledge of the global state of the entire system. This causes problems for:-entire system. This causes problems for:-

a.a. Distributed deadlock detectionDistributed deadlock detection, , b.b. load balancingload balancing, and , and c.c. resource management. resource management.

a.a.Managing global resources. Managing global resources. b.b.There are no online perfect load balancing algorithmsThere are no online perfect load balancing algorithms. Practical . Practical systems employ some kind of systems employ some kind of heuristicheuristic algorithms. algorithms.c.c.Fault-toleranceFault-tolerance is another formidable ( is another formidable (FormidableFormidable: : extremely difficult to extremely difficult to defeat, overcome, managedefeat, overcome, manage) challenge. ) challenge.

122

SECTION A• Multi-Processor and Distributed Operating System: Multi-Processor and Distributed Operating System: – Introduction, Introduction, – Architecture, Architecture, – Organization, Organization, – Resource sharing, Resource sharing, – Load Balancing, Load Balancing, – Availability and Fault Tolerance, Availability and Fault Tolerance, – Design and Development Challenges, Design and Development Challenges, – Inter-process CommunicationInter-process Communication, ,

123

INTERPROCESS COMMUNICATIONINTERPROCESS COMMUNICATIONInterprocess communication (IPC) Interprocess communication (IPC) •IPC is a capability supported by some operating systems that allows IPC is a capability supported by some operating systems that allows one one processprocess to communicate with another process. to communicate with another process. •The processes can be running on The processes can be running on the the same computer same computer oror on different on different computerscomputers connected through a network. connected through a network. •IPC enables:IPC enables:– one application to control another application, and one application to control another application, and – for several applications to share the same data without for several applications to share the same data without

interfering with one another. interfering with one another. •IPC is required in all multiprocessing systemsIPC is required in all multiprocessing systems, but it is not generally , but it is not generally supported by single-process operating systems such as DOS. supported by single-process operating systems such as DOS. •OS/2 and MS-Windows support an IPC mechanism called DDE OS/2 and MS-Windows support an IPC mechanism called DDE ((Dynamic Data ExchangeDynamic Data Exchange) ) . .

124

INTERPROCESS COMMUNICATIONINTERPROCESS COMMUNICATIONInter-Process Communication MethodsInter-Process Communication Methods•There are There are several ways several ways to support Inter-Process Communications on an to support Inter-Process Communications on an OS. OS. These includeThese include::

1.1. Message QueuingMessage Queuing.. One or more message queues sends messages One or more message queues sends messages between running processes and the OS kernel manages them.between running processes and the OS kernel manages them.

2.2. PipesPipes. Information can only be sent in . Information can only be sent in one direction one direction and is buffered and is buffered until received.until received.

3.3. Named pipesNamed pipes. A pipe has a certain name and can be used among . A pipe has a certain name and can be used among processes that do not share a common origin.processes that do not share a common origin.

4.4. Shared memoryShared memory. Permits information exchange through a predefined . Permits information exchange through a predefined area of memory and has to be allocated before data can gain access to area of memory and has to be allocated before data can gain access to the memory location.the memory location.

5.5. SemaphoresSemaphores. Solves problems when synchronization or race . Solves problems when synchronization or race conditions arise between processes.conditions arise between processes.

6.6. SocketSocket.. Processes use these to communicate over a network via a Processes use these to communicate over a network via a client/server relationshipclient/server relationship.. 125

126

INTERPROCESS COMMUNICATIONINTERPROCESS COMMUNICATION

16.5 Interprocess Communication.16.5 Interprocess Communication.•In order to achieve some common tasks, In order to achieve some common tasks, processes need to processes need to communicate with one another.communicate with one another. •Shared memory-based communications are not possibleShared memory-based communications are not possible between between processes residing at different sites processes residing at different sites as as sites do not share main sites do not share main memory. memory. •In such situations, In such situations, processes can communicate processes can communicate only by only by exchanging exchanging messages.messages. •Thereby, Thereby, good message communication primitives are vital good message communication primitives are vital in in distributed systems. distributed systems. •The underlying The underlying communications network transports messages communications network transports messages from from one site to another.one site to another.

127


16.5 Interprocess Communication…16.5 Interprocess Communication…•Here we do not discuss how the network transports messages from Here we do not discuss how the network transports messages from one site to another. one site to another. •The operating system:The operating system:– collects messages from local processes collects messages from local processes and puts the messages and puts the messages

into the communications networkinto the communications network, , – and collects messages from the networkand collects messages from the network and delivers them to and delivers them to

local processes. local processes.

128

129

INTERPROCESS COMMUNICATIONINTERPROCESS COMMUNICATION16.5.1 Communication Primitives16.5.1 Communication Primitives..•PrimitivePrimitive: a basic interface or segment of code that can be used to build more : a basic interface or segment of code that can be used to build more sophisticated program elements or interfaces.sophisticated program elements or interfaces.

•The message communication model The message communication model that is widely used in distributed that is widely used in distributed systems systems is the is the client-server modelclient-server model, in which a , in which a sender processsender process (called (called clientclient) that needs some service ) that needs some service sends a messagesends a message to another process to another process (called (called serverserver).). •Processes (Processes (clients or serversclients or servers) do not need to know details of ) do not need to know details of communications networkcommunications network to send and/or receive messages to and to send and/or receive messages to and from one another. from one another. •They only need to know the way to identify themselves They only need to know the way to identify themselves to the system. to the system. •In a single computer system, pids (In a single computer system, pids (process identification numbers, see process identification numbers, see Section 4.7 on page 93Section 4.7 on page 93) are used to identify processes. ) are used to identify processes. •In addition, processes in a distributed system In addition, processes in a distributed system must be able to locate must be able to locate the hosts of one another. the hosts of one another.

130

INTERPROCESS COMMUNICATIONINTERPROCESS COMMUNICATION• For inter-process communications, we must have a means to For inter-process communications, we must have a means to

connect two remote processes.connect two remote processes.• Broadly speaking, Broadly speaking, there are two basic ways to connect two there are two basic ways to connect two

processesprocesses: : directdirect and and indirectindirect communications. communications. 1.1. DirectDirect..• Each process is Each process is identified by a pair identified by a pair (hname, pid). (hname, pid). • This pair is called a process nameThis pair is called a process name, where the , where the hname hname is a is a

unique name (or network number) of a host in the network, unique name (or network number) of a host in the network, and the pid is the process’s local process identification and the pid is the process’s local process identification number within the host. number within the host.

• The message communication is The message communication is directdirect. . • A sender process A sender process knowsknows the identity of the receiver to whom the identity of the receiver to whom

it is sending a message. it is sending a message.

131

INTERPROCESS COMMUNICATIONINTERPROCESS COMMUNICATION2.2. IndirectIndirect..• AlternativelyAlternatively, , each host implements a set of communication each host implements a set of communication

pointspoints or or portsports. . • A communication point is represented by a pair (hname, id) A communication point is represented by a pair (hname, id)

where the hname is a unique name (where the hname is a unique name (or network numberor network number) of ) of a host in the network, and a host in the network, and the id is a unique port number the id is a unique port number within the host. within the host.

• The message communication is The message communication is indirectindirect. . • A sender process A sender process may not know may not know the identity of the receiver the identity of the receiver

process. process. • The sender sends messages to a The sender sends messages to a specific portspecific port, and any , and any

process that is attached to the port receives the messages. process that is attached to the port receives the messages.

132

INTERPROCESS COMMUNICATIONINTERPROCESS COMMUNICATION• In either schemeIn either scheme, we assume that , we assume that givengiven a process name a process name oror a port a port

namename, the communications network can locate the host where the , the communications network can locate the host where the process or the port resides. process or the port resides.

• The operating system implements send and receive communication The operating system implements send and receive communication primitivesprimitives (Primitive: a basic interface or segment of code that can be used to (Primitive: a basic interface or segment of code that can be used to build more sophisticated program elements or interfaces.)build more sophisticated program elements or interfaces.). .

• The The send primitivesend primitive specifies a destination specifies a destination (a (a process name process name oror a a communication portcommunication port) and provides a ) and provides a messagemessage. .

• The The receive primitivereceive primitive tells from whom tells from whom (a process or a port) it (a process or a port) it expects a message and expects a message and provides a buffer provides a buffer where the incoming where the incoming message is stored. message is stored.

133

INTERPROCESS COMMUNICATIONINTERPROCESS COMMUNICATION• >> In simple terms, >> In simple terms, clients ask questions and servers answer them. clients ask questions and servers answer them. • >> >>

– Non-blocking primitives Non-blocking primitives allow more concurrency, but programming with allow more concurrency, but programming with them becomes difficult. them becomes difficult.

– IrreproducibleIrreproducible, , timing dependent timing dependent errors are often very difficult to diagnose errors are often very difficult to diagnose and debug. and debug.

• The send and receive primitives, though their specifications are The send and receive primitives, though their specifications are very simple, raised a lot of controversies in the earlier phase of very simple, raised a lot of controversies in the earlier phase of implementation - mostly, about what should be the acceptable implementation - mostly, about what should be the acceptable semantics of these primitives.semantics of these primitives.

134

INTERPROCESS COMMUNICATIONINTERPROCESS COMMUNICATION• Two fundamental aspects that must be taken into account are Two fundamental aspects that must be taken into account are – (1) (1) unreliableunreliable-versus -versus reliablereliable execution and execution and – (2) (2) non-blockingnon-blocking-versus -versus blockingblocking execution. execution.

SendSend• An An unreliableunreliable send operation send operation puts a message in the network and puts a message in the network and

returns to the caller (returns to the caller (processprocess). There is no guarantee that the ). There is no guarantee that the message will be delivered to the destination; no automatic message will be delivered to the destination; no automatic message retransmission is attempted if the original message is lost. message retransmission is attempted if the original message is lost.

• A A reliablereliable send operation send operation handles lost messages handles lost messages and and retransmissions internally retransmissions internally so that so that when the send invocation returns when the send invocation returns to the caller the message has been delivered to the destination. to the caller the message has been delivered to the destination.

135

INTERPROCESS COMMUNICATIONINTERPROCESS COMMUNICATION• When a send operation is When a send operation is non-blockingnon-blocking, the send returns control to the sender as , the send returns control to the sender as

soon as the message data is queued for subsequent transmission. soon as the message data is queued for subsequent transmission. When the When the message is actually transmittedmessage is actually transmitted, the sender is interrupted to inform about the , the sender is interrupted to inform about the trans-mission. trans-mission.

• A blocking send A blocking send does not return control to the sender until the message has been does not return control to the sender until the message has been sent (for unreliable systems) or until the message has been delivered to the sent (for unreliable systems) or until the message has been delivered to the destination (for reliable systems). destination (for reliable systems).

ReceiveReceive• In In reliablereliable receive receive, the receiver , the receiver sendssends an acknowledgment to the sender. an acknowledgment to the sender. • In In unreliableunreliable receive receive, the receiver normally , the receiver normally does not send does not send an acknowledgment to an acknowledgment to

the sender. the sender. • A A blockingblocking receive receive does not return control to the receiver until a message has been does not return control to the receiver until a message has been

copied into the receive buffer. copied into the receive buffer. • A A non-blockingnon-blocking receive receive returns immediately if there is no message for the receiver. returns immediately if there is no message for the receiver.

• >> In the Internet domain, for TCP based sockets, the communication is reliable; for >> In the Internet domain, for TCP based sockets, the communication is reliable; for UDP based sockets, the communication is unreliable. UDP based sockets, the communication is unreliable. Sockets are used to Sockets are used to implement implement a client-server architecture of inter-process communications. Servers a client-server architecture of inter-process communications. Servers listen to well-known ports to which clients connect. listen to well-known ports to which clients connect.

136

137

INTERPROCESS COMMUNICATIONINTERPROCESS COMMUNICATION16.5.2 Sockets.16.5.2 Sockets.•A socket is a mechanism A socket is a mechanism of implementing of implementing bothboth reliable-and unreliable reliable-and unreliable interprocess communications in distributed systems. interprocess communications in distributed systems. •It is somewhat similar to the It is somewhat similar to the pipe pipe interprocess communication interprocess communication scheme (scheme (see Section 6.4.4 on page 145. A pipe is a “one way” flow of data see Section 6.4.4 on page 145. A pipe is a “one way” flow of data between two related processes. It is a fixed size first-in first-out communication between two related processes. It is a fixed size first-in first-out communication channel, and the size is system independent. For each pipe one process writes data channel, and the size is system independent. For each pipe one process writes data into the pipe and another process reads the data out of the pipe. In UNIX systems, a into the pipe and another process reads the data out of the pipe. In UNIX systems, a pipe is implemented as a sequential file. A pipe has no name in the system.pipe is implemented as a sequential file. A pipe has no name in the system.). ). •A A pipepipe is used for is used for unidirectionalunidirectional byte streams between two related byte streams between two related processes processes in the same computer systemin the same computer system, while , while a socket can be used a socket can be used for bidirectional communications for bidirectional communications between two unrelated processes between two unrelated processes that can reside in two different computer systems, see Fig. 16.3. that can reside in two different computer systems, see Fig. 16.3.

138

INTERPROCESS COMMUNICATIONINTERPROCESS COMMUNICATION16.5.2 Sockets ….16.5.2 Sockets ….•A A pipepipe is used for unidirectional byte streams between two related is used for unidirectional byte streams between two related processes processes in the same computer systemin the same computer system, while , while a a socketsocket can be used can be used for bidirectional communications for bidirectional communications between two unrelated processes between two unrelated processes that can reside in two different computer systems, see Fig. 16.3. that can reside in two different computer systems, see Fig. 16.3.

139

• Figure 16.3: Socket Figure 16.3: Socket connection between connection between two processes P1 and two processes P1 and P2 in two sitesP2 in two sites


140

TheThe socketsocket system call system call creates a new socket creates a new socket and and returns a socket descriptor.returns a socket descriptor.

bindsbinds the socket to a (well-the socket to a (well-known) local addressknown) local address

The client executes the The client executes the connectconnect system callsystem call, through a socket , through a socket descriptor, to initiate a descriptor, to initiate a connection with another connection with another (possibly, remote) socket. (possibly, remote) socket.


141


142

• TCBTCB: Transmission : Transmission Control Block.Control Block.• SYNSYN: Synchronise Flag: Synchronise Flag• ACKACK: Acknowledge-: Acknowledge-

mentment


143


144

INTERPROCESS COMMUNICATIONINTERPROCESS COMMUNICATION• A socket A socket is part of a logical communication channel, and is part of a logical communication channel, and representsrepresents

one end point one end point of the channelof the channel. . • Thus, Thus, there is one socket at either end of the channelthere is one socket at either end of the channel. . • It is an It is an indirect communication schemeindirect communication scheme: any process that has access : any process that has access

to the socket can exchange data over the channel. to the socket can exchange data over the channel. • Each socket has a host addressEach socket has a host address that is used to identify the host that is used to identify the host

having the socket. having the socket. • The address format depends on the communication domain of the The address format depends on the communication domain of the

socket. socket. • In the Internet domainIn the Internet domain, a , a socket address is a pairsocket address is a pair consisting of a consisting of a

32-bit host number 32-bit host number andand a 32-bit port number a 32-bit port number. . • Other communication domains have different socket structures. Other communication domains have different socket structures.

145

INTERPROCESS COMMUNICATIONINTERPROCESS COMMUNICATION• The operating system implements a set of system calls for sockets. The operating system implements a set of system calls for sockets. • Here we describe some system calls that are implemented in UNIX Here we describe some system calls that are implemented in UNIX

systems. systems. • TheThe socketsocket system call system call creates a new socket creates a new socket and returns a socket and returns a socket

descriptor. descriptor. • TheThe client executes the client executes the connectconnect system call system call, through a socket , through a socket

descriptor, to initiate a connection with another (possibly, remote) descriptor, to initiate a connection with another (possibly, remote) socket. socket.

• A remote process A remote process (called (called serverserver or or daemondaemon) also creates a socket ) also creates a socket and binds the socket to a (well-known) local address. and binds the socket to a (well-known) local address.

• ThenThen, , the server executes the the server executes the listen system call listen system call on the socket to on the socket to inform the operating system that it is ready to accept connections inform the operating system that it is ready to accept connections from other processes (clients). from other processes (clients).

• The server then executes The server then executes the the accept system call accept system call to accept individual to accept individual connection requests from clients. connection requests from clients.

146

INTERPROCESS COMMUNICATIONINTERPROCESS COMMUNICATION• The accept system call The accept system call returns a new socket descriptor returns a new socket descriptor for a newly for a newly

accepted connection. accepted connection. • The server, at this point, usually creates a new process or threadThe server, at this point, usually creates a new process or thread, and , and

goes back to the original socket descriptorgoes back to the original socket descriptor to accept further new to accept further new connections. connections.

• The The newly created process/thread serves the client newly created process/thread serves the client whose whose connection request has been accepted by the server. connection request has been accepted by the server.

• An application process executes ordinary file-system-based read and An application process executes ordinary file-system-based read and write system callswrite system calls on a socket descriptor to receive and send, on a socket descriptor to receive and send, respectively, messages over the socket connection. respectively, messages over the socket connection.

• Finally, a close system call on a socket descriptor Finally, a close system call on a socket descriptor terminates the terminates the connection. connection.

• >> In UNIX systems, for the internet domain, >> In UNIX systems, for the internet domain, port numbers less than 1024 are port numbers less than 1024 are reserved for servers and are well-knownreserved for servers and are well-known. For example, . For example, HTTP servers listen to port HTTP servers listen to port 8080, , telnet servers to port 23telnet servers to port 23. The /etc/services file contains well-known services . The /etc/services file contains well-known services and their ports. and their ports. 147

148

INTERPROCESS COMMUNICATIONINTERPROCESS COMMUNICATION16.5.3 Remote Procedure Call.16.5.3 Remote Procedure Call.•It is convenient to have interprocess communication schemes such as It is convenient to have interprocess communication schemes such as socket primitives available to application programs. socket primitives available to application programs. •The socket primitives are used to set up communication channels The socket primitives are used to set up communication channels between two processes, residing on the same computer or on two between two processes, residing on the same computer or on two different computers. different computers. •HoweverHowever, , most application developers wish to execute most application developers wish to execute functions/procedures that reside outside the process address space functions/procedures that reside outside the process address space in in another process also another process also possibly residing at a remote sitepossibly residing at a remote site. . •Remote procedure abstraction is useful for providing communications Remote procedure abstraction is useful for providing communications across a network. across a network. •It helps application developers execute remote functions as if the It helps application developers execute remote functions as if the functions are in the same address space. functions are in the same address space. Here remote means outside Here remote means outside the process address space, and the process need not be on a different the process address space, and the process need not be on a different computer system. computer system. 149

INTERPROCESS COMMUNICATIONINTERPROCESS COMMUNICATION• The abstraction makes the semantics of interprocess communications The abstraction makes the semantics of interprocess communications

as simple as possible to the local procedure/ function calls, the way as simple as possible to the local procedure/ function calls, the way traditional function calls transfer control and data within a program traditional function calls transfer control and data within a program running in a single address space. running in a single address space.

• The application may not know the exact location of the (remote) The application may not know the exact location of the (remote) functionfunction. .

• The system takes care of all the communications required for the The system takes care of all the communications required for the remote procedure execution on behalf of applications without their remote procedure execution on behalf of applications without their knowledge or involving them. knowledge or involving them.

• The application developers do not worry about the complexities of The application developers do not worry about the complexities of interprocess communications, and, unlike socket programming, they interprocess communications, and, unlike socket programming, they need not include codes in the applications to handle interprocess need not include codes in the applications to handle interprocess communications. communications.

• As the semantics of procedure calls are well understood, it becomes a As the semantics of procedure calls are well understood, it becomes a little easier to develop distributed applications. little easier to develop distributed applications. 150


• >> >> • Underneath the socket abstraction, the operating system Underneath the socket abstraction, the operating system

implements a set of communication protocols that handle data implements a set of communication protocols that handle data transport requests through sockets. transport requests through sockets.

• The protocols help processes to exchange data without concern The protocols help processes to exchange data without concern about the underlying communications network. about the underlying communications network.

• Ultimately, the operating system executes network interface driver Ultimately, the operating system executes network interface driver routines to transmit-and receive data to and from remote routines to transmit-and receive data to and from remote computers (computers (see Section 10.6see Section 10.6).).

151


• >> >> • RPC is a message-passing IPC scheme. RPC is a message-passing IPC scheme. • But the messages are transparent to applications. But the messages are transparent to applications. • RPC has been successfully used in many distributed applications. RPC has been successfully used in many distributed applications.

152


• A procedure call mechanism is a way of one procedure/function to A procedure call mechanism is a way of one procedure/function to call another, where both procedures are part of one address space call another, where both procedures are part of one address space and process Remote procedure call (RPC) is a mechanism that helps and process Remote procedure call (RPC) is a mechanism that helps a process call a remote routine; a process call a remote routine; it is a way for one process to it is a way for one process to execute a program (procedure or function) in another process execute a program (procedure or function) in another process (possibly on a remote computer) as if the program is local to the (possibly on a remote computer) as if the program is local to the calling process address space. calling process address space.

• In the following discussion, we assume that the two processes are In the following discussion, we assume that the two processes are on two computers. on two computers.

• The RPC is implemented through running an RPC daemon on every The RPC is implemented through running an RPC daemon on every computer. computer.

• Each daemon listens to a well-known local RPC port. Each daemon listens to a well-known local RPC port.

153


• A daemon contains the required routine that it executes and sends A daemon contains the required routine that it executes and sends back the output of the execution to the requesting process. back the output of the execution to the requesting process.

• When an application process makes an RPC, it is blocked until it When an application process makes an RPC, it is blocked until it gets back a reply. gets back a reply.

• The procedure parameters are passed across the network to the The procedure parameters are passed across the network to the site where the procedure is executed. site where the procedure is executed.

• When the execution of the procedure is complete, the results are When the execution of the procedure is complete, the results are passed back to the calling process and the process resumes its local passed back to the calling process and the process resumes its local execution. execution.

• The RPC is normally implemented as follows: the calling program The RPC is normally implemented as follows: the calling program (called a client) makes a normal procedure call p(x, y, z, ...) as if p is (called a client) makes a normal procedure call p(x, y, z, ...) as if p is a procedure in the client address space. a procedure in the client address space.

154

INTERPROCESS COMMUNICATIONINTERPROCESS COMMUNICATION• A dummy or stub procedure p’ is included in the client address space or A dummy or stub procedure p’ is included in the client address space or

dynamically linked to it upon the call, see Fig. 16.4; the stub is generated by a dynamically linked to it upon the call, see Fig. 16.4; the stub is generated by a compiler or pre-processor and not supplied by applications. compiler or pre-processor and not supplied by applications.

• The local stub collects the parameters of the procedure, forms them into one or The local stub collects the parameters of the procedure, forms them into one or more messages in a standard format, and sends the messages to the remote more messages in a standard format, and sends the messages to the remote computer having the actual procedure. computer having the actual procedure.

155

INTERPROCESS COMMUNICATIONINTERPROCESS COMMUNICATION• >> Calling a functionl procedure/method by passing parameters is a >> Calling a functionl procedure/method by passing parameters is a

basic step in programming. Using such a widely known basic step in programming. Using such a widely known programming technique for remote communication alleviates the programming technique for remote communication alleviates the complexity of distributed communication. One of the main complexity of distributed communication. One of the main objectives of remote procedure calls is that the users can make the objectives of remote procedure calls is that the users can make the remote communications similar to-local procedure calls and need remote communications similar to-local procedure calls and need not deal with details of message communication at the network not deal with details of message communication at the network level. level.

156


• P-447 P-447 • The stub finds out where the called function is located.) It then The stub finds out where the called function is located.) It then

blocks the client, and waits for an answer from the remote blocks the client, and waits for an answer from the remote computer. At the remote computer, there is another stub computer. At the remote computer, there is another stub procedure in which the RPC sewer process waits for messages. procedure in which the RPC sewer process waits for messages. Upon receiving messages, the receiver stub unpacks the parame-Upon receiving messages, the receiver stub unpacks the parame-ters from the message, and makes a call to the local procedure p ters from the message, and makes a call to the local procedure p with those parameters. The result of the procedure invocation with those parameters. The result of the procedure invocation follows an analogous path in the reverse direction. There is one follows an analogous path in the reverse direction. There is one restriction in RPCs, and that is in parameter passing. Most restriction in RPCs, and that is in parameter passing. Most programming languages support parameter passing by value and by programming languages support parameter passing by value and by refer-ence. refer-ence.

157


• Passing parameters by their values is easier; the stub copies the Passing parameters by their values is easier; the stub copies the values in messages. Passing reference (pointer) parameters is not values in messages. Passing reference (pointer) parameters is not so easy. We need, for each object, a unique system-wide pointer so so easy. We need, for each object, a unique system-wide pointer so that the object can be remotely accessed. Such accesses add steep that the object can be remotely accessed. Such accesses add steep overhead. Value parameters also sometimes create problems in overhead. Value parameters also sometimes create problems in their representations across different types of processor their representations across different types of processor architectures, for example little endian versus big endian integer architectures, for example little endian versus big endian integer values, different bit-sizes for the same data type, etc. Developing a values, different bit-sizes for the same data type, etc. Developing a fault-tolerant RPC mechanism is a demanding task. Error handling fault-tolerant RPC mechanism is a demanding task. Error handling in distributed systems is far more complex than in centralized in distributed systems is far more complex than in centralized systems. There are several things that could go wrong during an systems. There are several things that could go wrong during an RPC call. RPC call.

158


• If a client makes an RPC when the server is dead, the client may be If a client makes an RPC when the server is dead, the client may be left blocked forever unless a timeout scheme is built into the client-left blocked forever unless a timeout scheme is built into the client-or local operating system. Timeouts introduce a new problem or local operating system. Timeouts introduce a new problem where the client timeouts too quickly assuming that the server is where the client timeouts too quickly assuming that the server is down, when actually the sewer is congested and responding to down, when actually the sewer is congested and responding to requests very slowly. If the network is congested or has requests very slowly. If the network is congested or has disconnected the client from the sen/er, the client cannot tell disconnected the client from the sen/er, the client cannot tell whether or not the server is down. Crashing clients may also cause whether or not the server is down. Crashing clients may also cause troubles for the server if it expects input data from clients. troubles for the server if it expects input data from clients.

159


• The crucial question is what should be the accepted semantics of The crucial question is what should be the accepted semantics of RPC during failures. Ideally we would expect exactly one execution RPC during failures. Ideally we would expect exactly one execution of the remote procedure, but it is probably impossible to achieve of the remote procedure, but it is probably impossible to achieve that. Some systems offer no guarantee at all (zero or more that. Some systems offer no guarantee at all (zero or more executions), some guarantee at the most one execution (zero or executions), some guarantee at the most one execution (zero or one), and some guarantee at least one execution (one or more). We one), and some guarantee at least one execution (one or more). We will not discuss these issues in this book. will not discuss these issues in this book.

160


• 16.6 16.6 System EnvironmentSystem Environment A single computer system creates an A single computer system creates an environment (i.e., software platform) in which users do their work environment (i.e., software platform) in which users do their work with relative ease. Creating such an environment is one of the major with relative ease. Creating such an environment is one of the major goals of operating systems. Likewise, a distributed system creates an goals of operating systems. Likewise, a distributed system creates an environment in which users can use with relative ease all the environment in which users can use with relative ease all the resources (local or remote) available throughout the system. Users resources (local or remote) available throughout the system. Users may or may not be aware of the existence of multiple computers in may or may not be aware of the existence of multiple computers in the system, and thereby the system can be a network operating the system, and thereby the system can be a network operating system or a true distributed operating system, accordingly. We system or a true distributed operating system, accordingly. We discuss these two types of systems in the next two subsections. The discuss these two types of systems in the next two subsections. The key issue in these two kinds of systems is how aware users are of the key issue in these two kinds of systems is how aware users are of the multiplicity of computers. This visibility occurs in three important multiplicity of computers. This visibility occurs in three important spheres, namely program execution, file system, and protection. spheres, namely program execution, file system, and protection.

• P-448 P-448 161

SECTION A• Multi-Processor and Distributed Operating System: Multi-Processor and Distributed Operating System: – Introduction, Introduction, – Architecture, Architecture, – Organization, Organization, – Resource sharing, Resource sharing, – Load Balancing, Load Balancing, – Availability and Fault Tolerance, Availability and Fault Tolerance, – Design and Development Challenges, Design and Development Challenges, – Inter-process Communication, Inter-process Communication,

• Distributed Applications:Distributed Applications:

162


• Distributed Applications:Distributed Applications:– Logical Clock, Logical Clock,

163


• Distributed Applications:Distributed Applications:– Logical Clock, Logical Clock, – Mutual Exclusion, Mutual Exclusion,

164


• Distributed Applications:Distributed Applications:– Logical Clock, Logical Clock, – Mutual Exclusion, Mutual Exclusion, – Distributed File System.Distributed File System.

165

166

MULTI-PROCESSOR AND DISTRIBUTED OPERATING SYSTEM*MULTI-PROCESSOR AND DISTRIBUTED OPERATING SYSTEM*

• For the most partFor the most part, , multiprocessor operating systems multiprocessor operating systems are just are just regular operating systems. regular operating systems.

• They They 1.1. handle system calls, handle system calls, 2.2. do memory management, do memory management, 3.3. provide a file system, provide a file system, 4.4. and manage I/O devices. and manage I/O devices.

• NeverthelessNevertheless, there are some areas , there are some areas in which they have unique in which they have unique featuresfeatures. These include . These include 1.1. process synchronization, process synchronization, 2.2. resource management, and resource management, and 3.3. scheduling.scheduling.

167


• Multiprocessor Operating System refers to the use of two Multiprocessor Operating System refers to the use of two or more central processing units (CPU) or more central processing units (CPU) within a single within a single computer systemcomputer system. . These multiple CPUs are in a close These multiple CPUs are in a close communication communication sharingsharing the computer busthe computer bus, , memorymemory and and other peripheral devices. These systems are referred as other peripheral devices. These systems are referred as tightly coupled systems.tightly coupled systems.

• These types of systems are used when very high speed is These types of systems are used when very high speed is required to process a large volume of datarequired to process a large volume of data. These systems . These systems are generally used in environment like are generally used in environment like – satellite controlsatellite control, , – weather forecasting weather forecasting etc. etc.

• The basic organization of multiprocessing system is shown in fig. The basic organization of multiprocessing system is shown in fig. 168


169


• Multiprocessing Multiprocessing system is based on the system is based on the symmetric multiprocessing symmetric multiprocessing modelmodel, in which , in which each processor runs an identical copy of operating each processor runs an identical copy of operating system system and and these copies communicate with each otherthese copies communicate with each other. .

• In this system processor is assigned a specific task. In this system processor is assigned a specific task. • A master processor controls the system. A master processor controls the system. • This scheme defines a master-slave relationship. This scheme defines a master-slave relationship. • These systems These systems can save money can save money in comparison to single processor in comparison to single processor

systems because systems because the processors can share the processors can share peripherals, power peripherals, power supplies and other devices. supplies and other devices.

• The main advantageThe main advantage of multiprocessor system is to get more work of multiprocessor system is to get more work done in a shorter period of time. done in a shorter period of time.

170


• The main advantageThe main advantage of multiprocessor system is of multiprocessor system is to get more work to get more work done in a shorter period of timedone in a shorter period of time. .

• Moreover, Moreover, multiprocessor systems prove more reliable multiprocessor systems prove more reliable in the in the situations of failure of one processor. situations of failure of one processor.

• In this situation, the system with multiprocessor will not halt the In this situation, the system with multiprocessor will not halt the system; it will only slow it down. system; it will only slow it down.

• In order to employ multiprocessing operating system effectivelyIn order to employ multiprocessing operating system effectively, , the computer system must have the followings: the computer system must have the followings: 1.1. Motherboard Support:Motherboard Support: A motherboard capable of handling multiple A motherboard capable of handling multiple

processors. This means additional sockets or slots for the extra chips and a processors. This means additional sockets or slots for the extra chips and a chipset capable of handling the multiprocessing arrangement. chipset capable of handling the multiprocessing arrangement.

2.2. Processor SupportProcessor Support: : processors those are capable of being used in a processors those are capable of being used in a multiprocessing system. multiprocessing system.

171


• The whole task of multiprocessing is managed by the operating The whole task of multiprocessing is managed by the operating system, system, which allocates different tasks to be performed by the which allocates different tasks to be performed by the various processors in the system. various processors in the system.

• Applications designed for the use in multiprocessing Applications designed for the use in multiprocessing are said to be are said to be threadedthreaded, which means that , which means that they are broken into smaller routines they are broken into smaller routines that can be run independentlythat can be run independently. .

• This allows the operating system to let these threads run on more This allows the operating system to let these threads run on more than one processor simultaneouslythan one processor simultaneously, which is multiprocessing that , which is multiprocessing that results in improved performance. results in improved performance.

• Multiprocessor system supports the processes to run in parallel. Multiprocessor system supports the processes to run in parallel.

172


• Parallel processing is the ability of the CPU to simultaneously Parallel processing is the ability of the CPU to simultaneously process incoming jobsprocess incoming jobs. .

• This becomes most important in computer system, as the CPU This becomes most important in computer system, as the CPU divides and conquers the jobs. divides and conquers the jobs.

• Generally the parallel processing is used in the fields like Generally the parallel processing is used in the fields like 1.1. artificial intelligence and expert system, artificial intelligence and expert system, 2.2. image processing, image processing, 3.3. weather forecasting etc. weather forecasting etc.

173


• In a multiprocessor system, the In a multiprocessor system, the dynamically sharing of resources dynamically sharing of resources among the various processors among the various processors may causemay cause, therefore, , therefore, a potential a potential bottleneck. bottleneck.

• There There are three main sources of contention are three main sources of contention that can be found in a that can be found in a multiprocessor operating system: multiprocessor operating system: – Locking systemLocking system: :

• In order to provide safe access to the resources shared In order to provide safe access to the resources shared among multiple among multiple processors, processors, they need to be protected bythey need to be protected by locking schemelocking scheme. .

• The The purpose of a locking purpose of a locking is is to serialize accesses to serialize accesses to the protected resource to the protected resource by multiple processors. by multiple processors.

• Undisciplined use of locking can severely degrade the performance of Undisciplined use of locking can severely degrade the performance of system. system.

• This form of contention can be reduced by using locking scheme, avoiding This form of contention can be reduced by using locking scheme, avoiding long critical sections, replacing locks with lock-free algorithms, or, long critical sections, replacing locks with lock-free algorithms, or, whenever possible, avoiding sharing altogether. whenever possible, avoiding sharing altogether.

174


– Locking systemLocking system: : • ……• This form of contention can be reduced by using locking scheme, This form of contention can be reduced by using locking scheme, avoiding avoiding

long critical sectionslong critical sections, replacing locks with lock-free algorithms, or, , replacing locks with lock-free algorithms, or, whenever possible, avoiding sharing altogether. whenever possible, avoiding sharing altogether.

• Critical SectionCritical Section. . • In concurrent programming, a In concurrent programming, a critical section critical section is a piece of is a piece of

codecode that accesses a shared resource ( that accesses a shared resource (data structure or devicedata structure or device) ) that must not be concurrently accessed by more than one that must not be concurrently accessed by more than one thread of execution.thread of execution.

• A critical section will usually terminate in fixed time, and a A critical section will usually terminate in fixed time, and a thread, task, or process will have to wait for a fixed time to thread, task, or process will have to wait for a fixed time to enter it (aka bounded waiting). Some synchronization enter it (aka bounded waiting). Some synchronization mechanism is required at the entry and exit of the critical mechanism is required at the entry and exit of the critical section to ensure exclusive use, for example a semaphore.section to ensure exclusive use, for example a semaphore.175


• There There are three main sources of contention are three main sources of contention that can be found in a that can be found in a multiprocessor operating system: multiprocessor operating system: – Locking systemLocking system: : ……– Shared dataShared data: :

• The continuous accesses to the shared data items by multiple processors The continuous accesses to the shared data items by multiple processors ((with one or more of them with data writewith one or more of them with data write) are serialized by the cache ) are serialized by the cache coherence protocol. coherence protocol.

• Even in a moderate-scale system, serialization delays can have significant Even in a moderate-scale system, serialization delays can have significant impact on the system performance. impact on the system performance.

• In addition, bursts of cache coherence traffic saturate the memory bus or In addition, bursts of cache coherence traffic saturate the memory bus or the interconnection network, which also slows down the entire system. the interconnection network, which also slows down the entire system.

• This form of contention can be eliminated by either avoiding sharing or, This form of contention can be eliminated by either avoiding sharing or, when this is not possible, by using replication techniques to reduce the when this is not possible, by using replication techniques to reduce the rate of write accesses to the shared data. rate of write accesses to the shared data.

176


• There are three main sources of contention that can be found in a There are three main sources of contention that can be found in a multiprocessor operating system: multiprocessor operating system: – Locking systemLocking system: : ……– Shared dataShared data: : ……– False sharingFalse sharing: :

• This form of contention arisesThis form of contention arises when unrelated data items when unrelated data items used by used by different processorsdifferent processors are located next to each other in the memory are located next to each other in the memory and, and, therefore, therefore, share a single cache line. share a single cache line.

• The effect of false sharing The effect of false sharing is the same as that of regular sharing ie is the same as that of regular sharing ie bouncing of the cache line among several processors. bouncing of the cache line among several processors.

• Fortunately, Fortunately, once it is identifiedonce it is identified, false sharing can be easily eliminated by , false sharing can be easily eliminated by setting the memory layout of non-shared data. setting the memory layout of non-shared data.

177


• Apart from eliminating bottlenecks in the systemApart from eliminating bottlenecks in the system, a multiprocessor , a multiprocessor operating system developer operating system developer should provide support for efficiently should provide support for efficiently running user applicationsrunning user applications on the multiprocessor. on the multiprocessor.

• Some of the aspects Some of the aspects of such support include of such support include 1.1. mechanisms for task placement mechanisms for task placement and migration across and migration across

processors, processors, 2.2. physical memory placement physical memory placement insuring most of the memory insuring most of the memory

pages used by an application are located in the local memory, pages used by an application are located in the local memory, and and

3.3. scalable multiprocessor synchronization primitivesscalable multiprocessor synchronization primitives..

178

DISTRIBUTED OPERATING SYSTEM*DISTRIBUTED OPERATING SYSTEM*

179

DISTRIBUTED OPERATING SYSTEM*DISTRIBUTED OPERATING SYSTEM*

• A distributed operating system is a software over a collection of A distributed operating system is a software over a collection of – independent, independent, – networked, networked, – communicating, communicating, and and – physically separate physically separate computational nodes. computational nodes.

• Each individual node Each individual node hold a specific software subset of the global hold a specific software subset of the global aggregate operating system. aggregate operating system.

• Each subset Each subset is a composite of two distinct service provisioners.is a composite of two distinct service provisioners.

– The first The first is a ubiquitous minimal kernel, or microkernel, that directly controls is a ubiquitous minimal kernel, or microkernel, that directly controls that node’s hardware. that node’s hardware.

– SecondSecond is a higher-level collection of is a higher-level collection of system management componentssystem management components that that coordinate the node's individual and collaborative activities. coordinate the node's individual and collaborative activities.

180

DISTRIBUTED OPERATING SYSTEMDISTRIBUTED OPERATING SYSTEM

• Each subset Each subset is a composite of two distinct service provisioners.is a composite of two distinct service provisioners.

– The first The first is a ubiquitous minimal kernel, or microkernel, that directly controls is a ubiquitous minimal kernel, or microkernel, that directly controls that node’s hardware. that node’s hardware.

– SecondSecond is a higher-level collection of is a higher-level collection of system management componentssystem management components that that coordinate the node's individual and collaborative activities. coordinate the node's individual and collaborative activities.

• These components abstract microkernel functions and support user These components abstract microkernel functions and support user applications.applications.

• The microkernel and the management components collection work The microkernel and the management components collection work together. together.

• They support the system’s goal of integrating multiple resources They support the system’s goal of integrating multiple resources and processing functionality into an efficient and stable system.and processing functionality into an efficient and stable system.

• This seamless integration of individual nodes into a global system This seamless integration of individual nodes into a global system is is referred to as referred to as transparencytransparency, or , or single system imagesingle system image; describing the ; describing the illusion provided to users of the global system’s appearance as a illusion provided to users of the global system’s appearance as a single computational entity.single computational entity.

181

DISTRIBUTED OPERATING SYSTEMDISTRIBUTED OPERATING SYSTEM

182

183

DISTRIBUTED SYSTEMSDISTRIBUTED SYSTEMS• A distributed system A distributed system consists of consists of – a collection of autonomous computers, a collection of autonomous computers, – connected through connected through a network a network andand distribution distribution middlewaremiddleware, , which enables computers to coordinate their activities and to which enables computers to coordinate their activities and to

share the resources of the system, so that users perceive the share the resources of the system, so that users perceive the system as a single, integrated computing facility.system as a single, integrated computing facility.

MiddlewareMiddlewareMiddleware is software that acts as a Middleware is software that acts as a bridgebridge between between an operating an operating

system or databasesystem or database and and applicationsapplications, especially on a network., especially on a network.

184

DISTRIBUTED SYSTEMSDISTRIBUTED SYSTEMS• A distributed system is a software system A distributed system is a software system in which in which

– components located on networked computers components located on networked computers communicate and coordinate their actions communicate and coordinate their actions – by passing messages. by passing messages.

• The components interact with each other in order to achieve a The components interact with each other in order to achieve a common goal. common goal.

• There are many alternatives for the message passing mechanismThere are many alternatives for the message passing mechanism, , including RPC-like connectors and message queues. including RPC-like connectors and message queues.

• ThreeThree significant characteristics significant characteristics of distributed systems are: of distributed systems are: 1.1. concurrency of components, concurrency of components, 2.2. lack of a global clock, lack of a global clock, 3.3. and independent failure of components. and independent failure of components.

• An important goal An important goal and challenge of distributed systems is location and challenge of distributed systems is location transparency. transparency. 185

DISTRIBUTED SYSTEMSDISTRIBUTED SYSTEMS• Examples of distributed systems vary from SOA-based systems to Examples of distributed systems vary from SOA-based systems to

massively multiplayer online games to peer-to-peer applications.massively multiplayer online games to peer-to-peer applications.• A computer program that runs in a distributed system is called a A computer program that runs in a distributed system is called a

distributed programdistributed program, and , and distributed programming distributed programming is the process of is the process of writing such programs.writing such programs.

• Distributed computing also refers to the use of distributed systems Distributed computing also refers to the use of distributed systems to solve computational problems. to solve computational problems.

• In distributed computingIn distributed computing, a problem is divided into many tasks, a problem is divided into many tasks, , each of which is solved by one or more computerseach of which is solved by one or more computers, which , which communicate with each other by communicate with each other by message passing..

186

http://en.wikipedia.org/wiki/Message_passing

Documents

SYLLABUS