Tucker Computer Science Handbook

96Concurrent/Distributed

Computing Paradigm

Andrew P. BernatComputer Research Association

Patricia J. TellerUniversity of Texas at El Paso

96.1 Introduction

96.2 Hardware Architectures

96.3 Software ArchitecturesBusy-Wait: Concurrency without Abstractions Semaphores Monitors Message Passing

96.4 Distributed Systems

96.5 Formal Approaches

96.6 Existing Languages with Concurrency Features

96.7 Research Issues

96.8 Summary

96.1 Introduction

Concurrent computing is the use of multiple, simultaneously executing processes or tasks to compute

an answer or solve a problem. The original motivation for the development of concurrent computing

techniques was for timesharing multiple users or jobs on a single computer. Modern workstations use

this approach in a substantial manner. Another advantage of concurrent computing, and the reason

for much of the current attention to the subject, is that it seems obvious that solving a problem using

multiple computers is faster than using just one. Similarly, there is a powerful economic argument for using

multiple inexpensive computers to solve a problem that normally requires an expensive supercomputer.

Additionally, the use of multiple computers can provide fault tolerance.

Moreover, there is an additional powerful argument for concurrent computing the world is inherently

concurrent. Just as each of us engages in a large number of concurrent tasks (hearing while seeing while

reading, etc.), operating systems need to handle multiple, simultaneously executing tasks; robots need

to engage in a multiplicity of actions; database systems must simultaneously handle large numbers of

users accessing and updating information; etc. Often, breaking a problem into concurrent tasks provides

a simpler, more straightforward solution.

As an example, consider Conways problem: input is in the form of 80-character records (card images

in the original problem, which gives an idea of how long it has been around); output is to be in the form of

120-character records; each pair of dollar signs, $$, is to be replaced by a single dollar sign, $; and a space,

, is to be added at the end of each input record. In principle, a sequential solution may be developed,

but the complications introduced require complex and non-obvious buffer manipulations. Moreover, a

2004 by Taylor & Francis Group, LLC

concurrent solution consisting of three processes is both simpler and more elegant. The three processes

execute within infinite loops performing the following actions:

1. Process1 reads 80-character records into an 81-character buffer, places a space character in location

81, and then outputs single characters from the buffer sequentially.

2. Process2 reads single characters and copies them to output, but uses a simple state machine to

substitute a single $ for two consecutive $$.

3. Process3 reads single characters, saves them in a buffer, and outputs 120-character records.

To develop an implementable solution, we need to decide how the independently executing processes

communicate. A simple, widely used approach is to add two buffers: Buffer1 stores output characters from

Process1 to be input to Process2; Buffer2 stores output characters from Process2 to be input to Process3.

For simplicity, assume that Buffer1 and Buffer2 each hold a single character. Thus:

1. Process1 reads 80-character records into an 81-character internal buffer, places a space character

in location 81, and sequentially places in Buffer1 single characters from the internal buffer.

2. Process2 reads single characters from Buffer1 and places them into Buffer2, but uses a simple state

machine to substitute a single $ for two consecutive $$.

3. Process3 reads single characters from Buffer2, saves them in an internal 120-character buffer, and

outputs 120-character records.

This solution demonstrates the essence of the concurrent paradigm: individual sequential processes that

cooperate to solve a problem. The exemplified concurrency is pipelined concurrency, where the input of

all processes but the first is provided by another process. Cooperation, in this and all other cases, requires

that the processes:

1. Share information and resources

2. Not interfere during access to shared information or resources

In the Conway solution, information is readily shared via the buffers. The chief problem is to ensure

that concurrent accesses to the two buffers do not conflict; for example, Process2 does not attempt to

retrieve a character from Buffer1 before it has been placed there by Process1 (which would lead to garbage

characters), and Process1 does not attempt to place a character into Buffer1 before the previous character

has been retrieved by Process2 (which would lead to lost characters).

A simpler example of interference is provided by the following simple program (where the statements

within the cobegincoend pair are to be executed simultaneously):

x := 0

cobegin

x := x + 1

x := x + 2

coend

Consider the value of x at the end of execution. Because each assignment statement is actually a sequence

of machine-level instructions, various interleavings of the execution of these instructions result in different

final values for x (i.e., 1, 2, or 3). Clearly, this is unacceptable!

In each of these examples, it is clear that there are critical regions in which two (or more) processes

have sections of code that may not be executed concurrently; we must have mutual exclusion between the

critical regions. In the Conway example, critical regions include:

r Process1 placing a value into Buffer1r Process2 retrieving a value from Buffer1r Process2 placing a value into Buffer2r Process3 retrieving a value from Buffer2


In the simple example above, each of the two assignment statements are critical regions. The essence of

avoiding interference is to discover the critical regions and isolate them. This isolation takes the form of

an entry protocol to announce entry into a critical region and an exit protocol to announce that the

execution of the critical region has completed (below the # introduces a comment and the . . . represents

the appropriate program code):

# entry protocol

...

# critical region code

...

# exit protocol

...

This is the basic model used by the busy-wait and semaphore approaches (discussed below). It is a low-level

model in the sense that careful attention must be paid to the placement of the entry and exit protocols to

ensure that critical regions are properly protected.

There are other implementation approaches to concurrency that solve the critical region problem by

prohibiting any direct interference between concurrent processes. This is done by not allowing any sharing

of variables. The monitor approach places all shared variables and other resources under the control of a

single monitor module, which is accessed by only a single process at a time. The message-passing approach

is to share information only through messages passed from process to process. Both of these approaches

are discussed in this chapter.

As well as avoiding interference in data access, we must avoid interference in the sharing of resources

(e.g., keyboard input for multiple processes). Also, we must ensure that any physical actions of concurrent

processes, such as movement of robotic arms, are appropriately synchronized.

Thus, to develop concurrent solutions, we require notations to:

1. Specify which portions of our processes can run concurrently

2. Specify which information and resources are to be shared

3. Prevent interference by concurrent processes by ensuring mutual exclusion

4. Synchronize concurrent processes at appropriate points

Further, any proposed solution to a concurrent problem must have certain properties (see, for example,

[Ben-Ari, 1990]):

1. Safety: this property must always be true; examples include:

a. Non-interference

b. No deadlock, which occurs when no process can continue because all processes are waiting upon

conditions that can never occur

c. Partial correctness: whenever the program terminates, it has the correct answer

2. Liveness: this property must be true eventually; examples include:

a. Program terminates (if it also has the correct answer, this is total correctness)

b. No race: nondeterministic behavior caused by concurrently executing processes

c. Fairness: each process has an opportunity to execute (this is affected by implementation and

process/thread scheduling)

The verification or proof that solutions satisfy these properties is vastly complicated by the concurrent

execution of code: particular orderings of code execution may exhibit interference or deadlock while others

proceed nicely to termination. Returning to Conways problem, suppose the execution of Process1 and

Process2 are matched evenly so that each character placed by Process1 into Buffer1 is retrieved by Process2

before Process1 is ready to output another character. In this case, when tested, the program exhibits the

desired correctness properties, lack of deadlock, etc. But if, due to a variation in processor workload or

type, Process1 runs faster, then characters will be overwritten and lost; on the other hand, if Process2

runs faster, characters will be repeated. The fact that we tested the program under one particular set


of circumstances (even for all possible inputs) is irrelevant to this issue. Sufficient testing is impossible

because of the exponential explosion in the number of possible interleavings of instruction execution

that can occur. The only fully satisfactory approach is to use formal methods (techniques that are still

predominantly under development), which are touched on later in this chapter.

This chapter focuses on the software architectures used for concurrency, using a set of archetypical

problems and their solutions for illustration. These problems are chosen because of the frequency with

which they arise in computing; careful study of actual problems frequently leads to the realization that

a seemingly complicated problem is, at heart, one of these archetypes. First, we briefly explore hardware

architectures and their impact on software.

96.2 Hardware Architectures

Hardware can influence synchronization and communication primarily with respect to efficiency. Mul-

tiprogramming is the interleaving of the execution of multiple programs on a processor; on a uniproces-

sor, a time-sharing operating system implements multiprogramming. Although such an approach on a

uniprocessor does not provide the execution speedup discussed in the introduction, it does provide the

possibility of elegance and simplicity in problem solution, which is the second argument for the concurrent

paradigm.

By employing multiple computers, we have multiprocessing, or parallel processing. Multiprocessing can

involve multiple computers working on the same program or on different programs at the same time.

If a multiprocessor system is built so that processors share memory, then processes can communicate

via global variables stored in shared memory; otherwise, they communicate via messages passed from

process to process. In contrast to a multiprocessor system, a distributed system is comprised of multiple

computers that are remote from each other. This chapter focuses on multiprogramming and multipro-

cessing systems with a short introduction to the additional problems associated with distributed systems.

In addition (but outside the scope of this chapter), a wide variety of hybrid hardware/software approaches

exist.

96.3 Software Architectures

To specify a software architecture for implementing concurrency, we must provide the syntax and semantics

to:

1. Specify which information and resources are to be shared

2. Specify which portions of processes can run concurrently

3. Prevent interference by concurrent processes by ensuring mutual exclusion

4. Synchronize concurrent processes at appropriate points

The first feature requires no special notation (shared variables are simply global), and the third and fourth

are usually merged into one. A large number of software mechanisms have been proposed to support these

features; in this chapter we explore the most widely used among them:

1. Busy-wait: implementable on virtually any processor without operating system support; this is

concurrency without abstractions

2. Semaphores: historically the oldest satisfactory mechanism

3. Monitors: modules that encapsulate concurrent access to shared data

4. Message passing: a higher-level abstraction widely used in distributed systems

The references at the end of the chapter provide pointers to a number of other mechanisms, such as Unix

fork/join, conditional critical regions, etc.


96.3.1 Busy-Wait: Concurrency without Abstractions

To illustrate the busy-wait mechanism, we use (following [Ben-Ari, 1982], a very simple example consisting

of two concurrent processes, each with a single critical region. The only assumption made is that each

memory access is atomic; that is, it proceeds without interruption. Our task is to ensure mutual exclusion;

the purpose of the exercise is to demonstrate the care with which a solution must be crafted to ensure the

safety and liveness properties discussed above.

Our first approach, which follows, is to ensure that the processes, p1 and p2, simply take turns in their

critical regions.

global var turn := 1

process p1

while true do ->

# non-critical region

...

# entry protocol

while turn = 2 do ->

# wait for turn

# critical region

...

# exit protocol

turn := 2

# rest of computation

...

end p1

process p2

while true do ->


...

# entry protocol

while turn = 1 do ->

# wait for turn

# critical region

...

# exit protocol

turn := 1

# rest of computation

...

end p2

This approach meets the desired properties but has a fundamental flaw: processes must take turns entering

their critical regions. If p1 is ready and needs to execute its critical region at a higher frequency than p2, it

cannot. The processes are an example of co-routines, historically one of the first approaches to concurrency.

If we modify the solution to allow each process to proceed into its critical region if the other process is

not in its critical region, and to then notify the other process, we obtain the following (where ci is used

to signify that pi is in its critical region):

global var c1 := false, c2 := false

process p1

while true do ->



...

# entry protocol

while c2 do ->

# wait for turn

c1 := true # p1 in critical region

# critical region

...

# exit protocol

c1 := false # p1 out of critical region


end p1

process p2

while true do ->


...

# entry protocol

while c1 do ->

# wait for turn

c2 := true # p2 in critical region

# critical region

...

# exit protocol



end p2

Now, however, we have the possibility that the mutual exclusion requirement of the critical region can be

violated; that is, both processes can be in their critical regions at the same time. (For example, suppose

both c1 and c2 are false; p1 checks c2 via the loop and decides that it may enter its critical region; before

it sets c1 to true, p2 checks c1 via its loop and decides that it may enter its critical region).

As shown next, this disastrous possibility can be eliminated by having a process announce its intent to

enter into its critical region before checking whether it can enter:


process p1

while true do ->


...

# entry protocol

c1 := true # signal intent to enter

while c2 do ->

# wait for turn

# critical region

...

# exit protocol



end p1

process p2


while true do ->


...

# entry protocol


while c1 do ->

# wait for turn

# critical region

...

# exit protocol



end p2

But now we have raised the possibility of a race (when p1 sets c1 to true and p2 sets c2 to true).

A possible solution to this difficulty, which appears below, moves the announcement statement into the

loop, together with a random delay:


process p1

while true do ->


...

# entry protocol


while c2 do ->

c1 := false # give up intent if p2 already

# in critical region

c1 := true # try again

# critical region

...

# exit protocol



...

end p1

process p2

while true do ->


...

# entry protocol


while c1 do ->

c2 := false # give up intent if p1 already

# in critical region

c2 := true # try again

# critical region

...

# exit protocol




...

end p2

But this is not a satisfactory solution because it exhibits a race in the (unlikely) situation that the two loops

proceed in perfect synchronization.

A valid solution, such as that which appears below, can be developed by returning to the concept of

taking turns when applicable, which ensures mutual exclusion while not requiring alternating turns (thus

allowing true concurrency):

global var c1 := false, c2 := false, turn := 1

process p1

while true do ->


...

# entry protocol


turn := 2 # give p2 priority

while c2 and turn = 2 do ->

# wait if p2 in critical region

# critical region

...

# exit protocol



...

end p1

process p2

while true do ->


...

# entry protocol


turn := 1 # give p1 priority

while c1 and turn = 1 do ->

# wait if p2 in critical region

# critical region

...

# exit protocol



...

end p2

This solution is due to Peterson [1983]; the first valid solution was presented by Dekker.

The importance of the busy-wait approach is threefold:

1. It provides a nice introduction to the problems inherent in designing concurrent solutions.

2. It is executable on virtually every machine architecture without additional further software support

and is, thus, suitable for micro-controllers, etc.

3. Variants are frequently used in hardware implementations.


However, this approach also suffers from two difficulties:

1. It is very inefficient: machine cycles are expended when executing busy-wait loops.

2. Programming at such a low level is highly prone to error.

96.3.2 Semaphores

Dijkstra [1968] presented the first abstract mechanism for synchronization in concurrent programs. The

semaphore, so named in direct relation to the semaphores used on railroad lines to control traffic over a

single track, is a non-negative integer-valued abstract data type with two operations:

P(s) : delay until s > 0, then s := s - 1

V(s) : s := s + 1

When a process delays on a semaphore, it is awakened only when another process executes a V operation

on that semaphore. Thus, it uses no machine cycles to check if it can proceed. If more than one process

is delaying on a semaphore, only one (which one is implementation dependent) can be awakened by a V

operation.

Additionally, the value of s can be set at instance creation via the semaphore declaration; if set to

0, then some process must execute the V(s) operation before any processes first executing the P(s)

operation can continue. With this abstract data type, we have a mechanism that handles both interference

and synchronization.

Additional notes:

1. These are the only two synchronization operations defined; in particular, the value of s is not

determinable.

2. Implementation of these operations must be either in the hardware or in the (non-interruptible)

system kernel.

3. By sleeping while waiting for a semaphore (the delay in P(s)), a process does not waste machine

cycles by repeated checking.

4. The operation names (P and V) come from the Dutch words passeren (to pass) and vrygeven (to

release); sometimes, signal and wait are used in place of P and V, respectively.

5. Each of the P and V operations proceeds atomically; that is, it may not be interrupted by another

process.

The use of the semaphore in concurrent programming relates directly to the railroad analogy. Each

critical section looks like the following:

global var s : semaphore := 1

# entry protocol

P(s)

# critical region

...

# exit protocol

V(s)

The initialization of s to 1 ensures that the first process executing P(s) will continue. (Deadlock arises

if s were initialized to 0.) Only the first process to reach its P(s) statement is allowed to proceed, as

subsequent processes find s = 0 and delay. When the first process finishes its critical region, it executes

V(s), which setss to 1. One of the waiting processes is awakened, finds>0, decrementss, and proceeds.

Note the importance that these operations are atomic; this ensures that two processes cannot wake up and

each find s>0.


96.3.2.1 Semaphores and Producer-Consumer

The Producer-Consumer problem arises whenever one process is creating values to be used by another

process. Examples are Conways problem and buffers of various kinds, etc. Here we first look at the

multi-element buffer version of this problem and then add multiple producers and consumers as a

refinement.

# define the buffer

const N := ... # size

var buf[N] : int # buffer

front := 1 # pointers

rear := 1

semaphore empty := N # counts the number of empty slots

in the buffer

full := 0 # counts the number of items

in the buffer

process producer

var x : int

while true do ->

# produce x

...

P(empty) # delay until there is space in the buffer

buf[rear] := x # place value in the buffer

V(full) # signal that the buffer is non-empty

rear := rear mod N + 1 # update buffer pointer

end producer

process consumer

var x : int

while true do ->

P(full) # delay until a value is in the buffer

x := buf[front] # obtain value

V(empty) # signal that the buffer is not full

front := front mod N + 1 # update buffer pointer

# consume x

...

end consumer

The buffer processing is conventional; only the actual buffer access must be placed into a critical region

because there is no possibility of interference between the assignments to rear and front. Note also the

use of two semaphores: empty to signal that the producer can proceed because there is at least one empty

slot in the buffer and full to signal that the consumer can proceed because there is at least one item in

the buffer. Although it is possible to solve this problem with one semaphore, less concurrency results. Note

that the empty semaphore is initialized to N, the size of the buffer. The producer process can run up to N

steps ahead of the consumer process.

To allow multiple producers and/or consumers, we must protect the actual buffer operations with

additional semaphores to prevent, for example, two producers from accessing rear simultaneously with

read and assignment operations. These semaphores, mutexR and mutexF, guarantee mutual exclusion

of access to the rear and front pointers, respectively. It is not sufficient to use empty here because up

to N producers will be able to continue through the P(empty) statement.

# define the buffer as previously

semaphore empty := N, full := 0


semaphore mutexR := 1 # mutual exclusion on rear pointer

mutexF := 1 # mutual exclusion on front pointer

process pi # one for each producer

var x : int

while true do ->

# produce x

...

P(empty) # delay until there is space in the buffer

P(mutexR) # delay until rear pointer is not in use

# place value in the buffer and modify pointer

buf[rear] := x; rear := rear mod N + 1

V(mutexR) # release rear pointer

V(full) # signal that the buffer is non-empty

end pi

process ci # one for each consumer

var x : int

while true do ->

P(full) # delay until a value is in the buffer

P(mutexF) # delay until front pointer is not in use

# access the value in the buffer and modify pointer

x := buf[front]; front := front mod N + 1

P(mutexF) # release front pointer

V(empty) # signal that there is space in the buffer

# consume x

...

end ci

96.3.2.2 Semaphores and Readers-Writers

The Readers-Writers model captures the fundamental actions of a database; i.e.,

r No exclusion between readersr Exclusion between readers and a writerr Exclusion between writers

In other words, the software must guarantee only one update of a database record at a time, and no reading

of that record while it is being updated.

The simplest semaphore solution is to wait only for the first reader; subsequent readers need not check

because no writer can be writing if there is already a reader reading (here, nr and nw are the numbers of

active readers and writers, respectively):

...

nr := nr + 1

if nr = 1 -> P(rw) # if no one is presently reading,

# then ensure no one is writing

# before proceeding

# access database

...

nr := nr - 1

if nr = 0 -> V(rw) # if no more are reading, possibly wake up

# writer, or prepare for next reader


...

P(rw) # delay until no readers or writers

# access database

...

V(rw) # wake up delayed reader or writer, or prepare

# for next reader or writer

This solution gives readers preference over writers: new readers continually freeze out waiting writers.

Extending this solution to other kinds of preferences, such as writer preference or first-come-first-served

preference is cumbersome.

A more general approach is known as passing the baton; it is easily extended to other kinds of prefer-

ences because control is explicitly handed from process to process. Although a careful explanation of the

approach is not given here, the concept is easily summarized. A process must check to ensure that it can

legally proceed before doing so; if it cannot proceed, the process waits upon a semaphore assigned to it.

For example, a writer process checks to see if no readers or writers are executing on the database before it

proceeds; if they are executing on the database, then the writer process sleeps, waiting upon the semaphore

assigned to it. When a process is finished accessing the database, it checks the conditions and wakes up (via

signaling on the appropriate semaphore) one of the processes waiting upon the condition. This last opera-

tion essentially passes the baton from one process to another. The key is that first a check is made to ensure

that it is legal for the other process to wake up. The strength of the passing the baton approach emerges

when its flexibility is used to develop more general solutions. Details may be found in Andrews [1991].

96.3.2.3 Difficulties with Semaphores in Software Design

While the use of semaphores does provide a complete solution to the interference problem, the correctness

of the solution directly depends on the correct usage of the semaphore operations, which are fairly low-level

and unstructured. Semaphores and shared variables are global to all processes and, like any global data

structure, their correct usage requires considerable discipline by the programmer. Additionally, if a large

system is to be built, any one implementor is likely responsible for only a portion of the semaphore usage

so that correct pairing of Ps and Vs may be difficult. Despite this difficulty, semaphores are a widely used

construct for concurrency.

96.3.3 Monitors

A more structured approach is to encapsulate the shared data/resources and their operations into a single

module called a monitor. A monitor can contain non-externally accessible data and procedures that handle

the state of resources. External access is strictly controlled through procedure calls to the monitor; mutual

exclusion is ensured because procedure execution within the monitor is not concurrent.

Monitors have the traditional advantages of abstract data types, but they must also deal with two

issues rising from their use by concurrently executing processes: avoiding interference and providing

synchronization. This section illustrates some sample applications of monitors and how they internally

handle concurrency.

Returning to the Producer-Consumer problem, we implement a monitor for handling shared access to

the buffer. The monitor requires a synchronization mechanism to ensure that the Producer cannot overfill

the buffer and that the Consumer cannot retrieve from an empty buffer. Monitors implement condition

variables, the values of which are queues of processes delayed upon the corresponding conditions. Two

standard operations defined on conditional variable cv are:

1. wait(cv): causes the executing process to delay and to be placed at the end of cvs queue; in

order to allow eventual awakening of the process, the process must relinquish exclusive access to the

monitor when it executes a wait.

2. signal(cv): causes the process at the head of cvs queue to be awakened; if the queue is empty,

there is no effect.


Although these operations mirror those of semaphores, there is a key difference: the signal operation

has no memory.

96.3.3.1 Monitors and Producer-Consumer

The buffer monitor can be defined as follows:

monitor Buffer

# define the buffer

const N := .. # size of the buffer

var buf[N] : int # buffer

front := 1 # buffer pointers

rear := 1

# define the condition variables

var not_full, # signaled when count < N

not_empty : cv # signaled when count > 0

procedure deposit(data : int)

if count = N # check for space

then wait(not_full) # delay if no space

buf[rear] := data

rear := (rear mod N) + 1

N := N + 1

signal(not_empty) # signal non-empty

end

procedure fetch(var data : int)

if count = 0 # check for not empty

then wait(not_empty) # delay if empty

data := buf[front]

front := (front mod N) + 1

N := N - 1

signal(not_full) # signal not full

end Buffer

Using this monitor, the producer and consumer tasks can be redone as follows:

process Producer

var x : int

while true do ->

# produce x

...

deposit(x)

end Producer

process Consumer

var x : int

while true do ->

fetch(x)

# consume x

...

end Consumer

Now it is clear that programming (outside of the monitor) can now be done at a more abstract level, which

will lead to more reliable software.


96.3.3.2 Difficulties with Monitors

There are difficulties with monitors as well. Consider the case where we have two consumers, C1 and C2.

If the buffer is empty when C1 executes fetch, then C1 will delay on not empty. If the producer then

executes deposit (note that deposit and fetch cannot be executed concurrently), it will eventually

signal(not empty), which will awakenC1. But ifC2 executesfetch beforeC1 continues execution

and its call tofetchproceeds, thenC1will access an empty buffer. Hence, thesignal operation must be

considered to be a hint that proceeding with execution is possible, but not that it is correct. The following

two approaches are used to solve this problem:

1. Replace the check on the condition variable with a check inside a loop to ensure that the condition

is true before execution proceeds. For example:

procedure deposit(data : int)

while count = N do -> # check for space

then wait(not_full) # delay if no space

buf[rear] := data

rear := (rear mod N) + 1

N := N + 1

signal(not_empty) # signal non-empty

end

2. Give the highest priority to awakening processes so that intervening access to the monitor is not pos-

sible; this also requires that the signal operation be the last operation executed in any procedure

in which it occurs (to ensure that two processes will not be executing within the monitor).

Monitors form the basis for concurrent programming in a number of systems and provide an efficient,

high-level synchronization mechanism. They have the further advantage, as do other abstract data types

or objects, of allowing for local modification and tuning without affecting the remainder of the system.

96.3.4 Message Passing

Consider a hardware architecture with multiple independent computers. Creating a semaphore to be

efficiently accessed by processes running on separate computers is a difficult problem. We need a new

abstraction for this case: message passing in which a sending process outputs a message to a channel and

a receiving process inputs the message from this same channel. There are a large number of variations of

this basic concept, depending on the semantics of the operations and the channels.

The basic primitives are:

1. Channel declaration

2. send

3. receive

If both sending and receiving processes block upon reaching their corresponding message-passing

operation, we have synchronous communication; if the sending process can send a message and continue

without waiting for receipt, the system is asynchronous. Analogies are telephone communication and the

postal system. The synchronous approach allows for ready synchronization of processes (at the instant

of message passing we know where both are in their execution). This was the approach chosen by Hoare

[1985] for his communicating sequential processes model and its subsequent implementation in the

occam language [Jones and Goldsmith, 1988]. If we desire asynchronicity, we can add intermediate buffer

processes to the synchronous approach. An advantage of synchronous message passing is that it often

simplifies analysis of an algorithm because it is known where the sending and receiving processes are in

their execution at the moment the message is passed.

Further variations arise, depending on whether channels are one process-to-one process or one-to-

many, statically instantiated at load time or dynamically created during execution, bi-directional or


uni-directional, whether the receiving process must be named by the sending process, etc. However,

the basic concept is the same in all cases; ease of use and efficiency of implementation vary.

Further variations include remote procedure call (RPC), which is the core of many distributed systems,

and rendezvous, the approach used in Ada. We further explore these approaches after looking more closely

at simple message passing.

Note that, in the message-passing approach, there are no shared variables so interference is not an

issue. The critical section issue does not arise because there is no way for concurrent processes to interfere

with each other. This is one of the major motivating factors for the use of message-passing software

architectures.

96.3.4.1 Message Passing and Producer-Consumer

If the message-passing system is asynchronous, as demonstrated below, we can rely on the system itself to

buffer values:

channel P2C

process Producer

int x

while true do ->

# produce x

send P2C x

end Producer

process Consumer

int x

while true do ->

receive P2C x

# consume x

end Consumer

Using this approach, the Producer sends a message over channel P2C and continues producing and

sending (up to channel capacity at which point the system blocks), while the Consumer blocks at the

receive statement if no messages are available.

If our system is synchronous, then as shown below, we create a separate buffer process:

channel P2B, B2C

process Buffer

# create the buffer

const N := ..

var buffer[N] : int

front := 1

rear := 1

count := 0 # number of items in the buffer

while true do ->

if

# there is room and the producer is sending

count < n and receive P2B buffer[rear] ->

count++; rear := rear mod n + 1

else

# there are items and the consumer is receiving

count > 0 and send B2C buffer[front] ->

count--; front := front mod n + 1

end Buffer


process Producer

var x : int

while true do ->

# produce x

...

send P2B x

end Producer

process Consumer

var x : int

while true do ->

receive B2C x

# consume x

end Consumer

Above the if statement is nondeterministic; that is, any true clause can be selected. The Boolean

conditions in the clauses are called guards. The clauses are:

r If there is room and the producer wishes to send a characterr If there are items to retrieve and the consumer wishes to receive a character

For implementation efficiency reasons, actual programming languages do not allow guards for both

input and output statements, so we must modify our solution; for example, as shown below, we can modify

the buffer and consumer processes to eliminate the output guard:

channel P2B, B2C, C2B

process Buffer

# define the buffer

var buffer[n] : int

var front := 1

rear := 1

count := 0

while true do - >

if

# there is room and the producer is sending

count < n and receive P2B buffer[rear] ->

count++; rear := rear mod n + 1

else

# there are items and the consumer is requesting

count > 0 and receive C2b buffer[front] ->

send B2C buffer[front]

count--; front := front mod n + 1

end Buffer

process Producer

var x : int

while true do ->

# produce x


...

send P2B x

end Producer

process Consumer

var int : x

while true do ->

send C2B NIL # announce ready for input

receive B2C x

# consume x

...

end Consumer

Above, the Consumer process first announces its intention to receive a value from the Buffer process (send

C2B NIL; the NIL signifying that no message need be actually exchanged) and then actually receives the

value (receive B2C x).

This program is an example of client/server programming. The Consumer process is a client of the Buffer

process; that is, it requests service from the buffer, which provides it. Client/server programming is widely

used to provide services across a network and is based on the message-passing paradigm.

96.3.4.2 Message Passing and Readers-Writers

The message-passing approach to Readers-Writers is straightforward: do not accept a message from a

reader or writer if a writer is writing; do not accept a message from a writer if a reader is reading. The

solution, shown below, is simple if we adopt synchronous message passing and the notion of the database

as a server:

channel Rrequests, Rreceives, Wsends

Reader

send Rrequests

receive Rreceives

Writer

send Wsends

Server

if

# there are no writers, accept reader requests

nw = 0 ->

receive Rrequests

# access the database

...

send Rreceives

# there are no readers or writers, accept writer

requests

nr = 0 and nw = 0 ->

receive Wsends

# modify the database

...


96.3.4.3 Message Passing and Semaphore Simulation

Of course, as we show next, message passing can simulate a semaphore (and vice versa if need be):

channels P, V, initSemaphore

process Semaphore

var s : int

receive initSemaphore i

s := i

while true do ->

if

# semaphore is non-zero accept P operation

s > 0 and receive P NIL->

s--

# always accept V operation

receive V NIL ->

s++

end Semaphore

96.3.4.4 The Remote Procedure Call and Rendezvous Abstractions

The remote procedure call, or RPC, abstraction is widely used to provide client/server services in a dis-

tributed system. Revisiting the client/server examples above, it is clear that the client executes a send-

receive pair while the server executes a receive-send pair. Using the standard procedure model to

capture the servers actions, a call statement to capture the clients actions and parameters to capture the

messages being sent, we have:

Client

...

call Server(args)

...

Server(formal args)

...

return

which mirrors traditional procedure calls. The difference is that the Server procedure can be on a

machine remote to the Client process. Indeed, the Server is implemented as a process that is always

delayed until a Client executes a call. If multiple Clients concurrently execute calls to a Server, the

Servermust be re-entrant or must provide protection for shared information. The RPC approach forms

the basis for distributed systems programs on a wide variety of platforms; its relationship to monitors

should be clear.

The calling process and procedure are not truly concurrent in the sense used throughout this chapter,

in that the calling process delays once the call is made, the procedure does not execute until called, the

procedure delays when the return is executed, and the calling process resumes execution only upon the

return from the procedure. The model is similar to that of synchronous message passing if the execution of

the procedure is viewed as a component of the message-passing process (essentially, the procedure creates

the return message).

We can increase the power of this approach if we modify the procedure into a process and have both

processes executing concurrently. When a call is made, execution of the calling process delays while

execution of the called process continues until it is ready to accept the call (via a special statement). The

called process continues execution, performing actions or calculating values for the return message. The

return message is sent back to the caller, the called process continues executing, and the calling process


resumes execution once the message is received. Because there is an extended time period during which

the two processes are synchronized (from called accept through called return), this model of concurrency

is termed rendezvous. It is the basis for the model of concurrency used in the Ada language. The Ada model

is not symmetric: the calling process must know the name of the process it is calling, but the called process

need not know its caller. Accept statements may have guards, as discussed above for message passing, in

order to control acceptance of calls. The complexity of these guards, and their priority, must be carefully

followed during program implementation.

There are several advantages to this approach, all based on the possibility of the called routine using

multiple accept statements:

1. The called routine can provide different responses to the calling process at different stages of its

execution.

2. The called routine can respond differently to different calling processes.

3. The called routine chooses when it will receive a call.

4. Different accept statements can be used to provide different services in a clear fashion (rather than

through parameter values).

96.3.4.5 Difficulties with Message Passing

Message-passing systems are frequently inefficient during execution unless the algorithm is developed

carefully. This is because messages take time to propagate, and this time is essentially overhead. For

example, a single element buffer version of Conways problem spends significantly more time exchanging

messages than any other operation.

96.4 Distributed Systems

In addition to the difficulties inherent in developing and understanding concurrent solutions, distributed

systems contain the fundamental problem of identifying global state. For example, how do we determine if

a program has terminated? In the sequential case, this is obvious, we execute the exit or end statement.

In the concurrent case, we must ensure that all processes are ready to terminate. In the multiprogramming

case, we can do this by checking the ready queue; if it is empty, then there are no processes waiting to run,

which ensures that no process will ever be added to the ready queues (if no process can run, then there

can be no changes to create another ready process). But if we are in a distributed system, there is no single

ready queue to examine. If a process is in the suspended queue on its processor, it may be made ready by

a message from a process on a different processor.

Similarly, we may still require mutual exclusion on a system resource how do we ensure access across

processors? The solution is to develop a method of determining global state; see, for example, Ben-Ari

[1990].

While a true distributed paradigm has not yet emerged in the programming paradigms domain, it

will most likely evolve in the area of operating systems; for more information on distributed computing,

readers are encouraged to look at Chapter 108 in this Handbook.

96.5 Formal Approaches

We argued above that software verification in concurrent programming must take into account the enor-

mous number of possible interactions between concurrent processes. Obviously, traditional testing only

demonstrates the presence of good execution histories and is not a mechanism to verify any solution

sequential or concurrent. The use of a trace routine to generate execution histories is a standard sequential

technique that becomes infeasible in the concurrent domain. Consider, for example, that n processes each

executing m atomic actions generates (n m)!/(m!)n histories. For three processes, each executing only

two actions, this is a total of 90 possible histories!


The alternative is to use a formal, mathematically rigorous method to develop a solution and/or to verify

a complete solution. Two approaches have been applied to verifying concurrent software:

1. Axiomatic or assertional

2. Process algebraic

The axiomatic approach develops assertions in the predicate logic that characterize the possible states of a

computation. The actions of a program are viewed as predicate transformers that move the computation

from one state to another. The beginning state is specified by the pre-condition of the computation, and

the final state is characterized by the post-condition. This approach has been exploited for some time in

the sequential paradigm; see Schneider [1997] for a comprehensive introduction to the field in the context

of concurrency.

The process algebraic approach was pioneered by Hoare [1985], who also pioneered the coarse-grained

model of concurrency. The concept is that the interactions between a system and its environment (which

are all that is ultimately observable) can be modeled via a mathematical abstraction called a process (this is

the abstraction of the computing process as used above). Processes can be combined via algebraic laws to

form systems. Communication between processes is an example of this interaction. By building up a system

through these mathematical laws and then transforming the abstract mathematics into an implementable

language, one arrives at a correct solution. The occam language was designed to match the algebraic laws

devised by Hoare; transformations exist between these laws and occam programming constructs (but

the transformations are not perfect due to practicalities of implementation) [Hinchey and Jarvis, 1995].

A number of subsequent efforts developed process algebras with varying properties [Milner, 1989]; see

Magee and Kramer [1999] for the use of a process algebra in the development of Java programs.

Although both approaches are in active use, they are not typically applied in the concurrent paradigm

with any greater frequency than they are in the sequential paradigm, and they remain primarily research

tools. The fundamental difficulty is that theoreticians search for the fundamental particles of computing

to develop mathematical laws enabling formal reasoning. Practical languages are (inherently) extremely

complex mixtures of these fundamental particles and laws in order to have sufficient power to solve

real-world problems. Theoretical tools do not yet scale to these large, complex problems.

96.6 Existing Languages with Concurrency Features

A large number of languages have been developed to use the concurrency paradigm; most have remained

in the laboratory environment. If the underlying operating system provides the requisite support, then

semaphores can be implemented in any language via system calls. Higher-level concurrency control struc-

tures require modification of the underlying sequential language; for example, Concurrent Pascal [Brinch

Hansen, 1975] uses monitors while Concurrent C [Gehani and Roome, 1986] is based on the rendezvous.

By beginning with a widely used sequential programming language, a designer has a large community

from which to draw users to the new language. The Ada (concurrency based upon the rendezvous) and SR

(which includes structures for all of the approaches discussed in this chapter and is therefore particularly

useful for exploring concurrent programming) [Andrews and Olsson, 1993; see Hartley, 1995, for extensive

examples] languages are examples of sequential languages with concurrent structures included from the

initial stages of development.

Object-oriented languages have similarly had concurrency features added. For example, Smalltalk has

the Process and Semaphore classes to provide for the dynamic creation of independent processes and their

interaction using the semaphore approach [Goldberg and Robson, 1989].

Languages based on an inherently concurrent model include Linda (more a language-independent

philosophy than a language) [Ahuja et al., 1986] and occam (synchronous message passing) [Jones and

Goldsmith, 1988].

A different approach is to provide a standardized interface (an application program interface or API)

that is language independent. A language implementation then provides a set of library routines to im-

plement this API. Thus, programmers can use a language of their choice while being assured that their


program will function correctly. Currently, the two main paradigms that are the basis for writing paral-

lel programs are message passing and shared memory. A hybrid paradigm is used in systems comprised

of shared-memory multiprocessor nodes that communicate via message passing. For writing message-

passing programs, MPI (Message Passing Interface) [http://www-unix.mcs.anl.gov/mpi/index.html] is

a widely used standard; many variants of MPI exist, including MPICH, CH for Chameleon, which

is a complete, freely-available implementation of the MPI specification, targeted at high performance

[http://www-unix.mcs.anl.gov/mpi/mpich/].

MPIs interface includes features of a number of message-passing systems and attempts to provide

portability and ease-of-use. The MPI programming model is an MPMD (multiple program multiple data)

model, in which every MPI process can execute a different program. A computation is envisioned as one

or more processes that communicate by calling library routines to send and receive messages to other

processes. In general, a fixed set of processes, one for each processor, is created at program initialization

(versions of MPI that will support dynamic creation and termination of processes are anticipated). Local

and global communication (e.g., broadcast and summation) is provided by point-to-point and collective

communication operations, respectively. The former is used to send messages from one named process

to another, while the latter is used to provide message passing among a group of processes. Most parallel

algorithms are readily implemented using MPI. If an algorithm creates just one task per processor, it

can be implemented directly with point-to-point or collective communication routines that meet its

communication requirements. In contrast, if tasks are created dynamically or if several tasks are executed

concurrently on a processor, the algorithm must be refined to permit an MPI implementation.

The OpenMP API is becoming a standard that supports multi-platform shared-memory parallel pro-

gramming in C/C++ and Fortran on all architectures, including Unix and Windows NT platforms.

OpenMP is a portable, scalable model that gives shared-memory parallel programmers a simple and

flexible interface for developing parallel applications for platforms ranging from the desktop to the super-

computer [http://www.openmp.org/]. This API is jointly defined by a group of major computer hardware

and software vendors. OpenMP can be used to explicitly direct multi-threaded, shared memory paral-

lelism. It is comprised of three primary API components: compiler directives, runtime library routines,

and environment variables. Using the fork/join model of parallel execution, an OpenMP program begins

as a single master thread. The master thread creates or forks a set of parallel threads, which concurrently

execute a parallel region construct. On completion, the threads parallel threads join (i.e., synchronize and

terminate), leaving only the master thread. The API supports nested parallelism and dynamic threads,

that is, dynamic alternation of the number of active threads. Variable scoping, for example, declaration

of private and shared data, parallelism, and synchronization are specified through the use of compiler

directives. By itself, OpenMP is not meant for distributed memory parallel systems. For example, for high-

performance cluster architectures such as the IBM SP, where intranode communication is accomplished

via shared memory and internode communication is performed via message passing, OpenMP is used

within a node while MPI is used between nodes.

There are many parallel programming tools available that help the user parallelize her/his application

and then easily port it to a parallel machine. These machines can be shared-memory machines or a network

of workstations.

96.7 Research Issues

While it is clear that concurrency is a necessary technique for the solution of many problems, it also is clear

that progress must be made in order to ensure its effective application. That this is still a research issue

is clear whenever an operating system crashes due to system processes that interfere with each other or

we discover someone in our airplane seat due to concurrent access to the airlines database. This required

progress falls into three categories:

1. Theoretical advances must be made to develop formal techniques that scale to real-world appli-

cations. For example, process interference checkers exist, but operate essentially by checking all


possible interactions between processes to check for deadlock, etc. This approach rapidly develops

combinatorial explosion.

2. Design tools that provide development support for concurrent solutions. For example, debuggers

that capture the concurrent computation without overwhelming the user with information.

3. Languages with powerful structures to support the correct application of concurrency. For example,

the development of concurrent object-oriented languages appears straight-forward: simply allow

each object to run concurrently because each object is logically autonomous. However, there are a

number of issues that need resolution, including:

a. Not all objects need to run concurrently because the majority of computation will still be

sequential (thereby incurring no scheduler overhead).

b. If we consider multiple concurrent objects attempting to communicate with the same object:

i. Acceptance of a message must delay all other messages in order to correctly preserve the

internal state of the object.

ii. Ordering of message acceptance must be synchronized to ensure computations are correct.

iii. Acceptance of messages must occur only at appropriate points in the objects execution.

c. Inheritance through the class hierarchy creates problems because it will mix this synchronization

with object behavior.

96.8 Summary

The single outstanding problem with concurrency is the development of correct solutions (as it is in all

software systems): the state of development of both formal methods and software engineering tools for

concurrent solutions lags behind the sequential world in this regard and well behind hardware advances.

Defining Terms

Asynchronous message passing: The message-sending process allows messages to be buffered and the

sending process may continue after the send is initiated; the receiving process blocks if the message

queue is empty.

Channel: The data structure, which may be realized in hardware, over which processes send messages.

Client/server: The software architecture in which clients are able to request services of processes executing

on remote machines.

Condition variables: A variable used within a monitor to delay an executing process.

Critical regions: A section of code that must appear to be executed indivisibly.

Deadlock: The state in which processes are waiting for events that can never occur; that is, the processes

cannot progress.

Distributed processing: The use of multiple processors that are remote from each other.

Fairness: Processes will eventually be able to progress, that is, enter their critical regions.

Message passing: A technique for providing mutual exclusion, communication, and synchronization

among concurrent processes via sending messages between processes.

Monitor: An encapsulation of a resource and the operations on that resource that serve to ensure mutual

exclusion.

Multiprocessing: The use of multiple processors.

Multiprogramming: Simulating concurrency by interleaving instruction execution from multiple pro-

grams; time sharing or time slicing.

Mutual exclusion: The property ensuring that a critical region is executed indivisibly by one process or

thread at a time.

Race: Nondeterministic behavior caused by incorrectly synchronized concurrent processes.

Remote procedure call: The message-passing architecture in which processes request services of processes

executing procedures on remote machines.

Rendezvous: The message-passing construct used in the Ada language.


Semaphore: A nonnegative integer-valued variable on which two operations are defined:P andV to signal

intent to enter and exit, respectively, a critical region.

Synchronous message passing: The message-sending process requires both sender and receiver to syn-

chronize at the moment of message transmission.

References

Journals

Ahuja, S., Carriero, N., and Gelernter, D. 1986. Linda and Friends. Computer, 19(8):2634.

Andrews, G. R. and Schneider, F. B. 1983. Concepts and notations for concurrent programming. Comp.

Surv., 15(1):343; reprinted in Gehani, N. and McGettrick, A. D. 1988. Concurrent Programming.

Addison-Wesley, New York.

Brinch Hansen, P. 1975. The Programming Language Concurrent Pascal. IEEE Trans. on Software Engineer-

ing, 1(2):199207; reprinted in Gehani, N. and McGettrick, A. D. 1988. Concurrent Programming.

Addison-Wesley, New York.

Dijkstra, E. W. 1968. The structure of the T. H. E. multiprogramming system. CACM, 11:341346.

Gehani, N. H. and Roome, W. D. 1986. Concurrent C. Software: Practice and Experience, 16(9):821844;

reprinted in Gehani, N. and McGettrick, A. D. 1988. Concurrent Programming. Addison-Wesley,

New York.

Peterson, G. L. 1983. A new solution to Lamports concurrent programming problem using small shared

variables. ACM Trans. Prog. Lang. and Syst., 5(1):5655.

Books

Andrews, G. R. 2000. Foundations of Multithreaded, Parallel, and Distributed Programming. Benjamin-

Cummings, New York.

Andrews, G. R. and Olsson, R. A. 1993. The SR Programming Language. Benjamin-Cummings, New York.

Ben-Ari, M. 1982. Principles of Concurrent Programming. Prentice Hall, London.

Ben-Ari, M. 1990. Principles of Concurrent and Distributed Programming. Prentice Hall, London.

Bernstein, A. J. and Lewis, P. M. 1993. Concurrency in Programming and Database Systems. Jones and

Bartlett, Boston.

Filman, R. E. and Friedman, D. P. 1984. Coordinated Computing. McGraw-Hill, New York.

Gehani, N. and McGettrick, A. D. 1988. Concurrent Programming. Addison-Wesley, New York.

Goldberg, A. and Robson, D. 1989. Smalltalk80 The Language. Addison-Wesley, New York.

Hartley, S. J. 1995. Operating Systems Programming. Oxford, New York.

Hinchey, M. G. and Jarvis, S. A. 1995. The CSP Reference Book. McGraw-Hill, New York.

Hoare, C. A. R. 1985. Communicating Sequential Processes. Prentice Hall, London.

Jones, G. and Goldsmith, M. 1988. Programming occam 2. Prentice Hall, New York.

Lester, B. P. 1993. The Art of Parallel Programming. Prentice Hall, New Jersey.

Magee, J. and Kramer, J. 1999. Concurrency: State Models and Java Programs. Wiley, West Sussex.

Milner, R. 1989. Communication and Concurrency. Addison-Wesley, New York.

Schneider, F. 1997. On Current Programming. Springer-Verlag, New York.

Wilkinson, B. and Allen, M. 1999. Parallel Programming: Techniques and Applications Using Networked

Workstations and Parallel Computers. Prentice Hall, New Jersey.

Further Information

Further information can be gleaned from a number of sources; particularly recommended are Andrews

[2000] for a comprehensive view of the field with an axiomatic flair, including a fascinating bibliography

with historical notes and extensive problem sets; Schneider [1997] for a graduate level treatise on axiomatic

semantics in the context of concurrency; Ben-Ari [1982] for a nice introduction including problem sets;

Ben-Ari [1990], which adds Ada code examples, correctness arguments, and distributed computing; the


process algebra approach is developed in Hoare [1985] and Milner [1989] and demonstrated in Magee and

Kramer [1999]; Filman and Friedman [1984] emphasize the various models of concurrent computation;

Lester [1993] provides a comprehensive introduction including efficiency considerations, but without

correctness arguments; Bernstein and Lewis [1993] use the axiomatic approach to develop concurrent

solutions to a variety of problems with an emphasis on databases; Gehani and McGettrick [1988] reprint a

number of the classic papers in the field. Wilkinson and Allen [1999] demonstrate parallel programming

for a wide range of problems.

The journal Concurrency: Practice and Experience focuses on practical experience with concurrent ma-

chines and concurrent solutions to problems; concurrency is also frequently dealt with in a large number

of society journals.

In addition, there are a large number of resources available via the Web that may be discovered through

the use of the various search techniques.


Documents

Tucker Computer Science Handbook