A Fast, Scalable Mutual Exclusion Algorithm* - Department of

A Fast, Scalable Mutual Exclusion Algorithm�

Jae-Heon Yang

Department of Mathematics and Computer Science

Mills College

Oakland, California 94613

James H. Anderson

Department of Computer Science

The University of North Carolina at Chapel Hill

Chapel Hill, North Carolina 27599-3175

October 1993

Revised November 1994

Abstract

This paper is concerned with synchronization under read/write atomicity in shared memory multi-

processors. We present a new algorithm for N -process mutual exclusion that requires only read and write

operations and that has O(logN) time complexity, where \time" is measured by counting remote memory

references. The time complexity of this algorithm is better than that of all prior solutions to the mutual

exclusion problem that are based upon atomic read and write instructions; in fact, the time complexity

of most prior solutions is unbounded. Performance studies are presented that show that our mutual

exclusion algorithm exhibits scalable performance under heavy contention. In fact, its performance rivals

that of the fastest queue-based spin locks based on strong primitives such as compare-and-swap and

fetch-and-add. We also present a modi�ed version of our algorithm that generates only O(1) memory

references in the absence of contention.

Keywords: Fast mutual exclusion, local spinning, mutual exclusion, read/write atomicity, scalability,

synchronization primitives, time complexity.

CR Categories: D.4.1, D.4.2, F.3.1

�Preliminary version was presented at the Twelfth Annual ACM Symposiumon Principles of DistributedComputing, Ithaca,

New York, August 1993. Work supported, in part, by NSF Contracts CCR-9109497 and CCR-9216421 and by the Center for

Excellence in Space Data and Information Sciences (CESDIS).

1 Introduction

The mutual exclusion problem is a paradigm for resolving con icting accesses to shared resources and has

been studied for many years, dating back to the seminal paper of Dijkstra [6]. In this problem, each of a

set of processes repeatedly executes a program fragment known as its \critical section". Before and after

executing its critical section, a process executes two other program fragments, its \entry section" and \exit

section", respectively. The entry and exit sections must be designed so that at most one process executes

its critical section at any time, and so that each process in its entry section eventually executes its critical

section. The former is known as the mutual exclusion property, and the latter as the starvation-freedom

property. In some variants of the problem, starvation-freedom is replaced by the weaker requirement of

livelock-freedom: if some process is in its entry section, then some process eventually executes its critical

section.

Most early solutions to the mutual exclusion problem required only minimalhardware support, speci�cally

atomic read and write instructions. Although of theoretical importance, most such algorithms were judged

to be impractical from a performance standpoint, leading to the development of solutions requiring stronger

hardware support such as read-modify-write operations. The poor performance of early read/write algo-

rithms stems partially from two factors. First, such algorithms are not scalable, i.e., performance degrades

dramatically as the number of contending processes increases. Second, even in the absence of contention,

such algorithms require a process contending for its critical section to execute many operations.

The second of these two problems has been subsequently addressed, speci�cally by Lamport in [9],

where a read/write algorithm is given that requires only a constant number of operations per critical section

acquisition in the absence of contention. Following the title of Lamport's paper, such algorithms have come to

be known simply as \fast mutual exclusion algorithms". This designation is somewhat of a misnomer, as such

algorithms are not necessarily fast in the presence of contention. In fact, the problem of designing a scalable

mutual exclusion algorithm that requires only read/write atomicity, and that exhibits good performance

under contention, has remained open. In this paper, we present such an algorithm.

Many mutual exclusion algorithms that exhibit poor scalability do so, in part, because they require

processes to busy-wait on shared variables that are remotely-accessible, i.e., that require a traversal of the

global interconnect between processors and shared memory when accessed. Recently, several queue-based

mutual exclusion algorithms based on primitives that are stronger than reads and writes have been proposed

in which this type of busy-waiting is avoided [3, 7, 11]. These algorithms are based on the idea of \local

spinning". In particular, processes busy-wait only on locally accessible shared variables that do not require

a traversal of the global interconnect when accessed. Performance studies presented in [3, 7, 11] have

established the importance of local spinning in ensuring good performance under heavy contention.

Although the notion of a locally accessible shared variable may at �rst seem counterintuitive, there

are two architectural paradigms that support it. In particular, on distributed shared memory machines, a

shared variable can be made locally accessible by storing it in a local portion of shared memory, and on

cache-coherent machines, programs can be structured so that when a process busy-waits on a shared variable,

that variable migrates to a local cache line. Note that, because distributed shared memory machines require

static allocation of shared variables to processors, an algorithm that locally spins on such a machine will

also locally spin on cache-coherent machines. The reverse is not necessarily true. For this reason, we adopt

the stricter distributed shared memory model when distinguishing between local and remote shared memory

accesses in the time complexity computations given in this paper. Alternative de�nitions of local and remote

memory accesses based on cache-coherence are brie y considered in Section 6.

In a recent paper [2], Anderson presented a mutual exclusion algorithm that uses only local spins and that

1

requires only atomic read and write operations. In his algorithm, each of N processes executes O(N ) remote

operations to enter its critical section whether there is contention or not. All other previously published

mutual exclusion algorithms that are based on atomic reads and writes employ global busy-waiting and

hence induce an unbounded number of remote operations under heavy contention. Most such algorithms

also require O(N ) remote operations in the absence of contention. Some exceptions to the latter include

the algorithm given by Kessels in [8] and the previously mentioned one given by Lamport in [9]. Kessels'

algorithm generates O(logN ) remote operations in the absence of contention, while Lamport's generates

O(1). A variant of Lamport's algorithm was recently presented by Styer in [14]. Although Styer claims

that his algorithm is more scalable than Lamport's, in terms of time complexity, they are actually very

similar: both generate unbounded remote operations under heavy contention and O(1) operations in the

absence of contention. Styer's claims of scalability are predicated upon complexity calculations that ignore

operations performed within busy-waiting constructs. Because the processes in his algorithm busy-wait on

remote variables, such complexity calculations do not give a true indication of scalability. Another recent

algorithm of interest is one given by Michael and Scott in [12]. Although this algorithm generates O(N )

remote memory references in the presence of contention and O(1) in the absence of contention, it requires

both full- and half-word reads and writes to memory, which is a level of hardware support more powerful

than ordinary read/write atomicity.

In this paper, we present a new mutual exclusion algorithm that requires only atomic reads and writes

and in which all spins are local. Our algorithm induces O(logN ) remote operations under any amount

of contention, and thus is an improvement over the algorithm given by Anderson in [2]. We also present a

modi�ed version of this algorithm, called the \fast-path" algorithm, that requires onlyO(1) remote operations

in the absence of contention. Unfortunately, in the fast-path algorithm, worst-case complexity rises to O(N ).

However, we argue that this O(N ) behavior is rare, occurring only when transiting from a period of high

contention to a period of low contention. Under high contention, the fast-path algorithm induces only

O(logN ) remote operations. It is worth noting that our algorithm and its variation are starvation-free,

whereas some of the aforementioned algorithms are not.

The results of this paper suggest some interesting questions regarding the inherent complexity of syn-

chronization. The queue-based algorithms in [3, 7, 11] require O(1) remote operations, whether there is

contention or not (actually the algorithms in [3, 7] have O(1) complexity only if coherent caches are pro-

vided), whereas our algorithm requires O(logN ) remote operations. It is natural to ask whether this gap is

due to a fundamental weakness of atomic reads and writes, or whether there is a mutual exclusion algorithm

based on such operations that has O(1) complexity (which seems doubtful). A signi�cant step towards an-

swering this question was recently presented by us in [16]. This new result involves programs with limited

\write-contention". The write-contention of a concurrent program is the number of processes that may be

simultaneously enabled to write the same shared variable. It is shown in [16] that for any N -process mutual

exclusion algorithm with write-contention K, there is an execution involving only one process in which that

process executes at least (logK N ) remote memory references in its entry section. This lower bound holds

assuming any of a variety of synchronization primitives, including read, write, test-and-set, load-and-store,

compare-and-swap, and fetch-and-add. Our mutual exclusion algorithm is contained within the class of

algorithms based on such primitives that have constant write-contention. Hence, our algorithm is asymptot-

ically optimal for this class. The execution that establishes the (logK N ) lower bound in [16] involves only

one process, implying that fast mutual exclusion requires high write-contention (i.e., K must approach N ).

All fast mutual exclusion algorithms for N processes presented to date (including the fast-path algorithm

presented in this paper) have write-contention N .

Despite the signi�cance of the lower bound proved in [16], we still do not know whether the apparent

2

gap between mutual exclusion using reads and writes and mutual exclusion using stronger primitives is

fundamental. If the lower bound for mutual exclusion under read/write atomicity turns out to be (logN ),

then an additional question arises: namely, is it possible to develop a mutual exclusion algorithm under

read/write atomicity that requires O(1) remote operations in the absence of contention and O(logN ) remote

operations in the presence of contention? As mentioned above, our fast-path algorithm almost achieves such

bounds.

The rest of the paper is organized as follows. In Section 2, we present our model of concurrent programs.

The above-mentioned mutual exclusion algorithm is then presented in Section 3. In Section 4, we consider the

fast-path algorithm discussed above. In Section 5, we present results from performance studies conducted on

the BBN TC2000 and Sequent Symmetry multiprocessors. These studies indicate that our mutual exclusion

algorithm exhibits scalable performance under heavy contention. In fact, the performance of our algorithm

rivals (and sometimes beats) that of the queue-based algorithms mentioned above. We end the paper with

concluding remarks in Section 6. Correctness proofs for our algorithms are given in an appendix.

2 De�nitions

In this section, we present our model of concurrent programs and de�ne the relations used in reasoning about

such programs. A concurrent program consists of a set of processes and a set of variables. A process is a

sequential program consisting of labeled statements. Each variable of a concurrent program is either private

or shared. A private variable is de�ned only within the scope of a single process, whereas a shared variable

is de�ned globally and may be accessed by more than one process. Each process of a concurrent program

has a special private variable called its program counter : the statement with label k in process p may be

executed only when the value of the program counter of p equals k. For an example of the syntax we employ

for programs, see Figure 1.

A program's semantics is de�ned by its set of \fair histories". The de�nition of a fair history, which is

given below, formalizes the requirement that each statement of a program is subject to weak fairness. Before

giving the de�nition of a fair history, we introduce a number of other concepts; all of these de�nitions apply

to a given concurrent program.

A state is an assignment of values to the variables of the program. One or more states are designated

as initial states. If state u can be reached from state t via the execution of statement s, then we say that

s is enabled at state t and we write ts!u. If statement s is not enabled at state t, then we say that s is

disabled at t. A history is a sequence t0s0!t1

s1!� � �, where t0 is an initial state. A history may be either �nite

or in�nite; in the former case, it is required that no statement be enabled at the last state of the history. A

history is fair if it is �nite or if it is in�nite and each statement is either disabled at in�nitely many states

of the history or is in�nitely often executed in the history. Note that this fairness requirement implies that

each continuously enabled statement is eventually executed. Unless otherwise noted, we henceforth assume

that all histories are fair.

With regard to complexity, we assume that each shared variable is local to at most one process and is

remote to all other processes. This assumption is re ective of a distributed shared memory model. We refer

to a statement execution as an operation. An operation is remote if it accesses remote variables, and is local

otherwise.

Following [5], we de�ne safety properties using unless assertions and progress properties using leads-to

assertions. Consider two predicates P and Q over the variables of a program. The assertion P unless Q

holds i� for any pair of consecutive states in any history of the program, if P ^ :Q holds in the �rst state,

then P _ Q holds in the second. If predicate P is initially true and if P unless false holds, then predicate

3

P is said to be an invariant . We say that predicate P leads-to predicate Q, denoted P 7! Q, i� for each

history t0s0!t1

s1!� � � of the program, if P is true at some state ti, then Q is true at some state tj where j � i.

3 Mutual Exclusion Algorithm

In this section, we present our mutual exclusion algorithm. We begin by stating more precisely the

conditions that must be satis�ed by such an algorithm. In the mutual exclusion problem, there are N

processes, each of which has the following structure.

while true do

Noncritical Section;

Entry Section;

Critical Section;

Exit Section

od

It is assumed that each process begins execution in its noncritical section. It is further assumed that

each critical section execution terminates. By contrast, a process is allowed to halt in its noncritical section.

No variable appearing in any entry or exit section may be referred to in any noncritical or critical section

(except, of course, program counters). A program that solves this problem is required to satisfy the mutual

exclusion and starvation-freedom properties, given earlier in Section 1. We also require that each process in

its exit section eventually enters its noncritical section; this requirement is trivially satis�ed by our solution

(and most others), so we will not consider it further.

As in [8], we �rst solve the mutual exclusion problem for two processes, and then apply our two-process

solution in a binary arbitration tree to get an N -process solution. Our algorithm is depicted in Figure 1. In

order to obtain an N -process algorithm, we associate each link in a binary tree with an entry section and

an exit section of the two-process solution. In other words, the entry and exit sections associated with the

two links connecting a given node to its subtree constitute a two-process mutual exclusion algorithm. This

is depicted in Figure 2. Initially, all processes start at the leaves of the tree. To enter its critical section, a

process is required to traverse a path from its leaf up to the root, executing the entry section of each link

on this path. Upon exiting its critical section, a process traverses this path in reverse, this time executing

the exit section of each link.

The �rst mutual exclusion algorithm that used a binary arbitration tree is that given by Peterson and

Fischer in [13]. However, in this algorithm, a process that reaches the top of a left subtree checks all leaves of

the corresponding right subtree, resulting in O(N ) remote memory references outside of busy-waiting loops.

The structure of our arbitration tree is inherited from Kessels's solution in [8], which induces only O(logN )

memory references outside of busy-waiting loops.

In Figure 1, our two-process mutual exclusion algorithm (from statement 4 to statement 19) is placed

\on top" of two (N=2)-process versions of our algorithm. In this �gure, ENTRY LEFT and EXIT LEFT,

denoting the entry and exit sections of the (N=2)-process version of our algorithm, enforce mutual exclusion

in the left subtree. Similarly, ENTRY RIGHT and EXIT RIGHT are used in the right subtree. We depict

the algorithm in this way in order to hide the details relating to how variables are named in the arbitration

tree.

The always section [5] in Figure 1 is used to de�ne the expression side(i). Informally, side(i) = 0 i�

process i is a process from the left subtree, and side(i) = 1 i� process i is a process from the right subtree.

4

shared var C : array[0; 1] of � 1::N � 1;

P : array[0::N � 1] of 0::2;

T : 0::N � 1

initially C[0] = �1 ^ C[1] = �1 ^ (8i :: P [i] = 0)

always (8i : i < bN=2c :: side(i) = 0) ^ (8i : i � bN=2c :: side(i) = 1)

process i

private var rival : �1::N � 1;

while true do

0: Noncritical Section;

1: if i < bN=2c then

2: ENTRY LEFT

else

3: ENTRY RIGHT

�;

4: C[side(i)] := i; =� side(i) = 0 _ side(i) = 1 �=

5: T := i;

6: P [i] := 0;

7: rival := C[1� side(i)];

8: if rival 6= �1 then

9: if T = i then

10: if P [rival] = 0 then

11: P [rival] := 1 �;

12: while P [i] = 0 do =� null �= od;

13: if T = i then

14: while P [i] � 1 do =� null �= od �

�

�;

15: Critical Section;

16: C[side(i)] := �1;

17: rival := T ;

18: if rival 6= i then

19: P [rival] := 2 �;

20: if i < bN=2c then

21: EXIT LEFT

else

22: EXIT RIGHT

�

od

Figure 1: N -process mutual exclusion algorithm.

5

N−process mutual exclusion

mutual exclusion mutual exclusion

ENTRY_RIGHT

Statements 4..14with side(i)=0

Statements 4..14with side(i)=1

ENTRY_LEFT

N/2N/2 −process −process

Figure 2: Entry section of our N -process mutual exclusion algorithm.

The expression 1 � side(i) is used to identify the C-variable of process i's competitor. We assume that

process identi�ers range over f0::N � 1g.

The algorithm employs shared variables, C[0], C[1], and T , in addition to a spin location P [i] for each

process i. Variable C[0] is used by a process from the left subtree to inform processes in the right subtree of

its intent to enter its critical section. Observe that C[0] = i 6= �1 holds while a process from the left subtree

i executes its statements 5 through 16, and C[0] = �1 holds when no process from the left subtree executes

these statements. Variable C[1] is used similarly. Variable T is used as a tie-breaker in the event that a

process from the left subtree and a process from the right subtree attempt to enter their critical sections at

the same time. The algorithm ensures that the two processes enter their critical sections according to the

order in which they update T . Variable P [i] ranges over f0; 1; 2g and is used by process i whenever it needs

to busy-wait. Note that P [i] is waited on only by process i, and thus can be stored in a memory location

that is locally accessible to process i (in which case all spins are local).

Loosely speaking, the algorithm works as follows. Let process i be a process from the left subtree. When

process i wants to enter its critical section, it informs processes in the right subtree of its intention by

establishing C[0] = i. Then, process i assigns its identi�er i to the tie-breaker variable T , and initializes its

spin location P [i]. If no process in the right subtree has shown interest in entering its critical section, in

other words, if C[1] = �1 holds when i executes statement 7, then process i proceeds directly to its critical

section. Otherwise, i reads the tie-breaker variable T . If T = j 6= i, (i.e., if process j from the right subtree

is competing against process i,) then i can enter its critical section, as the algorithm prohibits process j

from entering its critical section when C[0] = i ^ T = j holds (recall that ties are broken in favor of the

�rst process to update T ). If T = i holds, then either process j executed statement 5 before process i, or

process j has executed statement 4 but not statement 5. In the �rst case, i should wait until j exits its

critical section, whereas, in the second case, i should be able to proceed to its critical section. This ambiguity

is resolved by having process i execute statements 10 through 14. Statements 10 and 11 are executed by

process i to release process j in the event that it is waiting for i to update the tie-breaker variable (i.e., j

is busy-waiting at statement 12). Statements 12 through 14 are executed by i to determine which process

updated the tie-breaker variable �rst. Note that P [i] � 1 implies that j has already updated the tie-breaker,

6

and P [i] = 2 implies that j has �nished its critical section. To handle these two cases, process i �rst waits

until P [i] � 1 (i.e., until j has updated the tie-breaker), re-examines T to see which process updated T last,

and �nally, if necessary, waits until P [i] = 2 (i.e., until process j �nishes its critical section).

After executing its critical section, process i informs process j that it is �nished by establishing C[i] = �1.

If T = j, in which case process j is waiting to enter its critical section, then process i updates P [j] in order

to terminate j's busy-waiting loop.

With regard to complexity, note that if variable P [i] is local to process i in the two process algorithm,

then process i executes a constant number of remote operations in its two-process entry and exit sections. It

follows that, in an N -process algorithm that employs a balanced binary tree, each process executes O(logN )

remote operations in its (N -process) entry and exit sections.

4 Fast Mutual Exclusion in the Absence of Contention

As discussed in Section 1, most early mutual exclusion algorithms based on read/write atomicity are

neither fast in the absence of contention, nor able to cope with high contention. Because Lamport's fast

mutual exclusion algorithm induces O(1) remote operations in the absence of contention, and our mutual

exclusion algorithm requires O(logN ) remote operations given any level of contention, it seems reasonable

to expect a solution to exist that induces O(1) remote operations when contention is absent, and O(logN )

remote operations when contention is high.

The fast-path algorithm given in Figure 3 almost achieves that goal. The basic idea, illustrated in Figure

4, is to combine Lamport's fast mutual exclusion algorithm and our algorithm, speci�cally by placing an

extra two-process version of our algorithm \on top" of the arbitration tree. The \left" entry section of this

extra two-process program is executed by a process if that process detects no contention. The \right" entry

section of this extra program is executed by the winning process from the arbitration tree. A process will

compete within the arbitration tree (as before) if it detects any contention. As seen in Figure 3, the scheme

used to detect contention is similar to that used in Lamport's algorithm. In this �gure, we use ENTRY k

and EXITk to denote the entry and exit sections of the k-process version of our algorithm. A correctness

proof for the algorithm of Figure 3 is given in the Appendix.

It should be clear that, in the absence of contention, a process enters its critical section after executing

O(1) remote operations. Also, in the presence of contention, a process enters its critical section after executing

O(logN ) remote operations. However, when a period of contention ends, N remote operations might be

required in order to re-open the fast entry section | see the while loop at line 22 in Figure 3. Nonetheless,

performance studies we have done show that, under high contention, these statements are rarely executed.

(Under low contention, they are obviously never executed.) For example, out of the 100,000 critical section

executions in one experiment, these N statements were performed after only 55 critical section executions

in the four-process case, and after only one in the eight- and sixteen-process cases.

In the absence of contention, our algorithm generates about twice as many remote memory operations

as Lamport's. However, under high contention, our algorithm is clearly superior, as Lamport's induces an

unbounded number of remote operations. Also, our fast-path algorithm ensures starvation-freedom, whereas

Lamport's algorithm does not.

5 Performance Results

To compare the scalability of our mutual exclusion algorithm with that of other algorithms, we conducted a

number of experiments on the BBN TC2000 and Sequent Symmetry multiprocessors. Results from some of

7

shared var B : array[0::N � 1] of boolean;

X;Y : �1::N � 1;

Z : boolean

initially Y = �1 ^ Z = false ^ (8i :: B[i] = false)

process i

private var flag : boolean;

n : 1::N

while true do

TOP: 0: Noncritical Section;

1: X := i;

2: if Y 6= �1 then goto SLOW �;

3: Y := i;

4: if X 6= i then goto SLOW �;

5: B[i] := true;

6: if Z then goto SLOW �;

7: if Y 6= i then goto SLOW �;

8: ENTRY2; =� Two-Process Entry Section �=


10: EXIT2; =� Two-Process Exit Section �=

11: Y := �1;

12: B[i] := false;

13: goto TOP;

SLOW: 14: ENTRYN ; =� Arbitration Tree �=

15: ENTRY2; =� Two-Process Entry Section �=


17: B[i] := false;

18: if X = i then

19: Z := true;

20: flag := true;

21: n := 0;

22: while (n < N ) do

23: if B[n] then flag := false �;

24: n := n+ 1

od;

25: if flag then Y := �1 �;

26: Z := false

�;

27: EXIT2; =� Two-Process Exit Section �=

28: EXITN =� Arbitration Tree �=

od

Figure 3: Fast, scalable mutual exclusion algorithm.

8

Fast (Lamport) Scalable (Our algorithm)

Figure 4: Providing a fast path.

these experiments are presented in this section.

BBN TC2000

The BBN TC2000 is a distributed shared memorymultiprocessor, each node of which contains a processor and

a memory unit. The nodes are connected via a multi-stage interconnection network, known as the Butter y

switch. Each access to a remote memory location (i.e., one that requires a traversal of the interconnection

network) takes about 2 microseconds, whereas each local reference takes about 0.6 microseconds. Each

node's processor, a Motorola 88100, provides an atomic fetch-and-store instruction called xmem. Other

strong primitives such as compare-and-swap and fetch-and-add are provided using the TC2000 hardware

locking protocol [4]. The TC2000 has cache memory, but does not provide a cache coherence mechanism.

We tested seven mutual exclusion algorithms on the TC2000: a simple test-and-set algorithm; the queue-

based algorithm using compare-and-swap given by Mellor-Crummey and Scott in [11]; the queue-based

algorithm using fetch-and-add given by T. Anderson in [3]; the fast mutual exclusion algorithm given by

Lamport in [9]; the tree-based algorithm given by Styer in [14]; the tree-based algorithm given by Peterson

and Fischer in [13]; and the mutual exclusion algorithm described in Section 3. Performance results obtained

by running these seven algorithms on the TC2000 are summarized in Figure 5. Each point (x; y) in each

graph represents the average time y for one critical section execution with x competing processors. The

timing results summarized in the graph were obtained by averaging over 105 critical section executions. The

critical section consists of a read and an increment of shared counter. Results obtained using larger critical

sections, which for brevity are not presented here, show similar performance to that depicted in Figure 5.

The timing results presented include the execution time of critical sections.

The performance of the test-and-set algorithm is given by the graph labeled T&S, Mellor-Crummey and

Scott's algorithm by the graph labeled MCS, T. Anderson's algorithm by the graph labeled AND, Lamport's

9

LAMP

T&S

PF

STYER

AND

YA

MCS

microsec.

processors

20.00

40.00

60.00

80.00

100.00

120.00

140.00

160.00

180.00

200.00

0.00 20.00 40.00 60.00

Figure 5: TC2000, small critical section.

10

algorithmby the graph labeled LAMP, Styer's algorithmby the graph labeled STYER, Peterson and Fischer's

algorithm by the graph labeled PF, and our algorithm by the graph labeled YA. On the TC2000, the MCS

algorithm was the best overall performer of the alternatives considered here. The graph depicted for the

MCS algorithm is mostly at, except at the point for two processors. This anomaly at two processors

coincides with results reported by Mellor-Crummey and Scott on the Sequent Symmetry, and was attributed

by them to the lack of a compare-and-swap instruction on the Symmetry [11]. As our implementation of

their algorithm did employ compare-and-swap, we have not found a satisfying explanation for this behavior

on the TC2000.

T. Anderson's algorithm requires only local spinning when implemented on a machine with coherent

caches. On the Symmetry, where each process can spin on its own coherent cache, Anderson's algorithm

outperforms the MCS algorithm. However, on the TC2000, which does not support coherent caching,

Anderson's algorithm requires remote spinning, slowing its performance.

The simple T&S algorithm exhibited poor scalability. The average execution time for the 64 processor

case, which is not depicted in Figure 5, is about 330 microseconds. Where there is a possibility of contention

among a large number of processors, it should be avoided, or used with good backo� scheme [1].

Three algorithms based on atomic reads and writes | Lamport's, Peterson and Fischer's, and Styer's |

also showed poor scalability. In particular, the performance of Lamport's algorithm degrades dramatically

as the number of contenders increases. The average execution time for the 64 processor case, which is not

depicted in Figure 5, is about 4000 microseconds. The performance of Styer's algorithm, which is better

than that of Lamport's, is due to the tree structure employed. Styer's algorithm generates O(logN ) remote

operations outside of busy-waiting loops. Even though Peterson and Fischer's algorithm is also tree-based,

it induces O(N ) remote operations outside of busy-waiting loops, which results in poorer scalability.

Our mutual exclusion algorithm shows performance that is comparable to that of T. Anderson's and

Mellor-Crummey and Scott's algorithms. Its good scalability emphasizes the importance of local spinning.

The di�erence seen between our mutual exclusion algorithm and the MCS algorithm is explained by the

amount of global tra�c generated by each algorithm. The MCS algorithm generates O(1) remote operations

per critical section execution, whereas ours generates O(logN ). The global tra�c of the other �ve algorithms

is unbounded, as each employs global spinning. The performance of T. Anderson's algorithm is far better than

that of the simple test-and-set algorithm. Because the processes in Anderson's algorithm spin globally on

the TC2000, this might be interpreted as a counterexample to our belief that minimizing remote operations

is important for good scalability. However, Mellor-Crummey and Scott reported in [11] that Anderson's

algorithm produced far fewer remote operations than the test-and-set algorithm.

Sequent Symmetry

Performance results of experiments on the Sequent Symmetry are summarized in Figure 5. The Sequent

Symmetry is a shared memory multiprocessor whose processor and memory nodes are interconnected via

a shared bus. A processor node consists of an Intel 80386 and a 64 Kbyte, two-way set-associative cache.

Cache coherence is maintained by a snoopy protocol. The Symmetry provides an atomic fetch-and-store

instruction. Because other strong primitives are not provided, we used a version of Mellor-Crummey and

Scott's algorithm that is implemented with fetch-and-store and that does not ensure starvation-freedom [11].

Fetch-and-add, which is used in T. Anderson's algorithm, was simulated by a test-and-set algorithm with

randomized backo�, as Anderson did in [3].

The experiments on the Symmetry show similar results to that for the TC2000. However, on the Sym-

metry, T. Anderson's algorithm has the best overall performance, mainly because the availability of coherent

11

LAMP

PF

STYER

T&S

AND

YA

MCS

microsec.

processors

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

5.00 10.00 15.00

Figure 6: Symmetry, small critical section.

12

caches makes all spins in his algorithm local. The performance of Lamport's algorithm on the Symmetry is

far better than that on the TC2000. This seems partly due to the fact that his algorithm is not starvation-

free. Speci�cally, when a process enters its critical section, it can keep all needed variables in its own cache

and repeatedly enter its critical section, without yielding to the other processes. In one of our tests for the

two-process case, one process executed 50,000 critical sections during a period of time in which the other

process executed only 120 critical sections.

Dependence on coherent caching for e�cient synchronization [3, 7] is questionable, as many caching

schemes do not cache shared writable data. Our solution neither requires a coherent cache for e�cient

implementation nor any strong primitives. An e�cient implementation of our algorithm requires only that

each processor has some part of shared memory that is locally accessible, and that read and write operations

are atomic. We consider these to be minimal hardware requirements for e�cient synchronization. It is

worth noting that, without fetch-and-add and compare-and-swap primitives, T. Anderson's algorithm and

Mellor-Crummey and Scott's algorithm are not starvation-free.

6 Concluding Remarks

We have presented a scalable mutual exclusion algorithm for shared memory multiprocessors that does

not require any hardware support other than atomic read and write operations. Our algorithm has better

worst-case time complexity than any previously published mutual exclusion algorithm based on read/write

atomicity, requiring O(logN ) remote operations under any amount of contention. We have also presented

an extension of our algorithm for fast mutual exclusion in the absence of contention that generates O(1)

remote operations in the presence of contention and O(N ) in the absence of contention. In addition, we have

suggested several open questions concerning the inherent complexity of multiprocessor synchronization.

In the complexity calculations given in this paper, the distinction between remote and local operations

is based upon a static assignment of shared variables to processes. Other de�nitions, which incorporate

speci�c architectural details of systems, are also possible. For example, for programs intended for machines

with coherent caching, it might be appropriate to consider a read of a shared variable x by a process p to

be local if x has not been written by another process since p's most recent access of x. However, because of

the many parameters that go into de�ning a cache-coherence protocol, such de�nitions can be problematic.

We therefore choose to leave this as a subject for further study.

A natural approach to measuring the time complexity of concurrent programs would be to simply count

the total number of operations executed. However, a straightforward application of such an approach does not

provide any insight into the behavior of mutual exclusion algorithms under heavy contention. In particular,

in any algorithm in which processes busy-wait, the number of operations needed for one process to get to

its critical section is unbounded. In order to serve as a measure of time complexity, a measure should be

both intuitive and easy to compute. In sequential programming, the usual measure of time complexity,

which is obtained by simply counting operations, satis�es these criteria. By contrast, there has been much

disagreement on how time complexity should be measured in concurrent programs, and a complexity measure

satisfying these criteria has yet to be adopted. We believe that an appropriate time complexity measure for

concurrent algorithms is one based on the number of remote memory references. As seen in this paper, such

a measure can be used to make meaningful distinctions concerning the performance of concurrent programs.

Because minimizing the number of remote memory references in shared-memory algorithms is analogous

to minimizing the number of messages in distributed message-passing algorithms, one might be tempted

to compare our algorithm with distributed mutual exclusion algorithms. However, in distributed message-

passing algorithms, processes respond to messages even when they are in their noncritical sections. By

13

contrast, in shared-memory algorithms (including ours), a process may halt while in its noncritical section,

without a�ecting other processes.

Acknowledgement: We would like to thank Howard Gobio� for helping with the performance studies given in

Section 5. We would also like to acknowledge Argonne National Laboratories and Lawrence Livermore National

Laboratories for providing us with access to the machines used in these studies. We are particularly grateful to Terry

Gaasterland at Argonne, and Tammy Welcome and Brent Gorda at Lawrence Livermore for their help. We also

thank the referees for their helpful remarks, and Leslie Lamport for sharing his hierarchical proof macros with us.

Appendix: Correctness Proof

In this appendix, we present the correctness proofs for the algorithms presented in Sections 3 and 4. As

depicted in Figure 1, our N -process solution is obtained by associating an instance of the two process solution

(statements 4 to 19) with each internal node of a binary arbitration tree. By induction on the depth of the

tree, it can be shown that the mutual exclusion and starvation-freedom properties hold for the N -process

program provided they hold for the two-process program. In the rest of this section, we focus on the proof

that the mutual exclusion and starvation-freedom properties hold for the two-process program.

Our assertions and proofs are stated in a hierarchical structure proposed by Lamport [10]. A sequence

of conjunctions (disjunctions) is denoted by putting each conjunct (disjunct), preceded by a ^ (_), in a

separate line. In proofs in Lamport's style, an assertion is proven by a series of high-level steps. Each of

these steps may be proven by a list of steps at a lower level. Every step is preceded by < level > number

for reference in subsequent steps. We begin by presenting de�nitions and notational conventions that will

be used in the remainder of the paper.

Notational Conventions: Unless speci�ed otherwise, we assume that i; p, and q range over f0::N � 1g .

We denote statement number k of process i as k:i. Let S be a subset of the statement labels in process i.

Then, i@fSg holds i� the program counter for process i equals some value in S. The expression (Ni :: P (i))

denotes the number of i's satisfying predicate P (i). The following is a list of symbols we will use ordered by

increasing binding power: �, 7!, ), _, ^, (=; 6=; >;<;�;�), (+;�), 0, :, (:;@) . The symbols enclosed in

parentheses have the same priority. We sometimes use parentheses to override this binding rule.

Mutual Exclusion

To facilitate the presentation, we de�ne the following predicate.

LME � (8i :: (Np : side(p) = side(i) :: p@f4::19g) � 1)

Informally, LME holds if the (N=2)-process solutions guarantee mutual exclusion in the subtrees. In other

words, the predicate LME serves as the induction hypothesis in our proof of mutual exclusion. We henceforth

assume that LME is an invariant.

We next prove that the mutual exclusion property holds. In particular, we prove that if at most one

process from the left subtree and at most one process from the right subtree compete, then at most one

process may enter its critical section at a time. This property can be stated as follows.

14

LME ) (Ni :: i@f15g) � 1

Next, we prove several assertions that are needed to establish ((Ni :: i@f15g) � 1).

(I1) C[1� side(i)] = p ^ p 6= �1

) side(p) = 1� side(i)

h1i1. Init ) C[1� side(i)] = �1

) I1

h1i2. I1 ) I10

Assume: I1 ^ :I10

Prove: ?

h2i1. side(p) 6= 1� side(i) by :I10

h2i2. 4:p ^ side(p) = 1� side(i) by I1 ^ :I10

h2i3. ? by h2i1 and h2i2

(I2) C[side(i)] = i

) i@f5::16g

h1i1. Init ) C[side(i)] = �1

) I2

h1i2. I2 ) I20

Assume: I2 ^ :I20

Prove: ?

h2i1. C0[side(i)] = i by :I20

h2i2. :(i@f5::16g)0 by :I20

h2i3. _ 4:i

_ 16:i

by I2 ^ :I20

Case: 4:i

h3i1. (i@f5g)0 by 4:i


Case: 16:i

h3i1. C 0[side(i)] = �1 by 16:i


(I3) i@f5::16g

) C[side(i)] = i

15

h1i1. Init ) i@f0g

) I3

h1i2. I3 ) I30

Assume: I3 ^ :I30

Prove: ?

h2i1. (i@f5::16g)0 by :I30

h2i2. C0[side(i)] 6= i by :I30

h2i3. _ 4:i

_ (4:p _ 16:p) ^ side(p) = side(i)

by I3 ^ :I30

Case: 4:i

h3i1. C 0[side(i)] = i by 4:i


Case: (4:p _ 16:p) ^ side(p) = side(i)

Case: p = i

h4i1. C0[side(i)] = i _ (i@f17g)0 by 4:i _ 16:i

h4i2. ? by h4i1, h2i1, and h2i2

Case: p 6= i

h4i1. (p@f5; 17g)0 by 4:p _ 16:p

h4i2. :(i@f4::19g)0 by LME, h4i1, and side(p) = side(i)


(I4) ^ p@f6::19g

^ side(q) = 1� side(p)

^ q@f6::19g

) _ T = p

_ T = q

h1i1. Init ) p@f0g

) I4

h1i2. I4 ) I40

Assume: I4 ^ :I40

Prove: ?

h2i1. (p@f6::19g)0 by :I40

h2i2. side(q) = 1� side(p) by :I40

h2i3. (q@f6::19g)0 by :I40

h2i4. T 06= p by :I40

h2i5. T 06= q by :I40

h2i6. 5:i by I4 ^ :I40

Case: i = p

h3i1. T 0 = p by 5:p


16

Case: i 6= p ^ side(i) = side(p)

h3i1. (i@f6g)0 by 5:i

h3i2. ? by h3i1, h2i1, side(i) = side(p), and LME

Case: i = q

h3i1. T 0 = q by 5:q


Case: i 6= q ^ side(i) = side(q)

h3i1. (i@f6g)0 by 5:i

h3i2. ? by h3i1, h2i3, side(i) = side(q), and LME

(I5) ^ p@f8::11g

^ C[1� side(p)] = q ^ q 6= �1

) _ p:rival = q

_ T = q

_ q@f5g

h1i1. Init ) p@f0g

) I5

h1i2. I5 ) I50

Assume: I5 ^ :I50

Prove: ?

h2i1. (p@f8::11g)0 by :I50

h2i2. C0[1� side(p)] = q ^ q 6= �1 by :I50

h2i3. p:rival0 6= q by :I50

h2i4. T 06= q by :I50

h2i5. :(q@f5g)0 by :I50

h2i6. side(q) = 1� side(p) by I1 and h2i2

h2i7. _ 4:q

_ 5:i

_ 7:p

_ 17:p

by I5 ^ :I50

Case: 4:q

h3i1. (q@f5g)0 by 4:q


Case: 5:i

Case: i = p

h4i1. (p@f6g)0 by 5:p



h4i1. (i@f6g)0 by 5:i


Case: i = q

17

h4i1. T 0 = q by 5:q



h4i1. (i@f6g)0 by 5:i

h4i2. C0[side(i)] = i by I3 and h4i1

h4i3. C0[1� side(p)] = i by h4i2, h2i6, and side(i) = side(q)

h4i4. ? by h4i3, h2i2, and i 6= q

Case: 7:p

h3i1. p:rival0 = q by 7:p and h2i2


Case: 17:p

h3i1. (p@f18g)0 by 17:p


(I6) ^ p@f7::15g

^ P [p] = 2

^ C[1� side(p)] = q ^ q 6= �1

) _ T = q

_ q@f5g

h1i1. Init ) p@f0g

) I6

h1i2. I6 ) I60

Assume: I6 ^ :I60

Prove: ?

h2i1. (p@f7::15g)0 by :I60

h2i2. P 0[p] = 2 by :I60

h2i3. C0[1� side(p)] = q ^ q 6= �1 by :I60

h2i4. T 06= q by :I60

h2i5. :(q@f5g)0 by :I60


h2i7. (q@f5::16g)0 by I2, h2i3, and h2i6

h2i8. _ 4:q

_ 5:i

_ 6:p

_ 19:i

Case: 4:q

h3i1. (q@f5g)0 by 4:q


Case: 5:i

Case: i = p

h4i1. (p@f6g)0 by 5:p



18

h4i1. (i@f6g)0 by 5:i


Case: i = q

h4i1. T 0 = q by 5:q



h4i1. (i@f6g)0 by 5:i


Case: 6:p

h3i1. P 0[p] = 0 by 6:p


Case: 19:i

Case: i = p

h4i1. (p@f20g)0 by 19:p



h4i1. i@f19g ^ p@f7::15g by 19:i and h2i1

h4i2. ? by h4i1, side(i) = side(p), and LME

Case: i = q

h4i1. (q@f20g)0 by 19:q



h4i1. i@f19g ^ q@f5::16g by 19:i and h2i7


The next invariant, (I7), is the crux of our mutual exclusion proof. The mutual exclusion property is

then established in (I8).

(I7) ^ p@f15::19g

^ C[1� side(p)] = q ^ q 6= �1

) _ T = q

_ q@f5g

h1i1. Init ) p@f0g

) I7

h1i2. I7 ) I70

Assume: I7 ^ :I70

Prove: ?

h2i1. (p@f15::19g)0 by :I70

h2i2. C0[1� side(p)] = q ^ q 6= �1 by :I70

h2i3. T 06= q by :I70

19

h2i4. :(q@f5g)0 by :I70


h2i6. (q@f5::16g)0 by I2, h2i2, and h2i5

h2i7. _ 4:q

_ 5:i

_ 8:p ^ p:rival = �1

_ (9:p _ 13:p) ^ T 6= p

_ 14:p ^ P [p] = 2

by I7 ^ :I70

Case: 4:q

h3i1. (q@f5g)0 by 4:q


Case: 5:i

Case: i = p

h4i1. (p@f6g)0 by 5:p



h4i1. (i@f6g)0 by 5:i


Case: i = q

h4i1. T 0 = q by 5:q



h4i1. (i@f6g)0 by 5:i


Case: 8:p ^ p:rival = �1

h3i1. p@f8g ^ p:rival = �1 by 8:p

h3i2. C[1� side(p)] = q ^ q 6= �1 by 8:p and h2i2

h3i3. T 6= q by 8:p and h2i3

h3i4. :q@f5g by 8:p and h2i4

h3i5. _ :(C[1� side(p)] = q ^ q 6= �1)

_ T = q

_ q@f5g

by I5 and h3i1

h3i6. ? by h3i2, h3i3, h3i4, and h3i5

Case: (9:p _ 13:p) ^ T 6= p

h3i1. (p@f15g)0 ^ T 06= p by 9:p _ 13:p and T 6= p

h3i2. T 0 = q by I4, h3i1, h2i5, and h2i6


Case: 14:p ^ P [p] = 2

h3i1. (p@f15g)0 ^ P 0[p] = 2 by 14:p

h3i2. T 0 = q _ (q@f5g)0 by I6, h3i1, and h2i2


(I8) (Ni :: i@f15g) � 1

20

Assume: :I8

Prove: ?

h1i1. ^ p@f15g ^ p 6= �1

^ q@f15g ^ q 6= �1

^ p 6= q

for some p; q, by :I8

h1i2. side(q) = 1� side(p) by h1i1 and LME

h1i3. C[side(p)] = p ^ C[side(q)] = q by I3 and h1i1

h1i4. C[1� side(q)] = p ^ C[1� side(p)] = q by h1i2 and h1i3

h1i5. T = p ^ T = q by I7, h1i1, and h1i4


Progress

Next, we prove that our two-process algorithm is starvation-free. We �rst prove a number of invariants that

are needed to establish starvation-freedom.

(I9) C[side(i)] = p ^ p 6= �1

) side(p) = side(i)


) I9

h1i2. I9 ) I90

Assume: I9 ^ :I90

Prove: ?

h2i1. side(p) 6= side(i) by :I90

h2i2. 4:p ^ side(p) = side(i) by I9 ^ :I90


(I10) i@f8::11g

) _ i:rival = �1

_ side(i:rival) = 1� side(i)

h1i1. Init ) i@f0g

) I10

h1i2. I10 ) I100

Assume: I10 ^ :I100

Prove: ?

21

h2i1. (i@f8::11g)0 by :I100

h2i2. i:rival0 6= �1 by :I100

h2i3. side(i:rival0) 6= 1� side(i) by :I100

h2i4. _ 7:i

_ 17:i

by I10 ^ :I100

Case: 7:i

h3i1. i:rival0 = C 0[1� side(i)] by 7:i

h3i2. side(i:rival0) = 1� side(i) by I1, h3i1, and h2i2


Case: 17:i

h3i1. (i@f18g)0 by 17:i


(I11) (8j : side(j) = side(i) :: j@f0::4; 17::22g)

) C[side(i)] = �1


) I11

h1i2. I11 ) I110

Assume: I11 ^ :I110

Prove: ?

h2i1. (8j : side(j) = side(i) :: (j@f0::4; 17::22g)0) by :I110

h2i2. Let: C 0[side(i)] = p ^ p 6= �1 by :I110

h2i3. side(p) = side(i) by I9 and h2i2

h2i4. (p@f5::16g)0 by I2 and h2i2


(I12) ^ p@f8g ^ p:rival 6= �1 _ p@f9::14g


^ q@f18; 19g

) _ q:rival = p

_ q:rival = q

h1i1. Init ) p@f0g

) I12

h1i2. I12 ) I120

Assume: I12 ^ :I120

Prove: ?

22

h2i1. (p@f8g)0 ^ p:rival0 6= �1 _ (p@f9::14g)0 by :I120


h2i3. (q@f18; 19g)0 by :I120

h2i4. q:rival0 6= p by :I120

h2i5. q:rival0 6= q by :I120

h2i6. C0[side(q)] = �1 by I11, h2i3, and LME

h2i7. T 0 = p _ T 0 = q by I4, h2i1, h2i2, and h2i3

h2i8. _ 7:p

_ 7:q

_ 17:q

by I12 ^ :I120

Case: 7:p

h3i1. C 0[1� side(p)] = �1 by 7:p, h2i6, and h2i2

h3i2. (p@f8g)0 ^ p:rival0 = �1 by 7:p and h3i1


Case: 7:q

h3i1. (q@f8g)0 by 7:q


Case: 17:q

h3i1. q:rival0 = p _ q:rival0 = q by 17:q and h2i7


(I13) ^ p@f8g ^ p:rival 6= �1 _ p@f9::14g


^ q@f19g

) q:rival = p

h1i1. Init ) p@f0g

) I13

h1i2. I13 ) I130

Assume: I13 ^ :I130

Prove: ?

h2i1. (p@f8g)0 ^ p:rival0 = �1 _ (p@f9::14g)0 by :I130


h2i3. (q@f19g)0 by :I130



h2i6. _ 7:p

_ 7:q _ 17:q

_ 18:q ^ q:rival 6= q

by I13 ^ :I130

Case: 7:p



23


Case: 7:q _ 17:q

h3i1. :(q@f19g)0 by 7:q _ 17:q


Case: 18:q ^ q:rival 6= q

h3i1. q:rival0 = p by I12, h2i1, h2i2, h2i3, and q:rival0 6= q

h3i2. ? by h3i1, and h2i4

(I14) ^ p@f8g ^ p:rival 6= �1 _ p@f9::14g


^ q@f18g

^ q:rival = q

) T = q

h1i1. Init ) p@f0g

) I14

h1i2. I14 ) I140

Assume: I14 ^ :I140

Prove: ?



h2i3. (q@f18g)0 by :I140

h2i4. q:rival0 = q by :I140

h2i5. T 06= q by :I140


h2i7. _ 5:i

_ 7:p

_ 7:q

_ 17:q

by I14 ^ :I140

Case: 5:i

Case: i = p

h4i1. (p@f6g)0 by 5:p



h4i1. (i@f6g)0 by 5:i


Case: i = q

h4i1. (q@f6g)0 by 5:q



h4i1. (i@f6g)0 by 5:i


24

Case: 7:p




Case: 7:q

h3i1. (q@f8g)0 by 7:q


Case: 17:q

h3i1. q:rival0 = T 0 by 17:q


(I15) ^ p@f8g ^ p:rival 6= �1 _ p@f9::14g

^ (8i : side(i) = 1� side(p) :: i@f0::4; 20::22g)

) P [p] = 2

h1i1. Init ) p@f0g

) I15

h1i2. I15 ) I150

Assume: I15 ^ :I150

Prove: ?


h2i2. (8i : side(i) = 1� side(p) :: (i@f0::4; 20::22g)0) by :I150

h2i3. P 0[p] 6= 2 by :I150

h2i4. C0[side(p)] = p by I3 and h2i1

h2i5. C0[1� side(p)] = �1 by I11 and h2i2

h2i6. _ 6:p

_ 7:p

_ 11:i ^ i:rival = p

_ 18:i ^ side(i) = 1� side(p) ^ i:rival = i

_ 19:i ^ side(i) = 1� side(p)

by I15 ^ :I150

Case: 6:p

h3i1. (p@f7g)0 by 6:p


Case: 7:p



Case: 11:i ^ i:rival = p

h3i1. (i@f12g)0 ^ i:rival0 = p by 11:i

h3i2. side(i) = 1� side(p) by I10 and h3i1


Case: 18:i ^ side(i) = 1� side(p) ^ i:rival = i

h3i1. i@f18g by 18:i

25

h3i2. p@f8g ^ p:rival = �1 _ p@f9::14g by 18:i and h2i1

h3i3. T = i by I14, h3i2, side(i) = 1� side(p), h3i1, and i:rival = i

h3i4. T 6= p by h3i3, and side(i) = 1� side(p)

h3i5. C[1� side(i)] = p by 18:i, h2i4, and side(i) = 1� side(p)

h3i6. p@f5g by I7,h3i1, h3i5, and h3i4


Case: 19:i ^ side(i) = 1� side(p)


h3i2. p@f8g ^ p:rival = �1 _ p@f9::14g by 19:i and h2i1

h3i3. i:rival = p by I13, h3i2, side(i) = 1� side(p), and h3i1

h3i4. P 0[p] = 2 by h3i3 and 19:i


(I16) ^ p@f10::12g


^ q@f8::12g

^ T = q

) q:rival = p

h1i1. Init ) p@f0g

) I16

h1i2. I16 ) I160

Assume: I16 ^ :I160

Prove: ?

h2i1. (p@f10::12g)0 by :I160


h2i3. (q@f8::12g)0 by :I160

h2i4. T 0 = q by :I160



h2i7. _ 5:q

_ 7:q

_ 9:p ^ T = p

_ 17:q

by I16 ^ :I160

Case: 5:q

h3i1. (q@f6g)0 by 5:q


Case: 7:q

h3i1. C[1� side(q)] = p by 7:q, h2i2, and h2i6

h3i2. q:rival0 = p by 7:q and h3i1


Case: 9:p ^ T = p

26

h3i1. T 0 = p by 9:p and T = p


Case: 17:q

h3i1. (q@f18g)0 by 17:q


(I17) ^ p@f10::12g


^ q@f12::14g

^ T = q

) P [p] � 1

h1i1. Init ) p@f0g

) I17

h1i2. I17 ) I170

Assume: I17 ^ :I170

Prove: ?

h2i1. (p@f10::12g)0 by :I170


h2i3. (q@f12::14g)0 by :I170

h2i4. T 0 = q by :I170

h2i5. P 0[p] = 0 by :I170

h2i6. _ 5:q

_ 6:p

_ 9:p ^ T = p

_ 10:q ^ P [q:rival] 6= 0

_ 11:q

by I17 ^ :I170

Case: 5:q

h3i1. (q@f6g)0 by 5:q


Case: 6:p

h3i1. (p@f7g)0 by 6:p


Case: 9:p ^ T = p

h3i1. T 0 = p by 9:p and T = p


Case: 10:q ^ P [q:rival] 6= 0

h3i1. q@f10g ^ p@f10::12g ^ T = q by 10:q, h2i1, and h2i4

h3i2. q:rival = p by I16, h3i1, and h2i2

h3i3. P [p] = 0 by 10:q and h2i5

h3i4. (q@f11g)0 by 10:q, h3i2, and h3i3


27

Case: 11:q

h3i1. q@f11g ^ p@f10::12g ^ T = q by 11:q, h2i1, and h2i4

h3i2. q:rival = p by I16, h3i1, and h2i2

h3i3. P 0[p] = 1 by 11:q and h3i2


(I18) ^ p@f12g


^ q@f12g

) _ P [p] � 1

_ P [q] � 1

Assume: ^ p@f12g


^ q@f12g

Prove: _ P [p] � 1

_ P [q] � 1

h1i1. _ T = p

_ T = q

by I4 and the assumptions

h1i2. _ P [p] � 1

_ P [q] � 1

by I17, h1i1, and the assumptions

(I19) i@f13; 14g

) P [i] � 1

h1i1. Init ) i@f0g

) I19

h1i2. I19 ) I190

Assume: I19 ^ :I190

Prove: ?

h2i1. (i@f13; 14g)0 by :I190

h2i2. P 0[i] = 0 by :I190

h2i3. _ 6:i

_ 12:i ^ P [i] � 1

by I19 ^ :I190

Case: 6:i

h3i1. (i@f7g)0 by 6:i


Case: 12:i ^ P [i] � 1

28

h3i1. P 0[i] � 1 by 12:i and P [i] � 1


(I20) ^ p@f14g


^ q@f7::14g

^ T = q

) P [q] = 0

h1i1. Init ) p@f0g

) I20

h1i2. I20 ) I200

Assume: I20 ^ :I200

Prove: ?

h2i1. (p@f14g)0 by :I200


h2i3. (q@f7::14g)0 by :I200

h2i4. T 0 = q by :I200

h2i5. P 0[q] 6= 0 by :I200

h2i6. _ 5:q

_ 6:q

_ 11:i ^ i:rival = q

_ 13:p ^ T = p

_ 19:i

by I20 ^ :I200

Case: 5:q

h3i1. (q@f6g)0 by 5:q


Case: 6:q

h3i1. P 0[q] = 0 by 6:q


Case: 11:i ^ i:rival = q


h3i2. side(i) = 1� side(q) by I10, h3i1, and i:rival = q

h3i3. p@f14g by 11:i and h2i1

h3i4. ? by h3i1, h3i2, h3i3, h2i2, and LME

Case: 13:p ^ T = p

h3i1. T 0 = p by 13:p and T = p


Case: 19:i

Case: i = p

h4i1. (p@f20g)0 by 19:p


29


h4i1. i@f19g ^ p@f14g by 19:i and h2i1


Case: i = q

h4i1. (q@f20g)0 by 19:q



h4i1. i@f19g ^ q@f7::14g by 19:i and h2i3

h4i2. ? by h4i1, side(i) = side(q), and LME

(I21) :(p@f14g ^ side(q) = 1� side(p) ^ q@f14g)

Assume: :I21

Prove: ?

h1i1. ^ p@f14g


^ q@f14g

by :I21

h1i2. P [p] � 1 ^ P [q] � 1 by I19 and h1i1

h1i3. T = p _ T = q by I4 and h1i1

h1i4. P [p] = 0 _ P [q] = 0 by I20, h1i1, and h1i3


(I22) ^ p@f6::14g


^ q@f15::19g

) T = p

h1i1. Init ) p@f0g

) I22

h1i2. I22 ) I220

Assume: I22 ^ :I220

Prove: ?

h2i1. (p@f6::14g)0 by :I220


h2i3. (q@f15::19g)0 by :I220

h2i4. T 06= p by :I220


30

h2i6. _ 5:i

_ 8:q ^ q:rival = �1

_ (9:q _ 13:q) ^ T 6= q

_ 14:q ^ P [q] = 2

by I22 ^ :I220

Case: 5:i

Case: i = p

h4i1. T 0 = p by 5:p



h4i1. (i@f6g)0 by 5:i


Case: i = q

h4i1. (q@f6g)0 by 5:q



h4i1. (i@f6g)0 by 5:i


Case: 8:q ^ q:rival = �1

h3i1. q@f8g by 8:q

h3i2. C[1� side(q)] = p ^ p 6= �1 by 8:q, h2i2, and h2i5

h3i3. p@f6::14g by 8:q and h2i1

h3i4. T = p I5, h3i1, h3i2, h3i3, and q:rival = �1

h3i5. T 0 = p 8:q and h3i4


Case: (9:q _ 13:q) ^ T 6= q

h3i1. (q@f15g)0 ^ T 06= q by 9:q _ 13:q and T 6= q

h3i2. T 0 = p by I4, h2i1, h2i2, and h3i1


Case: 14:q ^ P [q] = 2

h3i1. (q@f15g)0 ^ P 0[q] = 2 by 14:q and P [q] = 2

h3i2. C[1� side(q)] = p ^ p 6= �1 by 14:q, h2i2, and h2i5

h3i3. T 0 = p by I6, h3i1, h3i2, and h2i1


(I23) ^ p@f7::14g


^ q@f18; 19g

) _ q:rival = p

_ P [p] = 0

_ P [p] = 2

h1i1. Init ) p@f0g

31

) I23

h1i2. I23 ) I230

Assume: I23 ^ :I230

Prove: ?

h2i1. (p@f7::14g)0 by :I230


h2i3. (q@f18; 19g)0 by :I230


h2i5. P 0[p] 6= 0 by :I230

h2i6. P 0[p] 6= 2 by :I230

h2i7. _ 6:p

_ 7:q


_ 17:q

by I23 ^ :I230

Case: 6:p

h3i1. P 0[p] = 0 by 6:p


Case: 7:q

h3i1. (q@f8g)0 by 7:q




h3i2. side(i) = 1� side(p) by I10, h3i1, and i:rival = p

h3i3. side(i) = side(q) by h3i2, and h2i2

Case: i = q

h4i1. (q@f12g)0 by 11:q


Case: i 6= q

h4i1. (i@f12g)0 by 11:i

h4i2. ? by h4i1, h2i3, h3i3, and LME

Case: 17:q

h3i1. p@f7::14g ^ q@f17g by 17:q and h2i1

h3i2. T = p by I22, h3i1, and h2i2



(I24) ^ p@f7::14g

^ (8i : side(i) = 1� side(p) :: i@f0::5; 20::22g)

) _ P [p] = 0

_ P [p] = 2

h1i1. Init ) p@f0g

32

) I24

h1i2. I24 ) I240

Assume: I24 ^ :I240

Prove: ?

h2i1. (p@f7::14g)0 by :I240


h2i3. P 0[p] 6= 0 by :I240

h2i4. P 0[p] 6= 2 by :I240

h2i5. _ 6:p


_ 18:i ^ side(i) = 1� side(p) ^ i:rival = i

_ 19:i ^ side(i) = 1� side(p)

by I24 ^ :I240

Case: 6:p

h3i1. P 0[p] = 0 by 6:p





h3i3. (i@f12g)0 by 11:i


Case: 18:i ^ side(i) = 1� side(p) ^ i:rival = i


h3i2. p@f7::14g by 18:i and h2i1

h3i3. P [p] = 0 _ P [p] = 2 by I23, h3i1, h3i2, side(i) = 1� side(p), and i:rival = i

h3i4. P 0[p] = 0 _ P 0[p] = 2 by 18:i and h3i3





h3i3. i:rival = p _ P [p] = 0 _ P [p] = 2 by I23, h3i1, h3i2, and side(i) = 1� side(p)

h3i4. P 0[p] = 0 _ P 0[p] = 2 by 19:i and h3i3


(I25) ^ p@f8g ^ p:rival 6= �1 _ p@f9::14g


^ q@f18; 19g

) q:rival = p

h1i1. Init ) p@f0g

) I25

h1i2. I25 ) I250

Assume: I25 ^ :I250

33

Prove: ?



h2i3. (q@f18; 19g)0 by :I250



h2i6. _ 7:p

_ 7:q

_ 17:q

by I25 ^ :I250

Case: 7:p

h3i1. C 0[1� side(p)] = �1 by 7:p, h2i2, h2i5



Case: 7:q

h3i1. (q@f8g)0 by 7:q


Case: 17:q

h3i1. q@f17g ^ p@f8::14g by 17:q and h2i1

h3i2. T = p by I22, h3i1, and h2i2



(I26) ^ p@f13; 14g

^ (8i : side(i) = 1� side(p) :: i@f0::5; 20::22g)

) P [p] = 2

h1i1. Init ) p@f0g

) I26

h1i2. I26 ) I260

Assume: I26 ^ :I260

Prove: ?

h2i1. (p@f13; 14g)0 by :I260


h2i3. P 0[p] 6= 2 by :I260

h2i4. _ 6:p


_ 12:p ^ P [p] 6= 0

_ 18:i ^ side(i) = 1� side(p)

_ 19:i ^ side(i) = 1� side(p)

by I26 ^ :I260

Case: 6:p

h3i1. (p@f7g)0 by 6:p


34




h3i3. (i@f12g)0 by 11:i


Case: 12:p ^ P [p] 6= 0

h3i1. (p@f13g)0 ^ P 0[p] 6= 0 by 12:p

h3i2. P 0[p] = 2 by I24, h3i1, and h2i2




h3i2. p@f13; 14g by 18:i and h2i1

h3i3. i:rival = p by I25, h3i1, h3i2, and side(i) = 1� side(p)

h3i4. (i@f19g)0 by 18:i and h3i3

h3i5. ? by h3i4, h2i2, and side(i) = 1� side(p)



h3i2. p@f13; 14g by 19:i and h2i1

h3i3. i:rival = p by I25, h3i1, h3i2, and side(i) = 1� side(p)

h3i4. P 0[p] = 2 by 19:i and h3i3


(I27) ^ p@f7::14g


^ q@f11g

^ q:rival = p

) P [p] = 0

h1i1. Init ) p@f0g

) I27

h1i2. I27 ) I270

Assume: I27 ^ :I270

Prove: ?

h2i1. (p@f7::14g)0 by :I270


h2i3. (q@f11g)0 by :I270

h2i4. q:rival0 = p by :I270

h2i5. P 0[p] 6= 0 by :I270

35

h2i6. _ 6:p

_ 7:q _ 17:q

_ 10:q ^ P [q:rival] = 0


_ 19:i

by I27 ^ :I270

Case: 6:p

h3i1. P 0[p] = 0 by 6:p


Case: 7:q ^ 17:q

h3i1. (q@f8; 18g)0 by 7:q _ 17:q


Case: 10:q ^ P [q:rival] = 0

h3i1. q:rival = p by 10:q and h2i4

h3i2. P [p] = 0 by h3i1 and P [q:rival] = 0

h3i3. P 0[p] = 0 by 10:q and h3i2





h3i3. side(i) = side(q) by h3i2, and h2i2

Case: i = q

h4i1. (q@f12g)0 by 11:q


Case: i 6= q

h4i1. (i@f12g)0 by 11:i


Case: 19:i


h3i2. q@f11g by 19:i and h2i3


Case: i = p




Case: i = q




(I28) ^ p@f14g


^ T = q

) P [p] = 2

36

h1i1. Init ) p@f0g

) I28

h1i2. I28 ) I280

Assume: I28 ^ :I280

Prove: ?

h2i1. (p@f14g)0 by :I280


h2i3. T 0 = q by :I280

h2i4. P 0[p] 6= 2 by :I280

h2i5. _ 5:q

_ 6:p

_ 11:i ^ i:rival = p ^ P [p] = 2

_ 13:p ^ T = p

by I28 ^ :I280

Case: 5:q

h3i1. q@f5g by 5:q

h3i2. (8i : side(i) = side(q) :: i@f0::5; 20::22g) by h3i1 and LME

h3i3. p@f14g by 5:q and h2i1

h3i4. P [p] = 2 by I26, h3i3, h3i2, and h2i2

h3i5. P 0[p] = 2 by 5:q and h3i4


Case: 6:p

h3i1. (p@f7g)0 by 6:p


Case: 11:i ^ i:rival = p ^ P [p] = 2



h3i3. p@f14g by 11:i and h2i1

h3i4. P [p] = 0 by I27, h3i3, h3i2, h3i1, and i:rival = p

h3i5. ? by h3i4 and P [p] = 2

Case: 13:p ^ T = p

h3i1. T 0 = p by 13:p and T = p


(I29) ^ i@f12g

^ side(q) = 1� side(i)

^ q@f14g

) _ P [i] � 1

_ P [q] = 2

Assume: :I29

Prove: ?

h1i1. i@f12g by :I29

37

h1i2. side(q) = 1� side(i) by :I29

h1i3. q@f14g by :I29

h1i4. P [i] = 0 by :I29

h1i5. P [q] � 1 by :I29

h1i6. T = i _ T = q by I4, h1i1, h1i3, and h1i2

h1i7. T = i by I17, h1i1, h1i3, h1i2, h1i4, and h1i6

h1i8. P [q] = 2 by I28, h1i3, h1i2, and h1i7


(I15) implies that if only one process is in its entry section, then that process's busy-waiting loops will

terminate. (I18), (I21), and (I29) imply that if two processes are in their entry sections then at least one

of them can make progress. Note that at most two process may execute their entry sections at this level.

Hence, we conclude that the two-process solution in Figure 1 is free from livelock.

Next, we prove that the algorithm is free from starvation. To facilitate the presentation, we de�ne the

following predicate, which represents the starvation-freedom property.

(8i :: i@f1::14g 7! i@f15g) (SF)

The following two assertions imply that (SF) holds.

(8i :: i@f1::3g 7! i@f4g) (LSF)

(8i :: i@f4::14g 7! i@f15g) (2SF)

As we explained above, ENTRY LEFT(ENTRY RIGHT) and EXIT LEFT(EXIT RIGHT) are obtained by

nesting the two process mutual exclusion algorithm in lines from 4 to 19. It is straightforward to show, by

induction on the depth of the tree, that (LSF) follows from (2SF). In the rest of this section, we prove that

(2SF) holds. First, we prove two unless assertions.

i@f12g ^ P [i] � 1 unless i@f13g (U1)

h1i1. i@f12g ^ P [i] � 1 ) (i@f12g)0 ^ P 0[i] � 1 _ (i@f13g)0

Assume: i@f12g ^ P [i] � 1 ^ (:(i@f12g)0 _ P 0[i] = 0) ^ :(i@f13g)0

Prove: ?

h2i1. i@f12g by the assumption

h2i2. P [i] � 1 by the assumption

h2i3. :(i@f13g)0 by the assumption

h2i4. _ 6:i

_ 12:i

by :U1

Case: 6:i

h3i1. i@f6g by 6:i


38

Case: 12:i



i@f14g ^ P [i] = 2 unless i@f15g (U2)

h1i1. i@f14g ^ P [i] = 2 ) (i@f14g)0 ^ P 0[i] = 2 _ (i@f15g)0

Assume: i@f14g ^ P [i] = 2 ^ (:(i@f14g)0 _ P 0[i] � 1) ^ :(i@f15g)0

Prove: ?

h2i1. i@f14g by the assumption

h2i2. P [i] = 2 by the assumption

h2i3. :(i@f15g)0 by the assumption

h2i4. _ 6:i

_ 11:q ^ q:rival = i

_ 14:i

by :U2

Case: 6:i

h3i1. i@f6g by 6:i


Case: 11:q ^ q:rival = i

h3i1. q@f11g by 11:q

h3i2. side(q) = 1� side(i) by I1, h3i1, and q:rival = i

h3i3. P [i] = 0 by I27, h2i1, h3i2, h3i1, and q:rival = i


Case: 14:i



The next two assertions follow from these unless assertions, the de�nition of a fair history, and the program

text; (U1) is used to prove (L1) and (U2) is used to prove (L2).

i@f12g ^ P [i] � 1 7! i@f13g (L1)

i@f14g ^ P [i] = 2 7! i@f15g (L2)

The following assertions, which are stated without proof, follow directly from the de�nition of a fair history

and the program text.

i@f12g ^ side(q) = 1� side(i) ^ q@f5::11g 7!

i@f13g _

i@f12g ^ side(q) = 1� side(i) ^ q@f12; 15g) (L3)

39


i@f13g _

i@f12g ^ side(q) = 1� side(i) ^ q@f18g (L4)

Assertions (L5) through (L10), given next, easily follow from the preceding assertions. In particular, (I25)

and (L4) imply that (L5) holds; (I25) and (L5) imply that (L6) holds; (L6) implies that (L7) holds; (L1)

and (L7) imply that (L8) holds; (I18) implies that (L9) holds; and (L1) implies that (L10) holds.


i@f13g _

i@f12g ^ side(q) = 1� side(i) ^ q@f18g ^ q:rival = i (L5)


i@f13g _

i@f12g ^ side(q) = 1� side(i) ^ q@f19g ^ q:rival = i (L6)


i@f13g _

i@f12g ^ P [i] = 2 (L7)


i@f13g (L8)

i@f12g ^ side(q) = 1� side(i) ^ q@f12g 7!

i@f12g ^ side(q) = 1� side(i) ^ q@f12g ^ (P [i] � 1 _ P [q] � 1) (L9)

i@f12g ^ side(q) = 1� side(i) ^ q@f12g ^ (P [i] � 1 _ P [q] � 1) 7!

i@f13g _


The next assertion, which is stated without proof, follows directly from the de�nition of a fair history and

the program text.


i@f13g _

i@f12g ^ side(q) = 1� side(i) ^ q@f14; 15g (L11)


i@f13g _


By (L1) and (L2), (I29) implies that (L12) holds.


i@f13g (L13)

40

Assertions (L3), (L8), (L11), and (L12) imply that (L13) holds.

i@f12g 7! i@f13g (L14)

(I15) implies that i@f12g ) P [i] = 2 _ (9q :: side(q) = 1 � side(i) ^ q@f5::19g) holds. By (L1), (L8),

and (L13), this implies that (L14) holds.

By proving assertions similar to (L3) through (L13), it is possible to establish (L15), given next, which is

similar to (L14). For brevity, we omit the proof of (L15).

i@f14g 7! i@f15g (L15)

Note that (L14) implies that the �rst busy-waiting loop of process i terminates, while (L15) implies that

the second busy-waiting loop of i terminates. Thus, we conclude that the algorithm in Figure 1 is free from

starvation.

Correctness Proof for Fast Algorithm

In this section, we prove that the mutual exclusion and starvation-freedom properties hold for mutual ex-

clusion algorithm of Figure 3. We �rst prove �ve invariants that are needed to prove that mutual exclusion

holds. The �rst three are quite simple: (I30) follows from the mutual exclusion property of the algorithm in

Figure 1, (I31) follows directly from the program text, and (I32) follows from (I30).

(I30) (Ni :: i@f15::27g) � 1

(I31) i@f6::12g ) B[i]

(I32) i@f20::26g ) Z

(I33) ^ i:flag

^ _ (i@f22; 23g ^ i:n > p)

_ (i@f24g ^ i:n � p)

) :p@f7::12g

h1i1. Init ) p@f0g

) I33

h1i2. I33 ) I330

Assume: I33 ^ :I330

Prove: ?

h2i1. i:flag0 by :I330

h2i2. _ ((i@f22; 23g)0 ^ i:n0 > p)

_ ((i@f24g)0 ^ i:n0� p)

by :I330

h2i3. (p@f7::12g)0 by :I330

41

h2i4. _ 6:p ^ :Z

_ 20:i

_ 21:i

_ 23:i ^ i:n = p ^ :B[i:n]

by I33 ^ :I330

Case: 6:p ^ :Z

h3i1. :i@f20::26g by I32 and :Z

h3i2. :(i@f20::26g)0 by 6:p and h3i1


Case: 20:i

h3i1. (i@f21g)0 by 20:i


Case: 21:i

h3i1. (i@f22g)0 ^ i:n0 = 0 by 21:i

h3i2. ? by h3i1, h2i2, and p � 0

Case: 23:i ^ i:n = p ^ :B[i:n]

h3i1. :B[p] by i:n = p ^ :B[i:n]

h3i2. :p@f6::12g0

by I31 and h3i1

h3i3. :(p@f6::12g)0 by 23:i and h3i2


(I34) ^ i@f25g

^ i:flag

) (8p :: :p@f7::12g)

h1i1. Init ) i@f0g

) I34

h1i2. I34 ) I340

Assume: I34 ^ :I340

Prove: ?

h2i1. (i@f25g)0 by :I340

h2i2. i:flag0 by :I340

h2i3. (9p :: (p@f7::12g)0) by :I340

h2i4. _ 6:p ^ :Z

_ 20:i

_ 22:i ^ i:n � N

by I34 ^ :I340

Case: 6:p ^ :Z

h3i1. :i@f20::26g by I32 and :Z

h3i2. :(i@f20::26g)0 by 6:p and h3i1


Case: 20:i

h3i1. (i@f21g)0 by 20:i


42

Case: 22:i ^ i:n � N


h3i2. (8p :: i:n > p) by i:n � N

h3i3. i:flag by 22:i and h2i2

h3i4. (8p :: :p@f7::12g) by I33, h3i3, h3i1, and h3i2

h3i5. (8p :: :(p@f7::12g)0) by 22:i and h3i4


The following assertion implies that the mutual exclusion property holds for the fast entry section.

(I35) ^ ((Ni :: i@f2g ^ X = i ^ Y = �1)+

(Ni :: i@f3g ^ X = i)+

(Ni :: i@f4g ^ X = i ^ Y = i)+

(Ni :: i@f5::7g ^ Y = i)+

(Ni :: i@f8::11g)) � 1

^ ((9p :: p@f8::11g) ) Y 6= �1)

h1i1. Init ) (8i :: i@f0g)

) I35

h1i2. I35 ) I350

Assume: I35 ^ :I350

Prove: ?

h2i1. _ ((Ni :: (i@f2g)0 ^ X0 = i ^ Y 0 = �1)+

(Ni :: (i@f3g)0 ^ X0 = i)+

(Ni :: (i@f4g)0 ^ X0 = i ^ Y 0 = i)+

(Ni :: (i@f5::7g)0 ^ Y 0 = i)+

(Ni :: (i@f8::11g)0)) > 1

_ ^ (9p :: (p@f8::11g)0)

^ Y 0 = �1

by :I350

h2i2. (Ni ::X0 = i) � 1 because process identi�ers are unique.

h2i3. ((Ni :: (i@f2g)0 ^ X 0 = i ^ Y 0 = �1)+

(Ni :: (i@f3g)0 ^ X0 = i)+

(Ni :: (i@f4g)0 ^ X0 = i ^ Y 0 = i)) � 1

by h2i2

h2i4. _ 1:q ^ Y = �1 ^ ((9p :: p@f8::11g) ) Y 6= �1)

_ 7:q ^ Y = q

_ 11:q

_ 25:q ^ q:flag

by I35 ^ :I350

Case: 1:q ^ Y = �1 ^ ((9p :: p@f8::11g) ) Y 6= �1)

h3i1. (8p :: :p@f8::11g) by Y = �1 and ((9p :: p@f8::11g) ) Y 6= �1)

h3i2. (8p :: :(p@f8::11g)0) by 1:q and h3i1

h3i3. Y 0 = �1 by 1:q and Y = �1

43

h3i4. (Ni :: (i@f5::7g)0 ^ Y 0 = i) = 0 by h3i3


Case: 7:q ^ Y = q

h3i1. Y 0 = q by 7:q and Y = q

h3i2. ((Ni :: (i@f2g)0 ^ X0 = i ^ Y 0 = �1)+

(Ni :: (i@f3g)0 ^ X0 = i)+

(Ni :: (i@f4g)0 ^ X0 = i ^ Y 0 = i)+

(Ni :: (i@f5::7g)0 ^ Y 0 = i)+

(Ni :: (i@f8::11g)0)) � 1

by I35 and that 7:q decrements

(Ni :: i@f5::7g ^ Y = i) by one

when it increments (Ni :: i@f8::11g) by one.


Case: 11:q

h3i1. Y 0 = �1 by 11:q

h3i2. (Ni :: (i@f5::7g)0 ^ Y 0 = i) = 0 by h3i1

h3i3. (Ni :: (i@f8::11g)0) = 0 by I35 and that 11:q decrements (Ni :: i@f8::11g) by one.


Case: 25:q ^ q:flag

h3i1. q@f25g by 25:q

h3i2. (8p :: :p@f7::12g) by I34, h3i1, and q:flag

h3i3. Y 0 = �1 by 25:q and q:flag

h3i4. (Ni :: (i@f5::7g)0 ^ Y 0 = i) = 0 by h3i3


ENTRY2 and EXIT2 satisfy the following properties.

(I36) ((Ni :: i@f8::10g) � 1 ^ (Ni :: i@f15::27g) � 1) ) (Ni :: i@f9; 16::26g) � 1

i@f8g 7! i@f9g (L16)

i@f15g 7! i@f16g (L17)

The proof of (I36) is similar to that of (I8), and is omitted for brevity. The proof of (L16) and (L17) are

similar to that of (2SF), and are omitted for brevity. (Note that process identi�ers and function side are

used for convenience in the proof of (I8) and (2SF).)

The following two assertions imply that the mutual exclusion and starvation-freedom properties hold for

the algorithm of Figure 3.

(I37) (Ni :: i@f9; 16g) � 1

(I30), (I35), and (I36) imply that (I37) holds.

i@f1::8; 14; 15g 7! i@f9; 16g (L18)

44

By the program text, i@f1::7g 7! i@f8; 14g. (SF) implies that i@f14g 7! i@f15g. Hence, (L18) follows

from (L16) and (L17).

References

[1] A. Agarwal and M. Cherian, \Adaptive Backo� Synchronization Techniques", Proceedings of the 16th

International Symposium on Computer Architecture, May, 1989, pp. 396-406.

[2] J. Anderson, \A Fine-Grained Solution to the Mutual Exclusion Problem", Acta Informatica, Vol. 30,

No. 3, 1993, pp. 249-265.

[3] T. Anderson, \The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors", IEEE

Transactions on Parallel and Distributed Systems, Vol. 1, No. 1, January, 1990, pp. 6-16.

[4] BBN Advanced Computers, Inside the TC2000 Computer, February, 1990.

[5] K. Chandy and J. Misra, Parallel Program Design: A Foundation, Addison-Wesley, 1988.

[6] E. Dijkstra, \Solution of a Problem in Concurrent Programming Control", Communications of the

ACM , Vol. 8, No. 9, 1965, pp. 569.

[7] G. Graunke and S. Thakkar, \Synchronization algorithms for shared-memory multiprocessors", IEEE

Computer , Vol. 23, June, 1990, pp. 60-69.

[8] J. Kessels, \Arbitration Without Common Modi�able Variables", Acta Informatica, Vol. 17, 1982, pp.

135-141.

[9] L. Lamport, \A Fast Mutual Exclusion Algorithm", ACM Transactions on Computer Systems, Vol. 5,

No. 1, February, 1987, pp. 1-11.

[10] L. Lamport, \How to Write a Proof", Research Report 94, Digital Equipment Corporation Systems

Research Center, February, 1993.

[11] J. Mellor-Crummey and M. Scott, \Algorithms for Scalable Synchronization on Shared-Memory Multi-

processors", ACM Transactions on Computer Systems, Vol. 9, No. 1, February, 1991, pp. 21-65.

[12] M. Michael and M. Scott, \Fast Mutual Exclusion, Even With Contention", Technical Report, University

of Rochester, June, 1993.

[13] G. Peterson and M. Fischer, \Economical Solutions for the Critical Section Problem in a Distributed

System", Proceedings of the 9th ACM Symposium on Theory of Computing , May, 1977, pp. 91-97.

[14] E. Styer, \Improving Fast Mutual Exclusion", Proceedings of the Eleventh Annual ACM Symposium on

Principles of Distributed Computing , 1992, pp. 159-168.

[15] J. Yang and J. Anderson, \Fast, Scalable Synchronization with Minimal Hardware Support (Extended

Abstract)", Proceedings of the 12th Annual ACM Symposium on Principles of Distributed Computing ,

August, 1993, pp. 171-182.

[16] J. Yang and J. Anderson, \Time Bounds for Mutual Exclusion and Related Problems", Proceedings of

the 26th Annual ACM Symposium on Theory of Computing , May, 1994, pp. 224-233.

45

Documents

A Fast, Scalable Mutual Exclusion Algorithm* - Department of