TheBeneﬁtsofDualityinVerifyingConcurrent ProgramsunderTSO · TheBeneﬁtsofDualityinVerifyingConcurrent ProgramsunderTSO∗ Parosh Aziz Abdulla1, Mohamed Faouzi Atig2, Ahmed Bouajjani3,

The Benefits of Duality in Verifying ConcurrentPrograms under TSO∗

Parosh Aziz Abdulla1, Mohamed Faouzi Atig2, Ahmed Bouajjani3,and Tuan Phong Ngo4

1 Uppsala University, [email protected]


3 IRIF, Université Paris Diderot & IUF, [email protected]


AbstractWe address the problem of verifying safety properties of concurrent programs running over theTSO memory model. Known decision procedures for this model are based on complex encodingsof store buffers as lossy channels. These procedures assume that the number of processes is fixed.However, it is important in general to prove correctness of a system/algorithm in a parametricway with an arbitrarily large number of processes. In this paper, we introduce an alternative (yetequivalent) semantics to the classical one for the TSO model that is more amenable for efficientalgorithmic verification and for extension to parametric verification. For that, we adopt a dualview where load buffers are used instead of store buffers. The flow of information is now from thememory to load buffers. We show that this new semantics allows (1) to simplify drastically thesafety analysis under TSO, (2) to obtain a spectacular gain in efficiency and scalability comparedto existing procedures, and (3) to extend easily the decision procedure to the parametric case,which allows to obtain a new decidability result, and more importantly, a verification algorithmthat is more general and more efficient in practice than the one for bounded instances.

1998 ACM Subject Classification D.2.4 Software/Program Verification

Keywords and phrases Weak Memory Models, Reachability Problem, Parameterized Verification

Digital Object Identifier 10.4230/LIPIcs.CONCUR.2016.5

1 Introduction

Most modern processor architectures execute instructions in an out-of-order manner to gainefficiency. In the context of sequential programming, this out-of-order execution is transparentto the programmer since one can still work under the Sequential Consistency (SC) model [24].However, this is not true when we consider concurrent processes that share the memory. Infact, it turns out that concurrent algorithms such as mutual exclusion and producer-consumerprotocols may not behave correctly any more. Therefore, program verification is a relevant(and difficult) task in order to prove correctness under the new semantics. The inadequacyof the interleaving semantics has led to the invention of new program semantics, so called

∗ This work was supported in part by the Swedish Research Council and carried out within the Linnaeuscentre of excellence UPMARC, Uppsala Programming for Multicore Architectures Research Center.

© Parosh Aziz Abdulla, Mohamed Faouzi Atig, Ahmed Bouajjani, and Tuan Phong Ngo;licensed under Creative Commons License CC-BY

27th International Conference on Concurrency Theory (CONCUR 2016).Editors: Josée Desharnais and Radha Jagadeesan; Article No. 5; pp. 5:1–5:15

Leibniz International Proceedings in InformaticsSchloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

http://dx.doi.org/10.4230/LIPIcs.CONCUR.2016.5

http://creativecommons.org/licenses/by/3.0/

http://www.dagstuhl.de/lipics/

http://www.dagstuhl.de

5:2 The Benefits of Duality in Verifying Concurrent Programs under TSO

Weak (or relaxed) Memory Models (WMM), by allowing permutations between certain typesof memory operations [7, 20, 8]. Total Store Ordering (TSO) is one of the the most commonmodels, and it corresponds to the relaxation adopted by Sun’s SPARC multiprocessors [28]and formalizations of the x86-tso memory model [26, 27]. These models put an unboundedperfect (non-lossy) store buffer between each process and the main memory where a storebuffer carries the pending store operations of the process. When a process performs a storeoperation, it appends it to the end of its buffer. These operations are propagated to theshared memory non-deterministically in a FIFO manner. When a process reads a variable, itsearches its buffer for a pending store operation on that variable. If no such a store operationexists, it fetches the value of the variable from the main memory. Verifying programsrunning on the TSO memory model poses a difficult challenge since the unboundedness of thebuffers implies that the state space of the system is infinite even in the case where the inputprogram is finite-state. Decidability of safety properties has been obtained by constructingequivalent models that replace the perfect store buffer by lossy channels [11, 12, 2]. However,these constructions are complicated and involve several ingredients that lead to inefficientverification procedures. For instance, they require each message inside a lossy channel tocarry (instead of a single store operation) a full snapshot of the memory representing alocal view of the memory contents by the process. Furthermore, the reductions involvenon-deterministic guessing the lossy channel contents. The guessing is then resolved eitherby consistency checking [11] or by using explicit pointer variables (each corresponding to oneprocess) inside the buffers [2], causing a serious state space explosion problem.

In this paper, we introduce a novel semantics which we call the dual TSO semantics.Our aim is to provide an alternative (and equivalent) semantics that is more amenable forefficient algorithmic verification. The main idea is to have load buffers that contain pendingload operations (more precisely, values that will potentially be taken by forthcoming loadoperations) rather than store buffers (that contain store operations). The flow of informationwill now be in the reverse direction, i.e., store operations are performed by the processesatomically on the main memory, while values of variables are propagated non-deterministicallyfrom the memory to the load buffers of the processes. When a process performs a loadoperation it can fetch the value of the variable from the head of its load buffer. We show thatthe dual semantics is equivalent to the original one in the sense that any given set of processeswill reach the same set of local states under both semantics. The dual semantics allows us tounderstand the TSO model in a totally different way compared to the classical semantics.Furthermore, the dual semantics offers several important advantages from the point of viewof formal reasoning and program verification. First, the dual semantics allows transformingthe load buffers to lossy channels without adding the costly overhead that was necessaryin the case of store buffers. This means that we can apply the theory of well-structuredsystems [6, 5, 21] in a straightforward manner leading to a much simpler proof of decidabilityof safety properties. Second, the absence of extra overhead means that we obtain moreefficient algorithms and better scalability (as shown by our experimental results). Finally, thedual semantics allows extending the framework to perform parameterized verification whichis an important paradigm in concurrent program verification. Here, we consider systems,e.g., mutual exclusion protocols, that consist of an arbitrary number of processes. Theaim of parameterized verification is to prove correctness of the system regardless of thenumber of processes. It is not obvious how to perform parameterized verification underthe classical semantics. For instance, extending the framework of [2], would involve anunbounded number of pointer variables, thus leading to channel systems with unboundedmessage alphabets. In contrast, as we show in this paper, the simple nature of the dual

P.A. Abdulla, M. F. Atig, A. Bouajjani, and T. P.Ngo 5:3

semantics allows a straightforward extension of our verification algorithm to the case ofparameterized verification. This is the first time a decidability result is established for theparametrized verification of programs running over WMM. Notice that this result is takinginto account two sources of infinity: the number of processes, and the size of the buffers.

Based on our framework, we have implemented a tool and applied it to a large set ofbenchmarks. The experiments demonstrate the efficiency of the dual semantics compared tothe classical one (by two order of magnitude in average), and the feasibility of parametrizedverification in the former case. In fact, besides its theoretical generality, parametrizedverification is practically crucial in this setting: as our experiments show, it is much moreefficient than verification of bounded-size instances (starting from a number of componentsof 3 or 4), especially concerning memory consumption (which is the critical resource).

Related Work. There have been a lot of works related to the analysis of programs runningunder WMM (e.g., [25, 22, 23, 17, 2, 15, 16, 13, 14, 29]). Some of these works propose preciseanalysis techniques for checking safety properties or stability of finite-state programs underWMM (e.g., [2, 13, 19, 4]). Others propose stateless model-checking techniques for programsunder TSO and PSO (e.g., [1, 30, 18]). Different other techniques based on monitoring andtesting have also been developed during these last years (e.g., [15, 16, 25]). There are also anumber of efforts to design bounded model checking techniques for programs under WMM(e.g., [9, 29, 14]) which encode the verification problem in SAT/SMT.

The closest works to ours are those presented in [2, 11, 3, 12] which provide precise andsound techniques for checking safety properties for finite-state programs running under TSO.However, as stated in the introduction, these techniques are complicated and can not beextended, in a straightforward manner, to the verification of parameterized systems (as it isthe case of the developed techniques for the dual TSO semantics).

In Section 7, we experimentally compare our techniques with Memorax [2, 3] which is theonly precise and sound tool for checking safety properties for programs under TSO.

2 Preliminaries

Let Σ be a finite alphabet. We use Σ∗ (resp. Σ+) to denote the set of all words (resp.non-empty words) over Σ. Let ε be the empty word. The length of a word w ∈ Σ∗ is denotedby |w| (and in particular |ε| = 0). For every i : 1 ≤ i ≤ |w|, let w(i) be the symbol at positioni in w. For a ∈ Σ, we write a ∈ w if a appears in w, i.e., a = w(i) for some i : 1 ≤ i ≤ |w|.

Given two words u and v over Σ, we use u v to denote that u is a (not necessarilycontiguous) subword of v (i.e., if there is an injection h : 1, . . . , |u| 7→ 1, . . . , |v| such that:(1) h(i) < h(j) for all i < j, and (2) for every i ∈ 1, . . . , |u|, we have u(i) = v(h(i))).

Given a subset Σ′ ⊆ Σ and a word w ∈ Σ∗, we use w|Σ′ to denote the projection of wover Σ′, i.e., the word obtained from w by erasing all the symbols that are not in Σ′.

Let A and B be two sets and let f : A 7→ B be a total function from A to B. We usef [a← b] to denote the function f ′ such that f ′(a) = b and f ′(a′) = f(a′) for all a′ 6= a.

A transition system T is a tuple(C, Init, Act,∪a∈Act

a−→)where C is a (potentially infinite)

set of configurations, Init ⊆ C is a set of initial configurations, Act is a set of actions, and forevery a ∈ Act, a−→ ⊆ C×C is a transition relation. We use c a−→ c′ to denote that (c, c′) ∈ a−→.Let −→ = ∪a∈Act

a−→. A run π of T is of the form c0a1−−→ c1

a2−−→· · · an−−→ cn, where ciai+1−−−→ ci+1

for all i : 0 ≤ i < n. Then, we write c0 π−→ cn. We use target (π) to denote the configuration cn.The run π is said to be a computation if c0 ∈ Init. Two runs π1 = c0

a1−−→ c1a2−−→· · · am−−→ cm

and π2 = cm+1am+2−−−−→ cm+2

am+3−−−−→· · · an−−→ cn are compatible if cm = cm+1. Then, we write

CONCUR 2016


π1 • π2 to denote the run π = c0a1−−→ c1

a2−−→· · · am−−→ cmam+2−−−−→ cm+2

am+3−−−−→· · · an−−→ cn. Fortwo configurations c and c′, we use c ∗−→ c′ to denote that c π−→ c′ for some run π. Aconfiguration c is said to be reachable in T if c0 ∗−→ c for some c0 ∈ Init, and a set C ofconfigurations is said to be reachable in T if some c ∈ C is reachable in T .

3 Concurrent Systems

In this section, we define the syntax we use for concurrent programs, a model for representingcommunicating concurrent processes. Communication between processes is performed througha shared memory that consists of a finite number of shared variables (over finite domains) towhich all processes can read and write. Then we recall the classical TSO semantics includingthe transition system it induces and its reachability problem. Next, we introduce the DualTSO semantics and its induced transition system. Finally, we state the equivalence betweenthe two semantics (for a given concurrent program, we can reduce its reachability problemunder the classical TSO to its reachability problem under Dual TSO and vice-versa).

3.1 SyntaxLet V be a finite data domain and X be a finite set of variables. We assume w.l.o.g. that Vcontains the value 0. Let Ω(X,V ) be the smallest set of memory operations that contains (1)“no operation” nop, (2) read operation r(x, v), (3) write operation w(x, v), (4) fence operationfence, and (5) atomic read-write operation arw(x, v, v′), where x ∈ X, and v, v′ ∈ V .

A concurrent system is a tuple P = (A1, A2, . . . , An) where for every p : 1 ≤ p ≤ n, Apis a finite-state automaton describing the behavior of the process p. The automaton Ap isdefined as a triple

(Qp, q

initp ,∆p

)where Qp is a finite set of local states, qinit

p ∈ Qp is theinitial local state, and ∆p ⊆ Qp × Ω(X,V ) × Qp is a finite set of transitions. We defineP = 1, . . . , n to be the set of process IDs, Q := ∪p∈PQp to be the set of all local statesand ∆ := ∪p∈P∆p to be the set of all transitions.

3.2 Classical TSO semanticsIn the following, we recall the semantics of concurrent systems under the classical TSO modelas formalized in [26, 27]. To do that, we define the set of configurations and the inducedtransition relation. Let P= (A1, A2, . . . , An) be a concurrent system.

Configurations. A TSO-configuration c is a triple (q,b,mem) where (1) q : P 7→ Q is theglobal state of P mapping each process p ∈ P to a local state in Qp (i.e., q(p) ∈ Qp), (2)b : P 7→ (X × V )∗ gives the content of the store buffer of each process, and (3) mem : X 7→ V

defines the value of each shared variable. Observe that the store buffer of each processcontains a sequence of write operations, where each write operation is defined by a pair,namely a variable x and a value v that is assigned to x. The initial TSO-configuration cinitis defined by the tuple (qinit ,binit ,meminit) where, for all p ∈ P and x ∈ X, we have thatqinit(p) = qinit

p , binit(p) = ε and meminit(x) = 0. In other words, each process is in itsinitial local state, all the buffers are empty, and all the variables in the shared memory areinitialized to 0. We use CTSO to denote the set of TSO-configurations.

Transition Relation. The transition relation −→TSO between TSO-configurations is givenby a set of rules, described in Figure 1. Here we informally explain these rules. A noptransition (q, nop, q′) ∈ ∆p changes only the local state of the process p from q to q′. A


t = (q, nop, q′) q(p) = q

(q,b,mem) t−→TSO (q [p← q′] ,b,mem)Nop

t = (q,w(x, v), q′) q(p) = q

(q,b,mem) t−→TSO (q [p← q′] ,b [p← (x, v) · b(p)] ,mem)Write

t = updatep

(q,b [p← b(p) · (x, v)] ,mem) t−→TSO (q,b,mem [x← v])Update

t = (q, r(x, v), q′) q(p) = q b(p)|x×V = (x, v) · w(q,b,mem) t−→TSO (q [p← q′] ,b,mem)

Read-Own-Write

t = (q, r(x, v), q′) q(p) = q b(p)|x×V = ε mem(x) = v

(q,b,mem) t−→TSO (q [p← q′] ,b,mem)Read Memory

t = (q, arw(x, v, v′), q′) q(p) = q b(p) = ε mem(x) = v

(q,b,mem) t−→TSO (q [p← q′] ,b,mem [x← v′])ARW

t = (q, fence, q′) q(p) = q b(p) = ε

(q,b,mem) t−→TSO (q [p← q′] ,b,mem)Fence

Figure 1 The transition relation −→TSO under TSO. Here p ∈ P and t ∈ ∆p ∪

updatep

where

updatep is an operation that updates the memory using the oldest message in the buffer of process p.

write transition (q,w(x, v), q′) ∈ ∆p adds the message (x, v) to the tail of the store bufferof the process p. A memory update transition updatep can be performed at any time byremoving the oldest message in the store buffer of the process p and updating the memoryaccordingly. For a read transition (q, r(x, v), q′) ∈ ∆p, if the store buffer of the process pcontains some write operations to x, then the read value v must correspond to the valueof the most recent such a write operation. Otherwise the value v of x is fetched from thememory. A fence transition may be performed by a process p only if its store buffer is empty.Finally, an atomic read-write transition (q, arw(x, v, v′), q′) ∈ ∆p can be performed by theprocess p only if its store buffer is empty. This operation checks then whether the value of xis v and changes it to v′.

We use c−→TSO c′ to denote that c t−→TSO c

′ for some t ∈ ∆ ∪ ∆′ where∆′ :=

updatep| p ∈ P

. The transition system induced by P under the classical TSO

semantics is then given by TTSO = (CTSO, cinit, ∆ ∪ ∆′,−→TSO).

The TSO Reachability Problem. A global state qtarget is said to be reachable in TTSOiff there is a TSO-configuration c of the form (qtarget,b,mem), with b(p) = ε for all p ∈ P ,such that c is reachable in TTSO. The reachability problem for the concurrent system P underTSO asks, for a given global state qtarget, whether qtarget is reachable in TTSO. Observe that,in the definition of the reachability problem, we require that the buffers of the configurationc must be empty instead of being arbitrary. This is only for sake of simplicity and does notconstitute a restriction. Indeed, we can easily show that the “arbitrary buffer” reachabilityproblem is reducible to the “empty buffer” reachability problem.

CONCUR 2016


3.3 Dual TSO semantics

In this section, we define the Dual TSO semantics. The model has a perfect FIFO loadbuffer between the main memory and each process. The load buffer is used to store potentialread operations that will be performed by the process. Each message in the load buffer of aprocess p is either a pair of the form (x, v) or a triple of the form (x, v, own) where x ∈ Xand v ∈ V . A message of the form (x, v) corresponds to the fact that x had the value v inthe shared memory. While a message (x, v, own) corresponds to the fact that the process phas written the value v to x. A write operation w(x, v) of the process p immediately updatesthe shared memory and a message of the form (x, v, own) is then appended to the tail of theload buffer of p. Read propagation is then performed by non-deterministically choosing avariable (let’s say x and its value is v in the shared memory) and appending the message(x, v) to the tail of the load buffer of p. This propagation operation speculates on a readoperation of p on x that will be performed later on. Moreover, any message at the head ofthe load buffer can be removed at any time. A read operation r(x, v) of the process p can beexecuted if the message at the head of the load buffer (i.e., the oldest one) of p is of the form(x, v) and there is no pending message of the form (x, v′, own). In the case that the loadbuffer contains a message belonging to p (i.e., of the form (x, v′, own)), the read value mustcorrespond to the value of the most recent message belonging to p (implicitly, this allows tosimulate the Read-Own-Write transitions). A fence means that the load buffer of p must beempty before p can continue. Let P= (A1, A2, . . . , An) be a concurrent system.

Configurations. A DTSO-configuration c is a triple (q,b,mem) where q : P 7→ Q is theglobal state of P, b : P 7→ ((X × V ) ∪ (X × V × own))∗ is the content of the load buffer,and mem : X 7→ V gives the value of each variable. The initial DTSO-configuration cDinit isdefined by (qinit ,binit ,meminit) where, for all p ∈ P and x ∈ X, we have that qinit(p) = qinit

p ,binit(p) = ε and meminit(x) = 0. We use CDTSO to denote the set of DTSO-configurations.

Transition Relation. The transition relation −→DTSO induced by the Dual TSO semantics isgiven in Figure 2. This relation is induced by members of ∆ and∆aux :=

propagatexp , deletep| p ∈ P, x ∈ X

. propagatexp is an operation that speculates

on a read operation of p over x that will be executed later. This is done by appending themessage (x, v) to the tail of the load buffer of p where v is the current value of x in the sharedmemory. The operation deletep removes the oldest message in the load buffer of process p. Awrite operation w(x, v) updates the memory and appends the message (x, v, own) to the tailof the load buffer. A read operation r(x, v) checks first if the load buffer contains a messageof the form (x, v′, own), and in that case the read value v should correspond to the value ofthe most recent message of that form. If there is no message on the variable x belonging top in its load buffer then the value v of x is fetched from the message at the head of its loadbuffer.

We use c−→DTSO c′ to denote that c t−→DTSO c

′ for some t ∈ ∆ ∪ ∆aux. The trans-ition system induced by P under the Dual TSO semantics is then given by TDTSO =(CDTSO, cD

init, ∆ ∪ ∆aux,−→DTSO).

The Dual TSO Reachability Problem. The reachability problem for P under the Dual TSOsemantics is defined in a similar manner to the case of TSO. A global state qtarget is said tobe reachable in TDTSO iff there is a DTSO-configuration c of the form (qtarget,b,mem), withb(p) = ε for all p ∈ P , such that c is reachable in TDTSO. Then, the reachability problem


t = (q, nop, q′) q(p) = q

(q,b,mem) t−→DTSO (q [p← q′] ,b,mem)Nop

t = (q,w(x, v), q′) q(p) = q

(q,b,mem) t−→DTSO (q [p← q′] ,b [p← (x, v, own) · b(p)] ,mem [x← v])Write

t = propagatexp mem(x) = v

(q,b,mem) t−→DTSO (q,b [p← (x, v) · b(p)] ,mem)Propagate

t = deletep |m| = 1(q,b [p← b(p) ·m] ,mem) t−→DTSO (q,b,mem)

Delete

t = (q, r(x, v), q′) q(p) = q b(p)|x×V×own = (x, v, own) · w(q,b,mem) t−→DTSO (q [p← q′] ,b,mem)

Read-Own-Write

t = (q, r(x, v), q′) q(p) = q b(p)|x×V×own = ε b(p) = (x, v) · w(q,b,mem) t−→DTSO (q [p← q′] ,b,mem)

Read from buffer

t = (q, arw(x, v, v′), q′) q(p) = q b(p) = ε mem(x) = v

(q,b,mem) t−→DTSO (q [p← q′] ,b,mem [x← v′])ARW

t = (q, fence, q′) q(p) = q b(p) = ε

(q,b,mem) t−→DTSO (q [p← q′] ,b,mem)Fence

Figure 2 The induced transition relation −→DTSO under the Dual TSO semantics. Here p ∈ Pand t ∈ ∆p ∪∆′p where ∆′p :=

propagatex

p , deletep| x ∈ X.

consists in checking whether qtarget is reachable in TDTSO. The following theorem statesequivalence of the reachability problems under the TSO and Dual TSO semantics.

I Theorem 1. A global state qtarget is reachable in TTSO iff qtarget is reachable in TDTSO.

4 The Dual TSO Reachability Problem

In this section, we show the decidability of the Dual TSO reachability problem by makinguse of the framework of Well-Structured Transition Systems (Wsts) [5, 21]. First, we brieflyrecall the framework of Wsts and then we instantiate it to show the decidability of the DualTSO reachability problem.

4.1 Well-Structured Transition SystemsLet T =

(C, Init, Act,∪a∈Act

a−→)be a transition system. Let v be a well-quasi ordering on

C. Recall that a well-quasi ordering on C is a binary relation over C that is reflexive andtransitive and for every infinite sequence (ci)i≥0 of elements in C there exist i, j ∈ N suchthat i < j and ci v cj . A set U ⊆ C is called upward closed if for every c ∈ U and c′ ∈ C withc v c′, we have c′ ∈ U. It is known that every upward closed set U can be characterised bya finite minor set M ⊆ U such that: (i) for every c ∈ U there is c′ ∈ M such that c′ v c, and(ii) if c, c′ ∈ M and c v c′ then c = c′. We use min to denote the function which for a givenupward closed set U returns its minor set.

CONCUR 2016


Let D ⊆ C. The upward closure of D is defined as D ↑:= c′ ∈ C| ∃c ∈ D with c v c′. Wealso define the set of predecessors of D as PreT (D) := c| ∃c1 ∈ D, c−→ c1. For a finite set ofconfigurations M ⊆ C, we use minpre (M) to denote min (PreT (M ↑) ∪ M ↑).

The transition relation −→ is said to be monotonic wrt. v if, given c1, c2, c3 ∈ C s.t.c1−→ c2 and c1 v c3, we can compute a configuration c4 ∈ C and a run π s.t. c3 π−→ c4 andc2 v c4. The pair (T ,v) is called monotonic transition system if −→ is monotonic wrt. v.

Given a finite set of configurations M ⊆ C, the coverability problem of M in the monotonictransition system (T ,v) asks whether the set M ↑ is reachable in T . For the decidability ofthis problem the following three conditions are sufficient: (i) For every two configurationsc1 and c2, it is decidable whether c1 v c2, (ii) for every c ∈ C, we can check whetherc ↑ ∩Init 6= ∅, and (iii) for every c ∈ C, the set minpre (c) is finite and computable.

The solution for the coverability problem as suggested in [5, 21] is based on a backwardanalysis approach. It is shown that starting from a finite set M0 ⊆ C, the sequence (Mi)i≥0with Mi+1 := minpre (Mi), for i ≥ 0 reaches a fixpoint and is computable.

4.2 Dual TSO Transition System is a Wsts

In this section, we instantiate the framework of Wsts to show the following result:

I Theorem 2. The Dual TSO reachability problem is decidable.

The rest of this section is devoted to the proof of the above theorem. Let P =(A1, A2, . . . , An) be a concurrent system (as defined in Section 3). LetTDTSO = (CDTSO, cD

init, ∆ ∪ ∆aux,−→DTSO) be the transition system induced by P underthe Dual TSO semantics (as defined in Section 3.3). In the following, we will first define awell-quasi ordering v on the set of DTSO-configurations (Lemma 4) such that for every twoconfigurations c1 and c2, it is decidable whether c1 v c2. Then we show that the transitionsystem induced under the Dual TSO semantics is monotonic wrt. to v (Lemma 5). We willshow also that the Dual TSO reachability problem for P can be reduced to the coverabilityproblem in the monotonic transition system (TDTSO,v) (Lemma 6). (Observe that thisreduction is needed since we require that the load buffers are empty when defining the DualTSO reachability problem.) The second sufficient condition (i.e., checking whether the upwardclosed set c ↑, with c is a DTSO-configuration, contains an initial configuration) for thedecidability of the coverability problem is trivial. This check boils down to verifying whetherc is an initial configuration. Finally, the computability of the set of minimal configurationsfor the set of predecessors of any upward closed set is stated by the following lemma:

I Lemma 3. For any configuration c, we can compute minpre(c).

Well-quasi Ordering. In the following, we define a well-quasi ordering v on CDTSO. Let usfirst introduce some notations and definitions. Consider a wordw ∈ ((X × V ) ∪ (X × V × own))∗ representing the content of a load buffer. We definean operation that divides w into a number of fragments according to the most-recent own-messages concerning each variable. We define[w]own := (w1, (x1, v1, own), w2, . . . , wm, (xm, vm, own), wm+1), where the following con-ditions are satisfied: (1) xi 6= xj if i 6= j, (2) if (x, v, own) ∈ wi then x = xj forsome j < i (i.e., the most recent own-message on xj occurs at position j), and (3)w = w1 · (x1, v1, own) · w2 · · ·wm · (xm, vm, own) · wm+1 (i.e., the fragments correspondto the given word w).


Let w,w′ ∈ ((X × V ) ∪ (X × V × own))∗ be two words. Let us assume that[w]own = (w1, (x1, v1, own), w2, . . . , wr, (xr, vr, own), wr+1), and[w′]own = (w′1, (x′1, v′1, own), w′2, . . . , w′m, (x′m, v′m, own), w′m+1). We write w v w′ to denotethat the following conditions are satisfied: (i) r = m, (ii) x′i = xi and v′i = vi for alli : 1 ≤ i ≤ m, and (iii) wi w′i for all i : 1 ≤ i ≤ m+ 1.

Consider two DTSO-configurations c = (q,b,mem) and c′ = (q′,b′,mem′), we extendthe ordering v to configurations as follows: c v c′ iff the following conditions are satisfied:(i) q = q′, (ii) b(p) v b′(p) for all process p ∈ P , and (iii) mem′ = mem. The followinglemma shows that v is indeed a well-quasi ordering.

I Lemma 4. The relation v is a well-quasi ordering over CDTSO. Furthermore, for everytwo DTSO-configurations c1 and c2, it is decidable whether c1 v c2.

Monotonicity. Given configurations c1 = (q1,b1,mem1) , c2 = (q2,b2,mem2) , c3 =(q3,b3,mem3) ∈ CDTSO such that c1

t−→DTSO c2 for some transitiont ∈ ∆p ∪

propagatexp , deletep| x ∈ X

, with p ∈ P , and c1 v c3, we will show that it is

possible to compute a configuration c4 ∈ CDTSO and a run π such that c3 π−→DTSO c4and c2 v c4. To that aim, we first show that it is possible from c3 to reach a con-figuration c′3, by performing a certain number of deletep operations, such that the pro-cess p will have the same last message in its load buffer in the configurations c1 and c′3while c1 v c′3. Then, from the configuration c′3, the process p can perform the sametransition t as c1 did in order to reach the configuration c4 such that c2 v c4. Let usassume that [b1(p)]own is of the form 〈w1, (x1, v1, own), w2, . . . , wm, (xm, vm, own), wm+1〉,and [b3(p)]own is of the form 〈w′1, (x′1, v′1, own), w′2, . . . , w′m, (x′m, v′m, own), w′m+1〉. We definethe word w ∈ ((X × V ) ∪ (X × V × own))∗ to be the longest word such that w′m+1 =w′ · w with wm+1 w′. Observe that in this case we have either wm+1 = w′ = ε orw′(|w′|) = wm+1(|wm+1|). Then, after executing a certain number |w| of deletep operationsfrom the configuration c3, one can obtain a configuration c′3 = (q3,b′3,mem3) such thatb3 = b′3 [p← b′3(p) · w]. As a consequence, we have c1 v c′3. Furthermore, since c1 and c′3have the same global state, the same memory valuation, the same sequence of most-recentown messages concerning each variable, and the same last message in the load buffer of p, c′3can perform the transition t and reaches a configuration c4 such that c2 v c4. The followinglemma shows that (TDTSO,v) is monotonic system.

I Lemma 5. The relation −→DTSO is monotonic wrt. v.

From Reachability to Coverability. Let qtarget be a global state of P and Mtarget be theset of DTSO-configurations of the form c = (qtarget,b,mem) with b(p) = ε for all p ∈ P .Next, we show that the reachability problem of qtarget in TDTSO can be reduced to thecoverability problem of Mtarget in (TDTSO,v). Recall that qtarget in TDTSO iff Mtarget isreachable in TDTSO. Let us assume that Mtarget↑ is reachable in TDTSO. This means thatthere is a configuration c ∈ Mtarget↑ which is reachable in TDTSO. Let us assume that c isof the form (qtarget,b,mem). Then, from the configuration c, it is possible to reach theconfiguration c′ = (qtarget,b′,mem), with b′(p) = ε for all p ∈ P , by performing a sequenceof deletep operations to empty the load buffer of each process. It is then easy to see thatc′ ∈ Mtarget and so Mtarget is reachable in TDTSO. The other direction of the following lemmais trivial since Mtarget ⊆ Mtarget↑.

I Lemma 6. Mtarget↑ is reachable in TDTSO iff Mtarget is reachable in TDTSO.

CONCUR 2016


5 Parameterized Concurrent Systems

Let V be a finite data domain and X be a finite set of variables ranging over V . A parameter-ized concurrent system (or simply a parameterized system) consists of an unbounded numberof identical processes running under the Dual TSO semantics. Formally, a parameterizedsystem S is defined by a finite-state automaton A =

(Q, qinit ,∆

)uniformly describing the

behavior of each process. An instance of S is a concurrent system P= (A1, A2, . . . , An), forsome n ∈ N, where for every p : 1 ≤ p ≤ n, we have Ap = A. In other words, it consists of afinite set of processes each running the same code defined by A. Observe that consideringthat all processes run the same code is not a real restriction. In fact, the case where theprocesses run (finitely many) different finite-state automata A1, A2, . . . , Am can be easilyencoded in our model by constructing an extended automaton A which represents the unionof these automata A1, A2, . . . , Am. We use Inst(S) to denote all possible instances of S. Weuse TP = (CP, InitP, ActP,−→P) to denote the transition system induced by an instance P ofS under the Dual TSO semantics.

A parameterized configuration α is a pair (P, c) where P = 1, . . . , n, with n ∈ N, isthe set of process IDs and c is a DTSO-configuration of an instance P of S of the form(A1, A2, . . . , An). The parameterized configuration α = (P, c) is said to be initial if c is aninitial configuration of P (i.e., c ∈ InitP). We use C (resp. Init) to denote the set of all theparameterized configurations (resp. initial configurations) of S.

Let Act denote the set of actions of all possible instances of S (i.e., Act = ∪P∈Inst(S) ActP).We define a transition relation −→ on parameterized configurations such that (P, c) t−→ (P ′, c′)for some action t ∈ Act iff P ′ = P and there is an instance P of S such that t ∈ ActP andc−→P c′. The transition system induced by S is given by T = (C, Init, Act,−→).

In the following we extend the definition of the Dual TSO reachability problem to thecase of parameterized systems. A global state qtarget : P ′ 7→ Q is said to be reachable in T ifthere exists a parameterized configuration α = (P, (q,b,mem)), with b(p) = ε for all p ∈ P ,such that α is reachable in T and qtarget(1) · · ·qtarget(|P ′|) q(1) · · ·q(|P |). Then, thereachability problem consists in checking whether qtarget is reachable in T . In other words,the Dual TSO reachability problem for parameterized systems asks whether there is aninstance of the parameterized system that reaches a configuration with a number of processesin certain given local states.

6 Decidability of the Parameterized Verification Problem

We prove hereafter the following fact:

I Theorem 7. The Dual TSO reachability problem for parameterized systems is decidable.

Let S =(Q, qinit ,∆

)be a parameterized system and (C, Init, Act,−→) be its induced

transition system. The proof of Theorem 7 is done by instantiating the framework of Wsts.Following that framework, we will first define an ordering that we denote by E on the setof parameterized configurations and show the monotonicity of the the relation −→ wrt. thisordering (see Lemma 9 and Lemma 10). Then we will show that the Dual TSO reachabilityproblem for S can be reduced to the coverability problem in the monotonic transition system(T ,E) (Lemma 11). The second sufficient condition (i.e., checking whether the upward closedset α ↑, with α is a parameterized configuration, contains an initial configuration) forthe decidability of the coverability problem is trivial. This check boils down to whetherthe configuration α is initial. Finally, the last sufficient condition (i.e., computing the set


of minimal configurations for the set of predecessors of any upward closed set) for thedecidability of the coverability problem is stated by the following lemma:

I Lemma 8. For any parameterized configuration α, we can compute minpre(α).

Well-quasi Ordering. Let α = (P, (q,b,mem)) and α′ = (P ′, (q′,b′,mem′)) be two para-meterized configurations. We define the ordering E on the set of parameterized configurationsas follows: α E α′ iff the following conditions are satisfied: (1) mem = mem′ and (2) there isan injection h : 1, . . . , |P | 7→ 1, . . . , |P ′| such that (i) p < p′ implies h(p) < h(p′), and(ii) for every p ∈ 1, . . . , |P |, q(p) = q′(h(p)) and b(p) v b′(h(p)). The following lemmastates that E is indeed a well-quasi ordering.

I Lemma 9. The relation E is a well-quasi ordering over C. Furthermore, for every twoparameterized configurations α and α′, it is decidable whether α E α′.

Monotonicity. Let α1 = (P, (q1,b1,mem1)), α2 = (P, (q2,b2,mem2)) andα3 = (P ′, (q3,b3,mem3)) be parameterized configurations. Furthermore, we assume thatα1 E α3 and α1

t−→α2 for some transition t. Since α1 E α3, there is an injection func-tion h : 1, . . . , |P | 7→ 1, . . . , |P ′| such that (i) p < p′ implies h(p) < h(p′), and(ii) for every p ∈ 1, . . . , |P |, q1(p) = q3(h(p)) and b1(p) v b3(h(p)). We define theparameterized configuration α′ from α3 by only keeping the local state and load buf-fers of processes in h(P ). Formally, α′ = (P, (q′,b′,mem′)) is defined as follows: (i)mem′ = mem3 and (ii) for every p ∈ 1, . . . , |P |, q′(p) = q3(h(p)) and b′(p) = b3(h(p)).(Observe that (q1,b1,mem1) v (q′,b′,mem′)). Since the relation −→DTSO is monotonicwrt. the ordering v (see Lemma 5), there is a Dual TSO-configuration (q′′,b′′,mem′′) suchthat (q′,b′,mem′)−→∗DTSO (q′′,b′′,mem′′) and (q2,b2,mem2) v (q′′,b′′,mem′′). Considernow the parameterized configuration α4 = (P ′, (q4,b4,mem4)) such that mem′′ = mem4,(ii) for every p ∈ 1, . . . , |P |, q′′(p) = q4(h(p)) and b′′(p) = b4(h(p)), and (iii) forp ∈ (1, . . . , |P ′| \ h(1), . . . , h(|P |)), we have q4(p) = q3(p) and b4(p) = b3(p). It is easythen to see that α2 E α4 and α3−→∗ α4. Hence, we obtain the following result:

I Lemma 10. The relation −→ is monotonic wrt. E.

From Reachability to Coverability. Let qtarget : P ′ 7→ Q be a global state. Let Mtargetbe the set of parameterized configurations of the form α = (P ′, (qtarget,b,mem)) withb(p) = ε for all p ∈ P ′. In the following, we show that Mtarget ↑ is reachable in T iff there isa parameterized configuration α = (P, (q,b,mem)), with b(p) = ε for all p ∈ P , such thatα is reachable in T and qtarget(1) · · ·qtarget(|P ′|) q(1) · · ·q(|P |).

Let us assume that there is a parameterized configuration α = (P, (q,b,mem)), withb(p) = ε for all p ∈ P , such that α is reachable in T andqtarget(1) · · ·qtarget(|P ′|) q(1) · · ·q(|P |). It is then easy to show that α ∈ Mtarget ↑.

Now let us assume that there is α′ = (P ′′, (q′,b′,mem′)) ∈ Mtarget↑ which is reachable in T .From the configuration α′, it is possible to reach the configuration α′′ = (P ′′, (q′,b′′,mem′)),with b′′(p) = ε for all p ∈ P ′′, by performing a sequence of deletep operations to emptythe load buffer of each process. Since α′ ∈ Mtarget ↑, we have qtarget(1) · · ·qtarget(|P ′|) q′(1) · · ·q′(|P ′′|). Hence, α′′ is a witness of the state reachability problem.

I Lemma 11. qtarget is reachable in T iff Mtarget↑ is reachable in T .

CONCUR 2016


Table 1 Comparison between Dual-TSO and Memorax: The columns #P, #T and #C givethe number of processes, the running time in seconds and the number of generated configurations,respectively. If a tool runs out of time, we put TO in the #T column and • in the #C column.

Program #P Dual-TSO Memorax#T #C #T #C

SB 5 0.3 10641 559.7 10515914LB 3 0.0 2048 71.4 1499475WRC 4 0.0 1507 63.3 1398393ISA2 3 0.0 509 21.1 226519RWC 5 0.1 4277 61.5 1196988W+RWC 4 0.0 1713 83.6 1389009IRIW 4 0.0 520 34.4 358057Nbw_w_wr 2 0.0 222 10.7 200844Sense_rev_bar 2 0.1 1704 0.8 20577Dekker 2 0.1 5053 1.1 19788Dekker_simple 2 0.0 98 0.0 595Peterson 2 0.1 5442 5.2 90301Peterson_loop 2 0.2 7632 5.6 100082Szymanski 2 0.6 29018 1.0 26003MP 4 0.0 883 TO •Ticket_spin_lock 3 0.9 18963 TO •Bakery 2 2.6 82050 TO •Dijkstra 2 0.2 8324 TO •Lamport_fast 3 17.7 292543 TO •Burns 4 124.3 2762578 TO •

7 Experimental Results

We have implemented our techniques described in Section 4 and Section 6 in an open-sourcetool called Dual-TSO1. The tool checks the state reachability problems for (parameterized)concurrent systems under the Dual TSO semantics. We compare our tool with Memorax [2,3] which is the only precise and sound tool for deciding the state reachability problemof concurrent systems under TSO. Observe that Memorax cannot handle parameterizedverification. All experiments are performed on an Intel x86-32 Core2 2.4 Ghz machine and4GB of RAM.

In the following, we present two sets of results. The first set concerns the comparison ofDual-TSO with Memorax (see Table 1). The second set shows the benefit of the parameterizedverification compared to the use of the state reachability when increasing the number ofprocesses (see Figure 3 and Table 2). Our examples are from [2, 10, 13, 4, 25]. In allexperiments, we set up the time out to 600 seconds.

Table 1 presents a comparison between Dual-TSO and Memorax on a representative sampleof 20 benchmarks. In all these examples, Dual-TSO and Memorax return the same resultfor the state reachability problem (except 6 examples where Memorax runs out of time). Inthe examples where the two tools return, Dual-TSO out-performs Memorax and generatesfewer configurations (and so uses less memory). Indeed, Dual-TSO is 600 times faster than

1 https://www.it.uu.se/katalog/tuang296/dual-tso


Table 2 Parameterized verification with Dual-TSO.

Program Dual-TSO#T #C

SB 0.0 147LB 0.6 1028MP 0.0 149WRC 0.8 618ISA2 4.3 1539RWC 0.2 293W+RWC 1.5 828IRIW 4.6 648

0

200

400

600

2 3 4 5 6 7 8 9 10 11

SB

Dual-TSO

Memorax

0

200

400

600

2 3 4 5 6 7 8 9 10

LB

Dual-TSO

Memorax

0

200

400

600

2 4 6 8 10 12

MP

Dual-TSO

Memorax

0

200

400

600

3 4 5 6 7 8 9 10 11

WRC

Dual-TSO

Memorax

0

200

400

600

3 4 5 6 7 8 9 10 11

ISA2

Dual-TSO

Memorax

0

200

400

600

2 4 6 8 10 12 14

RWC

Dual-TSO

Memorax

0

200

400

600

3 4 5 6 7 8 9 10 11 12

W+RWC

Dual-TSO

Memorax

0

200

400

600

4 6 8 10 12 14

IRIW

Dual-TSO

Memorax

Figure 3 Running time of Memorax and Dual-TSO by increasing number of processes. The x axisis number of processes, the y axis is running time in seconds.

Memorax and generates 277 times fewer configurations on average.The second set compares the scalability of Memorax and Dual-TSO while increasing the

number of processes. The results are given in Fig. 3. We observe that Dual-TSO scalesbetter than Memorax in all these examples. In fact, Memorax can only handle the exampleswith at most 5 processes. Table 2 presents the running time and the number of generatedconfigurations when checking the state reachability problem for the parameterized version ofthese examples. We observe that the verification of these parameterized systems is muchmore efficient than verification of bounded-size instances (starting from a number of processesof 3 or 4), especially concerning memory consumption (which is given in terms of number ofgenerated configurations). The reason behind is that the size of the generated minor setsin the analysis of a parameterized system is usually smaller than the size of the generatedconfigurations during the analysis of an instance of the system with a large number ofprocesses.

8 Conclusion

In this paper, we have presented an alternative (yet equivalent) semantics to the classicalone for the TSO model. This new semantics allows us to understand the TSO model ina totally different way compared to the classical semantics. Furthermore, the proposedsemantics offers several important advantages from the point of view of formal reasoningand program verification. First, the dual semantics allows transforming the load buffers tolossy channels without adding the costly overhead that was necessary in the case of storebuffers. This means that we can apply the theory of well-structured systems [6, 5, 21] in a

CONCUR 2016


straightforward manner leading to a much simpler proof of decidability of safety properties.Second, the absence of extra overhead means that we obtain more efficient algorithmsand better scalability (as shown by our experimental results). Finally, the dual semanticsallows extending the framework to perform parameterized verification which is an importantparadigm in concurrent program verification.

In the future, we plan to apply our techniques to more memory models and to combinewith predicate abstraction for handling programs with unbounded data domain.

References1 P. Abdulla, S. Aronis, M.F. Atig, B. Jonsson, C. Leonardsson, and K. Sagonas. Stateless

model checking for TSO and PSO. In TACAS, volume 9035 of LNCS, pages 353–367.Springer, 2015.

2 P.A. Abdulla, M.F. Atig, Y.F. Chen, C. Leonardsson, and A. Rezine. Counter-exampleguided fence insertion under TSO. In TACAS 2012, pages 204–219, 2012.

3 P.A. Abdulla, M.F. Atig, Y.F. Chen, C. Leonardsson, and A. Rezine. Memorax, a preciseand sound tool for automatic fence insertion under TSO. In TACAS, pages 530–536, 2013.

4 P.A. Abdulla, M.F. Atig, and N.T. Phong. The best of both worlds: Trading efficiency andoptimality in fence insertion for TSO. In ESOP 2015, pages 308–332, 2015.

5 P.A. Abdulla, K. Cerans, B. Jonsson, and Y.K. Tsay. General decidability theorems forinfinite-state systems. In LICS’96, pages 313–321. IEEE Computer Society, 1996.

6 Parosh Aziz Abdulla. Well (and better) quasi-ordered transition systems. Bulletin ofSymbolic Logic, 16(4):457–515, 2010.

7 S. Adve and K. Gharachorloo. Shared memory consistency models: a tutorial. Computer,29(12), 1996.

8 S. Adve and M. D. Hill. Weak ordering - a new definition. In ISCA, 1990.9 J. Alglave, D. Kroening, and M. Tautschnig. Partial orders for efficient bounded model

checking of concurrent software. In CAV, volume 8044 of LNCS, pages 141–157, 2013.10 Jade Alglave, Luc Maranget, and Michael Tautschnig. Herding cats: Modelling, simulation,

testing, and data mining for weak memory. ACM TOPLAS, 36(2):7:1–7:74, 2014.11 M. F. Atig, A. Bouajjani, S. Burckhardt, and M. Musuvathi. On the verification problem

for weak memory models. In POPL, 2010.12 M.F. Atig, A. Bouajjani, S. Burckhardt, and M. Musuvathi. What’s decidable about weak

memory models? In ESOP, volume 7211 of LNCS, pages 26–46. Springer, 2012.13 Ahmed Bouajjani, Egor Derevenetc, and Roland Meyer. Checking and enforcing robustness

against TSO. In ESOP, volume 7792 of LNCS, pages 533–553. Springer, 2013.14 S. Burckhardt, R. Alur, and M. M. K. Martin. CheckFence: checking consistency of con-

current data types on relaxed memory models. In PLDI, pages 12–21. ACM, 2007.15 Sebastian Burckhardt and Madanlal Musuvathi. Effective program verification for relaxed

memory models. In CAV, volume 5123 of LNCS, pages 107–120. Springer, 2008.16 Jacob Burnim, Koushik Sen, and Christos Stergiou. Testing concurrent programs on relaxed

memory models. In ISSTA, pages 122–132. ACM, 2011.17 A. Marian Dan, Y. Meshman, M. T. Vechev, and E. Yahav. Predicate abstraction for

relaxed memory models. In SAS, volume 7935 of LNCS, pages 84–104. Springer, 2013.18 Brian Demsky and Patrick Lam. Satcheck: Sat-directed stateless model checking for SC

and TSO. In OOPSLA 2015, pages 20–36. ACM, 2015.19 Egor Derevenetc and Roland Meyer. Robustness against Power is PSpace-complete. In

ICALP (2), volume 8573 of LNCS, pages 158–170. Springer, 2014.20 M. Dubois, C. Scheurich, and F. A. Briggs. Memory access buffering in multiprocessors.

In ISCA, 1986.


21 A. Finkel and Ph. Schnoebelen. Well-structured transition systems everywhere! Theor.Comput. Sci., 256(1-2):63–92, 2001.

22 Michael Kuperstein, Martin T. Vechev, and Eran Yahav. Automatic inference of memoryfences. In FMCAD, pages 111–119. IEEE, 2010.

23 Michael Kuperstein, Martin T. Vechev, and Eran Yahav. Partial-coherence abstractionsfor relaxed memory models. In PLDI, pages 187–198. ACM, 2011.

24 L. Lamport. How to make a multiprocessor computer that correctly executes multiprocessprograms. IEEE Trans. Comp., C-28(9), 1979.

25 Feng Liu, Nayden Nedev, Nedyalko Prisadnikov, Martin T. Vechev, and Eran Yahav. Dy-namic synthesis for relaxed memory models. In PLDI ’12, pages 429–440, 2012.

26 S. Owens, S. Sarkar, and P. Sewell. A better x86 memory model: x86-tso. In TPHOL,2009.

27 P. Sewell, S. Sarkar, S. Owens, F. Z. Nardelli, and M. O. Myreen. x86-tso: A rigorous andusable programmer’s model for x86 multiprocessors. CACM, 53, 2010.

28 D. Weaver and T. Germond, editors. The SPARC Architecture Manual Version 9. PTRPrentice Hall, 1994.

29 Y. Yang, G. Gopalakrishnan, G. Lindstrom, and K. Slind. Nemos: A framework for ax-iomatic and executable specifications of memory consistency models. In IPDPS. IEEE,2004.

30 N. Zhang, M. Kusano, and C. Wang. Dynamic partial order reduction for relaxed memorymodels. In PLDI, pages 250–259. ACM, 2015.

CONCUR 2016

Documents

TheBeneﬁtsofDualityinVerifyingConcurrent ProgramsunderTSO · TheBeneﬁtsofDualityinVerifyingConcurrent ProgramsunderTSO∗ Parosh Aziz Abdulla1, Mohamed Faouzi Atig2, Ahmed Bouajjani3,