A Concurrency Control Protocol for Nested Transactionsravi/Papers/DBConf/nested-transactions.pdf · locking concurrency control unsuitable for nested transaction. 3. Optimistic concurrency

A Concurrency Control Protocol for Nested Transaction s

Ming-Ling Lo and C . V . RavishankarComputer Sciences Division, Dept. of EEC S

University of Michigan-Ann ArborAnn Arbor, MI 48109-212 2

Abstract

Nested transactions[6, 5] provide fine grain atom-icity, efficient recovery control, and structural mod-ularity. In distributed environments, they providea natural and semantically clean way of modelin gcomputations . However, the characteristics of nes-ted transactions are sufficiently different from those o ftraditional single-level transactions that concurrencycontrol for nested transactions should be reconsidere din order to exploit all its advantages .

In this paper, we investigate a new concurrenc ycontrol protocol for nested transactions, and introduc ethe notion of a request list for that purpose . Our ob-jectives are to provide shorter transaction turn-aroun dtimes and better system throughput . These goals areaccomplished by exploiting intra-transaction concur-rency and by reducing the time a transaction has t owait for consistent data states .

1 Introduction

Nested transactions are introduced to model long o rcomplex transactions by offering fine-grain fault atom-icity, more efficient recovery control, and structura lmodularity[6] . Nested transactions are most suitabl ewhen the amount of computation in a transaction islarge, when the data accessing pattern is complex, o rwhen one unit of computation directly or indirectl yinvokes computations at remote sites . Computationsinvoked at the remote sites are usually modeled as sub-transactions . These characteristics make nested trans -actions a suitable candidate for modeling transaction sin systems such as object oriented database systems .

Because of the aforementioned characteristics, nes-ted transactions used to their full advantages tend tobe larger . They have the following properties in com-parison to single level transactions : for a nested trans-action, (1) the expected execution time is longer, (2)the expect number of data items accessed is larger,

and (3) during the its life time, the expected number o ftransactions trying to access the same resources it ha saccessed is larger . Although all nested transactionsare not large, a well-designed nested transaction man-agement system should cope with situations in whichthe number of large transactions is significant . Threeproblems could arise in such situations :

1. Transaction execution times may be longer be-cause of larger and more complicated computa-tions . Communications to pass subtransactionstatus between application and system softwaremodules and communications between parent an dchildren subtransactions at different sites can alsoincrease the life time of nested transactions . Thismay result in degraded response time for inter -active users .

2. The effect of resource waiting may seriously pro -long transaction execution times . A transactionwaiting for resources may wait a long time simpl ybecause the waited nested transactions have a lo tof work to do . However, the waited nested trans-actions may very well be waiting for resources ac-cessed by other transactions, since nested trans-actions access more data items than single-levelones . The chain of resource waiting can be-come very long if the concurrency control pro-tocol is not properly designed . When the sys-tem workload is heavy, the degradation in trans -action execution time and system throughput dueto resource waiting can be multiplicative, and th echance for deadlock can be high . This makes plainlocking concurrency control unsuitable for neste dtransaction .

3. Optimistic concurrency control can be dangerous .If a nested transaction is aborted, the number o fdependent transactions that need be aborted wil lbe larger, which in turn will cause a even large rnumber of transactions to abort . That is, thefan-out of the cascading abort[1] will be greater .Consequently, aborting a long-lived nested trans-

67

action may potentially paralyze the whole system .

To avoid these problems, a concurrency control pro-tocol for nested transaction must :

• Minimize execution times for individual neste dtransactions .

• Minimize the time a transaction has to wait forresources held by other transactions .

• Minimize the number of cases in which a transac-tion has to proceed on the assumption that som eactive transactions will commit .

The above objectives are interrelated . By minim-izing transaction execution times we could also min-imize the penalty of waiting and reduce the danger o foptimistic assumptions . On the other hand, reducingresource wait times naturally reduces execution timesof transactions . One obvious approach to shorteningexecution times for individual nested transactions i sto exploit intra-transaction concurrency, i .e ., to allowsubtransactions from the same parent to execute con -currently .

In this paper, we investigate a new concurrencycontrol protocol for nested transactions that addresse sthese design goals . We minimize transaction executio ntimes by exploiting intra-transaction concurrency . Wealso introduced the concept of a request list, which isderived from the history mechanism[7] and is mergedwith the timestamp serialization protocol[4] . requestlist reduces the cases in which a nested transaction ha sto wait for resources held by others, and avoid alto-gether the assumption that same active transactionswill commit when allowing operations to proceed . Theproblem of cascading abort is avoid as a result . Andby using timestamp as the serialization protocol, dead-locks cannot occur, either . The primary costs in thismethod is paid at the time operations arrive at dataitems . We believe the costs is justifiable given the hig hchance of deadlocks and expensive cost of cascadingaborts for nested transactions .

Exploiting intra-transaction concurrency turn ou tnot to be as straightforward as we expected, due to th efact that existing concurrency control techniques ar ebased on the assumption that transactions are single -leveled . Section 2 provides backgrounds for our workand discusses issues concerning intra-transaction con -currency. Section 3 provides a brief description of thehistory mechanism. Section 4 and 5 gives and out -line and the details of our algorithm, respectively . Insection 6, we discuss how our methods avoids dead -lock and cascading abort, among others . Section 7conclude this paper .

2 Background

Work on concurrency control for single-level trans -actions has focussed mainly on inter-transaction con -currency, in particular inter-transaction concurrenc ybetween transactions from different programs . In suchsystems, operations from a single-level transaction ar eexecuted sequentially. Moreover, it is assumed thata program will issue and execute its transactions se-quentially and synchronously. That is, one transac-tion is executed after its preceding transactions in theprogram have completed. If one transaction must b eexecuted after the effect of another transaction has re-gistered into the database for the whole computatio nto be meaningful, the programmer is responsible fo renforcing such a requirement .

2.1 The Objectives of Serialization forNested Transactions

The concept of Serializability[2, 8] was first pro-posed in single-level transaction systems to preserv edata consistency . Serializability ensures that the ef-fect of executing two transactions is equivalent to thatof executing them sequentially . But data consistencyis not all we ask for . For the execution of a collec-tion of (sub)transactions to be meaningful, it is some -times necessary that these (sub)transactions be seri-alized in some particular order . For example, conside rtransactions A, B, C, and D that perform financia lcomputation for a company. A, B, and C calculat eand record salaries paid to three employees in a cer-tain month. D calculates the total amount of salariespaid that month. A meaningful way to serialize thesetransactions will be A-B-C-D . If the transactions ar eserialized the order A-B-D-C, data consistency is stil lpreserved, but the result returned by D does not sat-isfy its semantic requirement .

Traditional single-level transaction systems assum ethe programmer will arrange the transaction code i na correct order, for example, A-B-C-D . Since transac-tions from the same program are executed synchron-ously, semantical correctness is preserved . Moreover ,concurrently running programs are considered se-mantically unrelated. Enforcing specific serializationorder is not important to concurrency control proto-cols of single-level transaction systems . It needs onlyensure that transactions be serialized and data con-sistency preserved .

For nested transactions management systems, aset of subtransactions from the same parent transac-tion may have serialization ordering constraints among

68

them, yet still offer opportunities for exploiting con -currency . We need concurrency control protocols thatsupport concurrency between these subtransactionswhile preserving the serialization constraints .

2 .2 Serialization Ordering Requirement

To make the discussions of serialization and con-currency control in the following sections precise, wedefine the following terms :

For a set of computation units(e .g . transaction sor basic blocks) of a computation process, the logica lordering requirement is the minimal set of ordering re-quirements between these computational units for theprocess to bear its intended meaning . Two units ofcomputation may or may not have a logical orderin grequirement between them. The requirement is min-imal in the sense that if a unit X can either precedeor follow another unit Y without affecting the pro-gram 's intended results, no ordering is defined betweenX and Y . Using the example in section 2 .1, the logica lordering requirement will be {(A, D), (B, D), (C, D)} .There is no ordering requirement between A, B, and C,since they can be executed in arbitrary order . Imple-mentation details of a particular system may requireB to be executed before C to guarantee correctness ,but these issues are not relevant at this level ; the re-quirement is purely at the logical level .

The serialization ordering requirement refers spe-cifically to transactions . It is the order in which two(sub)transactions must be serialized for the computa-tion to preserve its intended meaning . If A and B aretwo (sub)transactions, the serialization requirementare formally defined as the collection of the followin gthree mutually exclusive relations :

1. NS : (A, B) E NS, if efforts need not be made toserialize the two subtransactions . For instance ,the two subtransaction may never access commondata items directly or indirectly .

2. AS : (A, B) E AS, if A and B must be serialized ,but the order is immaterial .

3. SS : (A, B) E SS, if A and B must be serialize din some specific order to preserve the meaning ofthe computation .

Note that NS and AS can hold only between sub -transactions without any logical ordering requirementsbetween them, whereas SS must hold between sub-transactions constrained by logical ordering require-ments . We want the the relation SS to be minima land compatible with the logical ordering requirement .We also need a way of specifying and communicating

Q Uncommitted Subtransactio n(Internally) Committed Subtransactlo n

Subtransactlons A, C are Invisible

Subtransactlons A, C, and D areto subtransaction B

visible to subtransactlon B

Figure 1 : Visibility of sub transactions

information about serialization ordering requirement sto the serialization mechanism .

2 .3 The Visibility of Internal Commit

A subtransaction can be in one of two kinds ofcommitted states . If a subtransaction is committedwhile its enclosing top-level transaction is not, th esubtransaction is in the state of internal commit . Ifa top-level transaction commits, the top-level trans-action and all its descendent subtransactions are inthe state of external commit or top-level commit . The(sub)transactions in such a state is also said to b ecommitted to the top level.

In a single-level transaction system, an operation i seither visible or invisible to all other transactions. Fornested transactions, because of their the hierarchica lstructure, the following visibility principles for com-mitment of operations should be observed[6] :

1. Top-level commit is visible to every operation .That is, an operation from a committed top-leve ltransaction is considered committed by all othe roperations in the system .

2. Internal commit of an operation A is visible t ooperation B if and only if:

(a) A and B are from the same top-level trans -action, and

(b) There is an ancestor of B which is a siblin gof some internally committed ancestor of A .

The visibility of commit is illustrated in Figure 1 .If the operation A is either committed to the top-level ,or is internally committed and the internal commit isvisible to operation B, we say A is visibly committe dwith regard to B .

Root

Root

69

Concurrency control protocols for nested transac-tions must enforce the visibility of internal commits .For locking methods, shadow copies can be used t oenforce correct visibility[6] . However, this involvesthe costs of copying and inheriting the shadows . Thecopying cost would be quite expensive if multiple sub -transactions from the same root transaction are to ex-ecute concurrently .

2.4 Related Concurrency Control Models

The easiest way to schedule subtransactions in anested transaction system is to execute sequentiall yall subtransactions belonging to the same root transac-tion while interleaving subtransactions from differen troot transactions . This approach is not satisfactor ybecause the active periods of individual transaction sbecomes long and the overall performance of the sys-tem might be poor as a result .

A slightly better method is to sequentially executesubtransactions from the same root, but with the fol-lowing proviso: a group of sub transactions are ex-ecuted concurrently, if they are adjacent to each otherin program code, and it can be determined that rela-tion NS or AS holds between each pair of them an dthat concurrent execution will not cause deadlock o rcontention . This is basically the way Argus [5] handle sits intra-transaction concurrency control . However ,this improvement is still unsatisfactory for the follow-ing reasons :

1. A subtransaction sees only its immediate chil-dren and no other descendants, hence it may notbe possible to determine whether NS, AS, or S Sholds between two of its children without viol-ating modularity and analyzing the descendants 'code . In distributed systems where a subtrans-action can be an invocation on a remote object ,such analysis may be impossible .

2. Analyzing the serialization ordering requirement sbetween subtransactions are difficult and error -prone . In some systems this information may no tbe available until run-time . For example, it maydepend on the parameters of program invocationor user input .

3. Even when the serialization ordering require-ments are available, performance will still not b esatisfactory if two subtransactions of relation S Sare executed sequentially.

3 History Mechanism for Single-leve lTransactions

The abstraction of history[7] has been designe din the context of single-level transactions to sup-port the implementation of atomic objects[5, 11] an dmeet the demand of systems with long transactions o rwith localized concurrency bottlenecks (hot-spots) . I tprovides the basis for our concurrency control mech-anism for nested transactions .

The history mechanism uses application semantic sto increase concurrency . Introducing application se-mantics into transactions has drawbacks . Not onlywriting applications becomes more complicated, i talso becomes difficult to write an application programwithout unnecessarily exposing details of the underly-ing concurrency control algorithm . The history mech-anism alleviates these problems by classifying the op-erations on data types into mutators and observers ,and hiding the serialization protocol from the applic-ations. Classifying operations into mutators and ob-servers provides a way of systematically and efficientl yexploiting application semantics without exposing to omany details . This method is particularly interestin gin the context of nested transactions because it greatl yreduces the number of cases in which an operation ha sto wait for others .

3.1 Data Objects with History Abstrac-tion

Conceptually, an atomic data object can be viewe das a state machine with four components : a set ofpossible states, an initial state, a set of possible trans-itions, and a set of rules that determine how the state sof the atomic object are changed by the transitions . Atransition is an ordered pair consisting of an operatio ninvocation on the data object and its result . We wil luse the terms operation and transition interchange-ably when there is no ambiguity.

The state of an atomic object is represented as alist, or history of previously executed transitions . Theexecution of an operation is implemented as an ad-dition to this history of transitions . Formally, th eknowledge possessed by an atomic object with the his-tory abstraction can be expressed as a triple (T, C, 0) ,where :

• T is the set of operations from all transactionsthat have arrived at the data object .

• C is the set of operations from committed trans -actions . C C T.

70

• O is a relation between elements of T . (1 1 ,1 2 ) E0, if the object knows t i is serialized before t 2 .

How the serialization ordering becomes known t othe object depends on the serialization protocol . Iftimestamp ordering is used, the data object discov-ers the ordering of two operations by comparing theirtimestamps . If the serialization order is determine dby the order of commitment, as is usually the cas ewhen locking is used for concurrency control, the dataobject obtain more information on ordering when it i sinformed of the completion of some subtransactions .

The notion of possible serialized sequences [7] is acentral idea used in the history mechanism to syn-chronize operations performed on the same data ob-ject . A possible serialized sequence of an object can b eviewed as a possible development of the history of thatobject . Each possible serialized sequence represent sa possible state of the object in the future when al loperations in the sequence are committed in the de-scribed order, and all other transitions in the histor yare aborted .

Formally, given a history (T, C, O), a sequence S o foperations is a possible serialized sequence i f

1. S includes all committed operations, i .e ., S J C .

2. S preserves known ordering of operations, i .e ., fo r11, t2 E S, if (1 1 ,1 2 ) E 0, then t i is ordered before1 2 in S .

There may be several possible serialized sequence sfor with a given history. For instance, consider the fol -lowing history, produced possibly under commit-orde rserialization protocol :

T = {t1 i t2,13} ,C = {tl} ,and 0 = {(t l , t2), (tl t3)} .

All possible serialized sequences for this history are :

(11, 1 2, 1 3), (11, 13, t2), (t1, 12), ( t 1, t3), (1 1 )

If timestamp ordering is used, the ordering betweenany two transitions are known, and 0 might becom e{(t 1 ) t 2 ), (t l , t 3 ), (1 2i t 3 )} . The possible serialized se-quences in this case are :

( t 1, t2) t3),( t 1) 1 2),( t1) 13),(t l )

3.2 Mutators and Observers

The history mechanism classifies operations on adata object into mutators and observers . A mutator isan operation that changes the object state . A mutator

differs from a write in that its may change the object sstate relative the object's previous state, whereas awrite determines new object state based solely on it sparameters . An observer is an operation that deducesinformation from the object state . Conceptually, anobserver derives its result by observing the possibleserialized sequences, if there are more than one suc hsequences, information deduced may be inconsistent .The observer fails in this case . An operation is bothan observer and a mutator when it both deduces in-formation from and changes the object state .

With the above definitions, the algorithm to syn-chronize operations arriving at a data object can b esummarized as follows :

1. Before adding a new observer into the history ,determine whether it observes different result sin different possible serialized sequences. If so ,either the new observer are delayed or aborted ,or some action needs be taken to make the obser-vations, such as aborting some active mutators .

2. Before adding a new mutator, we should determ-ine whether some observers serialized after th enew mutator would observe inconsistent result i fit is inserted into the history. The mutator isinserted only when no such observers exist . Oth-erwise, the it is aborted .

3. When a transaction commits, all operations it is -sued are marked as committed, i .e . included intothe set C . If a transaction aborts, the records ofthe operations it issued are erased from the his-tory .

3 .3 Advantages of the History Mechan-ism

The history mechanism potentially allows moreconcurrency in a single-level transaction system tha nits counterparts using read/write locking or timestam pconcurrency control[7] .

For instance, suppose transaction A accesses dat aobject X, and then B accesses X . With read/writelocking, unless both A and B are reads, B would b eblocked until A completes . In the history mechanism ,B would be only if A is a mutator, B is an observe ,and B observe inconsistent result .

With single version timestamp concurrency con-trol, if A read X and then B write X with a smalle rtimestamp, we would either have to abort A or B . Inthe corresponding situation for a history mechanism, i fA issues an observer to X and then B issues a mutatorwith smaller timestamp, we need not abort either A

71

or B unless an observer would be invalidated if B isinserted into the history .

The history mechanism is comparable to multi -version timestamp protocols[10] . However, it does no thave the problem of cascaded abort . When a mutatoris about to be insert into the history, we make sure n oexisting observers would observe differently because ofthe new inserting . Similarly, an observer is allowe dinto the history only when no abortion of existin gmutators can affect the information it derived . Whena mutator needs be aborted, it can simply be delet efrom the record, affecting no other operations .

4 The Algorithm for Nested Transac-tion Currency Contro l

In this section, we described a concurrency contro lprotocol for nested transaction that is derived from th ehistory mechanism and is merged with the timestampserialization protocol . The concept of a request list i sintroduced to support the protocol .

4.1 The Data Object Model

In our model, a data object knows (1) all operationsinvoked upon the object, (2) the ordering among theseoperations, (2) the originating transactions of theseoperations, and (3) the commit status of the origin-ating transactions . This knowledge is recorded in th erequest list . When an operation try to deduce inform-ation from a data object, an object state is derived forit from the request list depending on its serializationordering with respect to other operations .

Timestamps are used to order all top-level trans -actions. A precedence number, unique with the en -closing top-level transaction, is associated with eac hsubtransaction and used to order the subtransactions .The precedence number must be compatible with th elogical ordering requirement, i .e . if subtransaction A islogically required to be ordered before subtransactio nB, A will be assigned a smaller precedence number .Precedence numbers can be assigned either by examin-ing the location of subtransactions in the transactioncode or by specification by the programmer .

Since subtransactions may be invoked dynamicall yand in a hierarchical fashion, we do not assume pre-cedence numbers to be either consecutive or integer .Figure 2 shows one way of assigning precedence num-bers, with precedence determined by lexicographicalorder . For example, precedence number 1 .1 is smal-ler than 2, and 2 .1 .2 is greater than 1 .2 .2 .

0 (Sub)transaction

Root

Figure 2 : Example of precedence number assignmentfor subtransaction s

Formally, the request list of an data object D i sdefined as a tuple (T, C) where :

• T is a sequence constituted of all operations in-voked on the data object . The ordering of theoperations within the sequence reflects their seri -alization order . Operation x appears before op-eration y in T if and only if x should be serialize dbefore y.

• C is the set of all committed operations in T .

As in the history mechanism[7] operations are clas -sified into observers and mutators . An operation bot hchanges and derives information from the object' sstate will assume the roles of both a mutator and anobserver .

An epoch is defined as the interval between two ad-jacent mutators, or the interval from the start to thefirst mutator in the request list . An epoch whose be-ginning and end mutators are M; and M1 is denoted(M;, M.O . An observer 01 in the request list is con-sidered to be in the epoch (M;, M, ), if the ordering ofthe operations is M; < 0 1 < M1 . An operation beforean epoch (M;, MM ) is either an operation serialized be-fore M; or M; itself. An operation after (M;, M1 ) iseither an operation serialized after M3 or M, itself.

There is a set of candidate states associated wit heach epoch, which represents the set of possible statesthe data object could assume if and only if all oper-ations before that epoch have completed, i .e ., eithercommitted or aborted. The candidate states associ-ated with epoch (M;, Mi ) are the set of object statesan observer located in the epoch could possibly see .Figure 3 gives examples of candidate states . When amutator is committed or aborted, all epoch serializedafter this mutator must adjust their candidate state s

01 .1 .1

1 .1 .2

2 .1 .1

2 .1 .2

72

accordingly. If all mutators before an epoch have com-mitted, the set of candidate states associated with thi sepoch is a singleton .

A maindifference between the history mechanism[7] and re-quest list is that the ordering between two operation sin a history may be unknown, while that of two oper-ation in a request list is always known. History avoi dthis requirement because it tries to separate the seri-alizing protocol from its concurrency control protoco land support both timestamp ordering and commit or-der ordering .

4.2 The Basic Algorithm

According to criteria described below, an operationPi that arrives at a data object can be rejected or in-serted into the request list . If it is inserted, it will beinserted into an epoch (M=, M,) where M; < Pi < M1 .If the new operation is a mutator, the epoch is spli tinto two epochs : (M=, Pi ) and (PI , Mi ) . When we needabort an operation from an internally committed sub -transaction, we have to abort the youngest uncommit-ted ancestor . So simplicity, we sometime also refer t osuch a scenario as aborting the operation .

The concurrency control algorithm using the re-quest list is as follows :

1 . Operation ordering .

(a) Each top-level transaction is associated witha timestamp . Each subtransaction is as-signed a precedence number, unique withinits enclosing top-level transaction .

(b) The data object determines the precedenc eof operations according to the following cri-teria:

• Two op-erations with different timestamps, i .e .two operation comes from different roottransactions, are ordered by comparingtheir timestamps .

• Two operationswith the same timestamp are orderedby comparing the precedence number sof the subtransactions they come from .

2 . Processing of observers . When a new observe rarrives at a data object, the observer derives it sobservation from the set of candidate states asso-ciated with its enclosing epoch as follows :

• If the observer derives the same result fro mall candidate states in the period, the ob-server is inserted into the request list .

• If the observer derives different results fromdifferent candidate states, the observer isblocked and retried later when it can observ ea consistent result from all candidate states .The issue of retry will be discussed in th enext section .

3 . Processing of mutators . When a new mutator ar-rives at a data object, we must decide whethe ror not to insert the mutator into the request list .If the mutator is inserted, the candidate statesfor the all epochs located after the new mutato rmight be changed . An observer which previousl ysaw consistent results in one of these epoch maynow see inconsistent results . If this happens, th eobserver is said to be invalidated by the mutator .The issue of checking invalidation is discussed fur -ther in the next section . The criterion for decidin gwhether to insert the mutator is as follows :

(a) If no observer in the request list is invalid-ated, the mutator is inserted into the reques tlist .

(b) If some observers are invalidated, then :

• If some of the invalidated observer sare from different top-level transactions ,then :

If any of the invalidated observer shas committed to the top level, th emutator is aborted .Otherwise ,we call an algorithm ChooseAbort

to determine whether to abort th eoffending mutator or the invalidatedobservers . ChooseAbort can be de -signed according to the number an dcommit status of invalidated observ-ers (uncommitted or internally com-mitted), the relative cost of abor tmutators and observers, etc . Itcan be tuned to optimize systemthroughput, or simple heuristic canbe used. When an internally com-mitted observer needs to be aborted ,we have to abort the youngest un-committed ancestor of the observer .

• If the mutator and all invalidated ob -servers come from the same top-leve ltransaction as the mutator, i .e ., theyhave a common ancestor, then the ob -servers must be aborted regardless ofwhether any of them has internally com -mitted, since these observers must hav e

73

C Uncommitted mutator

Committed mutator

Set of candidate state s

I _1n_10 Ii L-102.1.20.4LI 1 -50,-35,-30,-5110 1

set (10)

sub (20)

add (15)

add (20)

sub (50)

Figure 3 : Examples of candidate states

been ordered after the mutator withinthe top-level transaction and some ofthem might logically depend on the res-ult of the mutator . None of the observ-ers in this case could have committed tothe top-level, since there is at least oneoperation, the mutator still uncommit-ted in this top-level transaction .

4. Commit order constraint . Subtransactions unde rthe same parent must commit in the order of thei rprecedence number . That is, a subtransaction cancommit only when all subtransactions under thesame root with smaller precedence numbers ar ecommitted . A top-level transaction can commi twhen all its children have internally committed .

5. Processing commits and aborts. When a(sub)transaction aborts, the operations issue dfrom this (sub)transaction are simply deletedfrom the request list . If a deleted operation is amutator, conceptually the immediate epochs be -fore and after the mutator will be merged int oone. The sets of candidate states associated wit hthe epochs after the aborted mutator may changeas a result . When a (sub)transaction commits ,the candidate states of the epochs after its mutat-ors may also need be updated . In both cases, thecardinalities of the changed candidate state set sare likely to become smaller .

The model above is interesting in two ways . First ,the serialization protocols for top-level transaction sand for subtransactions are decoupled . Timestamp sare different from precedence numbers in that whe na top-level transaction is aborted and retried it has anew timestamp, whereas if only an operation withina transaction is retried, its precedence number is stil lthe same. This subtle difference allow us to exploi tintra-transaction concurrency with minimal overhead .Potentially other serialization protocol can be usedfor top-level transaction or subtransactions to explor eother possibilities . Second, the issuing and commit-

ment of subtransactions are decoupled . In the abovediscussion, we made no assumptions about the waywe issue the subtransactions . In principle, all of thesubtransactions under the same parent can be issuedand executed concurrently, as long as we force thei rcommit order to conform to the serialization orderin grequirement .

4.3 Commit and Abort Issues

When a subtransaction is aborted, its parent coul deither proceed without the child, re-issue and retr ythe child, or abort itself. In our method, the con -currency control algorithm will initiate an abort onlywhen a mutator newly arrived at a data object invalid-ates some existing observers in the request list . In thiscase, either the offending mutator or the invalidatedobservers may be aborted .

If the offending mutator Mi is aborted, there is nopoint for its parent Pj to retry it again, because Miwill come with the same time-stamp and the sam eprecedence number, be inserted into the same place ,and invalidate the same and newer observers . If Pj isprogrammed in such a way that it keeps retrying Mi ,eventually Pi 's parent will see it as time-out and abortit . Therefore, when an offending mutator is aborted ,we abort it with the hint N0 REINSTANTIATE . The par-ent Pi could use this hint to make an appropriate de-cision . If this mutator is essential for the whole nestedtransaction, the nested transaction will have to abor titself eventually and a new nested transaction may b estarted with a new timestamp .

An observer to be aborted could be either totall yuncommitted or internally committed but not extern -ally committed . Suppose an uncommitted observerO; is aborted . Since new nested transactions willbe assigned new timestamps, and old transaction swill keep completing, the number of transactions wit htimestamps smaller than Oi will decrease with time .Eventually there will be no mutator that could pos-sibly invalidate Oi. Therefore, an uncommitted ob-server is aborted with the hint REINSTANTIATE to in-

74

dicate that if Oi is retried, it will have a better chanc eto succeed. On the other hand, it is dis-advantageousfor 0i 's parent to abort itself in this case . Aborting0 i ' s parent would waste system resources and intro -duce the possibility of infinite retires if the abortionleads to the abortion of the whole nested transaction .When a new nested transaction is instantiated with alarger timestamp, it faces an increased risk of beinginvalidated by mutators in the systems .

When an internally committed observer O i mustbe aborted, we cannot simply abort the observer itself ,since the (Vs parent have already see it as committed .Instead, we must abort the youngest ancestor A, o f0 i which has not committed internally . A2 has notpromised its parent anything yet .

Re-instantiating A j in this case can be either useles sor helpful . If many of A3 's descendants are mutators ,retrying Aj may be useless, because these mutator smay be invoked with different parameters dependin gon the new result of 0 i , and thus invalidate other ob-servers in the system . On the other hand, if nearl yall descendants of Al are observers, then the retry i svery likely to succeed . Heuristics can be devised to re -turn the right hint to the program to improve systemperformance .

When a newly arrived mutator invalidates existin gobservers, and none of the invalidated observers havecommitted to the top level, we use ChooseAbort to de-cide whether to abort the observers or the mutator, asdescribed before . The idea is to choose whichever sid ewill result in less overhead . If we abort the mutator ,it is very likely the whole top-level transaction will b eaborted and everything done so far would be wasted .If the observers are aborted, we also need to abor tall subtransactions that depend directly or indirectlyon the result of these observers . There are ways t oassociate the costs or predicted costs of algorithm-initiated abort with each mutator and observer, andChooseAbort may decide which side to abort by com-paring these costs of the observer and the mutator .On the other hand, we can of course always choose t oabort the mutator or the observer, and avoid overheadin ChooseAbort .

5 Details of the Algorithm

The primary costs characteristic to this algorith mcan be divided into three main categories : (1) the cos tof deciding what a newly arrived observer will see, (2 )the cost of deciding whether a newly arrived mutatorwill invalidate some existing observers, and (3) the

cost of retries . In this section, we discuss the detailsin our algorithm that minimize these costs .

5 .1 Classification of Operators

In order to use operator characteristics to improv eperformance, we further classify operations as follows :

• Relative mutators and absolute mutators : A rel-ative mutator changes the data object to a newstate relative to the original object state . Addingan amount to an account object and withdrawingan amount from an account object are examplesof relative mutators . Absolute mutators chang ethe states of data objects without referencing th eoriginal states . That is, the new states depend son the parameters of these mutators only . Ex-amples are : resetting an integer to zero or settin gan integer to 100 .

• Value observer and criterion observer. A valueobserver reports the value or part of the valu eof the data object . Criterion observers ask yes-or-no questions and return boolean values as an-swers . Examples are : asking whether an accountis greater than zero or less than a certain amount .

5 .2 Handling Observers

To simplify the discussion below, we assume all dat aobjects involved are scalar objects . We have extendedthese methods to composite objects .

Value Observers

Suppose O„ is a newly arrived value observer and 0„will be inserted into epoch (M2, M3 ) if accepted . 0„is handled according to the following principle :

Let Ml be the last visibly committed abso-lute mutator with respect to 0,, . If all mutat-ors between Ml and M2 including M2 are vis-ibly committed with respect to the 0,,, thenthe observer can observe a consistent resul tand is inserted into the request list . Other-wise 0,, is blocked and retried .

In an implementation, the above condition can b echecked by a backward linear scan of the request lis tfrom M2 . The scan stops at the first uncommittedmutator, in which case 0,, fail . If no uncommittedmutator has been encounter and a visibly committe dabsolute mutator is met, au is successful .

75

Criterion Observer s

The handling of criterion observers can be optimize dsignificantly. For each kind of criterion observers wedefine a worst possible value and a best possible valu e(BP and WP values hereafter) . Each epoch has a BPand a WP value, each of which is en element of the se tof candidate states . The question of whether a newlyarrived criterion observer can succeed and the inform-ation the observer need can be answered by consultin gthe BP and WP values of the epoch enclosing the ob-server .

For example, Figure 4 shows a data object export-ing the observer IsGreaterThanZero . For each epoch ,we keep WP value pmin, which is the smallest possiblevalue for all candidate states at that epoch, and a B Pvalue pmax, which is the greatest possible value . Whenan instance of IsGreaterThanZero arrives at the dat aobject, we know it must return YES if both pmin andpmax are greater than zero . No matter which activemutators are finally committed or aborted, the candid-ate states in the epoch would always give the answe rYES . If both pmin and pmax are less than zero, the ob-server must return NO. If pmin is less than zero whil epmax is greater than zero, we infer that the observe robserves inconsistent results, and should be blocke dand retried .

To summarize, we define a BP value and a W Pvalue for each criterion observer type. When an in-stance of that observer type arrives at the data ob-ject, if both BP and WP values satisfy the criterion ,then the observer succeeds with answer YES . If bothBP and WP values don't satisfy the criterion, the ob-server succeeds with the answer NO. When BP satisfiesthe criterion and WP value doesn ' t, the observer couldobserve inconsistent results, and therefore should b eblocked and retried .

Denote the BP value at a epoch h as BPh , the nthmutator in the request list as M,,, and the result of ex-ecuting M„ on a possible value BP," _ 1 as M,,(BPr,_I) .Then BP," is obtained in the following way :

1. BPo is the initial state of the data object .

2. If M„ is externally committed, then BP,, =On(BPr

3. If Mn is not externally committed yet, then com-pare the value BPn_ 1 and On(BPn_ 1 ), whicheveris better is chosen as BPn.

The WP values can be obtained in a similar fashion ,except that we choose whatever is worse when the op-eration in consideration is an uncommitted mutator .

Figure 4 gives examples of calculating BP and WPvalues .

When a new mutator arrives at a data object, i tmay be serialized at the end or in the middle of th erequest list . In either case, the structure of the reques tlist will change . If the mutator is appended to the en dof the request list, only the BP and WP values in thenew epoch need be calculated . If it is inserted in th emiddle of the request list, all of the BP and WP value sassociated with the epochs positioned after the newmutator will change . Similarly, when an operatio ncommits or aborts, the BP and WP values associatedwith the epochs after the operation also change .

The BP and WP values can be shared among differ-ent types of criterion observers . For example, if thereis another criterion observer IsGreaterThanFifty ,

the BP and WP values will be the same as those ofIsGreaterThanZero. If there is a criterion observerIslessThanZero, then pmin becomes the BP valu eand pmax becomes the WP value .

Value observers can also be handled using BP an dWP. A value observer derives a consistent result froman epoch if and only if the set of candidate states o fthe epoch is a singleton set, in which case the B Pand WP of the epoch would be the same, and thei rvalue is the answer to the values observer . In actualimplementation, if we can define BP and WP valuesfor all criterion observers, we need not maintain theset of candidate states at all .

The key issue in this discussion is : what is the defin-ition of better or worse? If the criterion observer isbased on numerical number comparison such as theone in the above example, the definition of better orworse can easily be given without ambiguity . However ,we are unable to exhaust all categories of criterion ob-servers that may be used, and are unable to give aformal definition of being better or worse . Our con-tention is that in most practical cases, there are a nunambiguous ways of obtaining the BP and WP val-ues. If a clear definition of BP and WP values is im-possible, we can simply reclassify the observer as avalue observer .

Compensation for the Visibility of InternalCommit

One might notice that we did not take into account thevisibility of internal commit in the above calculatio nof the BP and WP . That is, we treat as uncommitte dall mutators has not committed to the top level . Theresult is that we are overly pessimistic .

Consider the example in Figure 5 . The observe rin question is IsGreaterThanZero . Let add(15), de-

76

0 Uncommitted mutator

committed mutato r

IsGtThanZero= YES

IsGtThanZero

IsGtThanZero

IsGtThanZero= ??

_ ??

= YESIsGtThanZer o

=NO

BP=10 BP= 10 BP= 25 BP = 45 BP = -5WP=10 WP=-10 WP=-10 WP 10 WP=-40

set (10)

sub (20)

add (15)

add (20)

sub (50)

Figure 4 : Maintaining best and worst possible values for IsGreaterThanZero

noted as M3, be the mutator that divide epoch 2 an d3. The BP and WP values in the epoch 2, i .e ., BP2and WP2 , are 10 and -10, respectively. Accordingto the rules described above, BP3 and WP3 are 25

and -10 . Therefore, a newly arrived observer O, seri-alized into epoch 3 would not be able to derive anyconclusion and would be blocked . However, if the newobservers is from the same top-level transaction as M3and M3 is visibly committed with respect to O,, the nthe worse possible value O, can see is 5 and O, shouldin fact return the answer YES .

We need to compensate the effects of internal com-mit :

When a new observer arrives, if it canno tsucceed with current BP and WP values, wecheck to see if there are mutators whose in-ternal commits are visible to the observer . Ifsuch mutators exist, we calculate a tempor-ary set of BP and WP values with the mutat-ors regarded as committed . If both WP an dBP values in the temporary set satisfy or fai lthe criterion, the observer succeeds . Other-wise, the observer is blocked .

We choose not to consider internal commits whe nderiving BP and WP values but compensate for themlater, because considering internal commits when de -riving BP and WP values will result in one set o fBP and WP values for operations from each differen ttop-level transaction . The cases in which compensa-tion for internal commit is necessary are expected t obe few. We need to compensate for internal commi tonly when an observer and a mutator from the sam etop-level transaction access the same data object, andthe observer is serialize before the mutator in the top -level transaction . In other word, we use the simplerconcept of one set of BP and WP values and com-pensate for internal commit when necessary so tha tnormal processing will not be burdened by infrequen tspecial cases .

Such compensation is not necessarily expensive . Ifthe mutator is a commutative operation, the tempor-ary set of BP and WP values can be obtained by ap-plying the mutator on one of the old BP or WP values .

5.3 Handling Mutators

When a mutator M arrives at a data object, weneed decide whether M will invalidate any existin gobserver . The following test answers this question fo rvalue observers :

If there is a value observer O„ serialized afte rM and there is no externally committed ab-solute mutator serialized between them, the nO„ is invalidated by M. Otherwise no valu eobservers are invalidated .

To decide if a new mutator M invalidates any exist-ing criterion observer, we need to calculate new W Pand BP values for each epoch that is located afte rM and contains criterion observers . Then we conductthe following test on criterion observers located withinepochs whose BP and WP values have changed :

Is there is a criterion observer which returne dYES but now fails to satisfy the criterion ?Is there is a criterion observer which returne dNO but now satisfies the criterion ?

If the new mutator invalidate some existing cri-terion observers, it is processed as described in sec-tion 4. Otherwise, no observers are invalidated an dthe candidate states in each epoch located after th enew mutators should be updated .

Note that we were not concerned about the visib-ility problem of internal commits when handling th enew mutators . A mutator cannot arrive already in-ternally committed .

We also did not consider the possibility that a pre-viously blocked observer becomes able to observe aconsistent result because of the newly arrived mutator .

7 7

0 Uncommitted mutator ® Internally commited mutator

Externally committed mutator

YES, if add(15) is visibly committe dIeOreaterThanZero

NO, if add(15) is not visibly committed

Epoch 1

Epoch 2 Epoch 3

Epoch 4

Epoch 5

BP=10 BP= 10 BP = 25 BP=45 BP=-5WP=10 WP=-10 WP = -10 WP=10 WP=-65

set (10)

sub (20)

add (15)

add (20) .

sub (50 )

Figure 5 : Compensation for visibility of internal commi t

Inserting an uncommitted mutator can only make th erequest list more uncertain and reduce the chance thata consistent state can be observed .

5.4 The Handling of Retries

Another cost in our protocols is that cost of re-tries . This cost also exists in the history mechanismfor single-level transactions . In the original model ,when a new observer derives inconsistent results fro mdifferent possible serialized sequences, the observer i srejected and will be retried later by the issuing trans -action . The delay time before the next try is decide dby the transaction . The transaction may also chooseto abort the observer after some number of retries .

Retry is a serious problem for the original historymechanism, as stated in [7] . First, since retries are ini-tiated by the issuing transactions, there are additionalcommunication costs between the transaction softwar eand the data object manager, both when an observeris rejected by the data object and when it is retrie dby the transaction later . The cost is even higher ifthey reside in different sites . Second, the transactionretries by guessing a suitable delay . It is possible thatthe operation still cannot proceed on the next try, o rthat it could have been successfully tried earlier .

For these reasons, we choose to block the operationsat the data object sites and have the objects initiat ethe retries. A data object knows more about wha thappened inside itself and what is the best time t oretry . Communication costs will be saved, and trans -action code will be easier to write . The transactionmay choose to abort the observer if it does not retur nafter some period of time .

Retrying Value Observer s

Ideally, a blocked observer Ob should be retried whe nthe request list of the data object has changed in suc ha way that O b can now observe a consistent result .However, such condition is expensive to detect .

Our solution is to avoid testing for this condition ,and have the data object choose a time when it i shighly possible that the retry will succeed .

The following is a sufficient condition for the successof observers :

Suppose the blocked observer Ob is serializedin epoch (M2 , W . Let M1 be the last loggedabsolute mutator . O b will be able to observ ea consistent result if that M1 becomes visiblycommitted and all mutators between M1 andM2 in the request list including M2 becomeeither visibly committed or aborted .

If M1 is aborted instead of being visibly committed ,by definition, the last logged absolute mutator willbecome the M1 in the test .

The condition described above depends on a se-quence of consecutive mutators rather than a set ofpossible serialized sequences as in single-level historymechanism, and is therefore easier to evaluate .

In real implementations, we can implement heur-istics for the test and pay even less price . An goodheuristic would be to retry Ob when M2 becomes vis-ibly committed or aborted. The reason is that becauseM2 is the mutator with the largest timestamp befor e0, it is highly probable that when 0 is aborted orvisibly committed all mutators between M 1 and M2would have done so, too . Other good choices includ eretrying 0 when both M1 and M2 are completed, orretrying 0 when the last two mutators before 0 ar ecompleted .

7 8

Retrying Criterion Observers

Since it is easier for criterion observers to mak econsistent observations than for value observers, th eabove condition may be too conservative for criterio nobservers . As noted earlier, when a mutator commits ,we have to update the possible values for all epochslocated after it . When updating possible values fo rthese epochs, we can also decide whether a blocke dcriterion observer can now succeed .

When an observer is blocked, it is blocke don a BP/WP type and an epoch . The ob-server is retried when the BP/WP values i nthe epoch has changed .

The test above is optimistic . In real implementa-tion, heuristic attempting fewer tries can be used . Forexample, we can use the test for value observer to-gether with a background process retrying observer sthat have blocked for too long .

5 .5 Remarks

The discussion in this section has relied on the clas-sification of operations, such as classifying observersinto value and criterion ones . One might suggest tha tthe operations on data objects do not always allo wsuch classifications, or that it is not always possibleto define the WP value and BP value for a criterio nobserver . However, the classification and the WP an dBP values only serve to optimize performance . If wefound that a clear classification or or a definition W Por BP values is not possible for an observer, we ca nsimply classify the observer as a value observer, andthe algorithm can proceed without any difficulty. Sim-ilarly, if there is any ambiguity, we can classify th emutator as a relative mutator .

The concepts discussed in these sections, such a ssets of candidate states, need not correspond to phys-ical entities . We have provided methods to simplifythe protocol at the conceptual level, but various im-plementation issues still remain to be investigated .

6 Performance Failures

We use timestamps to order top-level transaction sin our model . As timestamp ordering is free fromdeadlock[3, 9], deadlocks cannot occur across two top-level transactions. Subtransactions are assigned pre-cedence numbers which are unique within a top-leve ltransaction . For similar reason, deadlock cannot oc-cur between subtransactions within the same top-leve ltransaction .

Cascading abort, or the domino effect cannot oc-cur, either. Suppose operations A and B are bothactive and access the same data object ; B is serial-ized after A . Aborting A can affect B only when Ais a mutator and B is an observer. Since the set ofcandidate states in B's epoch has included all possibl eobject states in which A can be aborted or committed ,if B was allowed into the request list in the first place ,it means B can derive a consistent result whether Acommits or not . Aborting A would not change B's ob-servation . Therefore, not only cascading abort canno toccur, aborting a top-level transaction will not affec tother top-level transactions at all .

Infinite retry cannot occur within a nested trans -action because the structure of any transaction is as-sumed to be finite, and in conflict we always resolvein favor of the operation with a smaller timestamp .Therefore, a subtransaction cannot be preempted byother subtransactions from the same root indefinitely .

Infinite retry between the top-level transactions ,however, is a problem . It is possible that a top-leve ltransaction might be aborted and retried indefinitely ,each time failing because some new operation with alarger timestamp has show up in the data objects i taccesses . This problem is intrinsic to any timestamp-based protocol, and is not aggravated in our case .Most timestamp-based systems assume it to be neg-ligible . How serious this problem is for our metho dcan only be measured in a real implementation, an dis remained for further investigation .

7 Conclusions

In this paper, we propose a new concurrency con-trol protocol for nested transactions . Our method tryto address three design goals : to minimize executio ntimes of individual transactions, the time a transac-tion has to spend on resource waiting, and the num-ber of cases in which a transaction proceeds based onthe assumption that some other transaction will com-mit . These goals will improve both program responsetimes and system throughput . We meet these goalsby exploiting intra-transaction concurrency, and usin gthe concept of request lists, which is derived from thehistory mechanism for single-level transactions an dmerged with a timestamp serialization protocol . Itpreserve the benefits or both methods . Problems inlocking and timestamp concurrency control protocols ,such as deadlocks and cascading aborts, are avoid alto-gether . Most notably, our method decouples the seri-alization protocol for top-level transaction from tha tfor subtransaction within the root, and decouples the

79

issuing of subtransactions from the synchronization o fcommitment . That is, we need not be concerned wit hhow subtransactions are issued as long as they arecommitted in the required order .

References

[1] P . A . Bernstein, V . Hadzilacos, and N. Goodman ,Concurrency Control and Recovery in Databas eSystems, Addison Wesley, 1987 .

[2] P. A. Bernstein, D . W. Shipman, and D. W .Wong, "Aspects of serializability in database con-currency conctrol," IEEE Trans . on Software En-gineering, vol . 5, no . 3, pp . 203-216, May 1979 .

[3] W . Cellary, E . Gelenbe, and T . Morzy, "Con-currency control in distributed database systems ,chap 1-7 and chap 11-12," North-Holland, 1988 .

[4] H . T. Kung and J . T. Robinson, "On optimisticmethods for concurrency control," ACM Trans .on Database Systems, vol . 6, no . 2, pp . 213-226 ,June 1981 .

[5] B . Liskov and R . Schiefer, "Guardians and ac-tions: Linguistic support for robust, distributedprograms," Transactions on Programming Lan-guages and Systems, Vol. 5, No. 3, pp . 381-404 ,July 1983 .

[6] J . Moss, "Nested transactions: An approac hto reliable distributed computing," MIT Press ,1985 .

[7] T. Ng, "Using history to implement atomic ob-jects," ACM, TOGS, Nov 1989 .

[8] C. H . Papadimitriou, The Theory of Concurrenc yControl, Computer Science Press, Rockville, MD,1986 .

[ 9] C. Papadimitriou, "Database concurrency con-trol," Computer science press, 1990 .

[10] D . P . Reed, "Implementing atomic object on de-centralized data," ACM, Trans . on ComputerSystems, vol . 1, no. 1, pp . 3-23, Feb . 1983 .

[11] W. Weihl and B.Liskov, "Implementation of resi-lient, atomic data types," ACM Transactions o nProgramming Languages and Systems, vol . 7, no .2, pp . 244-269, April 1985 .

About the authors

Ming-Ling Lo is a Ph. D. candidate in ComputerScience at the University of Michigan . His researchinterests include parallel and spatial databases, trans -action processing, and distributed systems .

Ming-Ling received his B . S. degree in ElectricalEngineering from National Taiwan University in 1985 .He can be reached at the EECS department, Uni-versity of Michigan, 1301 Beal Avenue, Ann Arbor, M I48109-2122, or by e-mail at mingling@eecs .umich .edu .

Chinya V. Ravishankar is presently Assistan tProfessor in the Electrical Engineering and ComputerSciences Department at the University of Michigan .His research interests include distributed systems, pro-gramming language design and software tools .

Ravishankar received his B. Tech. degree inChemical Engineering form the Indian Institute o fTechnology-Bombay, in 1975, and his M .S . and Ph .D .degrees in Computer science from the University ofWisconsin-Madison in 1986 and 1987, respectively . Hecan be reached at the EECS department, 1301 BealAvenue, Ann Arbor, MI 48109-2122, or by e-mail atravi@eecs .umich .edu .

80

Documents

A Concurrency Control Protocol for Nested Transactionsravi/Papers/DBConf/nested-transactions.pdf · locking concurrency control unsuitable for nested transaction. 3. Optimistic concurrency