RF-MVTC: an efficient risk-free multiversion concurrency control algorithm

CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCEConcurrency Computat.: Pract. Exper. 2004; 16:1291–1311Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cpe.756

RF-MVTC: an efficient risk-freemultiversion concurrencycontrol algorithm

Azzedine Boukerche1,∗,† and Terry Tuck2

1SITE, University of Ottawa, Ottawa, Ontario, Canada K1N 6N52Department of Computer Science, University of North of Texas, Denton, TX 76203, U.S.A.

SUMMARY

In this paper, we focus on the temporary return of data values that are incorrect for given transactionalsemantics that could have catastrophic effects similar to those in parallel and discrete event simulationsystems. In many applications using online transactions processing environments, for instance, it is best todelay the response to a transaction’s read request until it is either known or unlikely that a write messagefrom an older update transaction will not make the response incorrect. Examples of such applications arethose where aberrant behavior is too costly, and those in which precommitted data is visible to some reactiveentity. In light of the avoidance of risk in this approach, we propose a risk-free multiversion temporallycorrect concurrency-control algorithm. We discuss the algorithm, its implementation and report on theperformance results of simulation models using a cluster of workstations. Copyright c© 2004 John Wiley &Sons, Ltd.

KEY WORDS: concurrency; risk-free multiversion; databases; optimistic distributed simulations; rollbacks;cluster of workstations; multiversion timestamp ordering

1. INTRODUCTION

The fundamental problem addressed in this paper is temporally correct transaction concurrency indistributed database systems. In the context of this paper, however, the meaning of ‘temporal order’ isslightly different from that implied by the phrase ‘time stamp order,’ as is found in the literature [1–3].The main characteristics of temporal order are as follows:

(i) the position within the temporal order of a read-only transaction is determined solely by theglobal system time when the transaction is first initiated;

∗Correspondence to: Azzedine Boukerche, SITE, University of Ottawa, Ottawa, Ontario, Canada K1N 6N5.†E-mail: [email protected]

Contract/grant sponsor: Texas Advanced Research Program

Copyright c© 2004 John Wiley & Sons, Ltd.Received June 2002

Revised November 2002Accepted December 2002

1292 A. BOUKERCHE AND T. TUCK

(ii) the position within the temporal order of an update transaction is determined by a combination ofthe global system time when the transaction is first initiated and any network delays on initiationmessages that are sent on behalf of the transaction to local databases that it may update; and

(iii) once established, a transaction’s position in the temporal order of transactions never changes.

This paper addresses the above-mentioned gap in light of these assumptions. We present a transactionconcurrency-control algorithm that is designed to execute transactions in a high-performance mannerwhile producing conflict-serializable schedules that preserve temporal dependencies that coexist withdata dependencies. The algorithm, which we refer to as a risk-free multiversion temporally correct(RF-MVTC) concurrency-control algorithm, avoids all risk associated with responses to read requests:returned values are guaranteed to be written by only committed transactions that immediately precedethe reader in the temporal order in terms of data-item-level conflict. In summary, risk is avoided byemploying a ‘paranoid’ approach with respect to delayed begins: if a declared table has been readby a younger transaction, regardless of the commit state, the begin is rejected and returned with asupportable timestamp. This approach simplifies the concurrency-control algorithm, thereby increasingpotential performance.

2. PREVIOUS AND RELATED WORK

More practical concurrency-control methods such as conservative multiversion timestamp ordering(C-MTVO) [1] and chrono-scheduler [4] have been proposed that execute transactions and theirconstituent operations exclusively in chronological order. These methods are probably among the bestfor applications with database transactions having temporal dependencies to the extent that out-of-orderexecutions may be incorrect, even when transactions access no common data items. Although these twomethods use fundamentally different approaches, their high-level behavior disallows the execution ofa younger ‡ transaction until all older transactions are completed. The lack of data dependencies aside,temporal dependencies are strictly preserved.

There are many similarities between concurrency control protocols and synchronization protocolsin parallel and discrete event simulation (PDES) where causal events must also be executed beforetheir resultant events [5]. In a simulation, events must always be executed in increasing time order.Anomalous behavior might then result if an event is incorrectly simulated earlier in real time andaffects state variables used by subsequent events. To speedup the execution of simulation applicationsdistributed simulation techniques have been proposed [6–8]. These techniques can be classified intotwo groups, conservative algorithms and optimistic algorithms. While conservative synchronizationtechniques rely on blocking to avoid violation of dependence constraints [7,8] optimistic methodsrely on detecting synchronization errors at run-time and on recovery using a rollback mechanism[7,9,10]. In the optimistic approach, rollback action takes place whenever an event-message arrives‘in the past’ (a straggler), and it consists of restoring the (logical) process to the appropriate state andsending cancellation notices for messages produced by the rollback portion of the computation usingantimessages [7].

‡The younger transaction has a timestamp that is greater than the older transaction’s timestamp, similar to the situation withpeople and their birthdays.

Copyright c© 2004 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2004; 16:1291–1311

RF-MVTC 1293

In database applications, a counterpart event to the sending of an incorrect message is the returnof an incorrect data-item value in response to a transaction’s read request. This can be a commonoccurrence when the transaction concurrency-control method employs an abort and restart mechanismfor recovery. For example, in database systems using timestamp ordering as the basis for serialization,such an event occurs whenever a data-item value is returned to the reader before the database receivesor processes a write message from an older transaction. In this case, serializability is possible only ifone of the two transactions are aborted. If the reader is aborted, then a portion of its execution wasperformed with a semantically incorrect data-item value.

In the paper The Dark Side of Risk [11], the authors identified potential problems in parallel anddistributed simulation that can arise when events are allowed to execute under incorrect conditions.They also describe problems related to the sending of incorrect messages. When such problemsoccur, the impact ranges from, at best, the simulation crashes to, at worst, erroneous results withouta detectable symptom. The problems are made possible by the simulation coder’s lack of perfectionin addressing all unforeseeable exceptions. The issues raised in [11] motivated us to consider that thetemporary return of data values that are incorrect for the given transactional semantics could havecatastrophic effects similar to those in PDES. Thus, in many applications using online transactionsprocessing (OLTP) environments, it is best to delay the response to a transaction’s read request untilit is either known or unlikely that a write message from an older update transaction will not make theresponse incorrect. Examples of such applications are those where aberrant behavior is too costly andthose in which precommitted data is visible to some reactive entity.

With the avoidance of risk as a motivating factor, a further constraint for the concept of temporalcorrectness is worthy of investigation. In its strictest form, risk-free temporal correctness impliesthat semantically incorrect data-item values are never returned to a transaction. The two forms oftemporal correctness give rise to two fundamental strategies for providing supportive concurrency-control algorithms. For the relaxed form of temporal correctness, an aggressive approach can be used.When following this approach, if it is detected that semantically incorrect data has been returned toa transaction, the recovery action is to abort and restart the affected transaction in a manner thatis temporally correct as originally defined. Aggressive approaches are generally regarded as beingsuperior in performance with respect to the general goal of higher transaction throughput, as long asissues such as cascading aborts and transaction starvation are addressed appropriately. Towards thisend, we propose the stricter form of temporal correctness which necessitates the use of a differentapproach. As no portion of a transaction’s processing is to be performed with a semantically incorrectdata-item value, we must disallow actions that either return an incorrect data-item value or cause apreviously returned data-item value to become incorrect. In light of the avoidance of risk with thisapproach, we propose the RF-MVTC concurrency-control algorithm.

2.1. Database model

We view a database as a persistent store for a collection of named data items partitioned into disjointsets that we refer to as tables. (Note: although the term ‘table’ implies the relational data model,the algorithm does not appear to be restricted to this application.) A distributed database system(DDBS) is viewed as a collection of logically interrelated databases distributed among a group ofcomputer nodes that are interconnected via an assumed reliable network. In our view of the DDBS,we consider each data item to be tied to a particular database; that is, the DDBS is restricted to the



non-replicated case. In our DDBS model, a single transaction manager (TM) manages each transactionin a dedicated/centralized fashion, and multiple TMs operate independently of each other. The TMexecutes each logical database operation in client/server style by dispatching the correspondingmessage to the target database within DDBS. A concurrency-control algorithm governs the varioussynchronization mechanisms employed by the DDBS to control the execution order of databaseoperations [1,4]. This execution order is called a schedule, and it is, in general, correct if it is equivalentto any other schedule in which the transactions are executed serially (i.e. the schedule is serializable).

In our model, the execution of write operations requires support for multiple versions of each dataitem. Reducing semantic incorrectness associated with the return of an incorrect data-item versionsuggests a non-aggressive, if not conservative, synchronization technique. The requirement for anexecution schedule that is equivalent to a timestamp-ordered serial schedule necessitates the use oftimestamp ordering. Finally, improved concurrency combined with a non-aggressive synchronizationtechnique requires the predeclaration of the data items to be accessed. Combining this with therequirement for ease of use for the application developer requires that predeclaration be done at otherthan the data-item level [12,13].

3. DESCRIPTION OF RISK-FREE MVTC

The correctness constraint for all schedules produced by the RF-MVTC concurrency-control algorithmis that they are conflict serializable and computationally equivalent to a temporally ordered serialschedule for the same set of transactions. As its name implies, the operation of the RF-MVTC algorithmis further constrained to be risk-free with respect to the semantic correctness of all responses to readmessages. Basically, the risk associated with the temporary return of incorrect values followed by anabort is avoided.

It has been shown that the problem of concurrency control based on conflict serializability isdecomposable into subproblems of synchronizing operations involved in the two types of conflict,read–write and write–write [1,14]. These two subproblems can be solved independently as long astheir solutions are combined in a manner that yields an overall transaction ordering. Decomposing theoverall problem into write–write and read–write synchronization problems [12,13,15] also simplifiesthe presentation of our RF-MVTC algorithm. The design constraint for write operations is that they areexecuted immediately upon receipt by a local database. This requires support for coexisting versions ofeach data item, that is a multiversion DDBS. With a multiversion database, write–write synchronizationis trivial; each version of a data item is made unique with the globally unique timestamp of its creatingtransaction. This uniqueness effectively eliminates conflict between any two writes targeting the samedata item, thereby making the order of their execution inconsequential.

Let W-ts(xk) represent the creation timestamp of version k of data item x. Also, let ts(Ti )represent the timestamp of transaction i. The following four rules define the conditions for read–writesynchronization.

3.1. Write rule

Upon receipt of writei (x, y) from Ti , create a new data-item version xk with value y and timestampW-ts(xk) set to ts(Ti).


RF-MVTC 1295

3.2. Read rule

Upon receipt of readi (x) from Ti , return the value of data-item version xj such that W-ts(xj ) is thelargest timestamp such that W-ts(xj ) < ts(Ti).

The Write and Read rules are the same as those for MVTC. The Read rule, when combined withthe design constraint of being risk-free (i.e. avoiding semantic incorrectness associated with the returnof an incorrect version for the targeted data item), requires that a database delays its response to aread request until the correct version is both available and committed. This gives rise to the followingDelay rule.

3.3. Delay rule

A readi (x) from Ti will not be processed while either (i) an older, un-committed transaction hasdeclared for ‘update the table containing x’, or (ii) the creating transaction for the accessed data itemversion (per the Read rule) is still uncommitted.

The Delay rule ensures that every read request is returned to the correct and committed data-itemvalue. When combined with the Read rule, the Delay rule allows databases to respond to read requestswith only data values that have been written by committed transactions that immediately precede thereader in the temporal order in terms of data-item-level conflict. When combined with the Reorderrule (see Section 3.4), the second part of the Delay rule is similar to the realistic recovery as describedin [16], and offers the advantage that all responses are risk-free and no transaction will ever need to berestarted for updates of a late-arriving writing transaction.

The Delay rule is an effective read–write synchronization mechanism as long as the writingtransaction’s begin message is received at the database prior to its processing of a conflicting readfrom a younger transaction. However, it is possible for the writer’s begin message to be affected bynetwork delays (or similar) to the extent that it is received at the local database after the servicingof a conflicting read. In such cases, it is necessary to reposition the writer in the temporal order oftransactions to the earliest position that maintains that the writer is younger than all serviced readers ofthe declared tables.

3.4. Reorder rule

The timestamp ts(Tt ) of transaction Tt will be reset if it declares for update any table that has been readby a younger transaction Tt+d . The new timestamp will be set to a value such that the transaction T ′

t

is made to be younger than all serviced readers on any of its declared tables. Whenever a transaction’sbegin message arrives at a local database at which some younger reader has already accessed, there is apossibility that allowing the transaction to proceed will lead to a non-serializable schedule. This chanceexists only when the reader has read from one of the tables included in the table list of the delinquentbegin. True data-item-level conflict cannot be verified without allowing the delinquent transaction toproceed. Doing so would allow the transaction to access data-item values that may become semanticallyincorrect, should it ultimately have to be restarted due to data-item-level conflict with a previouslyserviced read from a younger transaction. Therefore, in order to provide risk-free concurrency, ourmethod allows the transaction’s timestamp to be reset (via the Reorder rule) to the earliest possible timeat which the transaction can be executed in a conflict-serializable fashion at all participating databases.



This occurs by the TM’s adoption of the maximal timestamp returned by all hosts of declared tables,followed by a re-registration phase.

4. TRANSACTIONS OPERATIONS

In this section, we describe each of the operations within a transaction and highlight the maindifferences between MVTC and the RF-MVTC.

4.1. The begin operation

The start of a transaction specification is delimited with a begin statement. If the transaction includespossible database writes, the begin statement will include the specification (i.e. predeclaration) of alltables that may possibly be updated. In the case of read-only transactions, the begin statement is simplya delimiter and triggers the TM to reset itself for a new transaction. Whenever a begin statement isprocessed at the TM, a timestamp is created and initialized to the system time. This timestamp isglobally unique, and associated with the transaction and all of its constituent operations. With thetransaction ID, timestamp and table list information as arguments, a begin message is sent by the TMto each local database hosting at least one of the declared tables; no begin messages are needed withread-only transactions.

Upon receipt of the begin message, the local database registers the transaction in its active transactionlist. If there is a record within the database of a younger transaction, then the begin message has arrivedout of order and the transaction is marked (locally) as a straggler [9,10,17]. The transaction is thenadded to the active writer list associated with each of the declared tables.

If the transaction is a non-straggler, the local database replies to the begin message with anacknowledgment, and the handling of the begin is completed at the database. (The begin processing tothis point is identical to that with ‘simple’ MVTC, and included for readability; the remainder is uniqueto RF-MVTC.) If the transaction is marked as a straggler, during the update of each table’s active writerlist, the table is checked for record of a read from a younger transaction. If one exists, the Reorder ruleis applicable, and the straggler’s timestamp within the local database is updated with the sum of theyoungest reader’s timestamp and an offset, so that the straggling writer is made to be younger thanthe youngest reader. This will reflect the earliest temporal position at which it can be safely executedwithin the local database. As a response to the begin message, a ‘too late’ exception is sent back tothe associated TM along with the value of the adjusted timestamp. The TM handles the exception byresetting its timestamp to the maximum of all such returned time stamps (from local databases hostingtables to be updated), and then resending the begin with the updated timestamp to all local databaseswith a non-maximal timestamp value.

4.2. The close operation

As mentioned in Section 4.1, the begin statement includes a declaration of all tables which may possiblybe written to the transaction. In some situations, it may be determined significantly prior to either a


RF-MVTC 1297

commit or rollback that a declared table will not be updated§. The close statement is provided for thesesituations. Databases relay all close messages to the scheduler upon receipt. The scheduler updatesthe table specified in a close message by deleting the sending transaction from the table’s associatedactive writer list. If the sending transaction is the table’s oldest writer at the time, the scheduler thensignals all reader transactions blocked on the table to reevaluate the conditions for their blockage.(Blocked readers are freed when no older writers remain in the table’s active writer list.)

4.3. The commit operation

The commit statement ends a transaction and causes data-item updates to be made permanent. The TMmaintains a list of all local databases accessed on behalf of the transaction and sends a commit messageto each. At each local database, if data has been written by the committing transaction, then othertransactions may be waiting for it to commit or rollback. Consequently, the commit processing ishandled first at the table level by the scheduler, and then at the data-item level by the data manager(DM). For each table declared by the transaction, the associated active writer list is updated byremoving the committing transaction. In cases where the committing transaction is the oldest writerin the list prior to its removal, a signal to resume is sent to all other younger readers waiting on thewriter to complete within the table. After the table-level issues are addressed, the DM receives controland commits each data item written by the transaction. All readers blocked on a data item are signaledto resume when the item is committed.

4.4. The write operation

The write statement allows the transaction to post an update to a data item in a database. The processingat the target database is simple: the DM responds to the write message by creating a new version of thedata item with the specified value and the timestamp of the transaction. Unlike MVTC, the DM neednot check for transactions in need of abortion.

4.5. The read operation

The read statement returns the value of the specified data item via the Read rule, the database respondsto a read request with the value of the data item’s latest version with timestamp less than that of therequesting transaction. This is done in a conservative fashion by blocking the reader (via the Delayrule) until all older writers are done and the targeted data-item version is committed.

Upon receipt of a read message, the database registers the transaction in its active transaction list ifsuch has not already been done. Control is then passed to the scheduler, which checks the host tablefor the data item and blocks the reader if the table’s active writer list contains a transaction that is olderthan the reader. Each time the table’s oldest writer is dropped from the active writer list because ofa commit, roll-back or close on the table, it is possible that some blocked readers may no longer beyounger than the table’s oldest writer. Whether blocked or not, the processing of the read is continued

§Although the MVTC algorithm is designed for transaction environments that are largely free of long-lived update transactions,inclusion of the close operation may improve its applicability.



by the scheduler whenever no older writer is found in the table’s active writer list. The schedulercompletes its role in the processing of the read by updating the table’s ‘youngest reader timestamp’variable if the reader is the youngest reader yet to access the table. Control is then passed on to theDM. The DM will access the version of the data item with maximum timestamp that is less than thatof the reading transaction. The DM checks the commit status of the data item version, and if it isuncommitted, the reader is blocked until the associated writer completes its commit. The DM returnsa value in response to the read whenever the accessed data-item version is or becomes committed.

4.6. The rollback operation

The rollback operation ends a transaction and causes its pending data-item updates to be dropped.TheTM maintains a list of all local databases accessed on behalf of the transaction and sends a rollbackmessage to each.

Upon receipt of the rollback message at a local database, the effects of any data item updates bythe sending transaction are first erased by the DM, and then the transaction is dropped from the activewriter lists of all declared tables by the scheduler. In cases where the transaction is the oldest writerin a list prior to its removal, a signal to resume is sent to all younger readers waiting on the writer tocommit or rollback. This order of processing is opposite of that with the commit; if it were the same, ablocked reader could be freed by the scheduler and blocked on a data-item version that was soon to beerased by the DM.

5. PSEUDO CODE OF THE RF-MVTC ALGORITHM

The high-level pseudo code listings in Listing 1 are provided to illustrate how the TM and databasecomponents operate in order to provide concurrency control according to the stated rules.

6. C-MVTO

MVTO is a well-known concurrency-control mechanism to achieve distributed data basesynchronization. We implement the C-MVTO protocol to serve as a basis for comparing our resultsobtained with the RF-MVTV protocol. Interested readers may wish to consult [1,4,18] for a completedescription of the C-MVTO algorithm.

7. SIMULATION EXPERIMENTS

In this section, we present a set of experiments we carried out to study the performance of ourscheme using functional DBMS models that implement the concurrency-control algorithms, ratherthan simulations using behavioral models. The transaction models used in our experiments consistof two types: read-only and update. For all update transactions, writesets are subsets of the readsets.Update transactions were executed in a two-step manner, so that data-item reads were completed priorto the first write. Consequently, blind writes were avoided in the experiments. A separate program was


RF-MVTC 1299

Procedure: RF MVTC Database()beginrepeatwait for msgswitch (msg)case msg is a begin operationregister writer in active transaction list

newtime := transaction’s original timestampforeach declared table

update table’s writer list with writer’s transaction infoif younger reader has already read from the tablenewtime := max(newtime, table’s youngest reader TS)

if declaring trans was older than any table readerupdate timestamp in local trans info to (newtime + offset)return reject and (newtime + offset) to calling transaction

elsereturn OK to calling trans

case msg is a read operationlocate table for data item

while older writer exists in table’s list of active writersblock reader

if reader is the youngest to read from tableupdate table’s youngest reader timestamp with that of readerlocate version of targeted data written by youngest older writer

return data item valuecase msg is a write operationcreate new version for data item

case msg is a commit operationexecute first-phase of commitif first-phase is OK at all participating local databases

foreach declared tableupdate all written data items as committedremove trans from table’s active writer listif trans was oldest in the table’s writer listsignal blocked readers to re-check block condition

elseinvoke commit recovery

case msg is a rollback operationforeach declared tabledrop all written data items

remove trans from table’s active writer listif trans was oldest in the writer listsignal blocked readers to re-check block condition

case msg is a close operationremove writer from active writer list on specified tablesif trans was oldest in the writer list

signal blocked readers to re-check block conditionuntil foreverend

Procedure: RF MVTC Trans Manager()beginrepeatwait for msgswitch (msg)case msg is a begin operation

trans sites := hosts for declared tablestrans timestamp := systemtimeforeach trans site do in parallelsend table declaration and trans info

if there was a rejectiontrans timestamp := max(rejection times)foreach site not registered do in parallelre-register with new timestamp

case msg is a read operationtrans sites := union(trans sites, site of read)send read msg to site of data itemreturn data item value to transaction

case msg is a write operationsend write msg to site of data item

case msg is a rollback operationforeach site in trans sites do in parallelsend rollback to site

case msg is a close operationforeach site in table list do in parallelsend close msg to site

case msg is a commit operationinvoke 2-phase commit

until foreverend

Listing 1.



written to generate the transactions. The input parameters for the program are number of transactions,readset size, size distribution, access pattern¶, and update probability. The program was used togenerate a fixed set of transactions for each experiment set. The same set of transactions was usedfor all runs. Each run was repeated several times.

A functional DBMS model was developed in Java for each of the RF-MVTC, MVTC concurrency-control algorithms and the C-MVTO algorithm. The models implement all required functionalityfor the scope of concurrency-control studies. Common DBMS functionality not implemented in themodels falls into the high-level areas of persistence and post-commit recovery, query processingand data abstraction. Each model includes standalone TM, database (DB), and directory componentsthat communicate with each other via Java remote method invocation (RMI). Owing to RF-MVTCincreased complexity, additional functionality is implemented in DB and TM components. With respectto the DB, the most significant extension is the addition of a data-item deleter component. The data-item deleter purges all written data item versions that remain when a transaction aborts. As this actioncan be performed in parallel with the transaction restart, the data-item deleter is implemented as itsown thread. In comparison, the other non-MVTC-specific components of the DB are activated viaJava’s RMI mechanism, and are effectively server-side extensions of the calling TM’s thread.

The TM is also extended for RF-MVTC in order to provide a mechanism for aborting transactions.This mechanism is largely provided by two components within the TM: the late write monitor and themissed write monitor. The late write monitor is responsible for retrieving the TMs of younger readersthat have already read from tables declared by the transaction in execution, and then communicatingall writes to those TMs. The missed write monitor is the final recipient of such communications; it isresponsible for monitoring for a missed write, and triggering the abort and restart of the executingtransaction when one is detected. As this activity can take place in parallel with the execution of thetransaction, both of these components are implemented with separate threads.

The experiments were performed using the JavaSoft JavaTM 2 runtime environment, standardedition. The hardware consisted of two types of platforms interconnected with a 10 Mbs LAN‖.The TMs were executed on a group of Intel Pentium III (350 MHz) PCs configured with 64 MBof memory, running Microsoft Windows NT Workstation 4.0, Service Pack 5. The DB was executedon a HP 9000/N4000 server configured with four 550 MHz PA-8600 processors, 4 GB of memory,running HPUX.

In order to assess the performance of RF-MVTC, we used the following metrics:

• Turnaround time: The elapsed time between the receipt of a transaction’s first message at thedatabase and the completion of the transaction’s commit processing at the database.

• Throughput: number of transactions executed, divided by the elapsed time, measured intransactions per second (tps).

• Fraction of blocked transactions: the concurrency-control algorithms will block the execution ofa transaction’s operation while conflict with another transaction is present. The fraction is withrespect to all transactions.

¶A two-mode selector between random and sequential. For a random access pattern, data items are chosen randomly (withoutreplacement) from the database. For a sequential pattern, a contiguous range of data items is selected from the database.‖The LAN was connecting over 1000 PCs and over 20 servers.


RF-MVTC 1301

• Percentage of reordered transactions: as covered in the descriptions of the proposed algorithms,an update transaction’s position in the temporal ordering of transactions is shifted whenever itsbegin message is too late in arriving at a database. The percentage is with respect to all updatetransactions.

7.1. Experimental results

In order to study the performance of our RF-MVTC scheme, we divide our set of experiments into twoparts.

(i) The goal of this first set of experiments is to study the behavior of the proposed RF-MVTCalgorithm in transaction environments with reduced intertransactional conflict among updatetransactions∗∗. The individual performance of each transaction type (i.e. update and read-only)is measured in experimental runs with various sizes of read-only transactions. Performance isprovided in terms of read-only transaction size and in separate sections for each of the transactiontypes; and

(ii) the goal of the second set of experiments is to characterize the performance of both read-onlyand update transactions as the mix of competing transaction types is varied.

7.1.1. Experimental set 1: varying the sizes of read-only transactions

In order to reduce the likelihood of conflict between update transactions, the writeset size was fixedat two data items. The data items selected for updates were chosen at random, thereby distributingthe probability of update uniformly across all data items within the database. In order to increase thelikelihood of conflict with the read-only transactions, the ratio of update to read-only transactions wasset at 4:1. This is accomplished in the experimental runs by restricting each TM to one of the two typesof transactions, and allowing the TMs to execute as many transactions as possible within a run. In allruns, 10 TMs were executed simultaneously against a single database. The readsets for the read-onlytransactions were selected using a sequential pattern in order to produce readsets consisting of adjacentdata items. This pattern was chosen in an effort to focus the accesses of each read-only transactionto the fewest tables without reading any data item more than once. Using this pattern, experimentruns were executed with mean readset sizes for the read-only transactions of 5–100 data items. As apercentage of the total database size, these readset sizes correspond to 1–20%.

7.1.1.1. Update transaction performance. The measured throughput for the update transactions isshown in Figure 1. As we can see, each of the three concurrency algorithms executed the mostupdate transactions for the runs with smaller readset sizes. Furthermore, our results indicate that usingeither MVTC or RF-MVTC scheme, with only two TMs executing read-only transactions, resourcecontention at the database is relatively low with the smaller transactions, and the update transactionsare consequently able to execute more quickly. With C-MVTO, however, the higher throughput for the

∗∗Since update transactions cannot issue blind writes, conflict arises from their need to first read each data item targeted forupdate.



0

20

40

60

80

100

120

0 20 40 60 80 100

Thr

ough

put (

tps)

Readset Size

RFMVTCMVTC

C-MVTO

Figure 1. Update throughput (tps).

runs with smaller read-only transactions is an indirect result of the blocks by the update transactionson read-only transactions being relatively short. Unlike the proposed algorithms, as the size of theread-only transactions is increased, the throughput with C-MVTO decreases substantially, as theduration of blocks is increased by larger readset sizes. This result was expected, since the lack of tablepredeclarations with C-MVTO means that progress on each TM’s transaction must be blocked until noother younger transactions are active. RF-MVTC, on the other hand, provided the highest throughputat an average of over 112 transactions per second††. MVTC was the middle-performing proposedalgorithm. It averaged 84% of the throughput of RF-MVTC. The measured update turnaround timesare shown in Figure 2.

7.1.1.2. Read-only transaction performance. Transaction throughput for the read-only transactionsis shown in Figure 3. As the size of the read-only transactions is increased, the necessary decreasein throughput occurs with all of the three algorithms. For experimental runs with the smallest readsetsizes (i.e. those with readsets fewer than 25 data items), the proposed algorithms yield significantlygreater throughput than C-MVTO. Without the benefit of table-level writeset predeclarations, C-MVTOmust block the execution of every read-only transaction while younger transactions are active, therebymissing the advantage of concurrency experienced with the proposed algorithms. As the size of thereadsets is increased, the duration of the blocks decreases relative to the time required for reading moredata items. For runs with readset sizes of 50 or more, throughput with C-MVTO surpasses that withMVTC and RF-MVTC. This is due to contention at the database. We also observe that RF-MVTCoutperforms MVTC in terms of throughput for runs with the smallest readset sizes. The impact of

††For this experiment, statistics are in terms of measured values only. Many graphs show interpolated plots between these pointsthat are not considered in the calculations.


RF-MVTC 1303

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 20 40 60 80 100

Tur

naro

und

time

(sec

)

Readset Size

C-MVTOMVTC

RFMVT

Figure 2. Update turnaround time (s).

0

5

10

15

20

25

30

35

40

0 20 40 60 80 100

Thr

ough

put (

tps)

Readset Size

RFMVTCMVTC

C-MVTO

Figure 3. Read-only throughput (tps).

overhead with MVTC is too large in comparison to the potential benefits, constrained by only two TMsexecuting relatively small read-only transactions. For runs with readsets of five data items, throughputwith MVTC lags that with RF-MVTC by 10%.

The measured turnaround times are shown in Figure 2. In general, the turnaround time increasesas the readsets increase. As we can see these results mirror that in Figure 1. RF-MVTC provides thehighest performance for smaller readsets. As the readset size increases from five data items, the relativeperformance of MVTC drops when compared to both RF-MVTC and C-MVTO. At 25 data items,the advantage for RF-MVTC reaches its maximum: the overall read-only transaction performance



0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 20 40 60 80 100

Tur

naro

und

time

(sec

)

Readset Size

C-MVTOMVTC

RFMVT

Figure 4. Read-only turnaround time (s).

for MVTC is only 70% of that with RF-MVTC. However, at this point the trend reverses, and therelative performance of MVTC increases with larger readset sizes. For runs with the largest readsetsize, performance with MVTC is within 5% of that with RF-MVTC. For runs with readset sizes of 25data items, the relative performance between these two algorithms is significantly affected by blocking,temporal reorders and aborts.

Recall that one of the design constraints shared by RF-MVTC and MVTC is that they executetransactions in temporal order. Both algorithms include a mechanism for reordering update transactionsthat arrive at a database after a younger transaction has completed irreversible data operations at oneof the tables targeted by the update transaction. Hence, in our next experiments, we wish to evaluatethe percentage of temporal ordering (i.e. reordered transactions).

The percentage of temporal reordering occurring in the runs for this experiment is plotted versusthe readset sizes in Figure 5. The graph shows that reorders are generally infrequent, with less than3% of the update transactions experiencing temporal reorders in all runs. The highest percentage ofreorders occurs with the RF-MVTC algorithm. The table-level writeset predeclarations mechanismalone is sufficient in limiting reorders to an average of less than 1.9%, as reordering is avoided for late-arriving update transactions that do not predeclare tables read by younger transactions. In additionto employing table-level predeclarations, the MVTC algorithm provides robustness by trading thepossibility of transaction abortion for a larger time window in which a late-arriving update transactioncan be executed in its original temporal position (i.e. with its original timestamp). The reduction inreorders is clearly seen in Figure 5; reorders with MVTC average 63% less than with RF-MVTC.

As mentioned above, the MVTC algorithm trades the possibility of transaction abortion for anincreased tolerance for late-arriving update transactions. As an abort is needed only in cases wherea transaction reads a data item that is an update target of a late-arriving older transaction, the rateof abortion with MVTC is expected to be much lower than the rate of reorders with RF-MVTC.


RF-MVTC 1305

0

0.5

1

1.5

2

2.5

3

0 20 40 60 80 100

Per

cent

age

of r

ecor

eder

ed tr

ans

Readset Size

RFMVTCMVTC

Figure 5. Percentage of recorded update transactions.

With RF-MVTC, a reorder is needed when there is table-level conflict between a younger reader andlate-arriving older writer, which is far more probable than conflict at the data-item level.

The percentage of aborted transactions with MVTC in our experiment is shown in Figure 6.The percentage is very small, with all runs for two of the readset sizes having no aborts at all.The average for all runs is 0.08%, with no run experiencing more than two aborted transactions(i.e. 0.67%). The fraction of read-only transactions that block on older transactions is shown inFigure 7. (C-MVTO is purposely omitted, since it blocks on all but the first transaction.) As we can see,the fraction of blocked read-only transactions with MVTC averages 7% less than that with RF-MVTC.It is our belief that this is due to the non-uniformity of inter-transactional conflict between the twotypes of transactions, coupled with increased reordering with RF-MVTC; the reorders with RF-MVTCappear to have put reordered update transactions into more favorable arrangements [19].

7.1.2. Experimental set 2: varying the ratios of transaction types

The goal of this experimental set is to characterize the performance of both read-only and updatetransactions as the mix of competing transaction types is varied. The set of update transactions fromexperimental set 1 is reused for this experimental set; each transaction consists of reads and writeson two randomly chosen data items. The choice of smaller sized update transactions was made in aneffort to reduce conflict between them. With respect to the read-only transactions, a subset of thosefrom the first experiment is reused: those with a mean readset size of 50 data items. These transactionswere chosen in order to provide relatively high levels of conflict with the update transactions and loadon the database, resulting from their mean size being fully 10% of the size of the database. The totalnumber of competing TMs is fixed at 10, with each TM dedicated to the execution of one of thetwo transaction types during a given run. Multiple experiment runs are executed in order to measure



0

0.05

0.1

0.15

0.2

0 20 40 60 80 100

Per

cent

age

of a

bort

ed tr

ans

Readset Size

MVTC

Figure 6. Percentage of aborted transactions.

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100

Fra

ctio

n of

blo

cked

tran

s

Readset Size

MVTCRFMVTC

Figure 7. Fraction of read-only transactions with blocks.

performance in transaction mixes consisting of 0–100% update transactions. These percentages areobtained by executing the runs with 0–10 of the 10 TMs dedicated to update transactions.

7.1.2.1. Update transaction performance. The measured throughput for the update transactions isshown in Figure 8. The figure shows that all of the three algorithms produce higher throughput withincreasing numbers of update TMs, which suggests that no algorithm reaches a saturation point forupdate transactions. Of all algorithms, RF-MVTC provides the highest level of throughput for each


RF-MVTC 1307

0

20

40

60

80

100

120

0 2 4 6 8 10

Thr

ough

put (

tps)

Number of update TMs Size

RFMVTCMVTC

C-MVTO

Figure 8. Update throughput (tps).

experiment run, with the second-best MVTC producing an average of only 81% of that for RF-MVTC.The C-MVTO performs poorly; on average, its throughput is only 22% of that with RF-MVTC.

Figure 9 shows the measured turnaround times. The extremely poor performance of C-MVTOis particularly noticeable in this graph. The drastic difference between C-MVTO and the others isexpected, since the lack of table predeclarations with it means that progress on each TM’s transactionmust be blocked until no other younger transactions are active. The drop-off of the turnaround timefor C-MVTO results from blocks on read-only transactions being displaced by shorter duration updatetransactions as the number of update TMs is increased. Regarding the execution of update transactionswithin the parameters of this experiment, it is clear that the C-MVTO algorithm is not competitive withboth RF-MVTC and MVTC algorithms.

Figure 10 depicts the measured turnaround times excluding those from C-MVTO. This exclusionallows a scale to be used that makes more visible the relative performance between the proposedalgorithms. As a result, it becomes apparent that RF-MVTC provides the quickest turnaround times forupdate transactions. MVTC yields the next-quickest turnaround times, averaging approximately 17%longer than RF-MVTC. With respect to the performance of update transactions, RF-MVTC clearlyoutperforms the other algorithms in terms of throughput and turnaround time. As we can see, a 20%performance improvement is obtained when using RF-MVTC compared with MVTC.

7.1.2.2. Read-only transaction performance. For all mixes of update transactions (i.e. numbersof TMs executing update transactions), RF-MVTC provides the highest performance for updatetransactions, and the C-MVTO algorithm provides the lowest read-only transaction performance.Throughput for the read-only transactions is shown in Figure 11. In general, throughput decreasesfor all algorithms as the mix of read-only transactions decreases (i.e. as the number of TMs executingupdate transactions increases). The decrease for C-MVTO is more gradual, since the decrease in thenumber of executing read-only transactions is offset by quicker turnaround times. For runs with 100%



0

100

200

300

400

500

600

700

800

0 2 4 6 8 10

Tur

naro

und

time(

mse

c)


C-MVTOMVTC

RFMVTC

Figure 9. Update turnaround time (ms).

0

20

40

60

80

100

120

0 2 4 6 8 10

Tur

naro

und

time(

mse

c)


C-MVTORFMVTC

Figure 10. Update turnaround time (ms).

read-only transactions (i.e. no TMs executing update transactions), the throughput for C-MVTO is only42% of that for the proposed algorithms. However, given the greater rate of decrease for the proposedalgorithms, throughput for C-MVTO surpasses that for MVTC as the mix drops below 40% read-onlytransactions (i.e. six TMs executing updates).

Unlike C-MVTO, the table predeclarations enable the proposed algorithms to execute read-onlytransactions with little or no risk of missing updates. For runs with no TMs executing updatetransactions, the proposed algorithms average 26.7 read-only transactions per second, with MVTCperforming approximately 3% less than RF-MVTC.


RF-MVTC 1309

0

5

10

15

20

25

30

35

40

0 2 4 6 8 10

Thr

ough

put (

tps)


RFMVTCC-MVTO

MVTC

Figure 11. Read-only throughput (tps).

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10

Tur

naro

und

time(

sec)


C-MVTOMVTC

RFMVTC

Figure 12. Read-only turnaround time (s).

The measured turnaround times are shown in Figure 12. Plots for the proposed algorithm arerelatively flat. This suggests that RF-MVTC (as well as MVTC) effectively executes read-onlytransactions while simultaneously executing non-conflicting update transactions, and that blocks byread-only transactions on conflicting update transactions are, on average, relatively short in duration.However, as the number of update TMs increases above 4, the corresponding increase in blocks onupdate transactions is evidenced by the slight decline in turnaround time. Unlike that with the MVTCand RF-MVTC algorithm, read-only transaction turnaround time with C-MVTO drops significantlyas the number of update TMs is increased. When no TMs are executing update transactions, average



turnaround time is over 90% longer than with the proposed algorithms; it is comparable when thenumber of update TMs is increased to only four. The decrease in turnaround time is a result of adecrease in the average block time experienced by the read-only transactions: as TMs are changedfrom read-only to update transactions, blocks on long-duration read-only transactions are displacedby blocks on shorter duration update transactions. The decrease in turnaround times for the updatetransactions is a result of the same effect (see Figure 12).

Thus our results indicate clearly that with respect to read-only transactions, the C-MVTO algorithmprovides poor performance in transaction mixes of 80% or more read-only transactions.

8. CONCLUSION

In this paper, we have presented a transaction concurrency-control algorithm that is designed toexecute transactions in a high-performance manner while producing conflict-serializable schedules thatpreserve temporal dependencies that coexist with data dependencies. The algorithm, which we referto as a RF-MVTC concurrency-control algorithm, avoids all risk associated with responses to readrequests. Extensive simulation experiments were conducted to study its performance. The experimentalresults indicate that careful implementation of RF-MVTC is a viable technique and it does improve theperformance of database systems when compared to C-MVTO. Our results indicate that with respectto the overall throughput and turnaround time, our scheme outperforms the well-known C-MVTOalgorithm substantially and the update transactions is much faster with our scheme when compared to‘simple’ MVTC and C-MVTO algorithms.

REFERENCES

1. Bernstein PA, Hadzilacos V, Goodman N. Concurrency Control and Recovery in Database Systems. Addison-Wesley:Reading, MA, 1987.

2. Badrinath BR, Ramamritham K. Semantics-based concurrency control: Beyond commutivity. ACM Transactions onDatabase Systems 1992; 17(1):163–199.

3. Sun R, Thomas G. Performance results on multiversion timestamp concurrency control with predeclared writesets.Proceedings of the 6th ACM SIGACT-SIGMOD-SIGART Symposium on i Principles of Database Systems, 1987; 177–184.

4. Ozsu MT, Valduriez P. Principles of Distributed Database Systems. Prentice-Hall: Englewood Cliffs, NJ, 1999.5. Boukerche A, Tropper C. Parallel simulation on a hypercube multiprocessor. Distributed Computing, vol. 8. Springer:

Berlin, 1995; 181–190.6. Boukerche A. Time management in parallel simulation. High Performance Cluster Computing, vol. 2, Rajkumar B (ed.).

Prentice-Hall: Englewood Cliffs, NJ, 1999; 375–394.7. Fujimoto R. Parallel Discrete Event Simulation. Wiley: New York, 2000.8. Misra J. Distributed discrete-event simulation. ACM Computer Surveys 1986; 18(1):39–65.9. Boukerche A, Das SK, Datta A, LeMaster T. Implementation of a virtual time synchronizer for distributed databases.

Proceedings of EuroPar ’98 (Lecture Notes in Computer Science). Springer: Berlin, 1998; 534–538.10. Jefferson D. Virtual time. ACM TOPLAS 1985; 7(3):404–425.11. Nicol D. The dark side of risk: What your mother never told you. Proceedings of the IEEE/ACM PADS 97, Austria, 1998.12. Boukerche A, Tuck T. T3C: A temporally correct concurrency control algorithm for distributed databases. IEEE

Symposium MASCOTS 2000, San Francisco, CA, 2000; 155–163.13. Boukerche A, Tuck T. Improving conservative concurrency control in distributed databases. Proceedings of EuroPar 2001

(Lecture Notes in Computer Science, vol. 2150). Springer: Berlin, 2001; 301–309.14. Thomasian A. Database Concurrency Control: Methods, Performance, and Analysis. Kluwer Academic, 1999.15. Miller JA, Griffeth ND. Performance of time warp protocols for transaction management in object-oriented systems.

International Journal in Computer Simulation 1994; 4(3):259–282.


RF-MVTC 1311

16. Hadzilacos V. An operational model for database system reliability. Proceedings of the 2nd ACM SIGACT-SIGMODSymposium on Principles of Database Systems, 1983; 244–256.

17. Jefferson D, Motro A. The time warp mechanism for database concurrency-control. Proceedings of the 2nd InternationalConference on Data Engineering, 1986; 474–481.

18. Peckham J, Fortier P, MacKellar B. Advanced database system concepts. Database Systems Handbook. McGraw-Hill:New York, 1997; 509–524.

19. Boukerche A. Synchronization issues in large scale distributed systems. Technical Report, University of Ottawa,2004 (in progress).


Documents

RF-MVTC: an efficient risk-free multiversion concurrency control algorithm