16
Directed Dependency Graph based Concurrency Control for Persistent Systems Maurice G. Ashton & Frans A. Henskens 1 Discipline of Computer Science & Software Engineering School of Electrical Engineering & Computer Science University of Newcastle N.S.W. 2308, Australia 1 Overview Persistent stores abstract over all aspects of storage including the distinction between primary and secondary storage. One consequence of this abstraction is that the store appears to the user to be free from failure. Since computers are vulnerable to failure, persistent systems typically provide mechanisms to support this appearance of a failure-free store. These allow the system to recover automatically from store failure to a self-consistent state, exhibiting a property called stability. A persistent store is said to be stable if it automatically recovers to a consistent state after a failure that has prevented orderly system shutdown. Stability in persistent stores is typically provided using operations called checkpoints that flush all modified data currently held in main memory to disk, and atomically creates a snapshot of the store at that moment. Early stability schemes, for example [9], checkpointed the entire store at once, requiring processing on the store to cease during a checkpoint operation. In a multi-user store involving multiple nodes this would result in unacceptable degradation of performance. Accordingly, systems have been developed which checkpoint parts of the store independently (for example [11]). The stable state of such a store is the collection of these stable parts. Checkpointing parts of the store independently however, creates the possibility of logical inconsistencies between data objects because the state modified data from one object may influence the way a process modifies data in some other object. As a result these objects have a dependency relationship that must be considered when checkpointing either of them. Such dependencies have been described using associations in Casper [11], and more recently using directed graphs of nodes representing entities [4]. 2 DDG-based Stability These directed graphs, termed Directed Dependency Graphs (DDGs) are maintained by the operating system as follows: Symbol Description E 1 , E 2 P, O Entities, either data objects or processes. Process and object respectively. An edge in a DDG representing a dependency between two entities. eg E 1 E 2 means that E 1 depends on E 2 . An edge is a DDG representing bidirectional dependency between two entities. E 1 E 2 means that E 1 depends on E 2 and E 2 depends on E 1 . Table 1 DDG Notation. When a process reads an unmodified object, no edge is added to the DDG. When a process P reads a modified object O, the edge P O is added, if it does not already exist, to the DDG(s) including P and O. When a process P modifies an object O, the edge P O is added if it does not already exist, to the DDG(s) including P and O. 1 Corresponding author: [email protected]

Directed Dependency Graph based Concurrency Control for ...wossa2004/HTML/04-Maurice-paper.pdf · Directed Dependency Graph based Concurrency Control for Persistent Systems ... Transactional

  • Upload
    others

  • View
    22

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Directed Dependency Graph based Concurrency Control for ...wossa2004/HTML/04-Maurice-paper.pdf · Directed Dependency Graph based Concurrency Control for Persistent Systems ... Transactional

Directed Dependency Graph basedConcurrency Control for Persistent Systems

Maurice G. Ashton & Frans A. Henskens1

Discipline of Computer Science & Software EngineeringSchool of Electrical Engineering & Computer Science

University of NewcastleN.S.W. 2308, Australia

1 Overview

Persistent stores abstract over all aspects of storage including the distinction between primary and secondary storage.

One consequence of this abstraction is that the store appears to the user to be free from failure. Since computers are

vulnerable to failure, persistent systems typically provide mechanisms to support this appearance of a failure-free store.

These allow the system to recover automatically from store failure to a self-consistent state, exhibiting a property called

stability.

A persistent store is said to be stable if it automatically recovers to a consistent state after a failure that has prevented

orderly system shutdown. Stability in persistent stores is typically provided using operations called checkpoints that

flush all modified data currently held in main memory to disk, and atomically creates a snapshot of the store at that

moment. Early stability schemes, for example [9], checkpointed the entire store at once, requiring processing on the

store to cease during a checkpoint operation. In a multi-user store involving multiple nodes this would result in

unacceptable degradation of performance. Accordingly, systems have been developed which checkpoint parts of the

store independently (for example [11]). The stable state of such a store is the collection of these stable parts.

Checkpointing parts of the store independently however, creates the possibility of logical inconsistencies between data

objects because the state modified data from one object may influence the way a process modifies data in some other

object. As a result these objects have a dependency relationship that must be considered when checkpointing either of

them. Such dependencies have been described using associations in Casper [11], and more recently using directed

graphs of nodes representing entities [4].

2 DDG-based Stability

These directed graphs, termed Directed Dependency Graphs (DDGs) are maintained by the operating system as follows:

Symbol Description

E1, E2

P, O

Entities, either data objects or processes.

Process and object respectively.

An edge in a DDG representing a dependency between two entities. eg E1 E2 means

that E1 depends on E2.

An edge is a DDG representing bidirectional dependency between two entities. E1

E2 means that E1 depends on E2 and E2 depends on E1.

Table 1 DDG Notation.

• When a process reads an unmodified object, no edge is added to the DDG.

• When a process P reads a modified object O, the edge P O is added, if it does not already exist, to the

DDG(s) including P and O.

• When a process P modifies an object O, the edge P O is added if it does not already exist, to the DDG(s)

including P and O.

1 Corresponding author: [email protected]

Page 2: Directed Dependency Graph based Concurrency Control for ...wossa2004/HTML/04-Maurice-paper.pdf · Directed Dependency Graph based Concurrency Control for Persistent Systems ... Transactional

• is transitive, but not symmetric i.e. if (E1 E2) and (E2 E3), then it is implied that (E1 E3), but E1

E2 does not imply E2 E1.

• The right hand side of a dependency relation may depend on the left hand side through transitivity (for instance

where E1 E2, E2 E3 and E3 E1 it follows that E2 E1).

• When a process belonging to a DDG reads a modified object or modifies an object that belongs to another

DDG, the two DDGs are merged using one of the above edges to create a single larger graph.

• As shown in Table 1, an edge represents both S

(i.e. in terms of checkpoint or stabilise dependency) and

R (i.e. in terms of rollback dependency). Thus E1 E2 implies that checkpoint of E1 propagates to E2 (but

checkpoint of E2 does not propagate to E1) and that rollback of E2 propagates to E1 (but rollback of E1 does not

propagate to E2). A consequence of this is that if E1 E2, checkpoint and rollback of either entity propagates

to the other.

• A DDG shrinks when a set of dependent entities is checkpointed or reverts to its last stable state (rolls back).

Once a checkpoint or rollback operation is initiated for an entity E, the operation propagates to each entity that

is reachable from E in the DDG to which E belongs. Then, because each involved entity is now stable, all

edges attached to them are removed.

• At any instant each entity belongs to one and only one dependency graph. To find the set of entities dependent

on any entity, it is sufficient to find the entity in its graph and then, subject to the kind of operation, traverse

the directed graph starting from the entity. Thus the set of dependent entities may differ for entities in the

same DDG.

At any instant each entity belongs to one and only one dependency graph. To find the set of entities dependent on any

entity, it is sufficient to find the location of the entity in its graph and then, subject to the kind of operation, traverse the

directed graph starting from the entity. Thus the set of dependent entities may differ for entities in the same DDG.

Dependency Graph Stabilising Graph Rollback Graph

S R

S R

S And

S Rand

R

Table 2 The relationship between edges in DDGs,

stabilise graphs and rollback graphs [4].

With appropriate hardware support, it is possible to lazily construct DDGs by updating them to record dependency data

at the completion of each process time slice [4]. This assumes that dependency is recorded at the virtual page rather

than the individual object level, thus utilising and extending hardware that is typically already present to support virtual

memory management. Conventional virtual memory management requires the presence of status data indicating

whether the content of an in-memory page has been modified since the page was loaded (i.e. whether the page is dirty),

allowing the system to determine whether the page must be flushed to disk before the page frame it occupies can be re-

used. In order to efficiently support stabilise and rollback operations, it is necessary to distinguish between in-memory

pages that are unstable and unflushed (DIRTY) and unstable but flushed (MODIFIED). A page would have

MODIFIED state if, for instance, as part of virtual memory management it had been loaded, modified, flushed and then

reloaded.

Pages may remain in main memory for a period encompassing many process activations. The M_ACCESSED status

data allows detection of process access to modified object data during the process' current time-slice. This status data is

set for a page if the page is accessed while the MODIFIED status for the page is set. Dependencies between a process

and the pages containing modified objects (pages with the M_ACCESSED status set) are represented by the addition of

appropriate edges to the dependency graph at the conclusion of the process' period of activation. All

M_ACCESSED status data must be cleared at the commencement of a process time-slice. This may be achieved in a

single operation using appropriate hardware.

Page 3: Directed Dependency Graph based Concurrency Control for ...wossa2004/HTML/04-Maurice-paper.pdf · Directed Dependency Graph based Concurrency Control for Persistent Systems ... Transactional

The inclusion of WRITTEN status data allows detection of page data modifications made by the current process. This

data is distinct from the MODIFIED status described previously because it describes the modification behaviour of the

current process rather than the status of the virtual page. The WRITTEN status is set together with the MODIFIED and

DIRTY status, but is cleared as part of the dependency graph update at the conclusion of the process time-slice. In

contrast the MODIFIED status is cleared at the next object checkpoint and the DIRTY status is cleared when the page is

flushed to disk. Pages with the WRITTEN status set cause the inclusion of an appropriate dependency graph edge.

Operation of the described status data is shown in Table 3.

Status Data

Operation

DIRTY MODIFIED M_ACCESSED WRITTEN

Unmodified page retrieved Cleared Cleared Cleared Cleared

Modified page retrieved Cleared Set Cleared Cleared

Process reads data from page Unchanged Unchanged Copy modified Unchanged

Process writes to page Set Set Set Set

End of process time-slice Unchanged Unchanged Cleared Cleared

Page flushed Cleared Unchanged Unchanged Unchanged

Object checkpoint Cleared Cleared Unchanged Unchanged

Table 3 Effect of operations on page status data [4].

Ideally dependencies would be recorded on a per-object basis as this reflects the “natural” granularity of data. Object

based dependency recording has been proposed for Monads-MM [10]. However this requires more complex data

structures and adds further overhead to the work of the ATU.

3 Concurrency Control

Most descriptions of concurrency control concentrate on the flat transaction model used for database systems. This

model represents one extreme of concurrency models ranging from isolation to cooperation. The transactional

concurrency control model enforces isolation and hides concurrency from the user. At the other extreme, concurrency

is achieved by cooperation between users. It is not clear which model of concurrency is most suited to persistent

systems. Some researchers [3] regard the cooperative model as the most appropriate while others [6] prefer to offer a

choice of models.

Transactional concurrency control techniques ensure that a set of concurrent transactions produce the same results as if

they had been executed serially, and may be broadly categorised as optimistic or pessimistic. Pessimistic schemes

typically use locks to prevent other concurrent transactions from accessing objects that are being used by the locking

transaction. Optimistic schemes proceed without locking but examine the transaction before it is committed to

determine its isolation, leading to a decision to commit or abort. Both approaches have advantages and disadvantages:

pessimistic methods achieve concurrency at the cost of wasted time, while optimistic methods achieve concurrency at

the cost of wasted work [2].

3.1 Using DDGs to support concurrency control

The stability mechanism described in [4] uses directed graphs to record dirty-read and write dependencies between

processes and entities. This information, recorded in directed dependency graphs (DDGs), is used to reduce the

‘domino effect’ of checkpoint and rollback operations found in other stability schemes.

It was recognised that this information formed part of that required for determining transaction isolation. This

observation led to the development of the new concurrency control technique called DDG Concurrency Control (DCC)

described in this paper.

3.2 Adapting Stability DDGs for Concurrency Control

Four issues leading to differences in data required to support stability and concurrency control schemes were identified:

Page 4: Directed Dependency Graph based Concurrency Control for ...wossa2004/HTML/04-Maurice-paper.pdf · Directed Dependency Graph based Concurrency Control for Persistent Systems ... Transactional

1. The DDG stability technique maintains dependency information on a per process basis. On the other hand

concurrency control requires dependency information on a per transaction basis. Transactions use processes as

agents to carry out their operations.

2. The DDG stability technique records information about dirty-read and write accesses. Clean-read accesses

must also be recorded to support transaction isolation.

3. Stability checkpoints and transaction commits have different semantics.

4. The stability mechanism uses physical page granularity for recording dependencies.

The implications of these are discussed below.

3.2.1 The Relationship between Transactions and Processes

A transaction is an abstract concept that includes the user-defined boundaries, the required data resources, and accesses

(that may including mutation) to that data. Processes are entities that execute the activities specified by transactions. A

transaction may use a single process, a group of processes, or share a process (or processes) with other transactions to

execute its activities. A process executing on behalf of many transactions executes instructions for only one transaction

in any given timeslice [2].

Because of this relationship between transactions and processes, dependencies created by process activity are

appropriately viewed as existing between transactions and other entities rather than between each process and other

entities.

3.2.2 Dependencies

The DDGs used for supporting stability record edges for:

• Dirty-read accesses occurring when a process reads from an entity that has been modified since the last

checkpoint.

• Write accesses occurring when a process modifies an entity.

Clean-read accesses are not recorded in stability DDGs because these accesses do not produce dependencies that are of

consequence to stability operations.

For a transaction to be considered isolated it must be guaranteed that the transaction has not seen an inconsistent state of

the data. Consider two concurrent transactions Ta and Tb. Transaction Tb modifies a set of entities E. Transaction Tareads some of these entities before transaction Tb has modified them and some of them after. This represents an

inconsistent view of the data for Ta, compromising its isolation. Such a transaction must be aborted.

If during the time that transaction Ta is in progress, some other transaction Tb modifies a set E of one or more entities,

then Ta is not compromised by Tb if transaction Ta reads from E, and its reads occur before Tb’s writes or after Tb’s

writes but not both.

Thus for DDGs to be used for determining transaction isolation it is necessary to record information about transactions’

clean-read, dirty-read, and write accesses.

3.2.3 Stability Checkpoints and Transaction Commits

Stability mechanisms provide the following transaction-related properties:

1. The abstraction of a stable computational store.

2. A logically consistent store restart state at all times.

3. Concurrency control at process level.

Full transaction support requires these properties to be augmented as follows:

1. Support for transaction-based events associated with programming language key words such as, for example,

BEGIN-TRX and COMMIT-TRX, used to define the extent of each transaction.

2. The extent described in (1) defines an atomic unit of work performed on the store that is isolated from any

other concurrent activity.

3. The means for managing concurrency must be flexible enough to cope with run-time determination of the

temporal extent and physical granularity of interaction.

One consequence of these requirements is that the transaction management system must have control over the timing of

checkpoints that correspond to COMMIT-TRX operations.

Page 5: Directed Dependency Graph based Concurrency Control for ...wossa2004/HTML/04-Maurice-paper.pdf · Directed Dependency Graph based Concurrency Control for Persistent Systems ... Transactional

3.2.4 Granularity Considerations

Granularity of concurrency control for databases may be applied at field, record, record range, table, and database

levels. Finer granularity offers the opportunity for more concurrent activity but at the cost of extra overhead, while

course-grained control requires fewer overheads but reduces the opportunities for concurrent activity. Page locking is

also available for supporting concurrency control by some commercial systems. For example Oracle [8] and Microsoft

SQL Server [7] offer optimistic concurrency control at page level and pessimistic concurrency control at record level.

Oracle documentation [8] suggests using page locking where transactions are short and several records from the same

page are accessed at a time. On the other hand record locking should be used where transactions are long and access

many records in a table.

Typically a page may hold several data objects. Consider the situation where two objects A and B reside in the same

page and are modified separately by transactions Ta and Tb respectively. Ideally these two transactions should be

regarded as isolated, but because detection occurs at page rather than object level, they are considered to be dependent

on each other. For example, in the absence of extended inter-dependency rules, if transaction Ta committed using a

page stabilise operation, the modifications performed by transaction Tb on B would also be stabilised even though Tb

may not be ready to commit.

The DDG stability mechanism [4] on which DCC is based records page-level access information. A consequence of

extending this technique to support concurrency control is that, in the absence of any other mechanism, it limits

available granularity of concurrency control to the page level.

4 A Description using Set Notation

The set of entities accessed by a transaction T may be divided into three subsets defined as follows:

1. The clean-read set (CR): comprises entities that were in an unmodified state when read by transaction T.

2. The dirty-read set (DR): comprises entities that had been modified by some other as-yet uncommitted

transaction before being read by transaction T.

3. The write set (W): comprises entities that have been modified by transaction T.

The isolation of a transaction Ta may be determined by examining the intersection of Ta’s sets with those of other

transactions.

• In section 4.1, the state of isolation of a transaction Ta is examined after the transaction has completed all its

access operations and is about to commit.

• Section 4.2 describes those situations where the isolation of a transaction Ta has been compromised. If it is

determined that a transaction is not isolated before it has completed its operations, there is no point in

continuing the transaction; it should be aborted and its completed operations rolled-back.

4.1 Transaction Isolation Prior to Commit

In the following discussion the isolation of a transaction Ta is determined after it has completed all its operations and is

about to commit. To establish the isolation of transaction Ta, it is necessary to consider the intersections of Ta’s access

sets with the write sets of all overlapping (in the sense of transaction start and finish times) transactions Tb [5]. There

are three situations to consider:

1. Ta and Tb are mutually exclusive.

Ta has not accessed any entity modified by Tb. This is described symbolically as follows:

DRa Wb = and Wb (CRa Wa) =

Ta is isolated from Tb and may be committed.

2. Ta’s clean-read set intersects with Tb’s write set.

Symbolically this is described as:

CRa Wb and Wb (DRa Wa) =

Ta is isolated from Tb where, if Ta has clean-read entities subsequently modified by Tb, then Ta has not dirty-

read or written to entities modified by Tb. In this situation Ta is logically isolated from Tb and may be

committed.

3. Ta’s dirty-read set interects with Tb’s write set.

Symbolically this may be described as

Page 6: Directed Dependency Graph based Concurrency Control for ...wossa2004/HTML/04-Maurice-paper.pdf · Directed Dependency Graph based Concurrency Control for Persistent Systems ... Transactional

DRa Wb and Wb (CRa Wa) =

Ta has seen a consistent view of the entities modified by Tb but it is not isolated from Tb. If Tb commits and its

write set Wb is stabilised, then Ta becomes isolated from Tb and is thus able to commit. If Tb cannot commit

and is rolled-back (see section 5.3.4) then Ta must also be rolled-back. Where there are multiple reads by Ta of

dirty data interleaved with writes by Tb, Ta must be rolled back as Ta has seen an inconsistent view of the

entities modified by Tb. This is covered in more detail in section 5.3.2.

4.2 Compromised Transactions

1. Figure 1 illustrates the situation where transaction Ta has an inconsistent view of the set of entities modified by

Tb. Since decisions made by Ta based on this view are potentially inconsistent, the transaction must be aborted.

This consideration applies even if Tb successfully commits. (See further discussion in section 5.3.5.)

Figure 1 Compromised read-sets.

2. In Figure 2 either Ta or Tb has modified at least one entity that the other transaction has modified but not

committed, thus violating isolation. Since Ta is not isolated it cannot commit (neither can Tb commit).

Figure 2 Compromised write-sets.

4.3 Summary

This discussion has shown that a transaction’s state of isolation can be determined providing information is kept about

the clean-read, dirty-read (including multiple dirty reads as discussed in section 5.3.2) and write sets for each

transaction. Such information is collected and stored in DDGs at the end of each process context switch. The details of

this mechanism are described in the next section.

5 Operations on the Directed Dependency Graph

5.1 Access Representation

The DDG concurrency control mechanism creates edges between transactions and accessed entities as follows:

1. A clean-read edge is recorded as “—”. T — E indicates that transaction T has read an unmodified entity E.

2. A dirty-read edge is recorded as “ ”. T E records that process T has read an entity E that had been

previously modified since its most recent checkpoint.

CRa

DRa

Wb

Wa

CRa

DRa

Wb

Wa

Page 7: Directed Dependency Graph based Concurrency Control for ...wossa2004/HTML/04-Maurice-paper.pdf · Directed Dependency Graph based Concurrency Control for Persistent Systems ... Transactional

3. A write edge is recorded as “ ”. T E indicates that process T has modified entity E since it was last check-

pointed.

Transaction concurrency control is incorporated into the existing stability system as follows:

• At the commencement of a transaction, the initiating process must exist in a single-node DDG. If that is not

the case, the process must initiate a stabilise operation, with isolation being the consequence. The process is

then part of a DDG associated by the system with the fledgling transaction.

• As the process (and any parallel processes incorporated in the transaction) interacts with entities in the store,

, and — edges are used to incorporate the entities into the transaction DDG. Construction of the graph is

achieved lazily on process switch using access data collected as described above during each process time

quantum.

• Edges have a precedence order —, , with the rule that insertion of an edge to the right in this order will

replace an edge to the left. An edge to the right will not be replaced by an edge to the left; indeed an edge to

the left will not be inserted if it occurs after an edge to the right.

• If there are no existing edges between any transaction node and the accessed entity node, the appropriate edge

is added and the entity belongs to (and becomes a node in) the same DDG as the transaction.

• If all prior edge(s) between other transaction nodes and the node representing the accessed entity are to nodes

in the same DDG as the process, the appropriate edge is inserted subject to the precedence rule.

• If one or more edges exist between other transaction nodes and the node representing the entity, the system

either inserts the appropriate edge or causes the transaction abort (rollback) operation(s) as described below.

• During each transaction DDG update, the system analyses any graph merge operations and determines whether

the merge causes a violation of transaction isolation and whether any transaction must be aborted as a result.

• A transaction that completes, i.e. whose DDG could be constructed without a need for transaction rollback,

commits by stabilising its transaction DDG.

• A transaction that aborts has its transaction DDG rolled back.

In the following discussion of the effect of DDG edge insertion, a transaction Ta accesses an entity En, creating a new

edge. Transaction Tb is another transaction that has already accessed En. Decisions on the validity of Ta's edge-

producing access are made by considering the edge to be inserted Da with respect to each individual existing edge

between En and each other concurrent transaction Tb, as follows (this discussion assumes that the system has already

determined that there is no existing edge of higher or equal priority to Da between En and Ta):

5.2 Edge Insertion Rules

1. If there is no edge between any Tb and En the new edge is inserted. It must be either a clean read or a write

edge.

Figure 3 Addition of a new edge (may be either read or write).

2. If there is an existing — edge between Tb and En and the access by Ta is a read, a new — edge is inserted

between the node representing En and the node representing Ta.

TaEnTa

En

(before)

(after)

TaEn

Page 8: Directed Dependency Graph based Concurrency Control for ...wossa2004/HTML/04-Maurice-paper.pdf · Directed Dependency Graph based Concurrency Control for Persistent Systems ... Transactional

Figure 4 Adding a read edge when there is an existing read edge.

3. If there is an existing — edge between Tb and En and the access by Ta is a write, a new edge is inserted

between the node representing En and the node representing Ta. At the same time it is necessary to ensure that

Tb has not dirty -read another entity that has been modified by Ta. If this is the case then Tb is forced to abort

(rollback) since Tb has seen an inconsistent view of the database.

Figure 5 Adding a write to an existing read.

4. If there is an existing , edge between Tb and En and the access by Ta is a read, a new edge is inserted

between the node representing En and the node representing Ta. At the same time it is necessary to ensure that

Ta has not previously clean-read another entity that has been modified by Tb. If this is the case then Ta is

forced to abort (rollback) since it no longer isolated from Tb. If not, Ta is allowed to continue optimistically in

the hope that Tb commits before Ta thus rendering Ta isolated.

Figure 6 Adding a read to an existing write.

5. If there is an existing , edge between Tb and En and the access by Ta is a write, a is added between the

node representing Ta and En as illustrated in Figure 7. Both transactions must be aborted (rolled back).

Figure 7 Adding a write to a write.

5.3 Special Situations

5.3.1 Cycle Formation

A cycle may develop between two or more transactions where transactions with dirty reads are waiting for writing

transactions to commit as illustrated in Figure 8. This leads to a deadlock situation, as the transactions in the cycle

can never be committed. The DDG management software must detect the formation of cycles and abort sufficient

transactions to break the deadlock. There is the potential to form very large cycles with a consequent loss of

performance caused by the rollback. In the simulation experiments described in the next chapter, cycle formation

was monitored. The largest observed cycle involved only two transactions.

(before) (after)

TaEn

Tb TaEn

Tb

(before) (after)

TaEn

Tb TaEn

Tb

(before) (after)

TaEn

Tb TaEn

Tb

(before) (after)

TaEn

Tb TaEn

Tb

Page 9: Directed Dependency Graph based Concurrency Control for ...wossa2004/HTML/04-Maurice-paper.pdf · Directed Dependency Graph based Concurrency Control for Persistent Systems ... Transactional

Figure 8 A cycle of dirty-read/write dependencies.

5.3.2 Repeated Writes

When a transaction Ta has added a dirty-read edge to an entity, the writing transaction Tb may perform a further

write to the same entity. (It is assumed that a transaction is performing operations that are internally consistent.)

While that does not add another write edge between Tb and En, further access by Ta must be detected if it occurs.

Such an action represents a violation of isolation and Ta must be rolled back. To enable detection of this situation,

such writes by Tb cause the dirty-read edge from Ta to be marked as a final dirty-read as shown in Figure 9. This is

not regarded as a new kind of dependency, but rather as an indicator that any further access by Ta violates its

isolation.

Figure 9 Adding a final dirty-read edge.

5.3.3 Transaction Commit

When a transaction commits, one consequence is that the edges associated with the transaction must be removed from

the DDG. There are two special cases to consider.

5.3.4 Dirty-read Dependency

When a transaction has completed all its accesses, it is necessary to determine whether the transaction is isolated. The

situation may occur where the transaction has one or more dirty-read dependencies. Such a transaction cannot be

committed because it depends on uncommitted modifications by other transactions. There are two possible policies that

may be used:

1. Wait until the other transaction(s) have committed. This policy leaves the waiting transaction open to the

compromise of its isolation by the actions of other transactions and the possibility that it may later need to be

rolled back.

2. Immediately rollback the transaction. This policy is pessimistic in the sense that it precludes the possibility

that the writing transaction eventually commits, rendering the waiting transaction isolated.

It may be that runtime monitoring indicates better throughput for a particular policy in some situations.

5.3.5 Clean-reads by Other Transactions

The second situation requiring special treatment is where the committing transaction Ta has modified entities that have

been clean-read earlier by another still-current transaction Tb. Such an interaction does not compromise the isolation of

Ta. However, if Tb accesses any entity modified by Ta, even after Ta has committed then Tb has seen an inconsistent

view of the database and must be aborted. Under normal circumstances when Ta is committed, its edges are removed

from the DDG thus removing information about which entities Ta has modified. Tb could then access one of the

modified entities without any indication that it had viewed a potentially inconsistent state of the database.

(before) (after)

TaEn

Tb TaEn

Tb

final

TaEn1

Tb

En3

Tc

En2

Page 10: Directed Dependency Graph based Concurrency Control for ...wossa2004/HTML/04-Maurice-paper.pdf · Directed Dependency Graph based Concurrency Control for Persistent Systems ... Transactional

This inconsistent access may be detected by adding a blocked edge ( ) between Tb and En where En has been

modified by Ta as shown in Figure 10. A blocked edge is not a dependency edge in the same sense as a dirty-read or a

write edge, and does not need to be considered in interactions with other transactions. These blocked edges must persist

until Tb is committed or rolled back.

Figure 10 Detecting inconsistent reads after transaction committal.

5.3.6 Transaction Rollback

Transaction rollback removes the DDG edges between the rolled back transaction and the entities it has modified.

Other transactions with dirty-read edges to entities modified by the rolled back transaction must also be rolled back as

they are dependent on the uncommitted state of the entity.

Figure 11 Transaction Rollback.

In Figure 11, if Tb rolls back, Ta must also roll back. On the other hand, if Ta were to rollback, Tb would not be required

to rollback as it is not dependent on the actions of Ta.

6 Relationship to Non-transactional Processes

Conventional DBMSs protect shared data by ensuring every access to the data complies either explicitly or implicitly

with transactional requirements. By contrast, in a persistent system any process may access any data object for which

the process has appropriate access rights. Thus in a persistent environment, and in the absence of any further protection

mechanism, it is possible for a data object involved in a transaction to be accessed by processes not involved in that

transaction. This could lead to a loss of integrity for the transaction.

Two options may be considered for preserving transactional consistency in persistent systems:

1. Enforce transaction behaviour on access to all data objects in the persistent store. The persistent store is shared

between all processes and thus applying transactional behaviour to the whole store provides the necessary

consistency. Such an approach is regarded as inflexible as it restricts computation to the transaction model

only [1].

2. Provide support for coexistent transactional and non-transactional activity.

In the remainder of this section, the relationships between transactional and non-transactional activities in persistent

systems are discussed together with mechanisms for supporting their coexistence.

6.1 The Computational Space

A stable store may be viewed as a collection of objects acted on by a set of processes, as illustrated in Figure 12. As

processes interact with objects, dependencies form between processes and objects as described in [4]. Where stability is

implemented using incremental checkpointing, these dependencies are recorded and used to determine which data

objects are written to the durable store.

(before) (after)

TaTb

E2

E1

Tb

E2

E1

TbTa

E2

E1

Page 11: Directed Dependency Graph based Concurrency Control for ...wossa2004/HTML/04-Maurice-paper.pdf · Directed Dependency Graph based Concurrency Control for Persistent Systems ... Transactional

Stable Store

Object

Process

Dependency

Legend

Figure 12 Dependencies between entities in a stable store.

Activity in a stable store may be regarded as transactional in the sense that modifications performed by a process (or a

set of dependent processes) between successive checkpoints are either all stabilised together, or none of them are.

Stability checkpoints are typically system initiated whereas transaction commits are specified in user program code. It

thus differs from transactional commit operations that must correspond to user-specified transaction boundaries (e.g.

BEGIN-TRX and COMMIT TRX).

Stability and transactions mechanisms differ in the way they manage the effects of concurrent activities. In a stable

store objects may become transitively dependent on other objects through the actions of multiple processes [4]. There is

no requirement to apply transactional semantics to such dependencies and in fact to do so would be overly restrictive

[1]. On the other hand, in a transactional environment, dependencies between entities are governed by rules of

transactional isolation. An attempt to execute a transaction that fails to comply with these isolation requirements results

in the transaction being aborted and its actions rolled back.

In summary, a transactional system must support, in addition to the features provided by stability mechanisms:

1. Response to transaction events such as BEGIN-TRX and COMMIT-TRX,

2. Atomicity and isolation, and

3. Flexibility to cope with runtime determination of the extent of transaction actions.

It has been asserted that requirements of transactional concurrency are at odds with the properties of orthogonal

persistence and that to achieve these goals requires separate persistent and non-persistent worlds [1].

The approach presented in this work resolves this problem by dynamically partitioning the stable store into two spaces,

• the stable store, where stability operations are managed by the operating system, and

• the transaction-managed store where the transaction manager manages stability operations.

This approach is illustrated in Figure 13. Using the view expressed in [1], the stable store can be seen to be under the

control of an all encompassing “transaction”, beginning with the completion of one checkpoint and extending to the

next checkpoint. By default all data entities exist and execute within the stable store which manages all non-

transactional activity. On the execution of an event indicating transaction activity, responsibility for the stability of

entities involved in that transaction passes to the transaction manager. The transaction manager then assumes

responsibility for durability of entities modified by the transaction and also for transaction atomicity and concurrency

control thus ensuring the transaction conforms to the required ACID properties. In other words entities involved in

transactions are moved to the transaction-managed space for the duration of the transaction as illustrated in Figure 13.

Page 12: Directed Dependency Graph based Concurrency Control for ...wossa2004/HTML/04-Maurice-paper.pdf · Directed Dependency Graph based Concurrency Control for Persistent Systems ... Transactional

Stable Store Transactional Store

Object

Process

Dependency

Legend

Figure 13 Entity management.

A transaction continues until either:

• the event associated with a COMMIT-TRX instruction is executed, signalling the successful completion of the

transaction,

• the event associated with an ABORT-TRX instruction is executed signalling the unsuccessful completion of

the transaction, or

• an unexpected system shutdown occurs, which also results in the unsuccessful termination of the transaction.

After any of these actions or events, management reverts to the stable-store. The programmer is unaware of the

movement of data between the management schemes in the same way that the programmer is unaware of the data

movement between the durable and computational store. In this way orthogonality of persistence is maintained and

durability of transactions is decoupled from durability provided by the stability scheme.

Store partitioning raises issues about managing interaction between transactional and non-transaction activities. These

issues are discussed in the following section.

6.2 Managing Transaction – Non-Transaction Interaction

This section considers the consequences of interactions between transactional and non-transactional processes. At the

end of a timeslice the activities executed by the process during that timeslice are recorded in the DDG and decisions are

made about the subsequent actions of any involved processes. In the case of a transactional process, the consequences

of those accesses are used to determine whether the transaction should continue or be aborted and its actions rolled

back.

There are two ways that transactional and non-transactional processes interact. Either:

• a transaction accesses a stability-managed entity, or

• a non-transactional process accesses a transaction-managed data entity.

In the discussion that follows, T represents a transactional process, N represents a non-transactional process, and E

represents an entity accessed by both T and N.

6.2.1 Transactional Access to Stability-Managed Entities

In this case a transactional process T reads or modifies an entity E that is represented as a node of a DDG in stability-

managed space. A stability-managed entity E is either:

1. an unmodified entity represented by a single node stability DDG, or

2. a modified entity represented by a node in a multimode stability DDG. A data entity only exists in such a

DDG because that data entity has been modified since it was last checkpointed. (Note: clean-reads are not

recorded in stability DDGs.) Consequently any data node in a multi-node DDG must be connected by at least a

single write edge to another node in that DDG.

The possibilities for the access performed by the transaction are:

Page 13: Directed Dependency Graph based Concurrency Control for ...wossa2004/HTML/04-Maurice-paper.pdf · Directed Dependency Graph based Concurrency Control for Persistent Systems ... Transactional

1. a transactional process reads an unmodified data entity

2. a transaction modifies an unmodified data entity.

3. a transactional process read a modified data entity.

4. a transactional process wrote to a previously modified data entity.

These cases are discussed individually in Sections 6.2.1.1 – 6.2.1.4

6.2.1.1 A Transactional Process Modifies an Unmodified Entity

When a transactional process T reads an unmodified stability-managed unmodified entity E a clean-read edge is added

between T and E.

Figure 14 A transaction accesses an unmodified data entity.

As a consequence of this action the accessed entity is moved from the stability-managed space to the transaction-

managed space.

6.2.1.2 A Transactional Process Modifies an Unmodified Entity

When a transactional process T modifies a stability-managed unmodified entity E a write edge is added between T

and E.

Figure 15 A transaction modifies a previously modified data entity

As a consequence of this action the accessed entity is moved from the stability-managed space to the transaction-

managed space

6.2.1.3 A Transaction Reads a Modified Entity

When a transactional process T reads an stability-managed modified entity E connected by a write edge to a non-

transactional process N a non-final dirty-read edge is added between the nodes representing T and E.

Figure 16 A transaction reads a modified data entity.

The consequences of allowing this edge to continue to exist are:

1. Further write operations by N on E do not alter the dependency of T on E. However if the transaction T after

such as action would violate its isolation. To detect this situation the first write operation after the transaction

T has read E causes the dirty-read edge between T and E to be marked as a final dirty read. The reason the

dirty-read edge is not initially marked as final is to allow T to perform multiple reads on E.

Further writes by the same or different non-transactional processes are permitted. Any transaction that reads

the the modified entity is only ever permitted to read one modified version of the entity E. A transaction that

reads any more than one modification of the same entity is compromised because it has seen an inconsitent

view of the data.

2. A COMMIT-TRX operation on T propagates to E causing it to be committed. This propagates to N and

possibly further as a checkpoint operation.

3. An ABORT-TRX operation on T does not propagate to E and consequently does not affect N.

4. A checkpoint operation on N propagates to E causing it to be checkpointed. The existing dirty-read edge

between T and E converts to a clean-read edge and E is moved into the transaction-managed space.

T E N

T E

T E

Page 14: Directed Dependency Graph based Concurrency Control for ...wossa2004/HTML/04-Maurice-paper.pdf · Directed Dependency Graph based Concurrency Control for Persistent Systems ... Transactional

5. A rollback operation on N propagates to E causing it to be rolled back. This action propagates to T forcing T to

abort.

In this situation the integrity of transactions are maintained by all of these actions.

6.2.1.4 A Transaction Modifies an Already Modified Entity

When a transactional process T modifies an entity E that is connected by a write edge to a non-transactional process N a

write edge is inserted between the node representing T and the node representing E.

Figure 17 A transaction modifies an already modified data entity.

The consequences of allowing this edge to continue to exist are:

1. A COMMIT-TRX operation on T propagates to E causing it to be committed. This propages to N and possibly

further causing them to checkpoint.

2. An ABORT-TRX operation on T also propagates to E and N causing them to rollback.

3. If a checkpoint operation on N was allowed to occur it would propagate to E and T. This is unacceptable, as it

would effectively represent a premature commit on transaction T. To avoid a compromise of the transaction’s

integrity either:

a. accept that this represents an unacceptable relationship requiring that both T and N to roll back, or

b. Modify the stability manager so that prior to any checkpoint of N the DDG for dependencies that

propagate the checkpoint to T. If such propagation could occur then both N and T are rolled back.

(This is an optimistic approach because it leaves the edge in place in the hope that the transaction

commits before the non-transactional process checkpoints.)

4. A further write access by N would violate the isolation of T because following this action T would have an

inconsistent view of the data. This would require T to abort causing E to rollback. The action would propagate

to N and possibly further causing them to rollback also.

5. On the other hand, further write operations by T are acceptable.

6.2.2 Non-Transactional Access to a Transaction-Managed Object

In this case a non-transactional process N reads or modifies an entity E that is represented as a node in a multi-node

DDG in transaction-managed space. It is possible for the node representing E to have either a clean-read edge or a

write edge connecting it to other nodes in the DDG. Where there are dirty-read edges, there will always be a write edge

as well because a dirty-read edge can only exist if there is a write edge.

The possibilities for an access performed by a non-transactional process are:

1. The non-transactional process reads an unmodified data entity.

2. The not-transactional process mutates an unmodified data entity.

3. The non-transactional process reads a modified entity.

4. The non-transactional process mutates a previously modified entity.

These cases are discussed individually in Sections 6.2.1.1 – 6.2.1.4.

6.2.2.1 A Non-Transactional Process Reads a Transaction-Read Object

When a non-transactional process N reads an entity E connected by a read edge to a transactional process T, no edge is

added between E and N because such as action does not form a dependency for stability-managed processes.

Figure 18 A non-transactional process reads a transaction-read entity.

T E N

T E N

Page 15: Directed Dependency Graph based Concurrency Control for ...wossa2004/HTML/04-Maurice-paper.pdf · Directed Dependency Graph based Concurrency Control for Persistent Systems ... Transactional

6.2.2.2 A Non-Transactional Process Modifies a Transaction-Read Object

When a non-transactional process N modifies an entity E connected by a read edge to a transactional process T a write

edge is added between the nodes representing N and E. The clean read edge is given “final” status because any further

access by T would compromise the transaction by giving T an inconsistent view of the data.

.

Figure 19 A non-transactional process modifies a transaction-read entity.

The consequences of allowing this edge to remain are:

1. A COMMIT-TRX operation on T does not propagate to E or N because the final clean-read edge between T

and E only represents the requirement for no further access by T.

2. An ABORT-TRX operation on T does not propagate to E or N for the same reason as (1).

3. A checkpoint operation on N propagates to E causing E to checkpoint. This does not cause T to checkpoint but

blocking edges would need to be added to T and any entity modified by N to ensure that T is not compromised

by an inconsistent view of data modified by N.

4. A rollback operation on N propagates to E causing E to roll back. This action does not propagate to T.

5. Further writes by N are permitted because these do not affect the transaction T’s view of the data.

6.2.2.3 A Non-Transactional Process Reads a Transaction Modified Entity

When a non-transactional process N reads an entity E connected by a write edge to a transactional process T a dirty-read

edge is added between the nodes representing E and N.

Figure 20 A non-transactional process reads a transaction-managed modified entity.

The consequences of allowing this edge to remain are:

1. A COMMIT-TRX operation on T propagates to E causing E to committ. This action does not propagate to N.

The dirty-read edge between N and E is removed.

2. An ABORT-TRX operation on T propagates to E causing E to roll back. This propagates to N and possibly

further causing them to also roll back.

3. If a checkpoint operation on N was allowed to occur it would propagate to E and T. This is unacceptable, as it

would effectively represent a premature commit on transaction T. To avoid a compromise of the transaction’s

integrity either:

a. Modify the stability manager so that prior to any checkpoint of N the DDG for dependencies that

propagate the checkpoint to T. If such propagation could occur then both N and T are rolled back.

b. Accept that the situation represents an unacceptable relationship requiring that both T and N to roll

back.

4. A rollback of N does not propagate to E and therefore would not affect T.

5. T may execute further writes on E without violating the isolation of T. In contrast to the situation in section

6.2.1.3 there is no requirement to limit repeated reads by N.

6.2.2.4 A Non-Transactional Process Modifies a Transaction-Modified Entity

When a non-transactional process N modifies an entity E connected by a write edge to a transactional process T a write

edge is added between the nodes representing E and N.

Figure 21 A non-transactional process modifies a transaction-modified entity.

This action compromises the isolation of transaction T for the same reason as two transactions are not allowed to write

to the same entity. The transaction T is aborted. This propagates to E and N, causing both to roll back.

T E N

T E N

T E Nfinal

Page 16: Directed Dependency Graph based Concurrency Control for ...wossa2004/HTML/04-Maurice-paper.pdf · Directed Dependency Graph based Concurrency Control for Persistent Systems ... Transactional

7 Distributed Transactions

Where a transaction is distributed over more than one host computer, the commit operation requires the services of a

coordinator. By default, the host where the BEGIN-TRX was executed acts as the coordinator. When a COMMIT-

TRX action is executed causing a DDG traversal the coordinator asks each host whether it is able to commit. Each host

responds with a message saying that it can commit, and at the same time records its response. If the coordinator

receives positive responses from all hosts it sends out a message instructing all the hosts to commit their part of the

checkpoint.

A host may fail after sending a positive response, but before it receives or can act on the commit message from the

coordinator. On restart the host sends a message to the coordinator saying that it gave a positive response to the

coordinator, but failed before it could act on the commit. The coordinator responds with the message that either yes the

transaction was committed, or no, the transaction was not committed. If the coordinator responded yes, the transaction

is committed.

8 Conclusion

The techniques described in this paper describe extensions to Jalili’s DDG-based stability scheme that provide support

for transaction-based concurrency control in persistent systems. These extensions support separate management for

concurrent transactional and non-transactional activities that co-exist in a persistent store. Interactions between these

kinds of activities are managed to ensure that transactions are aborted if they are compromised by either transactional or

non-transactional activity. As reported elsewhere this is achieved while maintaining performance consistent with that

provided by conventional bulk data management systems..

9 References

[1] Blackburn, S.M., Zigman, J.N.,, Concurrency — The fly in the ointment? in Proc., The Third International

Workshop on Persistence and Java, Tiburon, CA, USA, Morgan Kaufmann, pp 250 - 258, 1999.

[2] Gray, J., and Reuter, A, Transaction Processing: Concepts and Techniques, Morgan Kauffmann Publishers,

San Mateo, CA, ISBN: 1-558-60190-2, 1993.

[3] Henskens, F.A., Koch, D.M., Jalili, R., Rosenberg, J., Hardware Support for Stability in a Persistent

Architecture in Proc., The Sixth International Worskshop on Persistent Operating Systems, Tarascon, France,

Springer-Verlag and British Computing Society, pp 387-399, 1994.

[4] Jalili, R., A Failure Transparent Distributed Persistent Store, PhD Thesis, Basser Department of Computer

Science, University of Sydney, 1995.

[5] Kung, H.T. and Robinson, J.T., On Optimistic Methods for Concurrency Control, ACM Transactions on

Database Systems, vol 6(2), pp. 213 - 226, 1981.

[6] Lindström, A.G., User-level Memory Management and Kernel Persistence in the Grasshopper Operating

System, PhD Thesis, Basser Department of Computer Science, University of Sydney, 1996.

[7] Microsoft, Microsoft Locking Strategy: Dynamic Locking Initiative,

www.support.microsoft.com/support/sql/content/sql65/sqllock.asp, 2000.

[8] Oracle, Oracle Rdb7 SQL Reference Manual Volume 1 Part No A42814-1, Oracle Corporation, 1996.

[9] Rosenberg, J. and Henskens, F., Stability in a Persistent Store Based on a Large Virtual Memory in Proc.,

International Workshop on Computer Architectures to Support Security and Persistence of Information,

Bremen, Germany, Springer-Verlag, pp 229-245, 1990.

[10] Rosenberg, J., Koch, D.M., Keedy, J.L., A Massive Memory Supercomputer in Proc., 22nd Annual Hawaii

International Conference on System Sciences, Hawaii, pp 338-345, 1989.

[11] Vaughan, F., Basso, T.L., Dearle, A., Marlin, C. Barter, C., Casper: a Cached Architecture Supporting

Persistence, Computing Systems, vol 5(3), pp. 337 - 359, 1992.