Upload
others
View
22
Download
0
Embed Size (px)
Citation preview
Directed Dependency Graph basedConcurrency Control for Persistent Systems
Maurice G. Ashton & Frans A. Henskens1
Discipline of Computer Science & Software EngineeringSchool of Electrical Engineering & Computer Science
University of NewcastleN.S.W. 2308, Australia
1 Overview
Persistent stores abstract over all aspects of storage including the distinction between primary and secondary storage.
One consequence of this abstraction is that the store appears to the user to be free from failure. Since computers are
vulnerable to failure, persistent systems typically provide mechanisms to support this appearance of a failure-free store.
These allow the system to recover automatically from store failure to a self-consistent state, exhibiting a property called
stability.
A persistent store is said to be stable if it automatically recovers to a consistent state after a failure that has prevented
orderly system shutdown. Stability in persistent stores is typically provided using operations called checkpoints that
flush all modified data currently held in main memory to disk, and atomically creates a snapshot of the store at that
moment. Early stability schemes, for example [9], checkpointed the entire store at once, requiring processing on the
store to cease during a checkpoint operation. In a multi-user store involving multiple nodes this would result in
unacceptable degradation of performance. Accordingly, systems have been developed which checkpoint parts of the
store independently (for example [11]). The stable state of such a store is the collection of these stable parts.
Checkpointing parts of the store independently however, creates the possibility of logical inconsistencies between data
objects because the state modified data from one object may influence the way a process modifies data in some other
object. As a result these objects have a dependency relationship that must be considered when checkpointing either of
them. Such dependencies have been described using associations in Casper [11], and more recently using directed
graphs of nodes representing entities [4].
2 DDG-based Stability
These directed graphs, termed Directed Dependency Graphs (DDGs) are maintained by the operating system as follows:
Symbol Description
E1, E2
P, O
Entities, either data objects or processes.
Process and object respectively.
An edge in a DDG representing a dependency between two entities. eg E1 E2 means
that E1 depends on E2.
An edge is a DDG representing bidirectional dependency between two entities. E1
E2 means that E1 depends on E2 and E2 depends on E1.
Table 1 DDG Notation.
• When a process reads an unmodified object, no edge is added to the DDG.
• When a process P reads a modified object O, the edge P O is added, if it does not already exist, to the
DDG(s) including P and O.
• When a process P modifies an object O, the edge P O is added if it does not already exist, to the DDG(s)
including P and O.
1 Corresponding author: [email protected]
• is transitive, but not symmetric i.e. if (E1 E2) and (E2 E3), then it is implied that (E1 E3), but E1
E2 does not imply E2 E1.
• The right hand side of a dependency relation may depend on the left hand side through transitivity (for instance
where E1 E2, E2 E3 and E3 E1 it follows that E2 E1).
• When a process belonging to a DDG reads a modified object or modifies an object that belongs to another
DDG, the two DDGs are merged using one of the above edges to create a single larger graph.
• As shown in Table 1, an edge represents both S
(i.e. in terms of checkpoint or stabilise dependency) and
R (i.e. in terms of rollback dependency). Thus E1 E2 implies that checkpoint of E1 propagates to E2 (but
checkpoint of E2 does not propagate to E1) and that rollback of E2 propagates to E1 (but rollback of E1 does not
propagate to E2). A consequence of this is that if E1 E2, checkpoint and rollback of either entity propagates
to the other.
• A DDG shrinks when a set of dependent entities is checkpointed or reverts to its last stable state (rolls back).
Once a checkpoint or rollback operation is initiated for an entity E, the operation propagates to each entity that
is reachable from E in the DDG to which E belongs. Then, because each involved entity is now stable, all
edges attached to them are removed.
• At any instant each entity belongs to one and only one dependency graph. To find the set of entities dependent
on any entity, it is sufficient to find the entity in its graph and then, subject to the kind of operation, traverse
the directed graph starting from the entity. Thus the set of dependent entities may differ for entities in the
same DDG.
At any instant each entity belongs to one and only one dependency graph. To find the set of entities dependent on any
entity, it is sufficient to find the location of the entity in its graph and then, subject to the kind of operation, traverse the
directed graph starting from the entity. Thus the set of dependent entities may differ for entities in the same DDG.
Dependency Graph Stabilising Graph Rollback Graph
S R
S R
S And
S Rand
R
Table 2 The relationship between edges in DDGs,
stabilise graphs and rollback graphs [4].
With appropriate hardware support, it is possible to lazily construct DDGs by updating them to record dependency data
at the completion of each process time slice [4]. This assumes that dependency is recorded at the virtual page rather
than the individual object level, thus utilising and extending hardware that is typically already present to support virtual
memory management. Conventional virtual memory management requires the presence of status data indicating
whether the content of an in-memory page has been modified since the page was loaded (i.e. whether the page is dirty),
allowing the system to determine whether the page must be flushed to disk before the page frame it occupies can be re-
used. In order to efficiently support stabilise and rollback operations, it is necessary to distinguish between in-memory
pages that are unstable and unflushed (DIRTY) and unstable but flushed (MODIFIED). A page would have
MODIFIED state if, for instance, as part of virtual memory management it had been loaded, modified, flushed and then
reloaded.
Pages may remain in main memory for a period encompassing many process activations. The M_ACCESSED status
data allows detection of process access to modified object data during the process' current time-slice. This status data is
set for a page if the page is accessed while the MODIFIED status for the page is set. Dependencies between a process
and the pages containing modified objects (pages with the M_ACCESSED status set) are represented by the addition of
appropriate edges to the dependency graph at the conclusion of the process' period of activation. All
M_ACCESSED status data must be cleared at the commencement of a process time-slice. This may be achieved in a
single operation using appropriate hardware.
The inclusion of WRITTEN status data allows detection of page data modifications made by the current process. This
data is distinct from the MODIFIED status described previously because it describes the modification behaviour of the
current process rather than the status of the virtual page. The WRITTEN status is set together with the MODIFIED and
DIRTY status, but is cleared as part of the dependency graph update at the conclusion of the process time-slice. In
contrast the MODIFIED status is cleared at the next object checkpoint and the DIRTY status is cleared when the page is
flushed to disk. Pages with the WRITTEN status set cause the inclusion of an appropriate dependency graph edge.
Operation of the described status data is shown in Table 3.
Status Data
Operation
DIRTY MODIFIED M_ACCESSED WRITTEN
Unmodified page retrieved Cleared Cleared Cleared Cleared
Modified page retrieved Cleared Set Cleared Cleared
Process reads data from page Unchanged Unchanged Copy modified Unchanged
Process writes to page Set Set Set Set
End of process time-slice Unchanged Unchanged Cleared Cleared
Page flushed Cleared Unchanged Unchanged Unchanged
Object checkpoint Cleared Cleared Unchanged Unchanged
Table 3 Effect of operations on page status data [4].
Ideally dependencies would be recorded on a per-object basis as this reflects the “natural” granularity of data. Object
based dependency recording has been proposed for Monads-MM [10]. However this requires more complex data
structures and adds further overhead to the work of the ATU.
3 Concurrency Control
Most descriptions of concurrency control concentrate on the flat transaction model used for database systems. This
model represents one extreme of concurrency models ranging from isolation to cooperation. The transactional
concurrency control model enforces isolation and hides concurrency from the user. At the other extreme, concurrency
is achieved by cooperation between users. It is not clear which model of concurrency is most suited to persistent
systems. Some researchers [3] regard the cooperative model as the most appropriate while others [6] prefer to offer a
choice of models.
Transactional concurrency control techniques ensure that a set of concurrent transactions produce the same results as if
they had been executed serially, and may be broadly categorised as optimistic or pessimistic. Pessimistic schemes
typically use locks to prevent other concurrent transactions from accessing objects that are being used by the locking
transaction. Optimistic schemes proceed without locking but examine the transaction before it is committed to
determine its isolation, leading to a decision to commit or abort. Both approaches have advantages and disadvantages:
pessimistic methods achieve concurrency at the cost of wasted time, while optimistic methods achieve concurrency at
the cost of wasted work [2].
3.1 Using DDGs to support concurrency control
The stability mechanism described in [4] uses directed graphs to record dirty-read and write dependencies between
processes and entities. This information, recorded in directed dependency graphs (DDGs), is used to reduce the
‘domino effect’ of checkpoint and rollback operations found in other stability schemes.
It was recognised that this information formed part of that required for determining transaction isolation. This
observation led to the development of the new concurrency control technique called DDG Concurrency Control (DCC)
described in this paper.
3.2 Adapting Stability DDGs for Concurrency Control
Four issues leading to differences in data required to support stability and concurrency control schemes were identified:
1. The DDG stability technique maintains dependency information on a per process basis. On the other hand
concurrency control requires dependency information on a per transaction basis. Transactions use processes as
agents to carry out their operations.
2. The DDG stability technique records information about dirty-read and write accesses. Clean-read accesses
must also be recorded to support transaction isolation.
3. Stability checkpoints and transaction commits have different semantics.
4. The stability mechanism uses physical page granularity for recording dependencies.
The implications of these are discussed below.
3.2.1 The Relationship between Transactions and Processes
A transaction is an abstract concept that includes the user-defined boundaries, the required data resources, and accesses
(that may including mutation) to that data. Processes are entities that execute the activities specified by transactions. A
transaction may use a single process, a group of processes, or share a process (or processes) with other transactions to
execute its activities. A process executing on behalf of many transactions executes instructions for only one transaction
in any given timeslice [2].
Because of this relationship between transactions and processes, dependencies created by process activity are
appropriately viewed as existing between transactions and other entities rather than between each process and other
entities.
3.2.2 Dependencies
The DDGs used for supporting stability record edges for:
• Dirty-read accesses occurring when a process reads from an entity that has been modified since the last
checkpoint.
• Write accesses occurring when a process modifies an entity.
Clean-read accesses are not recorded in stability DDGs because these accesses do not produce dependencies that are of
consequence to stability operations.
For a transaction to be considered isolated it must be guaranteed that the transaction has not seen an inconsistent state of
the data. Consider two concurrent transactions Ta and Tb. Transaction Tb modifies a set of entities E. Transaction Tareads some of these entities before transaction Tb has modified them and some of them after. This represents an
inconsistent view of the data for Ta, compromising its isolation. Such a transaction must be aborted.
If during the time that transaction Ta is in progress, some other transaction Tb modifies a set E of one or more entities,
then Ta is not compromised by Tb if transaction Ta reads from E, and its reads occur before Tb’s writes or after Tb’s
writes but not both.
Thus for DDGs to be used for determining transaction isolation it is necessary to record information about transactions’
clean-read, dirty-read, and write accesses.
3.2.3 Stability Checkpoints and Transaction Commits
Stability mechanisms provide the following transaction-related properties:
1. The abstraction of a stable computational store.
2. A logically consistent store restart state at all times.
3. Concurrency control at process level.
Full transaction support requires these properties to be augmented as follows:
1. Support for transaction-based events associated with programming language key words such as, for example,
BEGIN-TRX and COMMIT-TRX, used to define the extent of each transaction.
2. The extent described in (1) defines an atomic unit of work performed on the store that is isolated from any
other concurrent activity.
3. The means for managing concurrency must be flexible enough to cope with run-time determination of the
temporal extent and physical granularity of interaction.
One consequence of these requirements is that the transaction management system must have control over the timing of
checkpoints that correspond to COMMIT-TRX operations.
3.2.4 Granularity Considerations
Granularity of concurrency control for databases may be applied at field, record, record range, table, and database
levels. Finer granularity offers the opportunity for more concurrent activity but at the cost of extra overhead, while
course-grained control requires fewer overheads but reduces the opportunities for concurrent activity. Page locking is
also available for supporting concurrency control by some commercial systems. For example Oracle [8] and Microsoft
SQL Server [7] offer optimistic concurrency control at page level and pessimistic concurrency control at record level.
Oracle documentation [8] suggests using page locking where transactions are short and several records from the same
page are accessed at a time. On the other hand record locking should be used where transactions are long and access
many records in a table.
Typically a page may hold several data objects. Consider the situation where two objects A and B reside in the same
page and are modified separately by transactions Ta and Tb respectively. Ideally these two transactions should be
regarded as isolated, but because detection occurs at page rather than object level, they are considered to be dependent
on each other. For example, in the absence of extended inter-dependency rules, if transaction Ta committed using a
page stabilise operation, the modifications performed by transaction Tb on B would also be stabilised even though Tb
may not be ready to commit.
The DDG stability mechanism [4] on which DCC is based records page-level access information. A consequence of
extending this technique to support concurrency control is that, in the absence of any other mechanism, it limits
available granularity of concurrency control to the page level.
4 A Description using Set Notation
The set of entities accessed by a transaction T may be divided into three subsets defined as follows:
1. The clean-read set (CR): comprises entities that were in an unmodified state when read by transaction T.
2. The dirty-read set (DR): comprises entities that had been modified by some other as-yet uncommitted
transaction before being read by transaction T.
3. The write set (W): comprises entities that have been modified by transaction T.
The isolation of a transaction Ta may be determined by examining the intersection of Ta’s sets with those of other
transactions.
• In section 4.1, the state of isolation of a transaction Ta is examined after the transaction has completed all its
access operations and is about to commit.
• Section 4.2 describes those situations where the isolation of a transaction Ta has been compromised. If it is
determined that a transaction is not isolated before it has completed its operations, there is no point in
continuing the transaction; it should be aborted and its completed operations rolled-back.
4.1 Transaction Isolation Prior to Commit
In the following discussion the isolation of a transaction Ta is determined after it has completed all its operations and is
about to commit. To establish the isolation of transaction Ta, it is necessary to consider the intersections of Ta’s access
sets with the write sets of all overlapping (in the sense of transaction start and finish times) transactions Tb [5]. There
are three situations to consider:
1. Ta and Tb are mutually exclusive.
Ta has not accessed any entity modified by Tb. This is described symbolically as follows:
DRa Wb = and Wb (CRa Wa) =
Ta is isolated from Tb and may be committed.
2. Ta’s clean-read set intersects with Tb’s write set.
Symbolically this is described as:
CRa Wb and Wb (DRa Wa) =
Ta is isolated from Tb where, if Ta has clean-read entities subsequently modified by Tb, then Ta has not dirty-
read or written to entities modified by Tb. In this situation Ta is logically isolated from Tb and may be
committed.
3. Ta’s dirty-read set interects with Tb’s write set.
Symbolically this may be described as
DRa Wb and Wb (CRa Wa) =
Ta has seen a consistent view of the entities modified by Tb but it is not isolated from Tb. If Tb commits and its
write set Wb is stabilised, then Ta becomes isolated from Tb and is thus able to commit. If Tb cannot commit
and is rolled-back (see section 5.3.4) then Ta must also be rolled-back. Where there are multiple reads by Ta of
dirty data interleaved with writes by Tb, Ta must be rolled back as Ta has seen an inconsistent view of the
entities modified by Tb. This is covered in more detail in section 5.3.2.
4.2 Compromised Transactions
1. Figure 1 illustrates the situation where transaction Ta has an inconsistent view of the set of entities modified by
Tb. Since decisions made by Ta based on this view are potentially inconsistent, the transaction must be aborted.
This consideration applies even if Tb successfully commits. (See further discussion in section 5.3.5.)
Figure 1 Compromised read-sets.
2. In Figure 2 either Ta or Tb has modified at least one entity that the other transaction has modified but not
committed, thus violating isolation. Since Ta is not isolated it cannot commit (neither can Tb commit).
Figure 2 Compromised write-sets.
4.3 Summary
This discussion has shown that a transaction’s state of isolation can be determined providing information is kept about
the clean-read, dirty-read (including multiple dirty reads as discussed in section 5.3.2) and write sets for each
transaction. Such information is collected and stored in DDGs at the end of each process context switch. The details of
this mechanism are described in the next section.
5 Operations on the Directed Dependency Graph
5.1 Access Representation
The DDG concurrency control mechanism creates edges between transactions and accessed entities as follows:
1. A clean-read edge is recorded as “—”. T — E indicates that transaction T has read an unmodified entity E.
2. A dirty-read edge is recorded as “ ”. T E records that process T has read an entity E that had been
previously modified since its most recent checkpoint.
CRa
DRa
Wb
Wa
CRa
DRa
Wb
Wa
3. A write edge is recorded as “ ”. T E indicates that process T has modified entity E since it was last check-
pointed.
Transaction concurrency control is incorporated into the existing stability system as follows:
• At the commencement of a transaction, the initiating process must exist in a single-node DDG. If that is not
the case, the process must initiate a stabilise operation, with isolation being the consequence. The process is
then part of a DDG associated by the system with the fledgling transaction.
• As the process (and any parallel processes incorporated in the transaction) interacts with entities in the store,
, and — edges are used to incorporate the entities into the transaction DDG. Construction of the graph is
achieved lazily on process switch using access data collected as described above during each process time
quantum.
• Edges have a precedence order —, , with the rule that insertion of an edge to the right in this order will
replace an edge to the left. An edge to the right will not be replaced by an edge to the left; indeed an edge to
the left will not be inserted if it occurs after an edge to the right.
• If there are no existing edges between any transaction node and the accessed entity node, the appropriate edge
is added and the entity belongs to (and becomes a node in) the same DDG as the transaction.
• If all prior edge(s) between other transaction nodes and the node representing the accessed entity are to nodes
in the same DDG as the process, the appropriate edge is inserted subject to the precedence rule.
• If one or more edges exist between other transaction nodes and the node representing the entity, the system
either inserts the appropriate edge or causes the transaction abort (rollback) operation(s) as described below.
• During each transaction DDG update, the system analyses any graph merge operations and determines whether
the merge causes a violation of transaction isolation and whether any transaction must be aborted as a result.
• A transaction that completes, i.e. whose DDG could be constructed without a need for transaction rollback,
commits by stabilising its transaction DDG.
• A transaction that aborts has its transaction DDG rolled back.
In the following discussion of the effect of DDG edge insertion, a transaction Ta accesses an entity En, creating a new
edge. Transaction Tb is another transaction that has already accessed En. Decisions on the validity of Ta's edge-
producing access are made by considering the edge to be inserted Da with respect to each individual existing edge
between En and each other concurrent transaction Tb, as follows (this discussion assumes that the system has already
determined that there is no existing edge of higher or equal priority to Da between En and Ta):
5.2 Edge Insertion Rules
1. If there is no edge between any Tb and En the new edge is inserted. It must be either a clean read or a write
edge.
Figure 3 Addition of a new edge (may be either read or write).
2. If there is an existing — edge between Tb and En and the access by Ta is a read, a new — edge is inserted
between the node representing En and the node representing Ta.
TaEnTa
En
(before)
(after)
TaEn
Figure 4 Adding a read edge when there is an existing read edge.
3. If there is an existing — edge between Tb and En and the access by Ta is a write, a new edge is inserted
between the node representing En and the node representing Ta. At the same time it is necessary to ensure that
Tb has not dirty -read another entity that has been modified by Ta. If this is the case then Tb is forced to abort
(rollback) since Tb has seen an inconsistent view of the database.
Figure 5 Adding a write to an existing read.
4. If there is an existing , edge between Tb and En and the access by Ta is a read, a new edge is inserted
between the node representing En and the node representing Ta. At the same time it is necessary to ensure that
Ta has not previously clean-read another entity that has been modified by Tb. If this is the case then Ta is
forced to abort (rollback) since it no longer isolated from Tb. If not, Ta is allowed to continue optimistically in
the hope that Tb commits before Ta thus rendering Ta isolated.
Figure 6 Adding a read to an existing write.
5. If there is an existing , edge between Tb and En and the access by Ta is a write, a is added between the
node representing Ta and En as illustrated in Figure 7. Both transactions must be aborted (rolled back).
Figure 7 Adding a write to a write.
5.3 Special Situations
5.3.1 Cycle Formation
A cycle may develop between two or more transactions where transactions with dirty reads are waiting for writing
transactions to commit as illustrated in Figure 8. This leads to a deadlock situation, as the transactions in the cycle
can never be committed. The DDG management software must detect the formation of cycles and abort sufficient
transactions to break the deadlock. There is the potential to form very large cycles with a consequent loss of
performance caused by the rollback. In the simulation experiments described in the next chapter, cycle formation
was monitored. The largest observed cycle involved only two transactions.
(before) (after)
TaEn
Tb TaEn
Tb
(before) (after)
TaEn
Tb TaEn
Tb
(before) (after)
TaEn
Tb TaEn
Tb
(before) (after)
TaEn
Tb TaEn
Tb
Figure 8 A cycle of dirty-read/write dependencies.
5.3.2 Repeated Writes
When a transaction Ta has added a dirty-read edge to an entity, the writing transaction Tb may perform a further
write to the same entity. (It is assumed that a transaction is performing operations that are internally consistent.)
While that does not add another write edge between Tb and En, further access by Ta must be detected if it occurs.
Such an action represents a violation of isolation and Ta must be rolled back. To enable detection of this situation,
such writes by Tb cause the dirty-read edge from Ta to be marked as a final dirty-read as shown in Figure 9. This is
not regarded as a new kind of dependency, but rather as an indicator that any further access by Ta violates its
isolation.
Figure 9 Adding a final dirty-read edge.
5.3.3 Transaction Commit
When a transaction commits, one consequence is that the edges associated with the transaction must be removed from
the DDG. There are two special cases to consider.
5.3.4 Dirty-read Dependency
When a transaction has completed all its accesses, it is necessary to determine whether the transaction is isolated. The
situation may occur where the transaction has one or more dirty-read dependencies. Such a transaction cannot be
committed because it depends on uncommitted modifications by other transactions. There are two possible policies that
may be used:
1. Wait until the other transaction(s) have committed. This policy leaves the waiting transaction open to the
compromise of its isolation by the actions of other transactions and the possibility that it may later need to be
rolled back.
2. Immediately rollback the transaction. This policy is pessimistic in the sense that it precludes the possibility
that the writing transaction eventually commits, rendering the waiting transaction isolated.
It may be that runtime monitoring indicates better throughput for a particular policy in some situations.
5.3.5 Clean-reads by Other Transactions
The second situation requiring special treatment is where the committing transaction Ta has modified entities that have
been clean-read earlier by another still-current transaction Tb. Such an interaction does not compromise the isolation of
Ta. However, if Tb accesses any entity modified by Ta, even after Ta has committed then Tb has seen an inconsistent
view of the database and must be aborted. Under normal circumstances when Ta is committed, its edges are removed
from the DDG thus removing information about which entities Ta has modified. Tb could then access one of the
modified entities without any indication that it had viewed a potentially inconsistent state of the database.
(before) (after)
TaEn
Tb TaEn
Tb
final
TaEn1
Tb
En3
Tc
En2
This inconsistent access may be detected by adding a blocked edge ( ) between Tb and En where En has been
modified by Ta as shown in Figure 10. A blocked edge is not a dependency edge in the same sense as a dirty-read or a
write edge, and does not need to be considered in interactions with other transactions. These blocked edges must persist
until Tb is committed or rolled back.
Figure 10 Detecting inconsistent reads after transaction committal.
5.3.6 Transaction Rollback
Transaction rollback removes the DDG edges between the rolled back transaction and the entities it has modified.
Other transactions with dirty-read edges to entities modified by the rolled back transaction must also be rolled back as
they are dependent on the uncommitted state of the entity.
Figure 11 Transaction Rollback.
In Figure 11, if Tb rolls back, Ta must also roll back. On the other hand, if Ta were to rollback, Tb would not be required
to rollback as it is not dependent on the actions of Ta.
6 Relationship to Non-transactional Processes
Conventional DBMSs protect shared data by ensuring every access to the data complies either explicitly or implicitly
with transactional requirements. By contrast, in a persistent system any process may access any data object for which
the process has appropriate access rights. Thus in a persistent environment, and in the absence of any further protection
mechanism, it is possible for a data object involved in a transaction to be accessed by processes not involved in that
transaction. This could lead to a loss of integrity for the transaction.
Two options may be considered for preserving transactional consistency in persistent systems:
1. Enforce transaction behaviour on access to all data objects in the persistent store. The persistent store is shared
between all processes and thus applying transactional behaviour to the whole store provides the necessary
consistency. Such an approach is regarded as inflexible as it restricts computation to the transaction model
only [1].
2. Provide support for coexistent transactional and non-transactional activity.
In the remainder of this section, the relationships between transactional and non-transactional activities in persistent
systems are discussed together with mechanisms for supporting their coexistence.
6.1 The Computational Space
A stable store may be viewed as a collection of objects acted on by a set of processes, as illustrated in Figure 12. As
processes interact with objects, dependencies form between processes and objects as described in [4]. Where stability is
implemented using incremental checkpointing, these dependencies are recorded and used to determine which data
objects are written to the durable store.
(before) (after)
TaTb
E2
E1
Tb
E2
E1
TbTa
E2
E1
Stable Store
Object
Process
Dependency
Legend
Figure 12 Dependencies between entities in a stable store.
Activity in a stable store may be regarded as transactional in the sense that modifications performed by a process (or a
set of dependent processes) between successive checkpoints are either all stabilised together, or none of them are.
Stability checkpoints are typically system initiated whereas transaction commits are specified in user program code. It
thus differs from transactional commit operations that must correspond to user-specified transaction boundaries (e.g.
BEGIN-TRX and COMMIT TRX).
Stability and transactions mechanisms differ in the way they manage the effects of concurrent activities. In a stable
store objects may become transitively dependent on other objects through the actions of multiple processes [4]. There is
no requirement to apply transactional semantics to such dependencies and in fact to do so would be overly restrictive
[1]. On the other hand, in a transactional environment, dependencies between entities are governed by rules of
transactional isolation. An attempt to execute a transaction that fails to comply with these isolation requirements results
in the transaction being aborted and its actions rolled back.
In summary, a transactional system must support, in addition to the features provided by stability mechanisms:
1. Response to transaction events such as BEGIN-TRX and COMMIT-TRX,
2. Atomicity and isolation, and
3. Flexibility to cope with runtime determination of the extent of transaction actions.
It has been asserted that requirements of transactional concurrency are at odds with the properties of orthogonal
persistence and that to achieve these goals requires separate persistent and non-persistent worlds [1].
The approach presented in this work resolves this problem by dynamically partitioning the stable store into two spaces,
• the stable store, where stability operations are managed by the operating system, and
• the transaction-managed store where the transaction manager manages stability operations.
This approach is illustrated in Figure 13. Using the view expressed in [1], the stable store can be seen to be under the
control of an all encompassing “transaction”, beginning with the completion of one checkpoint and extending to the
next checkpoint. By default all data entities exist and execute within the stable store which manages all non-
transactional activity. On the execution of an event indicating transaction activity, responsibility for the stability of
entities involved in that transaction passes to the transaction manager. The transaction manager then assumes
responsibility for durability of entities modified by the transaction and also for transaction atomicity and concurrency
control thus ensuring the transaction conforms to the required ACID properties. In other words entities involved in
transactions are moved to the transaction-managed space for the duration of the transaction as illustrated in Figure 13.
Stable Store Transactional Store
Object
Process
Dependency
Legend
Figure 13 Entity management.
A transaction continues until either:
• the event associated with a COMMIT-TRX instruction is executed, signalling the successful completion of the
transaction,
• the event associated with an ABORT-TRX instruction is executed signalling the unsuccessful completion of
the transaction, or
• an unexpected system shutdown occurs, which also results in the unsuccessful termination of the transaction.
After any of these actions or events, management reverts to the stable-store. The programmer is unaware of the
movement of data between the management schemes in the same way that the programmer is unaware of the data
movement between the durable and computational store. In this way orthogonality of persistence is maintained and
durability of transactions is decoupled from durability provided by the stability scheme.
Store partitioning raises issues about managing interaction between transactional and non-transaction activities. These
issues are discussed in the following section.
6.2 Managing Transaction – Non-Transaction Interaction
This section considers the consequences of interactions between transactional and non-transactional processes. At the
end of a timeslice the activities executed by the process during that timeslice are recorded in the DDG and decisions are
made about the subsequent actions of any involved processes. In the case of a transactional process, the consequences
of those accesses are used to determine whether the transaction should continue or be aborted and its actions rolled
back.
There are two ways that transactional and non-transactional processes interact. Either:
• a transaction accesses a stability-managed entity, or
• a non-transactional process accesses a transaction-managed data entity.
In the discussion that follows, T represents a transactional process, N represents a non-transactional process, and E
represents an entity accessed by both T and N.
6.2.1 Transactional Access to Stability-Managed Entities
In this case a transactional process T reads or modifies an entity E that is represented as a node of a DDG in stability-
managed space. A stability-managed entity E is either:
1. an unmodified entity represented by a single node stability DDG, or
2. a modified entity represented by a node in a multimode stability DDG. A data entity only exists in such a
DDG because that data entity has been modified since it was last checkpointed. (Note: clean-reads are not
recorded in stability DDGs.) Consequently any data node in a multi-node DDG must be connected by at least a
single write edge to another node in that DDG.
The possibilities for the access performed by the transaction are:
1. a transactional process reads an unmodified data entity
2. a transaction modifies an unmodified data entity.
3. a transactional process read a modified data entity.
4. a transactional process wrote to a previously modified data entity.
These cases are discussed individually in Sections 6.2.1.1 – 6.2.1.4
6.2.1.1 A Transactional Process Modifies an Unmodified Entity
When a transactional process T reads an unmodified stability-managed unmodified entity E a clean-read edge is added
between T and E.
Figure 14 A transaction accesses an unmodified data entity.
As a consequence of this action the accessed entity is moved from the stability-managed space to the transaction-
managed space.
6.2.1.2 A Transactional Process Modifies an Unmodified Entity
When a transactional process T modifies a stability-managed unmodified entity E a write edge is added between T
and E.
Figure 15 A transaction modifies a previously modified data entity
As a consequence of this action the accessed entity is moved from the stability-managed space to the transaction-
managed space
6.2.1.3 A Transaction Reads a Modified Entity
When a transactional process T reads an stability-managed modified entity E connected by a write edge to a non-
transactional process N a non-final dirty-read edge is added between the nodes representing T and E.
Figure 16 A transaction reads a modified data entity.
The consequences of allowing this edge to continue to exist are:
1. Further write operations by N on E do not alter the dependency of T on E. However if the transaction T after
such as action would violate its isolation. To detect this situation the first write operation after the transaction
T has read E causes the dirty-read edge between T and E to be marked as a final dirty read. The reason the
dirty-read edge is not initially marked as final is to allow T to perform multiple reads on E.
Further writes by the same or different non-transactional processes are permitted. Any transaction that reads
the the modified entity is only ever permitted to read one modified version of the entity E. A transaction that
reads any more than one modification of the same entity is compromised because it has seen an inconsitent
view of the data.
2. A COMMIT-TRX operation on T propagates to E causing it to be committed. This propagates to N and
possibly further as a checkpoint operation.
3. An ABORT-TRX operation on T does not propagate to E and consequently does not affect N.
4. A checkpoint operation on N propagates to E causing it to be checkpointed. The existing dirty-read edge
between T and E converts to a clean-read edge and E is moved into the transaction-managed space.
T E N
T E
T E
5. A rollback operation on N propagates to E causing it to be rolled back. This action propagates to T forcing T to
abort.
In this situation the integrity of transactions are maintained by all of these actions.
6.2.1.4 A Transaction Modifies an Already Modified Entity
When a transactional process T modifies an entity E that is connected by a write edge to a non-transactional process N a
write edge is inserted between the node representing T and the node representing E.
Figure 17 A transaction modifies an already modified data entity.
The consequences of allowing this edge to continue to exist are:
1. A COMMIT-TRX operation on T propagates to E causing it to be committed. This propages to N and possibly
further causing them to checkpoint.
2. An ABORT-TRX operation on T also propagates to E and N causing them to rollback.
3. If a checkpoint operation on N was allowed to occur it would propagate to E and T. This is unacceptable, as it
would effectively represent a premature commit on transaction T. To avoid a compromise of the transaction’s
integrity either:
a. accept that this represents an unacceptable relationship requiring that both T and N to roll back, or
b. Modify the stability manager so that prior to any checkpoint of N the DDG for dependencies that
propagate the checkpoint to T. If such propagation could occur then both N and T are rolled back.
(This is an optimistic approach because it leaves the edge in place in the hope that the transaction
commits before the non-transactional process checkpoints.)
4. A further write access by N would violate the isolation of T because following this action T would have an
inconsistent view of the data. This would require T to abort causing E to rollback. The action would propagate
to N and possibly further causing them to rollback also.
5. On the other hand, further write operations by T are acceptable.
6.2.2 Non-Transactional Access to a Transaction-Managed Object
In this case a non-transactional process N reads or modifies an entity E that is represented as a node in a multi-node
DDG in transaction-managed space. It is possible for the node representing E to have either a clean-read edge or a
write edge connecting it to other nodes in the DDG. Where there are dirty-read edges, there will always be a write edge
as well because a dirty-read edge can only exist if there is a write edge.
The possibilities for an access performed by a non-transactional process are:
1. The non-transactional process reads an unmodified data entity.
2. The not-transactional process mutates an unmodified data entity.
3. The non-transactional process reads a modified entity.
4. The non-transactional process mutates a previously modified entity.
These cases are discussed individually in Sections 6.2.1.1 – 6.2.1.4.
6.2.2.1 A Non-Transactional Process Reads a Transaction-Read Object
When a non-transactional process N reads an entity E connected by a read edge to a transactional process T, no edge is
added between E and N because such as action does not form a dependency for stability-managed processes.
Figure 18 A non-transactional process reads a transaction-read entity.
T E N
T E N
6.2.2.2 A Non-Transactional Process Modifies a Transaction-Read Object
When a non-transactional process N modifies an entity E connected by a read edge to a transactional process T a write
edge is added between the nodes representing N and E. The clean read edge is given “final” status because any further
access by T would compromise the transaction by giving T an inconsistent view of the data.
.
Figure 19 A non-transactional process modifies a transaction-read entity.
The consequences of allowing this edge to remain are:
1. A COMMIT-TRX operation on T does not propagate to E or N because the final clean-read edge between T
and E only represents the requirement for no further access by T.
2. An ABORT-TRX operation on T does not propagate to E or N for the same reason as (1).
3. A checkpoint operation on N propagates to E causing E to checkpoint. This does not cause T to checkpoint but
blocking edges would need to be added to T and any entity modified by N to ensure that T is not compromised
by an inconsistent view of data modified by N.
4. A rollback operation on N propagates to E causing E to roll back. This action does not propagate to T.
5. Further writes by N are permitted because these do not affect the transaction T’s view of the data.
6.2.2.3 A Non-Transactional Process Reads a Transaction Modified Entity
When a non-transactional process N reads an entity E connected by a write edge to a transactional process T a dirty-read
edge is added between the nodes representing E and N.
Figure 20 A non-transactional process reads a transaction-managed modified entity.
The consequences of allowing this edge to remain are:
1. A COMMIT-TRX operation on T propagates to E causing E to committ. This action does not propagate to N.
The dirty-read edge between N and E is removed.
2. An ABORT-TRX operation on T propagates to E causing E to roll back. This propagates to N and possibly
further causing them to also roll back.
3. If a checkpoint operation on N was allowed to occur it would propagate to E and T. This is unacceptable, as it
would effectively represent a premature commit on transaction T. To avoid a compromise of the transaction’s
integrity either:
a. Modify the stability manager so that prior to any checkpoint of N the DDG for dependencies that
propagate the checkpoint to T. If such propagation could occur then both N and T are rolled back.
b. Accept that the situation represents an unacceptable relationship requiring that both T and N to roll
back.
4. A rollback of N does not propagate to E and therefore would not affect T.
5. T may execute further writes on E without violating the isolation of T. In contrast to the situation in section
6.2.1.3 there is no requirement to limit repeated reads by N.
6.2.2.4 A Non-Transactional Process Modifies a Transaction-Modified Entity
When a non-transactional process N modifies an entity E connected by a write edge to a transactional process T a write
edge is added between the nodes representing E and N.
Figure 21 A non-transactional process modifies a transaction-modified entity.
This action compromises the isolation of transaction T for the same reason as two transactions are not allowed to write
to the same entity. The transaction T is aborted. This propagates to E and N, causing both to roll back.
T E N
T E N
T E Nfinal
7 Distributed Transactions
Where a transaction is distributed over more than one host computer, the commit operation requires the services of a
coordinator. By default, the host where the BEGIN-TRX was executed acts as the coordinator. When a COMMIT-
TRX action is executed causing a DDG traversal the coordinator asks each host whether it is able to commit. Each host
responds with a message saying that it can commit, and at the same time records its response. If the coordinator
receives positive responses from all hosts it sends out a message instructing all the hosts to commit their part of the
checkpoint.
A host may fail after sending a positive response, but before it receives or can act on the commit message from the
coordinator. On restart the host sends a message to the coordinator saying that it gave a positive response to the
coordinator, but failed before it could act on the commit. The coordinator responds with the message that either yes the
transaction was committed, or no, the transaction was not committed. If the coordinator responded yes, the transaction
is committed.
8 Conclusion
The techniques described in this paper describe extensions to Jalili’s DDG-based stability scheme that provide support
for transaction-based concurrency control in persistent systems. These extensions support separate management for
concurrent transactional and non-transactional activities that co-exist in a persistent store. Interactions between these
kinds of activities are managed to ensure that transactions are aborted if they are compromised by either transactional or
non-transactional activity. As reported elsewhere this is achieved while maintaining performance consistent with that
provided by conventional bulk data management systems..
9 References
[1] Blackburn, S.M., Zigman, J.N.,, Concurrency — The fly in the ointment? in Proc., The Third International
Workshop on Persistence and Java, Tiburon, CA, USA, Morgan Kaufmann, pp 250 - 258, 1999.
[2] Gray, J., and Reuter, A, Transaction Processing: Concepts and Techniques, Morgan Kauffmann Publishers,
San Mateo, CA, ISBN: 1-558-60190-2, 1993.
[3] Henskens, F.A., Koch, D.M., Jalili, R., Rosenberg, J., Hardware Support for Stability in a Persistent
Architecture in Proc., The Sixth International Worskshop on Persistent Operating Systems, Tarascon, France,
Springer-Verlag and British Computing Society, pp 387-399, 1994.
[4] Jalili, R., A Failure Transparent Distributed Persistent Store, PhD Thesis, Basser Department of Computer
Science, University of Sydney, 1995.
[5] Kung, H.T. and Robinson, J.T., On Optimistic Methods for Concurrency Control, ACM Transactions on
Database Systems, vol 6(2), pp. 213 - 226, 1981.
[6] Lindström, A.G., User-level Memory Management and Kernel Persistence in the Grasshopper Operating
System, PhD Thesis, Basser Department of Computer Science, University of Sydney, 1996.
[7] Microsoft, Microsoft Locking Strategy: Dynamic Locking Initiative,
www.support.microsoft.com/support/sql/content/sql65/sqllock.asp, 2000.
[8] Oracle, Oracle Rdb7 SQL Reference Manual Volume 1 Part No A42814-1, Oracle Corporation, 1996.
[9] Rosenberg, J. and Henskens, F., Stability in a Persistent Store Based on a Large Virtual Memory in Proc.,
International Workshop on Computer Architectures to Support Security and Persistence of Information,
Bremen, Germany, Springer-Verlag, pp 229-245, 1990.
[10] Rosenberg, J., Koch, D.M., Keedy, J.L., A Massive Memory Supercomputer in Proc., 22nd Annual Hawaii
International Conference on System Sciences, Hawaii, pp 338-345, 1989.
[11] Vaughan, F., Basso, T.L., Dearle, A., Marlin, C. Barter, C., Casper: a Cached Architecture Supporting
Persistence, Computing Systems, vol 5(3), pp. 337 - 359, 1992.