CSL 771: Database Implementation Transaction Processing

Preview:

DESCRIPTION

CSL 771: Database Implementation Transaction Processing. Maya Ramanath All material (including figures) from: Concurrency Control and Recovery in Database Systems Phil Bernstein, Vassos Hadzilacos and Nathan Goodman (http :// research.microsoft.com /en-us/people/ philbe / ccontrol.aspx ). - PowerPoint PPT Presentation

Citation preview

CSL 771: Database ImplementationTransaction Processing

Maya RamanathAll material (including figures) from:

Concurrency Control and Recovery in Database SystemsPhil Bernstein, Vassos Hadzilacos and Nathan Goodman

(http://research.microsoft.com/en-us/people/philbe/ccontrol.aspx)

Transactions• Interaction with the DBMS through

SQL

update Airlines set price = price - price*0.1, status = “cheap” where price < 5000

A transaction is a unit of interaction

ACID Properties• Atomicity• Consistency• Isolation• Durability

Database system must ensure ACID properties

Atomicity and Consistency• Single transaction– Execution of a transaction: “all-or-

nothing”Either a transaction completes in its entiretyOr it “does not even start”– As if the transaction never existed– No partial effect must be visible

2 outcomes: A transaction COMMITs or ABORTs

Consistency and Isolation• Multiple transactions– Concurrent execution can cause an

inconsistent database state– Each transaction executed as if isolated

from the others

Durability• If a transaction commits the effects

are permanent

• But, durability has a bigger scope– Catastrophic failures (floods, fires,

earthquakes)

What we will study…• Concurrency Control– Ensuring atomicity, consistency and

isolation when multiple transactions are executed concurrently

• Recovery– Ensuring durability and consistency in

case of software/hardware failures

Terminology• Data item

– A tuple, table, block

• Read (x)• Write (x, 5)

• Start (T)• Commit (T)• Abort (T)• Active Transaction

– A transaction which has neither committed nor aborted

High level model

Transaction Manager

Scheduler

Recovery Manager

Cache ManagerDisk

Transaction 1 Transaction 2 Transaction n

Recoverability (1/2)• Transaction T Aborts– T wrote some data items– T’ read items that T wrote

• DBMS has to…– Undo the effects of T– Undo effects of T’– But, T’ has already committed

T T’Read (x)Write (x,

k)Read (y)

Read (x)Write (y,

k’)Commit

Abort

Recoverability (2/2)• Let T1,…,Tn be a set of transactions• Ti reads a value written by Tk, k < i• An execution of transactions is

recoverable if Ti commits after all Tk commitT1 T2

Write (x,2)

Read (x)Write (y,2)

Commit

T1 T2

Write (x,2)

Read (x)Write (y,2)

CommitCommit

Cascading Aborts (1/2)• Because T was aborted, T1,…, Tk also

have to be abortedT T’ T’’

Read (x)Write (x,

k)Read (y)

Read (x)Write (y,

k’)Abort

Read (y)

Cascading Aborts (2/2)• Recoverable executions do not

prevent cascading aborts• How can we prevent them then ?

T1 T2

Write (x,2)

Read (x)Write (y,2)

CommitCommit

T1 T2

Write (x,2)

CommitRead (x)

Write (y,2)

Commit

What we learnt so far…

T1 T2

Write (x,2)

Read (x)Write (y,2)

Commit

T1 T2

Write (x,2)

Read (x)Write (y,2)

CommitCommit

T1 T2

Write (x,2)

CommitRead (x)

Write (y,2)

Commit

Not recoverable Recoverable with cascading aborts

Recoverable without cascading aborts

Reading a value, committing a transaction

Strict Schedule (1/2)• “Undo”-ing the effects of a

transaction– Restore the before image of the data

itemT1 T2

Write (x,1)Write (y,3)

Write (y,1)

CommitRead (x)Abort

T1 T2

Write (x,1)Write (y,3)

Commit

Equivalent toFinal value of y: 3

Strict Schedule (2/2)T1 T2

Write (x,2)

Write (x,3)

Abort

Initial value of x: 1

Should x be restored to 1 or 3?

T1 T2

Write (x,2)

Write (x,3)

AbortAbortT1 restores x to 3?

T2 restores x to 2?

Do not read or write a value which has been written by an active transaction until that transaction has committed or aborted

T1 T2

Write (x,2)

AbortWrite (x,3)

The Lost Update ProblemT1 T2

Read (x)Read (x)Write (x, 200,000)Commit

Write (x, 200)

Commit

Assume x is your account balance

Serializable Schedules• Serial schedule– Simply execute transactions one after

the other• A serializable schedule is one which

equivalent to some serial schedule

SERIALIZABILITY THEORY

op21, op22, op23, op24

op11, op12, op13

Serializable SchedulesT1: op11, op12, op13

T2: op21, op22, op23, op24

• Serial schedule– Simply execute transactions one after

the otherop11, op12, op13

op21, op22, op23, op24

• Serializable schedule– Interleave operations– Ensure end result is equivalent to some

serial schedule

Notationr1[x] = Transaction 1, Read (x)w1[x] = Transaction 1, Write (x)c1 = Transaction 1, Commita1= Transaction 1, Abort

r1[x], r1[y], w2[x], r2[y], c1, c2

Histories (1/3)• Operations of transaction T can be

represented by a partial order.r1[x]

r1[y]w1[z] c1

Histories (2/3)• Conflicting operations– Of two ops operating on the same data

item, if one of them is a write, then the ops conflict

– An order has to be specified for conflicting operations

Histories (3/3)• Complete History

Serializable Histories• The goal: Ensure that the

interleaving operations guarantee a serializable history.

• The method–When are two histories equivalent?–When is a history serial?

Equivalence of Histories (1/2)

H ≅ H’ if1. they are defined over the same set of

transactions and they have the same operations

2. they order conflicting operations the same way

Equivalence of Histories (2/2)

Source: Concurrency Control and Recovery in Database Systems: Bernstein, Hadzilacos and Goodman

y

Serial History• A complete history is serial if for

every pair of transactions Ti and Tk,– all operations of Ti occur before Tk OR– all operations of Tk occur before Ti

• A history is serializable if its committed projection is equivalent to a serial history.

Serialization Graph

T1 T3 T2

Serializability TheoremA history H is serializable if its

serialization graph SG(H) is acyclic

On your ownHow do recoverability, strict

schedules, cascading aborts fit into the big picture?

LOCKING

High level model

Transaction Manager

Scheduler

Recovery Manager

Cache ManagerDisk

Transaction 1 Transaction 2 Transaction n

Transaction ManagementTransaction

Manager• Receives

Transactions• Sends operations to

scheduler

Scheduler• Execute op• Reject op• Delay op

Read1(x)Write2(y,k)Read2(x)Commit1

Transaction 1Transaction 2Transaction 3

.

.

.Transaction n

Disk

Locking• Each data item x has a lock

associated with it• If T wants to access x– Scheduler first acquires a lock on x– Only one transaction can hold a lock on

x• T releases the lock after processing

Locking is used by the scheduler to ensure serializability

Notation• Read lock and write lock

rl[x], wl[x]• Obtaining read and write locks

rli[x], wli[x]• Lock table– Entries of the form [x, r, Ti]

• Conflicting locks– pli[x], qlk[y], x = y and p,q conflict

• Unlockrui[x], wui[x]

Basic 2-Phase Locking (2PL)Receive pi[x]

is qlk[x] set such that p and q conflict?

pi[x] delayed

Acquire pli[x]

pi[x] scheduled

RULE 1

NO

YES

RULE 2

pli[x] cannot be released until pi[x] is completed

RULE 3 (2 Phase Rule)

Once a lock is released no other locks may be obtained.

The 2-phase ruleOnce a lock is released no other locks may be obtained.T1: r1[x] w1[y] c1

T2: w2[x] w2[y] c2

H = rl1[x] r1[x] ru1[x] wl2[x] w2[x] wl2[y] w2[y] wu2[x] wu2[y] c2 wl1[y] w1[y] wu1[y] c1

T1 T2

Correctness of 2PL 2PL always produces serializable

historiesProof outline

STEP 1: Characterize properties of the schedulerSTEP 2: Prove that any history with these properties is serializable

(That is, SG(H) is acyclic)

Deadlocks (1/2)T1: r1[x] w1[y] c1

T2: w2[y] w2[x] c2

Schedulerrl1[x] wl2[y] r1[x] w2[y] <cannot proceed>

Deadlocks (2/2)Strategies to deal with deadlocks• Timeouts– Leads to inefficiency

• Detecting deadlocks–Maintain a wait-for graph, cycle

indicates deadlock– Once a deadlock is detected, break the

cycle by aborting a transaction• New problem: Starvation

Conservative 2PL• Avoids deadlocks altogether– T declares its readset and writeset– Scheduler tries to acquire all required locks– If not all locks can be acquired, T waits in a queue

• T never “starts” until all locks are acquired– Therefore, it can never be involved in a deadlock

On your ownStrict 2PL (2PL which ensures only strict

schedules)

Extra Information• Assumption: Data items are

organized in a tree

Can we come up with a better (more efficient) protocol?

Tree Locking Protocol (1/3)Receive ai[x]

is alk[x] ?

ai[x] delayed

RULE 2

RULE 1

NO

YESRULE 3ali[x] cannot be released until ai[x] is completed

RULE 2if x is an intermediate node, and y is a parent of x, the ali[x] is possible only if ali[y]

RULE 4Once a lock is released the same lock may not be re-obtained.

pi[x] scheduled

Tree Locking Protocol (2/3)• Proposition: If Ti locks x before Tk,

then for every v which is a descendant of x, if both Ti and Tk lock v, then Ti locks v before Tk.

• Theorem: Tree Locking Protocol always produces Serializable Schedules

Tree Locking Protocol (3/3)• Tree Locking Protocol avoids

deadlock• Releases locks earlier than 2PL

BUT• Needs to know the access pattern to

be effective• Transactions should access nodes

from root-to-leaf

Multi-granularity Locking (1/3)

• Granularity– Refers to the relative size of the data

item– Attribute, tuple, table, page, file, etc.

• Efficiency depends on granularity of locking

• Allow transactions to lock at different granularities

Multi-granularity Locking (2/3)

• Lock Instance Graph

Source: Concurrency Control and Recovery in Database Systems: Bernstein, Hadzilacos and Goodman

• Explicit and Implicit Locks

• Intention read and intention write locks

• Intention locks conflict with explicit read and write locks but not with other intention locks

Multi-granularity Locking (3/3)

• To set rli[x] or irli[x], first hold irli[y] or iwli[y], such that y is the parent of x.

• To set wli[x] or iwli[x], first hold iwli[y], such that y is the parent of x.

• To schedule ri[x] (or wi[x]), Ti must hold rli[y] (or wli[y]) where y = x, or y is an ancestor of x.

• To release irli[x] (or iwli[x]) no child of x can be locked by Ti

The Phantom Problem• How to lock a tuple, which (currently)

does not exist?T1: r1[x1], r1[x2], r1[X], c1

T2: w[x3], w[X], c2

rl1[x1], r1[x1], rl1[x2], r1[x2], wl2[x3], wl[X], w2[x3], wu2[x3,X], c2, rl1[X], ru1[x1,x2,X], c1

NON-LOCK-BASED SCHEDULERS

Timestamp Ordering (1/3)• Each transaction is associated with a

timestamp– Ti indicates Transaction T with

timestamp i.• Each operation in the transaction has

the same timestamp

Timestamp Ordering (2/3)TO RuleIf pi[x] and qk[x] are conflicting operations, then pi[x] is processed before qk[x] iff i < k

Theorem: If H is a history representing an execution produced by a TO scheduler, then H is serializable.

Timestamp Ordering (3/3)• For each data item x, maintain: max-rt(x), max-wt(x), c(x)• Request ri[x]

– Grant request if TS (i) >= max-wt (x) and c(x), update max-rt (x)– Delay if TS(i) > max-wt(x) and !c(x)– Else abort and restart Ti

• Request wi[x]– Grant request if TS (i) >= max-wt (x) and TS (i) >= max-rt (x),

update max-wt (x), set c(x) = false– Else abort and restart Ti

ON YOUR OWN: Thomas write rule, actions taken when a transaction has to commit or abort

Validation• Aggressively schedule all operations• Do not commit until the transaction

is “validated”

ON YOUR OWN

Summary• Lock-based Schedulers– 2-Phase Locking– Tree Locking Protocol–Multi-granularity Locking– Locking in the presence of updates

• Non-lock-based Schedulers– Timestamp Ordering– Validation-based Concurrency Control

(on your own)

RECOVERYSOURCE: Database System: The complete book. Garcia-Molina, Ullman and Widom

Logging• Log the operations in the

transaction(s)• Believe the log– Does the log say transaction T has

committed?– Or does it say aborted?– Or has only a partial trace (implicit

abort)?• In case of failures, reconstruct the DB

from its log

The basic setup

T1

T2

T3

Tk

LOGThe Disk

Buffer Space for data and log

Buffer Spacefor each transaction

Transactions

Terminology• Data item: an element which can be

read or written– tuple, relation, B+-tree index, etc

Input x: fetch x from the disk to bufferRead x,t: read x into variable local variable tWrite x,t: write value of t into xOutput x: write x to disk

Example

Read P, xx -= x* 0.1Write x,PRead S, yy = “CHEAP”Write y, SOutput POutput S

update Airlines set price = price - price*0.1, status = “cheap” where price < 5000

System fails here

System fails here

System fails here

Logs• Sequence of log records• Need to keep track of– Start of transaction– Update operations (Write operations)– End of transaction (COMMIT or ABORT)

• “Believe” the log, use the log to reconstruct a consistent DB state

Types of logs• Undo logs– Ensure that uncommitted transactions are

rolled back (or undone)• Redo logs– Ensure that committed transactions are

redone• Undo/Redo logs– Both of the aboveAll 3 logging styles ensure atomicity and

durability

Undo Logging (1/3)• <START T>: Start of transaction T• <COMMIT T>• <ABORT T>• <T, A, x>: Transaction T modified A

whose before-image is x.

Undo Logging (2/3)Read P, xx -= x* 0.1Write x,PRead S, yy = “CHEAP”Write y, SFLUSH LOGOutput POutput SFLUSH LOG

<START T>

<T, P, x>

<T, S, y>

<COMMIT T>

U1: <T, X, v> should be flushed before Output X

U2: <COMMIT T> should be flushed after all OUTPUTs

Undo Logging (3/3)• Recovery with Undo log

1. If T has a <COMMIT T> entry, do nothing

2. If T has a <START T> entry, but no <COMMIT T>• T is incomplete and needs to be undone• Restore old values from <T,X,v> records

• There may be multiple transactions– Start scanning from the end of the log

Redo Logging (1/3)• All incomplete transactions can be

ignored• Redo all completed transactions• <T, A, x>: Transaction T modified A

whose after-image is x.

Redo Logging (2/3)Read P, xx -= x* 0.1Write x,PRead S, yy = “CHEAP”Write y, S

FLUSH LOGOutput POutput S

<START T>

<T, P, x>

<T, S, y><COMMIT T> Write-ahead

Logging

R1: <T, X, v> and <COMMIT T> should

be flushed before Output X

Redo Logging (3/3)• Recovery with Redo Logging– If T has a <COMMIT T> entry, redo T– If T is incomplete, do nothing (add

<ABORT T>)• For multiple transactions– Scan from the beginning of the log

Undo/Redo Logging (1/3)• Undo logging: Cannot COMMIT T

unless all updates are written to disk• Redo logging: Cannot release

memory unless transaction commits

• Undo/Redo logs attempt to strike a balance

Undo/Redo Logging (2/3)

Read P, xx -= x* 0.1Write x,PRead S, yy = “CHEAP”Write y, SFLUSH LOGOutput POutput S

<START T>

<T, P, x, a>

<T, S, y, b>

<COMMIT T>

UR1: <T, X, a, b> should be flushed before Output X

U1: <T, X, v> should be flushed before Output X

U2: <COMMIT T> should be flushed after all OUTPUTs R1: <T, X, v> and

<COMMIT T> should be flushed before Output X

Undo/Redo Logging (3/3)• Recovery with Undo/Redo Logging– Redo all committed transactions

(earliest-first)– Undo all uncommitted transactions

(latest-first)

What happens if there is a crash when you are writing a log? What happens if there is a crash during recovery?

Checkpointing• Logs can be huge…can we throw

away portions of it?• Can we avoid processing all of it

when there is a crash?

ON YOUR OWN