55
Chapter 13 An Overview of Transaction Processing

Chapter 13 An Overview of Transaction Processing

  • View
    231

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Chapter 13 An Overview of Transaction Processing

Chapter 13

An Overview of Transaction Processing

Page 2: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-2

Transactions

• Many enterprises use databases to store information about their state– E.g., balances of all depositors

• When an event occurs in the real world that changes the state of the enterprise, a program is executed to change the database state in a corresponding way– E.g., balance must be updated when you deposit

• Such a program is called a transaction

Page 3: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-3

ACID Properties of Transactions

• Transaction execution must maintain the correctness of the database model

• Therefore additional requirements are placed on the execution of transactions beyond those placed on ordinary programs– Atomicity– Consistency– Isolation– Durability

Page 4: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-4

ACID Properties

• Atomic - Transaction should either complete or have no effect at all – Responsibility of transaction processing

system

• Consistent - Transaction should correctly transform the database state to reflect the effect of a real world event – Responsibility of transaction designer

Page 5: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-5

ACID Properties (con’t)

• Isolation - The effect of concurrently executing a set of transactions is the same as if they had executed serially (serializable) – Responsibility of transaction processing system

• Durable - The effect of a transaction on the database state should not be lost once the transaction has committed – Responsibility of transaction processing system

Page 6: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-6

Isolation

Serial execution of a set of (consistent) transactions is correct, but performance might be inadequateConcurrent (interleaved) execution of a set of transactions offers performance benefits, but might not be correct

Page 7: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-7

Interleaved Execution

Page 8: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-8

Serializable Schedules

• The concurrent schedule S: r1(x) w2(z) w1(y) is equivalent to the serial schedules of T1 and T2 in either order:– T1, T2: r1(x) w1(y) w2(z) and– T2, T1: w2(z) r1(x) w1(y)

since operations of distinct transactions on different data items commute. Hence, S is aserializable schedule

Page 9: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-9

Serializable Schedules

• The concurrent schedule S: r1(z) r2(q) w2(z) r1(q) w1(y) is equivalent to the serial schedule T1, T2: r1(z) r1(q) w1(y) r2(q) w2(z) since read operations of distinct transactions on the same data item commute. Hence, S is a serializable schedule • S is not equivalent to T2, T1 since read and write operations (or two write operations) of distinct transactions on the same data item do not commute

Page 10: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-10

Non-Serializable Schedule

• Example: course registration; cur_reg is the number of current registrants

T1: r(cur_reg : 29) w(cur_reg : 30)T2: r(cur_reg : 29) w(cur_reg : 30)

• Schedule not equivalent to T1, T2 or T2, T1• Database state no longer corresponds to real-world state, integrity constraint violated

Page 11: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-11

Commutativity

• Two operations commute if, when executed in either order:– The values returned by both are the same and– The database is left in the same final state

• Two schedules are equivalent if one can be derived from the other by a series of simple interchanges of commutative operations

• A schedule is serializable if it is equivalent to a serial schedule

Page 12: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-12

Concurrency Control

• Performance requirements might not be achievable if schedules are serializable

• In addition to serializable, DBMSs implement less stringent isolation levels– Serializable schedules correct for all applications – Less stringent levels do not guarantee correctness for all

applications, but are correct for some

• The concurrency control of a DBMS is responsible for implementing isolation levels

• Application programmer is responsible for choosing appropriate level

Page 13: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-13

Implementing Serializability: Two-Phase Locking• Locks are associated with each data item• A transaction must acquire a read (shared) or write

(exclusive) lock on an item in order to read or write it

• A write lock on an item conflicts with all other locks on the item; a read lock conflicts with a write lock

• If T1 requests a lock on x and T2 holds a conflicting lock on x, T1 must wait

Page 14: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-14

Lock Release

Two-Phase locking: All locks are acquired before any lock is released

Phase 1 Phase 2

Strict: Transaction holds all locks until completion

t

locks

t

locks

Page 15: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-15

Correctness of Strict Two-Phase Locking• Intuition: Active transactions cannot have executed

operations that do not commute (since locks required for non-commutative operations conflict)

• Hence, a schedule produced by a two-phase locking concurrency control is serializable since operations of concurrent transactions can always be reordered to produce a serial schedule

Page 16: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-16

Non-Strict Concurrency Controls

• Non-strict controls: locks can be released before completion

• Problem: (Bank account example)

w1(Bal) u1(Bal) r2(Bal) w2(Cred-Lim) commit2 abort1

– Although abort1 rolls Bal back, the new value of Cred-Lim might have been affected

• The new credit limit might have been based on a deposit that never happened

– T1 has an effect even though it is aborted

– Hence, atomicity is violated

Page 17: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-17

Anomalies in Non-Serializable Schedules• Dirty read (previous example – write lock given

up early) w1(x) r2(x) abort1

• Non-Repeatable Read (read lock given up early) r1(x) w2(x) commit2 r1(x)

• Lost Update (result of non-repeatable read – read lock given up early)– Two transactions trying to deposit in the same bank

account – the deposit of transaction 2 is lost

r1(Bal) r2(Bal) w2(Bal) commit2 w1(Bal) commit1

Page 18: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-18

Deadlock

• When a transaction can hold locks and request another lock (e.g., in two-phase locking), a cycle of waiting transactions can result:– Suppose two transactions are both trying to update the

value of x (for example to deposit in the same bank account)

r1(x) r2(x) request_w1(x) request_w2(x)

• A transaction in the cycle must be aborted by DBMS (since transactions will wait forever)

• DBMS uses deadlock detection algorithms or timeout to deal with this

Page 19: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-19

Locking in Relational Databases

• Locking entire table restricts concurrency• Locking only rows returned yields new anomaly

SELECT *FROM Transcript TWHERE T.CrsCode = ‘CS305’ AND T.Semester = ‘F2000’

T1: execute SELECTT2: insert a new row satisfying WHERE clauseT1: execute SELECT again

• Inserted row is called a phantom • Strict two-phase row locking does not prevent phantoms

Page 20: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-20

ANSI Standard Isolation Levels

• Defined in terms of anomalies– Anomaly prohibited at one level is also prohibited at all

higher levels

– READ UNCOMMITTED: all anomalies possible

– READ COMMITTED: dirty read prohibited

– REPEATABLE READ: reads of individual tuples are repeatable (but phantoms are possible)

– SERIALIZABLE: phantoms prohibited; transaction execution is serializable

Page 21: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-21

Locks in Relational Databases

• DBMS guarantees that each SQL statement is isolated

• Early (non-strict) lock release used to implement levels– Short-term locks - held for duration of single statement

– Long-term locks - held until transaction completes (strict)

• At all levels, transactions obtain long-term write locks

Page 22: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-22

Locking Implementation of Isolation Levels• READ UNCOMMITTED - no read locks (dirty

reads possible since transaction can read a write-locked item)

• READ COMMITTED - short-term read locks on rows (non-repeatable reads possible since transaction releases read lock after reading)

• REPEATABLE READ - long-term read locks on rows (phantoms possible)

• SERIALIZABLE - combination of table, row, and index locks

Page 23: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-23

Snapshot Isolation

• Not an ANSI standard isolation level, but used in a major DBMS (Oracle)

• Multiversion database: The old value of an item is not overwritten when it is updated. Instead, a new version is created– DBMS can construct, for any i, the state of an item as a result of

the execution of the first i transactions to commit

– Snapshot: The database state produced by the execution of the first i transactions to commit (snapshots never change)

Page 24: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-24

Definition of Snapshot Isolation

• No read locks necessary: a transaction reads all values from latest snapshot at time it started. Thus, read/only transactions do not wait.

• A (read/write) transaction T that has updated x can commit if no other transaction that updated x while T was executing has committed – First-committer-wins

Page 25: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-25

Snapshot Isolation Evaluation

• Good performance • Serializability not guaranteed:

• Implementation complicated by need to maintain multiversion database. Eventually old versions must be discarded (creates problems for long-running transactions)r1(x) r1(y) r2(x) r2(y) w1(x) w2(y)

Page 26: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-26

Lock Granularity

• Cost of implementing locks:– Space: data structure in DBMS for each lock– Time: handling of lock request and release

• Locks can be associated with different size items: row (fine granularity), page, table (coarse granularity)

• Tradeoff: Coarse granularity locks have lower cost than fine granularity locks, but they reduce concurrency

Page 27: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-27

Multiple Granularity Locking

• Some DBMSs implement a lock granularity hierarchy based on containment– E.g.,a read lock on a table implies read locks on all the

table’s pages (or rows)

• Problem: A read lock on a table should conflict with a write lock on a page (or row) in the table– How does DBMS detect this since different items are

involved?

• Solution: Intentions locks

Page 28: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-28

Intentions Locks

• Before being granted a read (S) or write (X) lock on an item, a transaction must acquire an intention read (IS) or intention write (IX) lock on the containing item

• IS and IX locks do not conflict with each other• Conflicts between locks at different levels in

hierarchy detected at the intentions level– E.g., To update a row, T1 gets an IX lock on the table

(and an X lock on the page containing the row). To read all rows in the table, T2 gets an S lock on the table. R and IX locks conflict.

Page 29: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-29

Implementing Serializability

• Phantom: T1 reads table R using predicate P. T2 inserts row t that satisfies P – T2 needs IX lock on R and X lock on page containing t

– If no index available, T1 holds S lock on R in order to scan entire table. Thus no problem

– If an index is available, T1 gets IS lock on R and, using the index, gets S locks only on pages containing rows satisfying P. Thus locking does not prevent phantoms since IS and IX do not conflict and t might be inserted in a different page

Page 30: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-30

Implementing Serializability

• Solution: Lock the index. In addition:– T1 gets IS lock on index, and S locks on index

pages it scans – Since t satisfies P, the index entry for t must be

inserted in one of those pages. T2 gets IX lock on index (this is a slight simplification) and X lock on that page. Hence, conflict is detected

Page 31: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-31

Atomicity and Durability

• Atomicity deals with failure:– User aborts transaction (e.g., cancel button)– System aborts transaction (e.g., deadlock)– Transaction aborts itself (e.g., unexpected db state)– System crashes

• Durability deals with failure:– Media failure

• Mechanism for dealing with failures is the log

Page 32: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-32

Log

• Log: – Append-only sequence of records used to restore

database to a consistent state after a failure.– Stored on non-volatile device distinct from mass

storage device that contains database• Survives processor crash and media failure

• Update record: – Appended to log when a transaction updates an item– Contains before image: value of item prior to update– Used to restore item when transaction is aborted

Page 33: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-33

Aborting a Transaction

• Transaction abort:– Scan log backward; apply before image in each

of the transaction’s update records to database items to restore them to their original state. Begin Record terminates scan

Page 34: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-34

Log

BT1

UT1

BT2

UT1

UT2

UT1

UT2

End of scan whenT2 is aborted

B - beginU - update

End of log whenT2 is aborted

Page 35: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-35

Recovery From Crash

• Crash:– Active transactions must be identified and

aborted when system recovers– Commit and Abort Records identify completed

transactions. If, during a backward log scan, the first record encountered for T is an update record, T was active at time of crash and must be rolled back

Page 36: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-36

Log

BT1

UT1

BT2

UT1

UT2

UT1

BT3

AT2

UT3

CT3

BT4

End of scan forcrash recovery

UT4

End of log whencrash occurs;roll back T1 andT4 on recoveryB - begin

U - updateC - commitA - abort

Page 37: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-37

Commit

• Transaction is not committed until its commit record is in the log– A crash at any time before that causes

transaction to be rolled back

Page 38: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-38

Write-Ahead Log

• Both the log and database must be updated when a transaction modifies an item. If a crash occurs between updates, abort the transaction– Database updated first - On recovery, item is in the new

state but there is no before image to roll it back. Transaction cannot be aborted.

– Log updated first - On recovery, item in old state and before image in log. Use of before image has no effect, but transaction can be aborted

• Update record in log must be written-ahead of update to item in database

Page 39: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-39

Practical Database Systems

• Two writes to mass store for each database update implies intolerable performance

• Real world:– DBMS maintains cache of recently accessed pages in

memory. Most accesses are to cache. Pages which have been updated eventually written to disk

– DBMS maintains log buffer in memory. Records appended to buffer until it fills; then buffer written to log

– Maintaining write-ahead feature more complex when buffers taken into account

Page 40: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-40

DBMS Organization

Page 41: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-41

Media Failure

• Durability requires that database be stored redundantly

• Log can be used as second copy if :– Update records contain after image (as well as before

image): new value of item

– A snapshot, or dump, of the database is periodically stored in non-volatile memory

• Recovery: Starting with most recent dump, play the log forward, update database using after images appended subsequent to dump

Page 42: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-42

Distributed Transactions

• Information supporting a large enterprise is generally stored on multiple computer systems scattered through the enterprise– E.g., Bank has a database at each branch recording local

branch data and a database at the main office recording aggregate data

• Each DBMS is independent, supporting local transactions at a site

• With increased enterprise integration and automation, global, or distributed, transactions, involving multiple sites must also be supported

Page 43: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-43

Distributed Transaction

• A transaction that invokes local transactions– E.g., Bank transfer: invoke withdraw at one site and

deposit at another

• A system that supports distributed transactions is often referred to as a multidatabase system

• In addition to the local integrity constraints that apply at each site, global integrity constraints might also exist– E.g., Aggregate bank assets at central site = sum of

assets at each branch

Page 44: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-44

ACIDity of Distributed Transaction• Although a distributed transaction is

consistent, maintaining isolation in a multi-database is an important issue

• Even if local sites are serializable, subtransactions of two distributed transactions might be serialized in different orders at different sites– At site A, T1A is serialized before T2A

– At site B, T2B is serialized before T1B

Page 45: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-45

ACIDity of Distributed Transaction• Although a distributed transaction is consistent,

maintaining atomicity in a multidatabase is an important issue

• Guaranteeing that subtransactions of a distributed transaction either all commit or all abort in spite of failures (e.g., message loss, site crash) requires the use of a two-phase commit protocol

Page 46: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-46

Two-Phase Commit Protocol

• Implemented as an exchange of messages between the coordinator and the cohorts– The cohorts are the individual subtransactions

that participated in the transaction– The coordinator polls the cohorts to see if they

want to commit

Page 47: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-47

Two-Phase Commit Protocol

• Prepare message (coordinator to cohort) :– If cohort wants to abort, it aborts– If cohort wants to commit, it moves all update

log records to non-volatile store and forces a prepared record to its log

– Cohort sends a (ready or aborting) vote message to coordinator

Page 48: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-48

Two-Phase Commit Protocol

• Vote message (cohort to coordinator): Cohort indicates ready to commit or aborting.– If any are aborting, coordinator decides abort– If all are ready, coordinator decides commit and

forces commit record to its log– Coordinator sends commit/abort message to all

cohorts that voted ready

Page 49: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-49

Two-Phase Commit Protocol

• Commit/abort message (coordinator to cohort):– Cohort commits locally by forcing a commit record to

its log. Or, if abort message, it aborts

– Cohort sends done message to coordinator

• Done message (cohort to coordinator):– When coordinator receives done message from all

cohorts, it writes a complete record to its log

Page 50: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-50

Global Serializability

• Theorem: If all sites use a two-phase locking protocol and a two-phase commit protocol is used, transactions are globally serializable– Transactions are serialized in the same order at

every site – the order in which the transactions committed

Page 51: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-51

ACIDity of Distributed Transactions• Global deadlock can be another result of

implementing two-phase locking and two-phase commit protocols– At site A, T1A is waiting for a lock held by T2A

– At site B, T2B is waiting for a lock held by T1B

• System uses deadlock detection algorithms or timeout to deal with this

Page 52: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-52

Replication

• Information is often replicated in a distributed system– Performance enhancement possible: access to a local

replica replaces network communication

– Availability improved: if a site containing a data item is unavailable, access a replica at a different site

• Major implementation problem: how do you keep the replicas synchronized when a replicated data item is updated?

Page 53: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-53

Implementation of Replication

• DBMSs provide replica control modules to make replication invisible to the application

• Typical implementation: read one/write all– When application requests to read an item, replica

control fetches the nearest copy

– When application requests to write an item, replica control updates all copies

• As compared with non-replicated systems, performance of read better, of write worse

Page 54: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-54

Implementation of Replication

• Synchronous update systems: all replicas updated as part of transaction. Supports serializability, but performance bad, deadlocks frequent, and cannot handle disconnected sites

• Asynchronous update systems: one replica updated as part of transaction. Others updated after transaction commits. Performance better, deadlocks less frequent, and disconnected sites can be supported, but serializability sacrificed

• Practical systems are generally asynchronous

Page 55: Chapter 13 An Overview of Transaction Processing

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-55

Correctness

• Application designers must be aware of the fact that real-world systems do not always support ACID executions even if all transactions are consistent– Isolation levels lower than SERIALIZABLE

might be used– Two-phase commit protocol might not be used– Replication might use asynchronous update