Synchronization & Transactions
Zachary G. IvesUniversity of Pennsylvania
CIS 455 / 555 – Internet and Web Systems
April 3, 2008
Some slide content by S. Davidson, R. Ramakrishnan, J. Gehrke
Reminders Lists of project team members due
TONIGHT by email
Homework 3 is now past due
Note: no class next week due to travel
2
3
SynchronizationThe issue:
What do we do when decisions need to be made in a distributed system? e.g., Who decides what action should be taken? Whose
conflicting option wins? Who gets killed in a deadlock? Etc.Some options:
Central decision point Distributed decision points with locking/scheduling
(“pessimistic”) Distributed decision points with arbitration
(“optimistic”) Transactions (to come)
4
Multi-process Decision Arbitration: Mutual Exclusion (i.e., Locking)
a) Process 1 asks the coordinator for permission to enter a critical region. Permission is grantedb) Process 2 then asks permission to enter the same critical region. The coordinator does not
reply.c) When process 1 exits the critical region, it tells the coordinator, when then replies to 2
5
Distributed Locking: Still Based on a Central Arbitrator
a) Two processes want to enter the same critical region at the same moment.b) Process 0 has the lowest timestamp, so it wins.c) When process 0 is done, it sends an OK also, so 2 can now enter the critical
region.
6
Eliminating the Request/Response: A Token Ring Algorithm
a) An unordered group of processes on a network.
b) A logical ring constructed in software.
7
Comparison of Costs
Algorithm Messages per entry/exit
Delay before entry (in message times) Problems
Centralized 3 2 Coordinator crash
Distributed 2 ( n – 1 ) 2 ( n – 1 ) Crash of any process
Token ring 1 to 0 to n – 1 Lost token, process crash
8
Electing a Leader: The Bully Algorithm Is Won By Highest ID
Suppose the old leader dies… Process 4 needs a decision; it runs an election Process 5 and 6 respond, telling 4 to stop Now 5 and 6 each hold an election
9
Totally-Ordered Multicasting: Arbitrates via Priority Updating a replicated database and leaving it in an
inconsistent state.
10
Another Approach: Time We can choose the decision that
happened first
… But how do we know what happened first?
11
Clock Synchronization
When each machine has its own clock, an event that occurred after another event may nevertheless be assigned an earlier time.
12
The Berkeley Algorithm: Central Calibration
a) The time daemon asks all the other machines for their clock valuesb) The machines answerc) The time daemon tells everyone how to adjust their clock
13
The Magic Bullet: GPS GPS sends a time sync signal that’s quite
accurate (exact precision varies depending on situation)
Many distributed devices now use this clock to sync themselves … But of course not everything has GPS…
14
An Alternative: Logical Time We don’t really need to know times, as
long as we know relative ordering
Lamport clocks: create a partial ordering that respects causality
Vector clocks: more precisely capture causality
15
Lamport Clocks Each site maintains a clock value LC
Whenever an item publishes a timed item, use LC and postincrement it
Append clock on every published message When we receive a message with a bigger LC
timestamp, say LC2, set our local LC := LC2 + 1
Note that we always use a clock “later than” what we’ve heard – but a bigger LC value doesn’t mean that we have heard the previous message, or that there’s any causal ordering…
16
Vector Clocks Given N nodes, each node needs to keep a Lamport
clock for all of the other nodes, as well as its own logical clock!
Node i keeps LCi and TSj for every j ≠ i TSi[i] is another name for LCi TSj[j] is i’s copy of the last timestamp it has seen from j
Each time a site generates a new event, it increments its logical clock
Each time it hears an event from j, it updates TSj and then sets its clock to the max of TSj and LCi
So now one can test: Whether two events are causally related from the same node Whether one node logically succeeds another
17
More on Vector Clocks Their ordering exactly reflects causality – we can
compare two vector clocks to determine which event happens first
e1 occurs before e2 iff TS(e2) dominates TS(e1), i.e., TS(e1)[i] ≤ TS(e2)[i] for every i
If e1 < e2 then LC(e1) < LC(e2) and TS(e2) dominates TS(e1)
If TS(e2) doesn’t dominate TS(e1) then e1 doesn’t precede e2
18
A Final Note on Vector Clocks If information is only passed in one direction,
i.e., A B, then we may be able to tell when a B message came after an A message
But what about whether a 2nd A message occurred after the resulting B message?
“What does A know about what B knows about A?”
Need ACKs!
19
Synchronization Several general approaches:
Mutex Consensus Clock synchronization (establishes
chronological or causal ordering)
20
We Need More Than Synchronization
What needs to happen when you… Click on “purchase” on Amazon?
Suppose you purchased by credit card? Use online bill-paying services from your bank? Place a bid in an eBay-like auction system? Order music from iTunes?
What if your connection drops in the middle of downloading?
Is this more than a case of making a simple Web Service (-like) call?
21
Transactions Are a Means of Handling Failures There are many (especially, financial)
applications where we want to create atomic operations that either commit or roll back
This is one of the most basic services provided by database management systems, but we want to do it in a broader sense
Part of “ACID” semantics…
22
ACID Semantics Atomicity: operations are atomic, either
committing or aborting as a single entity Consistency: the state of the data is
internally consistent Isolation: all operations act as if they
were run by themselves Durability: all writes stay persistent!
23
A Problem Confronted by eBay eBay wants to sell an item to:
The highest bidder, once the auction is over, or The person who’s first to click “Buy It Now!”
But: What if the bidder doesn’t have the cash?
A solution: Record the item as sold Validate the PayPal or credit card info with a 3rd
party If not valid, discard this bidder and resume in prior
state
24
“No Payment” Isn’t the Only Source of Failure Suppose we start to transfer the money,
but a server goes down…Purchase:sb = Seller.balbb = Buyer.balWrite Buyer.bal= bb - $100
Write Item.sellTo = Buyer
Write Seller.bal= sb + $100
CRASH!
25
Providing Atomicity and Consistency Database systems provide transactions with the
ability to abort a transaction upon some failure condition Based on transaction logging – record all operations and
undo them as necessary Database systems also use the log to perform
recovery from crashes Undo all of the steps in a partially-complete transaction Then redo them in their entirety This is part of a protocol called ARIES
These can be the basis of persistent storage, and we can use middleware like J2EE to build distributed transactions with the ability to abort database operations if necessary
26
The Need for Isolation Suppose eBay seller S has a bank account
that we’re depositing money into, as people buy:
What if two purchases occur simultaneously, from two different servers on different continents?
S = Accounts.Get(1234)Write S.bal = S.bal + $50
27
Concurrent Deposits This update code is represented as a
sequence of read and write operations on “data items” (which for now should be thought of as individual accounts):
where S is the data item representing the seller’s account # 1234
Deposit 1 Deposit 2Read(S.bal) Read(S.bal)S.bal := S.bal + $50 S.bal:= S.bal + €10Write(S.bal) Write(S.bal)
28
A “Bad” Concurrent Execution
Only one action (e.g. a read or a write) can actually happen at a time for a given database, and we can interleave deposit operations in many ways:
Deposit 1 Deposit 2Read(S.bal) Read(S.bal)S.bal := S.bal + $50 S.bal:= S.bal + €10Write(S.bal) Write(S.bal)
time
BAD!
29
A “Good” Execution Previous execution would have been fine if the
accounts were different (i.e. one were S and one were T), i.e., transactions were independent
The following execution is a serial execution, and executes one transaction after the other:
Deposit 1 Deposit 2Read(S.bal) S.bal := S.bal + $50 write(S.bal) Read(S.bal) S.bal:= S.bal + $10 Write(S.bal)
time
GOOD!
30
Good Executions An execution is “good” if it is serial (transactions are
executed atomically and consecutively) or serializable (i.e. equivalent to some serial execution)
Equivalent to executing Deposit 1 then 3, or vice versa Why would we want to do this instead?
Deposit 1 Deposit 3read(S.bal) read(T.bal)S.bal := S.bal + $50 T.bal:= T.bal + €10write(S.bal) write(T.bal)
31
Concurrency Control A means of ensuring that transactions are
serializable There are many methods, of which we’ll
see one Lock-based concurrency control (2-phase
locking) Optimistic concurrency control (no locks –
based on timestamps) Multiversion CC …
Lock-Based Concurrency Control Strict Two-phase Locking (Strict 2PL) Protocol:
Each transaction must obtain: a S (shared) lock on object before reading an X (exclusive) lock on object before writing An owner of an S lock can upgrade it to X if no one else is
holding the lock
All locks held by a transaction are released when the transaction completes Locks are handled in a “growing” phase, then a
“shrinking” phase (Non-strict) 2PL Variant:
Release locks anytime, but cannot acquire locks after releasing any lock.
33
Benefits of Strict 2PL Strict 2PL allows only serializable
schedules. Additionally, it simplifies transaction aborts (Non-strict) 2PL also allows only serializable
schedules, but involves more complex abort processing
Aborting a Transaction If a transaction Ti is aborted, all its actions have
to be undone Not only that, if Tj reads an object last written by Ti, Tj
must be aborted as well! Most systems avoid such cascading aborts by
releasing a transaction’s locks only at commit time If Ti writes an object, Tj can read this only after Ti
commits Actions are undone by consulting the transaction
log mentioned earlier
The Transaction Log The following actions are recorded in the log:
Ti writes an object: the old value and the new value Log record must go to disk before the changed page does!
Ti commits/aborts: a log record indicating this action Log records are chained together by transaction
id, so it’s easy to undo a specific transaction Log is often mirrored and archived on stable
storage
Another Benefit of the Log:Recovering From a Crash 3 phases in the ARIES recovery algorithm:
Analysis Scan the log forward (from the most recent checkpoint)
to identify all pending transactions, unwritten pages Redo
Redo all updates to unwritten pages in the buffer pool, to ensure that all logged updates are in fact carried out and written to disk
Undo Undo all writes done by incomplete transactions by
working backwards in the log (Care must be taken to handle the case of a crash
occurring during the recovery process!)
A Danger with Locks: Deadlocks Deadlock: Cycle of transactions waiting for
locks to be released by each other Two ways of dealing with deadlocks:
Deadlock prevention Deadlock detection
Deadlock Prevention Assign priorities based on timestamps Assume Ti wants a lock that Tj holds Do one of:
Wait-Die: If Ti has higher priority, Ti waits for Tj; otherwise Ti aborts
Lower-priority transactions wait for higher-priority Wound-wait: If Ti has higher priority, Tj aborts; otherwise Ti
waits Higher-priority transactions never wait for lower-priority
If a transaction re-starts, make sure it has its original timestamp Keeps it from always getting aborted!
39
Database Transactions and Concurrency Control, Summarized The basic goal was to guarantee ACID
properties Transactions and logging provide Atomicity and
Consistency Locks ensure Isolation The transaction log (and RAID, backups, etc.) are
also used to ensure Durability
So far, we’ve been in the realm of databases, though, not distributed systems…
40
OK, this Works for Databases;What about Distributed Systems? We generally rely on a
middleware layer called application servers, aka TP monitors, to provide transactions across systems Tuxedo, iPlanet, WebSphere, etc. For atomicity, two-phase commit
protocol For isolation, need distributed
concurrency controlDB DB
TransactServer
TransactServer
WorkflowController
MsgQueue
WebServer
App Server
Client
Two-Phase Commit (2PC) Site at which a transaction originates is the
coordinator; other sites at which it executes are subordinates
Two rounds of communication, initiated by coordinator: Voting
Coordinator sends prepare messages, waits for yes or no votes
Then, decision or termination Coordinator sends commit or rollback messages, waits for
acks Any site can decide to abort a transaction!
42
Steps in 2PCWhen a transaction wants to commit:
Coordinator sends prepare message to each subordinate Subordinate force-writes an abort or prepare log record
and then sends a no (abort) or yes (prepare) message to coordinator
Coordinator considers votes: If unanimous yes votes, force-writes a commit log record
and sends commit message to all subordinates Else, force-writes abort log rec, and sends abort message
Subordinates force-write abort/commit log records based on message they get, then send ack message to coordinator
Coordinator writes end log record after getting all acks
43
Illustration of 2PCCoordinator Subordinate 1Subordinate 2force-write begin log entry
force-write prepared log entryforce-writeprepared log entry
send “prepare” send “prepare”
send “yes”send “yes”force-writecommit log entry
send “commit” send “commit”force-writecommit log entryforce-writecommit log entry
send “ack”send “ack”writeend log entry
Comments on 2PC Every message reflects a decision by the sender;
to ensure that this decision survives failures, it is first recorded in the local log
All log records for a transaction contain its ID and the coordinator’s ID The coordinator’s abort/commit record also includes IDs
of all subordinates Thm: there exists no distributed commit protocol
that can recover without communicating with other processes, in the presence of multiple failures!
What if a Site Fails in the Middle? If we have a commit or abort log record for transaction T,
but not an end record, we must redo/undo T If this site is the coordinator for T, keep sending commit/abort
msgs to subordinates until acks have been received If we have a prepare log record for transaction T, but not
commit/abort, this site is a subordinate for T Repeatedly contact the coordinator to find status of T, then write
commit/abort log record; redo/undo T; and write end log record If we don’t have even a prepare log record for T,
unilaterally abort and undo T This site may be coordinator! If so, subordinates may send
messages and need to also be undone
Blocking for the Coordinator If coordinator for transaction T fails,
subordinates who have voted yes cannot decide whether to commit or abort T until coordinator recovers T is blocked Even if all subordinates know each other (extra
overhead in prepare msg) they are blocked unless one of them voted no
Link and Remote Site Failures If a remote site does not respond during the
commit protocol for trnasaction T, either because the site failed or the link failed: If the current site is the coordinator for T, should
abort T If the current site is a subordinate, and has not
yet voted yes, it should abort T If the current site is a subordinate and has voted
yes, it is blocked until the coordinator responds!
Observations on 2PC Ack msgs used to let coordinator know
when it’s done with a transaction; until it receives all acks, it must keep T in the transaction-pending table
If the coordinator fails after sending prepare msgs but before writing commit/abort log recs, when it comes back up it aborts the transaction
49
From Distributed Commits toDistributed Concurrency Control What we saw were the steps involved in
preserving atomicity and consistency in a distributed fashion
Let’s briefly look at distributed isolation (locking)…
Distributed LockingHow do we manage locks across many
sites? Centralized: One site does all locking
Vulnerable to single site failure Primary Copy: All locking for an object done at
the primary copy site for this object Reading requires access to locking site as well as site
where the object is stored Fully Distributed: Locking for a copy done at
site where the copy is stored Locks at all sites holding the object being written
Distributed Deadlock Detection
Each site maintains a local waits-for graph A global deadlock might exist even if the local
graphs contain no cycles:T1 T1 T1T2 T2 T2
SITE A SITE B GLOBAL
Three solutions: Centralized (send all local graphs to one site) Hierarchical (organize sites into a hierarchy and
send local graphs to parent in the hierarchy) Timeout (abort transaction if it waits too long)
52
Summary of Transactions and Concurrency There are many (especially monetary) transfers
that need atomicity and isolation Transactions and concurrency control provide
these features In a distributed, 3-tier setting they run in an Application
Server Similar features are provided in a 2-tier setting for
applications that run directly in the DBMS Two-phase locking ensures isolation Two-phase commit is a voting scheme for doing
distributed commit