View
228
Download
4
Tags:
Embed Size (px)
Citation preview
CSS434 Replication 1
CSS434 Distributed Transactions and CSS434 Distributed Transactions and ReplicationReplication
Textbook Ch 14 - 15Textbook Ch 14 - 15
Professor: Munehiro Fukuda
CSS434 Replication 2
Outline Distributed transaction
Two-phase commitment protocol File replication
Group communication revisited Primary copy replication Active replication
Read-any-write-all protocol Available copy protocol Quorum-based protocol
CSS434 Replication 3
Distributed TransactionExample: Banking Transaction
..
BranchZ
BranchX
participant
participant
C
D
Client
BranchY
B
A
participant join
join
join
T
a.withdraw(4);
c.deposit(4);
b.withdraw(3);
d.deposit(3);
openTransaction
b.withdraw(T, 3);
closeTransaction
T = openTransaction a.withdraw(4); c.deposit(4); b.withdraw(3); d.deposit(3);
closeTransaction
Note: the coordinator is in one of the servers, e.g. BranchX
CSS434 Replication 4
Transaction Commitment How can all participant servers either commit
a transaction or abort it? One-phase atomic commit protocol
The coordinator keep requesting all participants to commit until they return an acknowledgment.
No chance of a participant to initiate an abort. Two-phase commit protocol
Phase 1: calls for participants’ vote. Phase 2: Complete a commit or an abort according to out
put of vet.
CSS434 Replication 5
Two-Phase Commit Protocol
operations
canCommit?(trans)-> Yes / NoCall from coordinator to participant to ask whether it can commit a transaction. Participant replies with its vote.
doCommit(trans) Call from coordinator to participant to tell participant to commit its part of a transaction.
doAbort(trans) Call from coordinator to participant to tell participant to abort its part of a transaction.
haveCommitted(trans, participant) Call from participant to coordinator to confirm that it has committed the transaction.
getDecision(trans) -> Yes / NoCall from participant to coordinator to ask for the decision on a transaction after it has voted Yes but has still had no reply after some delay. Used to recover from server crash or delayed messages.
CSS434 Replication 6
Two-Phase Commit ProtocolCommunication
canCommit?
Yes
doCommit
haveCommitted
Coordinator
1
3
(waiting for votes)
committed
done
prepared to commit
step
Participant
2
4
(uncertain)
prepared to commit
committed
statusstepstatus
CSS434 Replication 7
Two-Phase Commit ProtocolState Transition
INIT
READY
ABORT COMMIT
CanCommit?Vote-Yes
doCommitAck
doAbortAck
CanCommit?Vote-No
Worker 2INIT
WAIT
ABORT COMMIT
Client_wants_to_commitCanCommit?
Vote-NodoAbort
Vote-YesdoCommit
Coordinator
INIT
READY
ABORT COMMIT
CanCommit?Vote-Yes
doCommitAck
doAbortAck
CanCommit?Vote-No
Worker 1
Another possible cases:The coordinator didn’t receive all vote-Yes. → Time out and send a doAbort.A worker didn’t receive a CanCommit?. → All workers eventually receive a doAbort.A worker didn’t receive a doCommit. → Time out and check the other work’s status.
CSS434 Replication 8
File ReplicationConcepts
Difference between replication and caching A replica is associated with a server, whereas a cache with
client. A replicate focuses on availability, while a cache on locality A replicate is more persistent than a cache is A cache is contingent upon a replica
Advantages Increased availability/reliability Performance enhancement (response time and network traffic) Scalability and autonomous operation
Requirements Naming: no need to be aware of multiple replicas. Consistency: data consistency among replicated files. Replication control: explicit v.s. implicit/lazy replication ACID: Atomicity, Consistency, Isolation, and Durability
CSS434 Replication 9
File ReplicationBasic Architectural Model
1. Request: send a client request to a server.
2. Coordination: deliver the request to each replica manger in some order.
3. Execution: process a client request but not permanently commit it.
4. Agreement: agree if the execution will be committed (ex. Two-phase commit protocol)
5. Response: respond to the front end
Client
Client
ReplicaManger
ReplicaManger
ReplicaManger
FrontEnd
FrontEnd
Ex: DNS Web server
CSS434 Replication 10
Review: Group Communication
Group membership service Create and destroy a group. Add or withdraw a replica
manager to/from a group. Detect a failure. Notify members of group
membership changes. Provide clients with a group
address. Message delivery
Absolute ordering Consistent ordering
ReplicaManger
ClientReplicaManger
ReplicaManger
ReplicaManger
group
CSS434 Replication 11
Review: Group Communication
Example: ISIS
Group view
p1
p2
p3
p4
Joins the group
multicast
multicast
crashed
Multicast toavailable processes
multicastrejoins
In ISIS, if P4 receives this partially multicast message at the same time whenit knows p3 has been crashed, it forwards it to all the others and immediatelysends a flush message. In other words, P1, P2, and P4 receive this multicastmessage as if P3 was still alive.
Deleted or delivered?
CSS434 Replication 12
Review: Group CommunicationAbsolute Ordering - Linearizability
Rule: Mi must be delivered before mj if Ti < Tj
Implementation: A clock synchronized among machines A sliding time window used to commit
message delivery whose timestamp is in this window.
Example: Distributed simulation
Drawback Too strict constraint No absolute synchronized clock No guarantee to catch all tardy
messages
mi
mi
mj
mj
Tj
Ti
Ti < Tj
CSS434 Replication 13
Review: Group Communication
Total Ordering - Sequential Consistency
Rule: Messages received in the same
order (regardless of their timestamp).
Implementation: A message sent to a sequencer,
assigned a sequence number, and finally multicast to receivers
A message retrieved in incremental order at a receiver
Example: Replicated database update
Drawback: A centralized algorithm
mi
mi
mj
mj
Tj
Ti
Ti < Tj
CSS434 Replication 14
Multi-copy Update Problem Keep in mind the basic architecture and
group communication models, how can we update multiple copies over replica servers? Read-only replication
Allow the replication of only immutable files. Primary backup replication
Designate one copy as the primary copy and all the others as secondary copies.
Active backup replication Access any or all of replicas
Read-any-write-all protocol Available-copies protocol Quorum-based consensus
CSS434 Replication 15
Primary-Copy Replication
Client
Client
ReplicaManger
ReplicaManger
ReplicaManger
FrontEnd
FrontEnd
PrimaryBackup
Backup
1. Request: The front end sends a request to the primary replica.
2. Coordination:. The primary takes the request atomically.
3. Execution: The primary executes and stores the results.
4. Agreement: The primary sends the updates to all the backups and receives an ask from them.
5. Response: reply to the front end. Advantage: an easy implementation, line
arizable, coping with n-1 crashes. Disadvantage: large overhead especially
if the failing primary must be replaced with a backup. Ex: Sun NIS (Yellow Page)
CSS434 Replication 16
Active Replication1. Request: The front end multicasts to
all replicas.2. Coordination:. All replica take the req
uest in the sequential order.3. Execution: Every replica executes the
request.4. Agreement: No agreement needed.5. Response: Each replies to the front. Advantage: achieve sequential consi
stency, cope with (n/2 – 1) byzantine failures
Disadvantage: no more linearizable
Client
Client
ReplicaManger
ReplicaManger
ReplicaManger
FrontEnd
FrontEnd
CSS434 Replication 17
Read-Any-Write-All Protocol
Read Lock any one of replicas for
a read Write
Lock all of replicas for a write
Sequential consistency Intolerable for even 1 failing
replica upon a write.
Client ReplicaManger
ReplicaManger
ReplicaManger
FrontEnd
Read from any one of them
Write to all of them
Client ReplicaManger
FrontEnd
CSS434 Replication 18
Available-Copies Protocol Read
Lock any one of replicas for a read
Write Lock all available replicas
for a write Recovering replica
Bring itself up to date by coping from other servers before accepting any user request.
Better availability Cannot cope with network
partition. (Inconsistency in two sub-divided network groups)
Client ReplicaManger
ReplicaManger
ReplicaManger
FrontEnd
Read from any one of them
Write to all available replicats
XClient Replica
Manger
FrontEnd
CSS434 Replication 19
Available Copies ProtocolExample 1: Gossip
RMk
RMj(Tj)
RMi(Ti)
FE(Tf)
Client
FE
Client
Query
Query, Tf Value, Ti
Value
Update, Tf
Update
Update id
Gossip
If (Tf < Ti) return valueelse { waits for RMi to be updated or query RMj/RMk}
If (Tf > Tj) update RMjelse { update Client or ignore and update RMj}
If (Tj > Tk) update RMkelse discard the gossip message
Categorized in lazy availablecopies protocol
Tardy messages are ignored
CSS434 Replication 20
To make a tentative update committed:
Perform a dependency check
Check conflicts Check priority
Merge Procedure Cancel tentative updates Change tentative updates
T0 Tn+1TnT3T2T1CNC2C1C0
Committed Tentative
Primary
RM RM
FE
Client
FE
Client
TnT3
Secretary and other employees: book 3pm
Executive: book 3pm
Sent first
Sent later
FE
Client
T1
FE
Client
T0
Available Copies ProtocolExample 2: Bayou
Categorized in lazy availablecopies protocol
Tardy messages are reorderedor merged.
CSS434 Replication 21
Network PartitionsWell-known Solution: Quorum-Based Protocols
Client
ReplicaManger
ReplicaManger
FrontEnd
Client
ReplicaManger
FrontEnd
ReplicaManger
ReplicaManger
ReplicaManger
ReplicaManger
ReplicaManger
Read quorum
Write quorum
#replicas in read quorum + #replicas in write quorum > n Read
Retrieve the read quorum Select the one with the
latest version. Perform a read on it
Write Retrieve the write quorum. Find the latest version and
increment it. Perform a write on the
entire write quorum. If a sufficient number of
replicas from read/write quorum, the operation must be aborted.
Read-any-write-all: r = 1, w = n
CSS434 Replication 22
Network PartitionsSystem example: Coda
Server 3 Server 2 Server 1
1. Normal case:• Read-any, write-all protocol• Whenever a client writes back its file, it increments the file version at each server.
Version[1,1,1] Version[1,1,1] Version[1,1,1]
W
Version[2,2,2] Version[2,2,2] Version[2,2,2]
2. Network disconnection:• A client writes back its file to only available servers.• Version conflicts are detected and resolved automatically when network is reconnected
W WVersion[2,2,3] Version[3,3,2] Version[3,3,2]
3. Client disconnection:• A client caches as many files as possible (in hoard walking).• A client works in local if disconnected (in emulation mode).• A client writes back updated files to servers (in reintegration mode).
hoard
reintegration
emulation
CSS434 Replication 23
Paper Review by Students ISIS System Gossip Architecture Bayou System Coda Discussions
What if a message is lost in ISIS group communication? What if another crash occurs when unstable/flush messages are exchanged?
What performance drawbacks does Gossip have? What problems remain to users in Bayou? Why doesn’t Coda use read/write quorum?
CSS434 Replication 24
Non-Turn-In Exercises1. The following state transition diagram describes the two-phase commitment protoco
l. Let’s assume that worker1 crashed when a coordinate sent a commit message. Trace this diagram. To be specific, make appropriate dashed arrows “thick and solid arrows” with your pen or pencil.
INIT
READY
ABORT COMMIT
CanCommit?Vote-Yes
doCommitAck
doAbortAck
CanCommit?Vote-No
Worker 2INIT
WAIT
ABORT COMMIT
Client_wants_to_commitCanCommit?
Vote-NodoAbort
Vote-YesdoCommit
Coordinator
INIT
READY
ABORT COMMIT
CanCommit?Vote-Yes
doCommitAck
doAbortAck
CanCommit?Vote-No
Worker 1
CSS434 Replication 25
Non-Turn-In Exercises2. Textbook p762, Q17.1: In a decentralized variant of the two-phase
commit protocol, the participants communicate directly with one another instead of indirectly via the coordinator. In phase 1, the coordinator sends its vote to all the participants. In phase 2, if the coordinator’s vote is No, the participants just abort the transaction; if it is Yes, each participant sends its vote to the coordinator and the other participants, each of which decides on the outcome according to the vote and carries it out. Calculate the number of messages and the number of rounds it takes. What are its advantages or disadvantages in comparison with the centralized variant?
3. Textbook p816, Q18.10: Explain why allowing backups to process read operations directly, (i.e., without contacting a primary), leads to sequentially consistent rather than linearizable executions in a primary-copy replication.
4. Textbook p816, Q18.11: Could the gossip architecture be used for a distributed computer game as describe below?
5. The players move figures around a common scene. The state of the game is replicated at the players’ workstations and at a server, which contains services controlling the game overall, such as collision detection. Updates are multicast to all replicas.
6. The quorum-based replication protocol can address network partition problems. Why didn’t Coda use this protocol? Explain the reason.
7. What if a message is lost in ISIS group communication? Describe a solution.