Upload
hector
View
26
Download
0
Embed Size (px)
DESCRIPTION
ICS 214B: Transaction Processing and Distributed Data Management. Replication Techniques. Data Replication. Databasenode 1node 2node 3. item. fragment. Goals for Data replication Higher throughput Better response time Higher availability. - PowerPoint PPT Presentation
Citation preview
1
ICS 214B: Transaction Processing and Distributed Data Management
Replication Techniques
ICS214B Notes 14 2
Data Replication
Database node 1 node 2 node 3
item
fragment
ICS214B Notes 14 3
• Goals for Data replication– Higher throughput– Better response time– Higher availability
• Assumption: Reliable network, fail-stop nodes
ICS214B Notes 14 4
Basic Solution (for Concurrency Control)
• Treat each copy as an independent data item
X1 X2 X3
Lockmgr X3
Lockmgr X2
Lockmgr X1
Txi Txj Txk
Object X has copies X1, X2, X3
ICS214B Notes 14 5
• Read(X):– get shared X1 lock– get shared X2 lock– get shared X3 lock– read one of X1, X2, X3– at end of transaction, release X1, X2, X3
locks
X1
lockmgr
X3
lockmgr
X2
lockmgr
read
ICS214B Notes 14 6
• Write(X):– get exclusive X1 lock– get exclusive X2 lock– get exclusive X3 lock– write new value into X1, X2, X3– at end of transaction, release X1, X2, X3
locks
X1 X3X2
lock locklock
write write write
ICS214B Notes 14 7
• RAWA (read lock all, write lock all)• Correctness OK
– 2PL serializability
• Problem: Low availability
X1 X3X2
down!
cannot access X!
ICS214B Notes 14 8
• Readers lock and access a single copy
• Writers lock all copies and update all copies
Basic Solution — Improvement
X1 X2 X3
reader has lock writer will conflict!
ICS214B Notes 14 9
ROWA
• Good availability for reads
• Poor availability for writes
ICS214B Notes 14 10
Reminder
• With basic solution– use standard 2PL– use standard commit protocols (e.g.,
2PC)
ICS214B Notes 14 11
Another approach: Primary copy
• Select primary site• All read/write locks requested at the primary site• All data read at primary• Updates are propagated to backups (using sequence
numbers)• Probability a transaction blocks = prob. that primary is
down• Availability is the same as the primary node
X1 * X2 X3reader
writelock
writer
ICS214B Notes 14 12
Update Propagate Options
• Immediate Write– Write all copies at the same time
• Deferred Write– Wait until commit
ICS214B Notes 14 13
Electing a new primary
• Elect New Primary Pnew
• Make sure Pnew has committed T0,…,Tk and has locks for Tk+1,…,Tm
• Therefore, before each transaction Ti commits, we should ensure backups (potential future primaries) have Ti’s updates
PRIMARY Backup 1 Backup n
Committed: T1, T2, …, TkPending: Tk+1, …, Tm
ICS214B Notes 14 14
Distributed Commit
X1* X2 X3
prepare to write new value
lock, computenew value
writer
ok
commit (write)
ICS214B Notes 14 15
Example
X1: 0Y1: 0Z1: 0
* X2: 0Y2: 0Z2: 0
T1: X 1; Y 1;T2: Y 2;T3: Z 3;
ICS214B Notes 14 16
Local Commit
X1 * X2 X3
propagate update
lock,write, commit
writer
Write(X):• Get exclusive X1* lock• Write new value into X1*
• Commit at primary; get sequence number
• Perform X2, X3 updates in sequence-number order
...
...
ICS214B Notes 14 17
Example t = 0
X1: 0Y1: 0Z1: 0
* X2: 0Y2: 0Z2: 0
T1: X 1; Y 1;T2: Y 2;T3: Z 3;
ICS214B Notes 14 18
Example t = 1
X1: 1Y1: 1Z1: 3
* X2: 0Y2: 0Z2: 0
T1: X 1; Y 1;T2: Y 2;T3: Z 3;
active at node 1waiting for lock at node 1active at node 1
ICS214B Notes 14 19
Example t = 2
X1: 1Y1: 2Z1: 3
* X2: 0Y2: 0Z2: 0
T1: X 1; Y 1;T2: Y 2;T3: Z 3;
committedactive at 1committed
#2: X 1; Y 1
#1: Z 3
ICS214B Notes 14 20
Example t = 3
X1: 1Y1: 2Z1: 3
* X2: 1Y2: 2Z2: 3
T1: X 1; Y 1;T2: Y 2;T3: Z 3;
committedcommittedcommitted
#1: Z 3
#2: X 1; Y 1
#3: Y 2
ICS214B Notes 14 21
What good is RPWP-LC?
primary backup backup
updatescan’t read!
Answer: Can read “out-of-date” backup copy (also useful with 1-safe backups... later)
ICS214B Notes 14 22
Basic Solution
• Read lock all; write lock all: RAWA• Read lock one; write lock all:
ROWA• Read and write lock primary: RPWP
– local commit: LC– distributed commit: DC
ICS214B Notes 14 23
ComparisonN = number of nodes with copiesP = probability that a node is operational
RAWA p^N p^NROWA p^N (*) p^N (*)RPWP:LC p (**) pRPWP:DC p p
Probabilitycan read
Probabilitycan write
• *: assuming the read/write waits until all sites are active• **: assuming we don’t care if other backups are active
ICS214B Notes 14 24
Failures in Primary Copy model
• Backup fails– Asks primary for “lost” updates
• Primary fails– Data unavailable
ICS214B Notes 14 25
Mobile Primary
(1) Elect new primary(2) Ensure new primary has seen all
previously committed transactions(3) Resolve pending transactions(4) Resume processing
primary backup backup
ICS214B Notes 14 26
(1) Elections
• Similar to 3PC termination protocol– Nodes have IDs– Largest (or smallest) ID wins
ICS214B Notes 14 27
(2) Ensure new primary has previously committed transactions
primary new primary backup
committed:T1, T2
need to getand apply:
T1, T2
cannot use local commit
ICS214B Notes 14 28
Failed Nodes: Examplenow: primary backup backup
-down- -down-
later:-down- -down- -down-
later primary backup
still:
P1 P2 P3
P1 P2 P3
P1 P2 P3-down-
commits T1
commit T2 (unaware of T1!)
ICS214B Notes 14 29
(3) Resolve pending transactions
primary new primary backup
T3? T3 in“W” state
should not use blocking commit
T3 in“W” state
ICS214B Notes 14 30
3PC takes care of this problem!
• Option A– Failed node waits for:
commit info from active node, or all nodes are up and recovering
• Option B– Majority voting
ICS214B Notes 14 31
Node Recovery
• All transactions have commitsequence number
• Active nodes save update values“as long as necessary”
• Recovering node asks active primary formissed updates; applies in order
ICS214B Notes 14 32
Example: Majority Commit
C1 C2 C3
C T1,T2,T3 T1,T2 T1,T3P T3 T2W T4 T4 T4
state:
ICS214B Notes 14 33
Example: Majority Commit
t1: C1 failst2: C2 new primaryt3: C2 commits T1, T2, T3; aborts T4t4: C2 resumes processingt5: C2 commits T5, T6t6: C1 recovers; asks C2 for latest statet7: C2 sends committed and pending
transactions; C2 involves C1 in any future transactions
ICS214B Notes 14 34
2-safe vs. 1-safe Backups
• Distributed commit results in 2-safe backups
primary backup 1 backup 2
(1) T end work
(2) send data
(3) get acks
(4) commit
tim
e
ICS214B Notes 14 35
Guarantee
• After transaction T commits at primary,any future primary will “see” T
primary backup 1 backup 2
primary next primary backup 2
now:
T1, T2, T3
T1, T2, T3
later:
ICS214B Notes 14 36
Performance Hit• 3PC is very expensive
– many messages– locks held longer (less concurrency)
[Note: group commit may help]
• Can use 2PC– may have blocking– 2PC still expensive
[up to 1 second reported]
ICS214B Notes 14 37
Alternative: 1-safe• Commit transactions unilaterally at primary• Send updates to backups as soon as possible
primary backup 1 backup 2
(1) T end work
(2) T commit
(3) send data
(4) purge data
tim
e
ICS214B Notes 14 38
Problem: Lost Transactions
primary backup 1 backup 2
primary next primary backup 2
now:
T1, T2, T3
T1, T4, T5
later:
T1 T1
T1, T4
ICS214B Notes 14 39
Claim
• Lost transaction problem tolerable– failures rare– only a “few” transactions lost
ICS214B Notes 14 40
Primary Recovery with 1-safe• When failed primary recovers, need to
“compensate” for missed transactions
primary backup 2
backup 3 next primary backup 2
now:
T1, T2, T3
T1, T4, T5
later:
T1, T4, T5 T1, T4
T1, T4, T5
next primary
T1, T2, T3,T3-1, T2-1, T4, T5
compensation
ICS214B Notes 14 41
Log Shipping
• “Log shipping:” propagate updates to backup
• Backup replays log• How to replay log efficiently?
– e.g., elevator disk sweeps– e.g., avoid overwrites
primary backuplog
ICS214B Notes 14 42
So Far in Data Replication
• RAWA• ROWA• Primary copy
– static local commit distributed commit
– mobile primary 2-safe (blocking or non-blocking) 1-safe
ICS214B Notes 14 43
Correctness with replicated data
S1: r1[X1] r2[X2] w1[X1] w2[X2] Is this schedule serializable?
X1 X2
ICS214B Notes 14 44
One copy serializable (1SR)
A schedule S on replicated data is 1SR if it is equivalent to a serial history of the same transactions on a one-copy database
ICS214B Notes 14 45
To check 1SR
• Take schedule• Treat ri[Xj] as ri[X] Xj is copy of
X wi[Xj] as wi[X]
• Compute P(S)• If P(S) acyclic, S is 1SR
ICS214B Notes 14 46
S1: r1[X1] r2[X2] w1[X1] w2[X2] S1’: r1[X] r2[X] w1[X] w2[X]
S1 is not 1SR!
Example
T1T2
T2T1
ICS214B Notes 14 47
Second exampleS2: r1[X1] w1[X1] w1[X2]
r2[X1] w2[X1] w2[X2]
S2’: r1[X] w1[X] w1[X]
r2[X] w2[X] w2[X]
P(S2):
ICS214B Notes 14 48
Second exampleS2: r1[X1] w1[X1] w1[X2]
r2[X1] w2[X1] w2[X2]
S2’: r1[X] w1[X] w1[X]
r2[X] w2[X] w2[X]
• Equivalent serial schedule
SS:
ICS214B Notes 14 49
Next lecture
• Multiple fragments• Network partitions