Upload
dot
View
101
Download
0
Embed Size (px)
DESCRIPTION
Database Replication in Tashkent. CSEP 545 Transaction Processing Sameh Elnikety. Replication for Performance. Expensive Limited scalability. DB Replication is Challenging. Single database system Large, persistent state Transactions Complex software Replication challenges - PowerPoint PPT Presentation
Citation preview
Database Replication in Tashkent
CSEP 545 Transaction ProcessingSameh Elnikety
Replication for PerformanceExpensiveLimited scalability
2
DB Replication is Challenging• Single database system
– Large, persistent state– Transactions– Complex software
• Replication challenges– Maintain consistency – Middleware replication
3
BackgroundReplica 1StandaloneDBMS
4
Background
Replica 2
Replica 1
Replica 3
Load Balancer
5
Read Tx
Replica 2
Replica 1
Replica 3
Load Balancer
T
Read tx does not change DB state
6
Update tx changesDB state
Update Tx 1/2
Replica 2
Replica 1
Replica 3
Load Balancer
Twsws
7
Update tx changesDB state
Update Tx 1/2
Replica 2
Replica 1
Replica 3
Load Balancer
Tws
Apply (or commit) T everywhere
ws
ws
ws
Example:T1: { set x = 1 }
8
ws
Ordering
Update Tx 2/2
Replica 2
Replica 1
Replica 3
Load Balancer
Tws
Update tx changesDB state
ws
Tws
ws
9
Ordering
Update Tx 2/2
Replica 2
Replica 1
Replica 3
Load Balancer T
Update tx changesDB state
T
ws
ws
ws ws
ws
ws
ws
ws
Replica 3 Example:T1: { set x = 1 }T2: { set x = 7 }
Commit updates in order
10
Ordering
Sub-linear Scalability Wall
Replica 2
Replica 1
Replica 3
Load Balancer T
T
ws
ws
ws ws
ws
ws
ws
ws
Replica 3
11
Replica 4
• General scaling techniques– Address fundamental bottlenecks– Synergistic, implemented in middleware– Evaluated experimentally
This Talk
12
Super-linear Scalability
Single Base United MALB UF0
20
40
60
80
100
120
TP S
12 X
25 X
37 X
1 X
7 X
Big Picture: Let’s OversimplifyStandaloneDBMSR reading
updatelogging
U
14
readingupdatelogging
Big Picture: Let’s Oversimplify
Replica 1/N (traditional)
StandaloneDBMSR reading
updatelogging
U
N.RN.U
RU
(N-1).ws
15
readingupdatelogging
readingupdatelogging
Big Picture: Let’s Oversimplify
Replica 1/N (traditional)
Replica 1/N (optimized)
StandaloneDBMS
16
R readingupdatelogging
U
N.RN.U
RU
(N-1).ws
N.RN.U
R*U*
(N-1).ws*
readingupdatelogging
readingupdatelogging
Big Picture: Let’s Oversimplify
Replica 1/N (traditional)
Replica 1/N (optimized)
StandaloneDBMS
17
R readingupdatelogging
U
N.RN.U
RU
(N-1).ws
N.RN.U
R*U*
(N-1).ws*
MALBUpdate FilteringUniting O & D
Key Points1. Commit updates in order
– Perform serial synchronous disk writes– Unite ordering and durability
2. Load balancing– Optimize for equal load: memory contention– MALB: optimize for in-memory execution
3. Update propagation– Propagate updates everywhere– Update filtering: propagate to where needed
18
Tx A
Roadmap
Replica 2
Replica 1
Replica 3
Load Balancer
12, 3Ordering
Load balancing
Update propagation
Commit updates in
order 19
• Traditionally: – Commit ordering and durability are separated
• Key idea: – Unite commit ordering and durability
Key Idea
20
All Replicas Must Agree• All replicas agree on
– which update tx commit– their commit order
• Total order – Determined by middleware – Followed by each replica
durability
Replica 3
Tx A
Tx Bdurability
Replica 2
durability
Replica 1
21
Tx B
durability
Replica 3
Ordering
Tx A
Order Outside DBMSTx A
Tx Bdurability
Replica 2
durability
Replica 1
22
Tx B
durability
Replica 3
Ordering
Tx A
A B
A B
Order Outside DBMSTx A
Tx Bdurability
Replica 2
A B
durability
Replica 1
A B
A B
A B
A B
23
Ordering
A B DBMS
durability
Replica 3
Proxy
Tx A
Tx B
SQL interface
Task A
Task B
Enforce External Commit Order
24
Ordering
A B DBMS
durability
Replica 3
Proxy
Tx A
Tx B
SQL interface
Task A
Task B
B A
Enforce External Commit Order
25
Ordering
A B DBMS
durability
Replica 3
Proxy
Tx A
Tx B
SQL interface
Task A
Task B
B A
Cannot commit A & B concurrently!
Enforce External Commit Order
26
Ordering
A B
durability
Replica 3
Proxy
Tx A
Tx B
SQL interface
Task A
Task B
A
Enforce Order = Serial Commit
DBMS
27
Ordering
A B
durability
Replica 3
Proxy
Tx A
Tx B
SQL interface
Task A
Task B
A B
Enforce Order = Serial Commit
DBMS
28
Commit Serialization is Slow
DurabilityA
Proxy
DBMS
durability
CPU
OrderingA B C
Commit orderA B C
DurabilityA B
CPU
DurabilityA B C
CPU
Com
mit A
Com
mit B
Com
mit C
Ack
A
Ack
B
Ack
C
29
Commit Serialization is Slow
DurabilityA
Proxy
DBMS
durability
CPU
OrderingA B C
Commit orderA B C
DurabilityA B
CPU
DurabilityA B C
CPU
Com
mit A
Com
mit B
Com
mit C
Ack
A
Ack
B
Ack
C
Problem: Durability & ordering separated → serial disk writes
30
Com
mit A
Com
mit B
Com
mit C
Ack
A
Ack
B
Ack
C
Unite D. & O. in Middleware
Proxy
DBMS
CPU
OrderingA B C
Commit orderA B C
CPU
DurabilityA B C
CPU
durabilityOFF
durability
31
Com
mit A
Com
mit B
Com
mit C
Ack
A
Ack
B
Ack
C
Unite D. & O. in Middleware
Proxy
DBMS
CPU
OrderingA B C
Commit orderA B C
CPU
DurabilityA B C
CPU
durabilityOFF
durability
Solution: Move durability to MW Durability & ordering in middleware → group commit
32
• Middleware logs tx effects– Durability of update tx
• Guaranteed in middleware• Turn durability off at database
• Middleware performs durability & ordering– United → group commit → fast
• Database commits update tx serially– Commit = quick main memory operation
Implementation: Uniting D & O in MW
33
Uniting Improves Throughput• Metric
– Throughput• Workload
– TPC-W Ordering (50% updates)
• System– Linux cluster – PostgreSQL– 16 replicas– Serializable exec. Single Base United MALB UF
0
5
10
15
20
25
30
35
40
TPC-W
1 X
12 X
7 X
TP S
Tx A
Roadmap
Replica 2
Replica 1
Replica 3
Load Balancer
1Ordering
2, 3
Load balancing
Update propagation
Commit updates in
order 35
Key IdeaReplica 1
Mem
Disk
Replica 2Mem
Disk
Load Balancer
Equal load on replicas
36
Key IdeaReplica 1
Mem
Disk
Replica 2Mem
Disk
Load Balancer
Equal load on replicas
MALB: (Memory-Aware Load Balancing)Optimize for in-memory execution 37
How Does MALB Work?Database 21 3
Workload A →
B →
MemMemory
21
2 3
38
A, B, A, B
A, B, A, B
Read Data From Disk
A, B, A, B
Replica 1Mem
Disk21 3
Replica 2Mem
Disk21 3
LeastLoaded
31
A →
B →21
2 3
39
A, B, A, B
A, B, A, B
Read Data From Disk
A, B, A, B
Replica 1Mem
Disk21 3
Replica 2Mem
Disk21 3
LeastLoaded
31
Slow
Slow
A →
B →21
2 3
40
21 331
21 331
Data Fits in MemoryReplica 1
Mem
Disk21 3
Replica 2Mem
Disk21 3
MALB
A →
B →21
2 3A, A, A, A
B, B, B, B
A, B, A, B
41
Data Fits in MemoryReplica 1
Mem
Disk21 3
21
Replica 2Mem
Disk21 3
32
MALB
Fast
Fast
A →
B →21
2 3A, A, A, A
B, B, B, BMemory info?Many tx and replicas?
A, B, A, B
42
• Exploit tx execution plan– Which tables & indices are accessed– Their access pattern
• Linear scan, direct access• Metadata from database
– Sizes of tables and indices
Estimate Tx Memory Needs
43
• Objective– Construct tx groups that fit together in memory
• Bin packing– Item: tx memory needs– Bin: memory of replica– Heuristic: Best Fit Decreasing
• Allocate replicas to tx groups– Adjust for group loads
Grouping Transactions
44
MALB in Action
A B CD E F
MALB
45
MALB in Action
A B CD E F
MALB
Memory needs forA, B, C, D, E, F
46
Group A
MALB in Action
A B CD E F Group B C
Group D E F
MALB
Memory needs forA, B, C, D, E, F
47
Group A
MALB in Action
A B CD E F Replica
Replica
Replica
Group B C
A
Group D E F
B C
D E F
MALB
Disk
Disk
Disk
Memory needs forA, B, C, D, E, F
48
• Objective– Optimize for in-memory execution
• Method– Estimate tx memory needs– Construct tx groups– Allocate replicas to tx groups
MALB Summary
49
• Implementation– No change in consistency – Still middleware
• Compare– United: efficient baseline system– MALB: exploits working set information
• Same environment– Linux cluster running PostgreSQL– Workload: TPC-W Ordering (50% update txs)
Experimental Evaluation
50
MALB Doubles Throughput
TPC-WOrdering16 replicas
51
Single Base United MALB UF0
20
40
60
80
100
120
TP S
105%
12 X
25 X
1 X
7 X
MALB Doubles Throughput
52United MALB
0.0
0.2
0.4
0.6
0.8
1.0
Single Base United MALB UF0
20
40
60
80
100
120
TP S
Rea
d I/O
, nor
mal
ized
105%
12 X
25 X
1 X
7 X
BigSmall
Big
Small
MemSize
DBSize
Big Gains with MALB
4%0%29%
48%105%45%
182%75%12%
BigSmall
Big
Small
MemSize
DBSize
Big Gains with MALB
4%0%29%
48%105%45%
182%75%12%
Run from memory
Run from disk
Tx A
Roadmap
Replica 2
Replica 1
Replica 3
Load Balancer
1Ordering
2, 3
Load balancing
Update propagation
Commit updates in
order 55
• Traditional: – Propagate updates everywhere
• Update Filtering: – Propagate updates to where they are needed
Key Idea
56
Update Filtering ExampleReplica 1
Mem
Disk21 3
Replica 2Mem
Disk21 3
MALBUF
A →
B →21
2 3
A, B, A, B
57
Group A
Update Filtering ExampleReplica 1
Group B
Mem
Disk21 3
21
Replica 2Mem
Disk21 3
32
MALBUF
A →
B →21
2 3
A, B, A, B
58
Group A
Update Filtering Example
Disk
Replica 1
Group B
Mem
21
21
Replica 2Mem
Disk21 3
2
MALBUF
Updatetable 1
3
3
A →
B →21
2 3
A, B, A, B
59
Group A
Update Filtering Example
Disk
Replica 1
Group B
Mem
21
21
Replica 2Mem
Disk21 3
2
MALBUF
Updatetable 1
3
3
A →
B →21
2 3
A, B, A, B
60
Group A
Update Filtering Example
Disk
Replica 1
Group B
Mem
21
21
Replica 2Mem
Disk2 3
2
MALBUF
Updatetable 1
3
3
A →
B →21
2 3
A, B, A, B
611
Group A
Update Filtering Example
Disk
Replica 1
Group B
Mem
21
21
Replica 2Mem
Disk21 3
2
MALBUF
Updatetable 1
3
Updatetable 3
3
A →
B →21
2 3
A, B, A, B
62
Group A
Update Filtering Example
Disk
Replica 1
Group B
Mem
21
21
Replica 2Mem
Disk21 3
2
MALBUF
Updatetable 1
3
Updatetable 3
3
A →
B →21
2 3
A, B, A, B
63
Group A
Update Filtering Example
Disk
Replica 1
Group B
Mem
21
21
Replica 2Mem
Disk21 3
2
MALBUF
Updatetable 1
3
Updatetable 3
3
A →
B →21
2 3
A, B, A, B
64
Update Filtering in Action
UF
65
Update Filtering in Action
UF
Update tored table
66
Update Filtering in Action
UF
Update tored table
Update togreen table
67
Update Filtering in Action
UF
Update tored table
Update togreen table
68
Update Filtering in Action
UF
Update tored table
Update togreen table
69
Single Base United MALB UF0
20
40
60
80
100
120
MALB+UF Triples Throughput37 X
TP S
12 X
25 X
1 X
7 X
49% TPC-WOrdering16 replicas
MALB UF0
2
4
6
8
10
12
14
Single Base United MALB UF0
20
40
60
80
100
120
MALB+UF Triples Throughput37 X
TP S
12 X
25 X
1 X
7 X Prop
. Upd
ates
15
7
49%
1.49
0
0.5
1
1.5
2
MALB MALB+UF
Filtering Opportunities
50%Ordering Mix
5% Browsing Mix
1.02
0
0.5
1
1.5
2
MALB MALB+UF
Updates
Rat
io M
ALB
+UF
/ MA
LB
72
Conclusions1. Commit updates in order
– Perform serial synchronous disk writes– Unite ordering and durability
2. Load balancing– Optimize for equal load: memory contention– MALB: optimize for in-memory execution
3. Update propagation– Propagate updates everywhere– Update filtering: propagate to where needed
73