Upload
gaye
View
32
Download
0
Embed Size (px)
DESCRIPTION
LightTx :. A Lightweight Transactional Design in Flash-based SSDs to Support Flexible Transactions. Youyou Lu 1 , Jiwu Shu 1 , Jia Guo 1 , Shuai Li 1 , Onur Mutlu 2. 1 Tsinghua University 2 Carnegie Mellon University. Executive Summary. - PowerPoint PPT Presentation
Citation preview
A Lightweight Transactional Design in Flash-based SSDs to Support Flexible Transactions
Youyou Lu1, Jiwu Shu1, Jia Guo1, Shuai Li1, Onur Mutlu2
LightTx:
1Tsinghua University2Carnegie Mellon University
2
• Problem: Flash-based SSDs support transactions naturally (with out-of-place updates) but inefficiently:– Only a limited set of isolation levels are supported (inflexible)– Identifying transaction status is costly (heavyweight)
• Goal: a lightweight design to support flexible transactions• Observations and Key Ideas:
– Simultaneous updates can be written to different physical pages, and the FTL mapping table determines the ordering=> (Flexibility) make commit protocol page-independent
– Transactions have birth and death, and the near-logged update way enables efficient tracking=> (Lightweight) track recently updated flash blocks, and retire the dead transactions
• Results: up to 20.6% performance improvement, stable GC overhead, fast recovery with negligible persistence overhead
Executive Summary
3
H/W Interface
FTL
Flash Memory Pkg #1
Flash Memory Pkg #0
Flash Memory Pkg #2
Flash Memory Pkg #3
Flash Memory Pkg
Die 0 Die 1
Host Interconnect
Plane 0
Plane 0
Plane 1
Plane 1
• FTL (Flash Translation Layer)• Address mapping, garbage collection, wear leveling
• Out-of-place Update (address mapping)• Pages are updated to new physical pages instead of overwriting original pages
• Internal Parallelism• New pages are allocated from different pkgs/planes
• Page metadata (OOB): (4096 + 224)Bytes
SSD Basics
4
H/W Interface
FTL
Flash Memory Pkg #1
Flash Memory Pkg #0
Flash Memory Pkg #2
Flash Memory Pkg #3
Flash Memory Pkg
Die 0 Die 1
Host Interconnect
Plane 0
Plane 0
Plane 1
Plane 1
• Simultaneous updates and FTL ordering• (Out-of-place update) pages for the same LBA can be
updated simultaneously• (Ordering in mapping table) Only when the mapping table
is updated, the write is visible to the external• Near-logged update way
• Pages are allocated from blocks over different parallel units• Pages are sequentially allocated from each block
Block
Two Observations
5
Outline
• Executive Summary• Background
• Traditional Software Transactions• Existing Hardware Transactions
• LightTx Design• Evaluation• Conclusions
6HDD SSD
Traditional S/W Transaction
Logical ViewLogical View
FTL Mapping Table
• Transaction: Atomicity and Durability• Software Transaction
– Duplicate writes– Synchronization for ordering
Data Area Log Area
Data Area Log Area
7
We have both old and new versions in the SSD (out-of-place update).
Why shall we write the log?Why not support transactions inside the SSD?
8
Existing H/W Approaches• Atomic-Write [HPCA’11]
– Log-structured FTL – Commit protocol: Tag the last page “1”, while the others “0”– Limited Parallelism: one tx at a time– High mapping persistence overhead: persistence on each commit
• SCC/BPCC (Cyclic commit protocols) [OSDI’08]– Commit Protocol: Link all flash pages in a cyclic list by keeping pointers
in page metadata– High overhead in differentiate broken cyclic lists for partial erased
committed txs and aborted txs– SCC forces aborted pages erased before writing the new one– BPCC delays the erase of pages to its previous aborted pages are
erased– Limited Parallelism: txs without overlapped accesses are allowed
[HPCA’11] Beyond block i/o: Rethinking traditional storage primitives[OSDI’08] Transactional flash
9
Problems:• Tx support is inflexible (limited parallelism)
– Cannot meet the flexible demands from software– Cannot fully exploit the internal parallelism of SSDs
• Tx state tracking causes high overhead in the device
Our Goal: A lightweight design
to support flexible transactions
10
Outline
• Executive Summary• Background• LightTx Design
• Design Overview• Page Independent Commit Protocol• Zone-based Transaction State Tracking
• Evaluation• Conclusions
11
Goal
Flexible• Page-independent commit protocol: support
simultaneous updates, to enable flexible isolation level choices in the system
Lightweight• Zone-based transaction state tracking scheme: track
only blocks that have live txs and retire the dead ones, to reduce lower the cost
A lightweight design to support flexible transactions
12
Page-independent Commit Protocol• Observations:
– Simultaneous Updates (Out-of-place update)
– Version order (FTL mapping table)
A B C D E
1
2
3
Version number
Page
• How to support this?– Extend the storage
interface– Make commit protocol
page-independent
13
Design Overview
FTL Mapping Table
Active TxTable
Garbage Collection
Wear Levelling
Free Blocks and Zone Mgmt.
Read/Write Cache
Data AreaMapping
Table
Applications
File SystemsDatabase Systems
READ, WRITE, BEGIN, COMMIT, ABORT
FTL
Flash Media
Recovery Logic
Commit Logic
• Transaction Primitive– BEGIN(TxID)– COMMIT(TxID)– ABORT(TxID)– WRITE(TxID,
LBA, len …)
14
Page-independent Commit Protocol
• Transactional metadata: <TxID, TxCnt, TxVer>– TxID– TxCnt: (00…0N)– TxVer: commit sequence
• Keep it in the page metadata of each flash page
Page Data
LPN TxID TxCnt TxVer ECC
Page Metadata (OOB)
T0, 0
T0,0
T0,0
T0,0
T0,5
T1,0
T1,2
T3,1
A B C D E
1
2
3
Version number
Page
15
Zone-based Transaction State Tracking• Transaction Lifetime
Active
Live
BEGIN COMMIT/ABORT CHECKPOINT
Completed
Dead
ERASED
• Writes appended in the free flash blocks
– Retire the dead: write back the mapping table, and remove the dead from tx state maintenance
– Track the recently updated flash blocks
Can we write back the mapping back for each commit?- Ordering cost (waiting for mapping table persistence)- Mapping persistent is not atomic
16
• Block Zones– Free block: all pages are free– Available block: pages are available for allocation– Unavailable block: all pages have been written to but
some pages belong to (1) a live tx, or (2) a dead tx but has at least one page in some available block
– Checkpointed block: all pages have been written to and all pages belong to dead txs
• Respectively, we have Free, Available, Unavailable and Checkpointed Zones.
17
• Checkpoint– Periodically write back the mapping table (making the txs
dead)– And, sliding the zones (available + unavailable)
• Zone Sliding– Check all blocks in available and unavailable zones
• Move the block to the checkpointed zone if the block is checkpointed
• Move the block to the unavailable zone if the block is unavailable
– Pre-allocate free blocks to the available zone– Garbage collection is only performed on the checkpointed
zone
18
(1) Available Zone Updating
Tx4, 0Tx4, 0
Tx3, 0Tx4, 0
Tx4, 0
Tx3, 0Tx3, 0Tx3, 4
Tx5, 0
Tx5, 0Tx5, 0
2-0 4-0 6-1 6-2
2-2 7-0 2-3 4-1
3-3 5-0 5-1
Tx3
Tx4
Tx5
2-0 4-0 6-1 6-2Tx3
19
(2) Zone Sliding
Tx4, 0Tx4, 0
Tx3, 0Tx4, 0
Tx4, 0
Tx3, 0Tx3, 0Tx3, 4
Tx5, 0
Tx5, 0Tx5, 0
2-0 4-0 6-1 6-2
2-2 7-0 2-3 4-1
3-3 5-0 5-1
Tx3
Tx4
Tx5
Tx4, 5Tx5, 0
Tx6, 0Tx6, 0 Tx7, 0
Tx9, 0 Tx8, 0
4-2 4-3
7-1 7-2
Tx6
Tx7
Tx8
Tx9
6-3
5-27-3
Tx8, 0
Abort Tx5
2-0 4-0 6-1 6-2Tx3
3-3 5-0 5-1Tx5
2-2 7-0 2-3 4-1Tx4 7-3
Tx7, 2
Tx8, 3Tx9, 0
8-0
9-07-1 7-2
Tx7
Tx8
6-3 8-0
9-0
Tx4, 0Tx4, 0
Tx5, 0
9-1
20
(3) Zone Sliding
Tx4, 0Tx4, 0
Tx3, 0Tx4, 0
Tx4, 0
Tx3, 0Tx3, 0Tx3, 4
Tx5, 0
Tx5, 0Tx5, 0
2-0 4-0 6-1 6-2
2-2 7-0 2-3 4-1
3-3 5-0 5-1
Tx3
Tx4
Tx5
Tx4, 5Tx5, 0
Tx6, 0Tx6, 0 Tx7, 0
Tx9, 0 Tx8, 0
4-2 4-3
7-1 7-2
Tx6
Tx7
Tx8
Tx9
6-3
5-27-3
Tx8, 0
Abort Tx5
2-0 4-0 6-1 6-2Tx3
3-3 5-0 5-1Tx5
2-2 7-0 2-3 4-1Tx4 7-3
Tx7, 2
Tx8, 3Tx9, 0
8-0
9-07-1 7-2
Tx7
Tx8
6-3 8-0
9-0
9-1
Tx4, 0Tx4, 0
Tx5, 0
Tx3, 0Tx4, 0
Tx4, 0
Tx3, 0Tx3, 0Tx3, 4
Tx5, 0Tx5, 0
Tx4, 5Tx5, 0
Tx6, 0Tx6, 0 Tx7, 0
Tx9, 0Tx8, 0Tx8, 0
Tx4, 0Tx4, 0
Tx5, 0
21
(4) System Failure
Tx4, 0Tx4, 0
Tx3, 0Tx4, 0
Tx4, 0
Tx3, 0Tx3, 0Tx3, 4
Tx5, 0
Tx5, 0Tx5, 0
2-0 4-0 6-1 6-2
2-2 7-0 2-3 4-1
3-3 5-0 5-1
Tx3
Tx4
Tx5
Tx4, 5Tx5, 0
Tx6, 0Tx6, 0 Tx7, 0
Tx9, 0 Tx8, 0
4-2 4-3
7-1 7-2
Tx6
Tx7
Tx8
Tx9
6-3
5-27-3
Tx8, 0
Abort Tx5
2-0 4-0 6-1 6-2Tx3
3-3 5-0 5-1Tx5
2-2 7-0 2-3 4-1Tx4 7-3
Tx7, 2
Tx8, 3Tx9, 0
8-0
9-07-1 7-2
Tx7
Tx8
6-3 8-0
9-0
9-1
Tx4, 0Tx4, 0
Tx5, 0
Tx3, 0Tx4, 0
Tx4, 0
Tx3, 0Tx3, 0Tx3, 4
Tx5, 0Tx5, 0
Tx4, 5Tx5, 0
Tx6, 0Tx6, 0 Tx7, 0
Tx9, 0Tx8, 0Tx8, 0
Tx4, 0Tx4, 0
Tx5, 0
8-1 11-0
Tx10
Tx11
10-0
11-18-1 11-0Tx11 11-1
Tx11, 0Tx10, 0
Tx11, 3Tx11, 0
Tx9, 0
9-2
22
Recovery
Tx3, 0Tx4, 0
Tx4, 0
Tx4, 5
Tx3, 0Tx3, 0Tx3, 4
Tx5, 0Tx5, 0
Tx5, 0
Tx6, 0Tx6, 0 Tx7, 0
Tx9, 0Tx8, 0Tx8, 0
Tx7, 2
Tx8, 3
Tx11, 0Tx10, 0
Tx11, 3Tx11, 0
Tx9, 0Tx9, 0
Tx3, 0Tx4, 0
Tx4, 0
Tx4, 5
Tx3, 0Tx3, 0Tx3, 4
Tx5, 0Tx5, 0
Tx5, 0
Tx7, 0
Tx9, 0Tx8, 0Tx8, 0
7-1 7-2
Tx7
Tx8
Tx9
6-3
5-2
8-1 11-0
Tx10
Tx11
10-0
11-1
8-0
9-1 9-2
9-0• Scan the available zone
• Scan the unavailable zone
23
• Recovery– Scan the available zone
• If TxCnt matches, completed tx• If not, add the tx to the pending list
– Scan the unavailable zone• If TxID in the pending list, check TxCnt again. If TxCnt
matches, completed tx• If txID not in the pending list, discard it• If TxCnt still doesn’t match, uncompleted tx
– Replay with the sequence of TxVer
24
Outline
• Executive Summary• Background• LightTx Design• Evaluation• Conclusions
25
Experimental Setup
• SSD simulator– SSD add-on from
Microsoft on DiskSim– Parameters from Samsung
K9F8G08UXM NAND flash• Trace
– TPC-C benchmark: DBT2 on PostgreSQL 8.4.10
26
Flexibility
(1) For a given isolation level, LightTx provides as good or better tx throughput than other protocols.
(2) In LightTx, no-page-conflict and serialization isolation improve throughput by 19.6% and 20.6% over strict isolation.
27
Garbage Collection Cost
Atomic-Write SCC BPCC LightTx0
5
10
15
20
25
30
35
40
45
Nor
mal
ized
Gar
bage
Col
lect
ion
Cos
tTransaction Protocol
Abort Ratio (0%) Abort Ratio (10%) Abort Ratio (20%) Abort Ratio (50%)
Atomic-Write SCC BPCC LightTx0
100
200
300
400
500
600
Tra
nsac
tion
Thr
ough
put (
txs/
s)
Transaction Protocol
Abort Ratio (0%) Abort Ratio (10%) Abort Ratio (20%) Abort Ratio (50%)
(1) LightTx significantly outperforms SCC/BPCC when abort ratio is not zero.
(2)Garbage collection overhead in SCC/BPCC goes extremely high when abort ratio goes up.
28
Recovery Time and Persistence Overhead
4M 16M 64M 256M 1G0.00
0.05
0.10
0.15
0.20
0.25
0.30
6
7
8
Rec
over
y Tim
e (s
econ
ds)
Size of the Available Zone (byte)
Atomic-Write SCC/BPCC LightTx
4M 16M 64M 256M 1G
0
1
2
3
4
5
Map
ping
Per
sist
ence
Ove
rhea
d (%
)Size of the Available Zone (byte)
Atomic-Write SCC/BPCC LightTx
LightTx achieves fast recovery with low mapping persistence overhead.
29
Outline
• Executive Summary• Background• LightTx Design• Evaluation• Conclusions
30
Conclusion• Problem: Flash-based SSDs support transactions naturally (with out-of-
place updates) but inefficiently:– Only a limited set of isolation levels are supported (inflexible)– Identifying transaction status is costly (heavyweight)
• Goal: a lightweight design to support flexible transactions• Observations and Key Ideas:
– Simultaneous updates can be written to different physical pages, and the FTL mapping table determines the ordering=> (Flexibility) make commit protocol page-independent
– Transactions have birth and death, and the near-logged update way enables efficient tracking=> (Lightweight) track recently updated flash blocks, and retire the dead transactions
• Results: up to 20.6% performance improvement, stable GC overhead, fast recovery with negligible persistence overhead
31
A Lightweight Transactional Design in Flash-based SSDs to Support Flexible Transactions
Youyou Lu1, Jiwu Shu1, Jia Guo1, Shuai Li1, Onur Mutlu2
LightTx:
1Tsinghua University2Carnegie Mellon University
Thanks
32
Backup Slides
33
Existing Approaches (1)• Atomic Writes
– Log-structured FTL– Transaction state: <00…1>
[HPCA’11] Beyond block i/o: Rethinking traditional storage primitives
+ No logging, no commit record+ No tx state maintenance cost- Poor parallelism- Mapping persistence overhead
34
Existing Approaches (2)
A B C D E
1
2
3
4
Version number
Page• SCC/BPCC (Cyclic
commit protocol)– Use pointers in
the page metadata to put all pages in a cycle for each tx
[OSDI’08] Transactional flash
35
• SCC– Block erase
forced for aborted pages
A B C D E
1
2
3
4
Version number
Page
- Low garbage collection efficiency: lots of data moves due to forced block erase
36
SRS(D3) = {C2, D2}SRS(E3) = {D2}SRS(D4) = {C2, D2, D3}
A B C D E
1
2
3
4
Version number
Page
• BPCC– SRS: Straddle
Responsibility Set– Erasable only after
SRS is empty- Complex and costly SRS updates- Low garbage collection efficiency: wait until SRS is empty
37
+ No logging, no commit record+ Improved parallelism- Limited parallelism- High tx state maintenance overhead
+ No logging, no commit record+ No tx state maintenance overhead- Poor parallelism- Mapping persistence overheadProblems:• Tx support is inflexible (limited parallelism)
– Cannot meet the flexible demands from software– Cannot fully exploit the internal parallelism of SSDs
• Tx state tracking causes high overhead in the deviceA lightweight design to support flexible transactions
Atomic Writes SCC/BPCC