31
1 Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison

Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

1

Yiying Zhang, Leo Prasath Aruraj,

Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau

University of Wisconsin - Madison

Page 2: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

Indirection Reference an object with a different name

Flexible, simple, and modular

Indirection in computer systems Virtual memory: virtual to physical memory address

Hard disks: bad sectors to nearby locations

RAID arrays: logical to array physical address

SSDs: logical to SSD physical address

2 * Usually attributed to Butler Lampson

Page 3: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

Excess indirection Redundant levels of indirection in a system

e.g. OS on top of hypervisor(s)

e.g. File system on top of RAID

Are all indirections really necessary? Some indirection can be removed

Space and performance cost

What about flash-based SSDs? File system: file offset to logical address (F -> L)

Device: logical address to physical address (L -> P)

3

B

A

C

C L

F

P

Page 4: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

Indirection in SSDs (L->P) Mapping from logical to physical address

Hides erase-before-write and wear leveling

Implemented in Flash Translation Layer (FTL)

Cost of indirection RAM space to maintain indirection table

Hybrid: small page-mapped area + big block-mapped area

Performance cost of garbage collection

Performance impact on random writes [Kim ’12]

4 4

FTL L -> P

P

FS

SSD

LBA+Data

L

Page 5: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

Solution: De-indirection Remove indirection in SSDs (L->P) Store physical addresses directly in file system (F->P)

New interface: Nameless Write

Write without a name (logical address) Device allocates and returns physical address File system stores physical address

Advantages

Reduces space and performance cost of indirection Device maintains critical controls

5

L

F

P

Page 6: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

Designed nameless writing interfaces

Implemented a nameless-writing system Built a nameless-writing SSD emulator

Ported ext3 to nameless writes

Evaluation results Evaluated against two other FTLs

Small indirection table, ~20x reduction over traditional SSDs

Better random write throughput, ~20x over traditional SSDs

6

Page 7: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

Introduction

Problems of basic interfaces and solutions

Nameless-writing device and ext3

Results

Conclusion

7

Page 8: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

8

Nameless Write Writes only data and no name

Physical Read Reads using physical address

Free/Trim Invalidates block at physical address

FTL

P

P

FS

SSD

P Data

P: Addr the structure points to

P: Addr of the block

Page 9: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

P1: Cost of straw-man nameless-write approach How to reduce the overheads of complete de-indirection?

P2: Migration during wear leveling How to reflect physical address change in the file system?

P3: Locating metadata structures

How to find metadata structures efficiently?

9

Page 10: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

Overwrite a data block in a file in ext3

10

P0

P0 FS

SSD

P0

P1 P2

P1 P2

Data Inode Directory

Metadata

Data

Red: Addr the structure points to

Blue: Addr of the block

Problems Overhead of updating along FS tree

FS more complex and less flexible

P1+ offset

Page 11: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

Problem of recursive updates Writes propagate to reflect physical addresses

Solution: Two segments of address space

Stop recursive updates

Physical address space Nameless write, physical read Contains data blocks

Virtual address space Traditional (virtual) read/write Small indirection table in device Contains metadata blocks (typically small metadata [Agrawal’07])

11

Page 12: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

12

Data Blocks Super

Inode Bitmap

Journal

Data Bitmap

Inode

L -> P

Dir

FTL allocates physical addresses

Data + logical address Data

Virtual address space Physical address space

SSD physical flash memory

Physical Address

Page 13: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

13

P0

P0 FS

SSD

P0

P1

I#

Data Inode Directory

L1 -> P1

Inode + L1

Overwrite a data block with segmented address space

Advantages One level of update propagation

Simple implementation

Metadata

Data

Red: Addr the structure points to

Blue: Addr of the block

Page 14: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

Block wear in SSDs Uneven wear among blocks with data of different access frequency

Wear leveling SSD moves data to distribute block erases evenly

Physical address change File system needs to be informed

Only address change in the physical space

14

P1

P1

P1

FS

SSD

P2

Read P1

P1

Page 15: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

New interface: Migration Callbacks Device informs FS about physical address change

Temporary remapping table

Reads and overwrites to old address Remapped to new address

FS processes callbacks in background Acknowledges device when metadata updated

15

FTL

P1

P1

FS

SSD

P1 ->P2

P2

P2

P1 -> P2

Ack

Page 16: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

Problem: Locating metadata structures e.g. During callbacks

e.g. During recovery

Naive approach: traversing all metadata

Solution: Associated Metadata Small amount of metadata used to locate metadata

e.g. Inode number, inode generation number, block offset

Sent with nameless writes and migration callbacks

Stored adjacent to data pages on device, e.g. OOB area

16

Page 17: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

17

P1

FS

SSD

P1->P2, Assoc Meta1 P3->P4, Assoc Meta2 P5->P6, Assoc Meta 3

P2

Ack P1->P2 P3->P4 P5->P6

Hash Table Key: Assoc Metadata

Value: Callback Entries

P1->P2 P3->P4 P5->P6

P3 P4

P5 P6

Inode1 Inode2

Xact Commit

Callback Entry: Old physical addr New physical addr Associated metadata Timestamp

P1 P3 P5

P2 P4 P6

Page 18: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

Introduction

Nameless write interfaces

Results

Conclusion 18

Page 19: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

Supports nameless write interfaces

Flexible device allocation

Maintains small mapping table Indirection of the virtual address space

Temporary remapping table for callbacks

Control of garbage collection and wear leveling Minimize physical address migration (In-place GC)

19

Page 20: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

Ext3: Journaling file system extending ext2

Ordered journal mode Metadata always written after data

Fits well with nameless writes

Interface support Segmented address space

Nameless write

Physical read

Free/trim

Callback

20

Page 21: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

Total: 4360

Ext3: 1530

JBD: 480

Generic I/O: 2020

Headers: 340

21

Page 22: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

Introduction

Nameless write interfaces

Nameless-writing device and ext3

Conclusion 22

Page 23: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

SSD emulator Linux pseudo block device

Data stored in memory

FTLs studied

Page mapping: log-structured allocation

ideal in performance, unrealistic in indirection space

Hybrid mapping: small page-mapped area + block-mapped area

models real SSDs, realistic in indirection space

Nameless-writing

23

Page 24: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

24

Mapping table sizes for typical file system images [Agrawal’09]

Nameless writes use 2% - 7% mapping table space of traditional hybrid SSDs

100

1024

11 118

0.2 2.2

Page 25: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

Sequential and sustained 4KB random write

25 25

Nameless writes deliver 20x random write throughput over traditional hybrid SSDs

Performance of nameless writes is close to page FTL (upper-bound)

Page 26: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

Varmail, FileServer, and WebServer from Filebench

26

Similar performance when workload is read or sequential-write intensive

Performance of hybrid FTL is worse than the other two FTLs when workload has random writes

Page 27: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

Introduction

Nameless write interfaces

Nameless-writing device and ext3

Results

27

Page 28: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

Problem: Excess indirection in flash-based SSDs

Solution: De-indirection with Nameless Writes

Implementation of a nameless-writing system Built an emulated nameless-writing SSD

Ported ext3 to nameless writes

Advantages of nameless writes

Reduce the space cost of indirection over traditional SSDs

Improve random write performance over traditional SSDs Reduce energy cost, simplify SSD firmware

28

Page 29: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

“All problems in computer science can be solved by another level of indirection” Usually attributed to Butler Lampson Lampson attributes it to David Wheeler

And Wheeler usually added: “but that usually will create another problem”

29

Page 30: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

Too much: Excess indirection e.g. file offset => logical address => physical address

Partial indirection e.g. nameless writes with segmented address space

Too little: Cost of (complete) de-indirection e.g. overheads of recursive update

30

Page 31: Yiying Zhang, Leo Prasath Aruraj, Andrea C. Arpaci - Usenix

31

Thank you !

Questions ?

The ADvanced Systems Laboratory (ADSL) http://www.cs.wisc.edu/adsl/