Aerie: Flexible File-System Interfaces to Storage-Class Memory
Haris Volos† Sanketh Nalli, Sankaralingam Panneerselvam, Venkatanathan Varadarajan, Prashant Saxena,
Michael M. Swift
HP Labs †
• Persistent • Short access time
2
Software overhead matters
Storage-Class Memory (SCM) Flash-backed DRAM
Flash
Phase-Change Memory
Spin Torque MRAM Resistive RAM
ns μs
Latency
3
Memory Controller
SCM DRAM
Storage-Class Memory (SCM)
• Persistent • Short access time • Byte addressable
Accessible via loads/stores
No software overhead
• Direct user-mode access for fast access to data – Moneta-D, PMFS, Quill,
NV-Heaps, Mnemosyne
• File system for sharing
– Shared namespace – Protection – Integrity
4
OS/ File System
Application
Accessing SCM today
+
Device Driver Device Driver
Generic Block Layer
I/O Scheduler
SCM FS
App
Virtual File System
App
Disk FS
I/O Bus
Does SCM need a kernel FS?
5
MMU protects CPU access
DMA is not protected
Variable latency to disk
~ Constant latency to SCM
Load/store interface
No standard interface
SCM Disk
• Enable implementation flexibility – Optimize file-system interface semantics – Optimize operations regarding metadata
Library file systems (libFS)
6
APP
LibFS API
[Exokernel (MIT), Nemesis (Cambridge)]
Aerie libFS in a nutshell
7
APP
LibFS (functionality)
Safely multiplex SCM
User
Kernel HW
API
APP
LibFS (layout, logic)
API
APP
OtherLibFS (layout, logic)
API
APP APP
LibFS (layout, logic)
API
APP
OtherLibFS (layout, logic)
API
Safely multiplex SCM
8
API
User
HW
/
common bob alice
common alice bob /
Aerie libFS in a nutshell
open
Kernel
Outline
• Overview • Motivation: Interface flexibility • Aerie: In-memory library file systems • Evaluation • Conclusion
9
• Universal abstraction: Everything is a file – Has generic-overhead cost
10
Application
POSIX File (Virtual File System)
Storage File IPC Network
Socket
POSIX File: Expensive abstraction
• Rigid interface and policies – Has fixed components and costs – Hinders application-specific customization
11
Application
POSIX File (Virtual File System)
UNIX concurrency semantics
Hierarchical names
Byte streams
Permissions
POSIX File: Expensive abstraction
POSIX File: Expensive abstraction
• Rigid interface and policies – Has fixed components and costs – Hinders application-specific customization
Application
POSIX File (Virtual File System)
open() Syscall Naming + Permissions
In-memory state (Byte streams)
Concurrency control
File Descriptors
~ 2.5 μs ≈ 25x SCM latency
File System
Motivating Example: Web Proxy
13
Web Proxy Cache
/cache
Characteristics • Flat namespace • Immutable files • Infrequent sharing
Motivating Example: Web Proxy
14
Web Proxy Cache
Proxy FS POSIX FS
Customizing the file system today
• Modify the kernel
• Add a layer over existing kernel file system • Use a user-mode framework such as FUSE
15
Cumbersome options
• Software interface overheads handicap fast SCM
• Flexible interface is a must for fast SCM
• Library file systems can help remove generic software overheads
16
Flexible interfaces more important than ever
Outline
• Overview • Motivation: Interface flexibility • Aerie: In-memory library file systems (libFS) • Evaluation • Conclusion
17
Kernel safely multiplexes SCM
18
allocation, protection, addressing
Kernel HW
• Allocation: Allocates SCM regions (i.e. extents)
• Protection: Keeps track of region access rights
• Addressing: Memory-maps SCM regions
<extent>
Library implements functionality
19
APP
LibFS (layout, logic)
allocation, protection, addressing
User
Kernel HW
API
APP
LibFS (layout, logic)
API
APP
OtherLibFS (layout, logic)
API
Implementing file-system features
• File-system objects
• Shared namespace
• Protection (access control)
• Integrity
20
File-system objects build on SCM extents
• Collection (or directory) – key → object ID (oid)
• mFile (or memory file) – Offset → data extent ID
21
Shared namespace
22
APP
LibFS (layout, logic)
allocation, protection, addressing
User
Kernel HW
API
APP
LibFS (layout, logic)
API
APP
OtherLibFS (layout, logic)
API
common alice bob /
23
APP
libFS
User
HW
APP
/
common bob alice
/
common bob alice
common alice bob /
Shared namespace
libFS
open
24
APP
libFS
User
HW
APP
common alice bob /
Decentralize access control via hardware-enforced permissions
libFS /
common bob alice
/
common bob alice Memory protection prevents Bob from accessing Alice’s files
25
APP
libFS
User
HW
APP
libFS /
shared alice
alice bob /
Hardware protection cannot guarantee integrity
common
/
common bob
foo
bar
Trusted FS
Service (TFS)
26
APP
LibFS (layout, logic)
API
User
Kernel HW
APP
LibFS (layout, logic)
API
Integrity via Trusted File Service
allocation, protection, addressing
27
HW
SCM
APP
LibFS API
File data Metadata
Read/ Write Read
Read/Write
APP
LibFS (layout, logic)
API
User
Trusted FS
Service (TFS)
Metadata Server (integrity)
Update
Decentralized architecture
RPC
28
User
HW
SCM
APP
LibFS API
File data Metadata
Read/ Write Read
Read/Write
APP
API
Trusted FS
Service (TFS)
Metadata Server (integrity)
Lease Manager
(sharing) Log Update
Reducing communication: Hierarchical leases + Batching
RPC
29
User
HW
SCM
APP
LibFS API
File data Metadata
Read/ Write Read
Read/Write
APP
API
Trusted FS
Service (TFS)
Metadata Server (integrity)
Lease Manager
(sharing) Log Free blocks
Update
Reducing communication: Hierarchical leases + Batching
RPC
common
adjust
Contention
30
User
HW
SCM
APP
LibFS API
File data Metadata
Read/ Write Read
Read/Write
APP
API
Trusted FS
Service (TFS)
Metadata Server (integrity)
Lease Manager
(sharing) Log Free blocks
Update
Reducing communication: Hierarchical leases + Batching
RPC
Prototype Implementation
• Extent API by Linux 3.2.2 x86-64 kernel modifications
• Communication via loopback RPC • Crash consistency through
– x86 CLFLUSH instruction (cache line flush) – Redo logging
• SCM emulation using DRAM
31
32
HW
SCM
APP
LibFS libFS
File data Metadata
WRITE READ (/common)
APP
Trusted FS
Service (TFS)
Metadata Server (integrity)
Lease Manager
(sharing) Log Free blocks
Example: A shared file
creat(/common/foo) LOCK (/common)
CreateFile
LinkBlock
write(...)
RPC
33
HW
SCM File data Metadata
RELEASE (/shared)
Trusted FS
Service (TFS)
Metadata Server (integrity)
Lease Manager
(sharing)
Example: A shared file
LOCK (/common)
APP
LibFS libFS
APP
LibFS
read(/common/foo)
READ (/common/foo) READ (/common)
API RPC
LOCK (foo)
File Systems
Functionality: PXFS • POSIX interface:
open/read/write/unlink • Hierarchical namespace • POSIX concurrency
semantics • File byte streams
34
File Systems
Functionality: PXFS • POSIX interface:
open/read/write/unlink • Hierarchical namespace • POSIX concurrency
semantics • File byte streams
Optimization: FlatFS • Key-value interface:
put/get/erase • Flat namespace
– Simplifies name resolution
• KV-store concurrency semantics – Reduce in-memory state
• Short, immutable files – Simplify storage allocation
35
File Systems
36
PXFS
APP
FlatFS
APP
Performance Evaluation
• Performance model – Writes to DRAM + software created delay – Reads to DRAM
• Configurations – RamFS: In-memory kernel FS – Ext4: ext4fs + RAM-disk – LibFS: PXFS and FlatFS
• Filebench workloads: Fileserver, Webserver, Webproxy
37
Application-workload performance
0
5
10
15
20
Fileserver Webserver Webproxy
Late
ncy
per o
p
( μs
)
RamFS
ext4
PXFS
FlatFS
38
• PXFS performs better than kernel-mode FS • FlatFS exploits app semantics to improve performance
10%
9% 22%
53% (2.1x) 30%
1
10
Late
ncy
per o
p ( μs
)
PXFS
ext4
FlatFS
Sensitivity to SCM performance: Webproxy
• Shorter SCM latencies favor – Direct access via load/store instructions – Interface specialization
39
0 100 1000 10000 Extra software delay ( ns )
Scalability: Webproxy
0200400600800
10001200
0 5 10
Thro
ughp
ut (K
ops/
s)
# Threads
PXFSRamFSext4FlatFS
• FlatFS retains its benefits over kernel-mode file systems
Machine: Intel 6-core 2-way HT
• Software interface overheads handicap fast SCM
• Flexible interface is a must for fast SCM
• Aerie: Library file systems help remove generic overheads for higher performance – FlatFS improves performance by up to 110%
41
Conclusion
Thank you! Questions?