Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
1
The Direct Access File System (DAFS)
Matt DeBergalis, Peter Corbett, Steve Kleiman, Arthur Lent, Dave Noveck, Tom Talpey, Mark Wittle
Network Appliance, Inc.
Usenix FAST ’03
Tom Talpey
2Usenix FAST ‘03
Outline
4 DAFS
4 DAT / RDMA
4 DAFS API
4 Benchmark results
3Usenix FAST ‘03
DAFS – Direct Access File System
4 File access protocol, based on NFSv4 and RDMA, designed specifically for high-performance data center file sharing (local sharing)
4 Low latency, high throughput, and low overhead
4 Semantics for clustered file sharing environment
4Usenix FAST ‘03
DAFS Design Points
4 Designed for high performance– Minimize client-side overhead– Base protocol: remote DMA, flow control– Operations: batch I/O, cache hints, chaining
4 Direct application access to transport resources– Transfers file data directly to application buffers– Bypasses operating system overhead– File semantics
4 Improved semantics to enable local file sharing– Superset of CIFS, NFSv3, NFSv4 (and local file systems!)– Consistent high-speed locking– Graceful client and server failover, cluster fencing
4 http://www.dafscollaborative.org
5Usenix FAST ‘03
DAFS Protocol
4 Session-based
4 Strong authentication
4 Message format optimized
4 Multiple data transfer models
4 Batch I/O
4 Cache hints
4 Chaining
6Usenix FAST ‘03
DAFS Protocol Enhanced Semantics
4 Rich locking
4 Cluster fencing
4 Shared key reservations
4 Exactly-once failure semantics
4 Append mode, Create-unlinked, Delete-on-last-close
7Usenix FAST ‘03
DAT – Direct Access Transport
4 Common requirements and an abstraction of services for RDMA - Remote Direct Memory Access
– Portable, high-performance transport underpinning for DAFS and applications
– Defines communications endpoints, transfer semantics, memory description, signalling, etc.
4 Transfer models:– Send (like traditional network flow)– RDMA Write (write directly to advertised peer memory)– RDMA Read (read from advertised peer memory)
4 Transport independent– 1 Gb/s VI/IP, 10 Gb/s InfiniBand, future RDMA over IP
4 http://www.datcollaborative.org
8Usenix FAST ‘03
DAFS Inline Read
READ_INLINE
ApplicationBuffer
Send Descriptor
ReceiveDescriptor
Client
REPLY
ServerBuffer
Send Descriptor
ReceiveDescriptor
Server
READ_INLINE
REPLY
1
23
9Usenix FAST ‘03
DAFS Direct Read
READ_DIRECT
ApplicationBuffer
Send Descriptor
ReceiveDescriptor
Client
REPLY
ServerBuffer
Send Descriptor
ReceiveDescriptor
Server
READ_DIRECT
REPLY
1
2
3
RDMA Write
10Usenix FAST ‘03
DAFS Inline Write
WRITE_INLINE
ApplicationBuffer
Send Descriptor
ReceiveDescriptor
Client
REPLY
ServerBuffer
Send Descriptor
ReceiveDescriptor
Server
WRITE_INLINE
REPLY
1
23
11Usenix FAST ‘03
DAFS Direct Write
WRITE_DIRECT
ApplicationBuffer
Send Descriptor
ReceiveDescriptor
Client
REPLY
ServerBuffer
Send Descriptor
ReceiveDescriptor
Server
WRITE_DIRECT
REPLY
1
2
3
RDMA Read
12Usenix FAST ‘03
DAFS-enabled Applications
Raw Device Adapter
Disk I/OSyscalls
Application(unchanged)
Buffers
Device Driver
DAFS Library
DAT Provider Library
NIC Driver
RDMA NIC
• Kernel-level plug-in• Looks like raw disk• App uses standard
disk I/O calls• Very limited access to
DAFS features• Performance similar
to direct-attached disk
Kernel File System
File I/OSyscalls
Application(unchanged)
Buffers
File System
DAFS Library
DAT Provider Library
NIC Driver
RDMA NIC
• Kernel-level plug-in• Peer to local FS• App uses standard
file I/O semantics• Limited access to
DAFS features• Performance similar
to local FS
User Library
Application(modified)
Buffers
RDNA NIC
DAFS Library
DAT Provider Library
NIC Driver
UserSpace
OSKernel
H/W
• User-level library• Best performance• Full application
access to DAFS semantics
• Paper focuses on this style
UserSpace
OSKernel
H/W
DAFS APICalls
13Usenix FAST ‘03
DAFS API
4 File based: exports DAFS semantics4 Designed for highest application performance4 Lowest client CPU requirements of any I/O system4 Rich semantics that meet or exceed local file system
capabilities4 Portable and consistent interface and semantics
across platforms– No need for different mount options, caching policies,
client-side SCSI commands, etc.– DAFS API interface is completely specified in an open
standard document, not in OS-specific documentation
4 Operating system avoidance
14Usenix FAST ‘03
The DAFS API
4 Why a new API?– Backward compatibility with POSIX is fruitless
• File descriptor sharing, signals, fork()/exec()– Performance
• RDMA (memory registration), completion groups– New semantics
• Batch I/O, cache hints, named attributes, open with key, delete on last close
– Portability• OS independence and semantic consistency
15Usenix FAST ‘03
Key DAFS API Features
4 Asynchronous– High performance interfaces support native asynchronous
file I/O– Many I/Os can be issued and awaited concurrently
4 Memory registration– Efficiently prewires application data buffers, permitting
RDMA (direct data placement)
4 Extended semantics– Batch I/O, delete on last close, open with key, cluster
fencing, locking primitives
4 Flexible completion model– Completion groups segregate related I/O– Applications can wait on specific requests, any of a set, or
any number of a set
16Usenix FAST ‘03
Key DAFS API Features
4 Batch I/O– Essentially free I/O: amortizes costs of I/O issue over many
requests– Asynchronous notification of any number of completions– Scatter/gather file regions and memory regions
independently– Support for high-latency operations– Cache hints
4 Security and authentication– Credentials for multiple users– Varying levels of client authentication: none, default,
plaintext password, HOSTKEY, Kerberos V, GSS-API
4 Abstraction– server discovery, transient failure and recovery, failover,
multipathing
17Usenix FAST ‘03
Benchmarks
4 Microbenchmarks to measure throughput and cost per operation of DAFS versus traditional network I/O
4 Application benchmark to demonstrate value of modifying application to use DAFS API
18Usenix FAST ‘03
Benchmark Configuration
4 User-space DAFS library, VI provider4 NetApp F840 Server, fully cached workload
– Adapters (GbE):• Intel PRO/1000• Emulex GN9000 VI/TCP
– NFSv3/UDP, DAFS
4 Sun 280R client– Adapters:
• Sun “Gem 2.0”• Emulex GN9000 VI/TCP
4 Point-to-point connections
19Usenix FAST ‘03
Microbenchmarks
4 Measures read performance4 NFS kernel versus DAFS user4 Asynchronous and Synchronous4 Throughput versus blocksize4 Throughput versus CPU time4 DAFS advantages are evident:
– Increased throughput– Constant overhead per operation
20Usenix FAST ‘03
Microbenchmark Results
21Usenix FAST ‘03
Application (GNU gzip)
4 Demonstrates benefit of user I/O parallelism4 Read, compress, write 550MB file4 Gzip modified to use DAFS API
– Memory preregistration, asynchronous read and write
4 16KB blocksize4 1 CPU, 1 process: DAFS advantage4 2 CPUs, 2 processes: DAFS 2x speedup
22Usenix FAST ‘03
GNU gzip Runtimes
23Usenix FAST ‘03
Conclusion
4 DAFS protocol enables high-performance local file sharing
4 DAFS API leverages benefit of user space I/O
4 The combination yields significant performance gains for I/O intensive applications