View
220
Download
0
Tags:
Embed Size (px)
Citation preview
Federated DAFS: Scalable Cluster-based Direct Access File Servers
Murali Rangarajan, Suresh GopalakrishnanAshok Arumugam, Rabita Sarker
Rutgers University
Liviu Iftode
University of Maryland
SAN-2 Disco Lab
2
Network File Servers
OS involvement increases latency & overhead
TCP/UDP protocol processing
Memory-to-memory copying
NFS
CLIENTS
FILESERVER
TCP/IP
SAN-2 Disco Lab
3
Application Application
NIC
OS
Send Receive
NIC
Application has direct access to the network interface
OS involved only in connection setup to ensure protection
Performance benefits: zero-copy, low-overhead
User-level Memory Mapped Communication
OS OS
SAN-2 Disco Lab
4
SENDQUEUE
KernelAgent
VI Provider Library
VI NIC
Data transfer from user-space
Setup & Memory registration through kernel
Communication models Send/Receive: a pair of
descriptor queues Remote DMA: receive
operation not required
Set
up
&M
emor
y re
gist
rati
on
Application
RECVQUEUE
COMPQUEUE
Virtual Interface Architecture
SAN-2 Disco Lab
5
ApplicationBuffers
DAFS Client
VIPLUser
Kernel
NIC
VI NICDriver
DAFS Server
NIC
File access API
DAFS File ServerBuffers Driver
VI NICDriver
KVIPL
Direct Access File System Model
DAFS File ServerBuffers
VI NICDriver
VIPL
SAN-2 Disco Lab
6
Goal: High-performance DAFS Server
Cluster-based DAFS Server Direct access to network-attached storage
distributed across server cluster Clusters of commodity computers - Good
performance at low cost User-level communication for server
clustering Low-overhead mechanism Lightweight protocol for file access across
cluster
SAN-2 Disco Lab
7
Outline Portable DAFS client and server
implementation Clustering DAFS servers – Federated DAFS Performance Evaluation
SAN-2 Disco Lab
8
User-space DAFS Implementation
DAFS client and server in user-space
DAFS API primitives translate to RPCs on server
Staged Event Driven Architecture
Portable across Linux, FreeBSD and Solaris
DAFS Server
VI Network
Application
VI Local FSVI
DAFS ClientDAFS Server
Application
VI VI
DAFS Client
VI Network Local FS
DAFS API Request DAFS Server
Application
VI VI
DAFS Client
VI Network Local FS
DAFS API Response
SAN-2 Disco Lab
9
DAFS Server
CLIENT
ConnectionManager
Protocol Threads
SERVER
Connection RequestConnection RequestConnection RequestConnection Request
DAFS API RequestDAFS API RequestDAFS API Request
Response
SAN-2 Disco Lab
10
Client-Server Communication
VI channel established at client initialization VIA Send/Receive used except for dafs_read Zero-copy data transfers
Emulation of RDMA Read used for dafs_read Scatter/gather I/O used in dafs_write
DAFS Server
VI Network
Application
VI VIDAFS Client
Local FS
dafs_read(file, buf)buf
DAFS Server
req
VI NetworkVI VI
DAFS Client
Request
Local FS
dafs_read(file, buf)buf
DAFS Server
VI Network VIVI
DAFS Client
Response
Local FS
dafs_write(file, buf)
DAFS Server
bufreq
VI Network
DAFS Client
VIVI Local FS
SAN-2 Disco Lab
11
Asynchronous I/O Implementation
Applications use I/O descriptors to submit asynchronous read/write requests
Read/write call returns immediately to application
Result stored in I/O descriptor on completion
Applications need to use I/O descriptors to wait/poll for completion
SAN-2 Disco Lab
12
Benefits of Clustering
Local FSVI
DAFS Server
Application
VIDAFS Client
• • •
Application
VIDAFS Client
Application
VIDAFS Client
Single DAFS Server
Local FSVI
DAFS Server
Application
VI
DAFS Client
Local FSVI
DAFS Server
Local FSVI
DAFS Server
Application
VI
DAFS Client
Application
VI
DAFS Client
Standalone DAFS Servers on a Cluster
Local FSVI
DAFS Server
Application
VI
DAFS Client
Local FSVI
DAFS Server
Local FSVI
DAFS Server
Application
VI
DAFS Client
Application
VI
DAFS Client
Standalone DAFS Servers on a Cluster
Local FSVI
DAFS ServerApplication
VIDAFS Client
Application
VIDAFS Client
Application
VIDAFS Client
Clustered DAFS Servers
Clustering Layer
Local FSVI
DAFS Server
Clustering Layer
Local FSVI
DAFS Server
Clustering Layer
SAN-2 Disco Lab
13
Clustering DAFS Servers Using FedFS
Federated File System (FedFS) Federation of local file systems on cluster
nodes Extend the benefits of DAFS to cluster-based
servers Low overhead protocol over SAN
FedFS over SAN
File I/O
DAFSServer
DAFSServer
DAFSServer
DAFSServer
SAN-2 Disco Lab
14
FedFS Goals
Global name space across the cluster Created dynamically for each distributed
application Load balancing Dynamic Reconfiguration
SAN-2 Disco Lab
15
Each VD is mapped to a manager node Determined using hash function on pathname
Manager constructs and maintains the VD
Virtual Directory (VD) Union of all local directories with same
pathname
/
usr
file1
/
usr
file2
file1 file2
/
usr
Virtual Directory (/usr)
/
usr
file1
/
usr
file2
SAN-2 Disco Lab
16
Constructing a VD Constructed on first access to directory Manager performs dirmerge to merge
real directory info on cluster nodes into a VD Summary of real directory info is generated
and exchanged at initialization Cached in memory and updated on directory
modifying operations
SAN-2 Disco Lab
17
File Access in FedFS
Each file mapped to a manager Determined using hash on pathname Maintains information about the file
Request manager for location (home) of file Access file from home
Local FSVIFedFS
VI Network
DAFS Server
Local FSVIFedFS
DAFS Server
Local FSVIFedFS
DAFS Server
f1
manager(f1)
home(f1)
SAN-2 Disco Lab
18
Optimizing File Access Directory Table (DT) to cache file
information File information cached after first lookup Cache of name space distributed across
cluster Block level in-memory data cache
Data blocks cached on first access LRU Replacement
SAN-2 Disco Lab
19
Communication in FedFS
Two VI channels between any pair of server nodes Send/Receive for request/response RDMA exclusively for data transfer
Descriptors and buffers registered at initialization
Local FS
DAFS Server
FedFS
VI
VI Network
Local FS
DAFS Server
FedFS
VI
Send/Receive forRequest/Response
Local FS
DAFS Server
FedFS
VI
VI Network
Local FS
DAFS Server
FedFS
VI
BufferRDMA for
Response with data
SAN-2 Disco Lab
20
Performance Evaluation
DAFS Server
VI Network
Application
VI Local FSVI
DAFS Client FedFS
Application
VIDAFS Client
Application
VIDAFS Client
DAFS Server
Local FSVIFedFS
• • • •
• •
SAN-2 Disco Lab
21
Experimental Platform Eight node server cluster
800 MHz PIII, 512 MB SDRAM, 9 GB 10K RPM SCSI
Clients Dual processor (300 MHz PII), 512 MB SDRAM
Linux-2.4 Servers and Clients equipped with
Emulex cLAN adapter 32 port Emulex switch in full-bandwith
configuration
SAN-2 Disco Lab
22
SAN Performance Characteristics VIA Latency and Bandwidth
poll/wait for latency/bandwidth measurement respectively
Packet Size (Bytes)
Roundtrip Latency (s)
Bandwidth (MB/s)
256 23.3 56
512 27.3 85
1024 36.9 108
2048 56.0 109
4096 91.2 110
SAN-2 Disco Lab
23
Workloads Postmark – Synthetic benchmark
Short-lived small files Mix of metadata-intensive operations
Benchmark outline Create a pool of files Perform transactions – READ/WRITE paired
with CREATE/DELETE Delete created files
SAN-2 Disco Lab
24
Workload Details
Each client performs 30,000 transactions Each transaction – READ paired with
CREATE/DELETE READ = open, read, close CREATE = open, write, close DELETE = unlink
Multiple clients used for maximum throughput
Clients distribute requests to servers using a hash function on pathnames
SAN-2 Disco Lab
25
Base Case (Single Server) Maximum throughput
5075 transactions/second Average time per transaction
For client ~ 200 s On server ~ 100 s
SAN-2 Disco Lab
26
Postmark Throughput
0
5000
10000
15000
20000
25000
30000
0 1 2 3 4 5 6 7 8 9
Number of Servers
Po
stm
ark
Th
rou
gh
pu
t (t
xns/
sec)
File size: 2 K
File size: 4 K
File size: 8 K
File size: 16 K
# Servers
2 4 8
Speedup 1.75 3 5
SAN-2 Disco Lab
27
FedFS Overheads
Files are physically placed on the node which receives client requests
Only metadata operations may involve communication first open(file) delete(file)
Observed communication overhead Average of one roundtrip message among
servers per transaction
SAN-2 Disco Lab
28
Other Workloads No client request sent to file’s correct
location All files created outside Federated DAFS Only READ operations (open, read, close) Potential increase in communication overhead
Optimized coherence protocol minimizes communication Avoid communication at open and close in
the common case Data Caching helps reduce the frequency
of communication for remote data access
SAN-2 Disco Lab
29
Postmark Read Throughput
Each transaction = READ
0
10000
20000
30000
40000
50000
60000
2 4
Number of Servers
Po
stm
ark
Re
ad
Th
rou
gh
pu
t (t
xn
s/s
ec
)
Federated DAFS
Federated DAFS - No Cache
SAN-2 Disco Lab
30
Communication Overhead Without Caching
Without caching, each read results in remote fetch Each remote fetch costs ~65s request message (< 256 B) + response
message (4096 B)
# Servers
# Clients for Max. Throughput
# Transactions
# Remote Reads on each
server
2 10 300,000 150,000
4 20 600,000 150,000
SAN-2 Disco Lab
31
Work in Progress Study other application workloads Optimized coherence protocols to
minimize communication in Federated DAFS
File migration Alleviate performance degradation from
communication overheads Balance load
Dynamic reconfiguration of cluster Study DAFS over a Wide Area Network
SAN-2 Disco Lab
32
Conclusions Efficient user-level DAFS implementation Low overhead user-level communication
used to provide lightweight clustering protocol (FedFS)
Federated DAFS minimizes overheads by reducing communication among server nodes in the cluster
Speedups of 3 on 4-node and 5 on 8-node clusters demonstrated using Federated DAFS
Thanks
Distributed Computing Laboratoryhttp://discolab.rutgers.edu
SAN-2 Disco Lab
34
DAFS Performance
0
5000
10000
15000
20000
25000
30000
35000
40000
0 2 4 6 8 10
Number of Servers
Po
stm
ark
Th
rou
gh
pu
t (t
xns/
sec)
File size: 4 K