Upload
parvez-gupta
View
60
Download
1
Embed Size (px)
DESCRIPTION
EFFECTIVENESS OF ZFS FILE SYSTEM
Citation preview
Improving Performance of a Distributed File System Using OSDs and Cooperative Cache
Submitted By:Parvez Gupta
Varenya Agrawal
Introduction
This work describes a cooperative cache algorithm used in zFS and explores the effectiveness of this algorithm and of zFS as a file system
This is done by comparing the system’s performance to NFS using the IOZONE benchmark
Results show that :
• zFS performs better than NFS when cooperative cache is activated
• Using pre-fetching in zFS also increases performance significantly
zFS
It is a distributed file system that uses Object Store Devices (OSD) and a set of cooperating machines
The objectives of zFS design are :
• Achieving a scalable file system
• Built from off-the-shelf components
• Make use of the memory of all participating machines
• Linear increase in performance with each added machine
• Separation of storage management from file management
The Architecture
zFS has six components :
• Front End (FE)• Cooperative Cache (Cache)• File Manager (FMGR)• Lease Manager (LMGR)• Transaction Server (TSVR)• Object Store (ObS)
The Components
Object Store
• It is the storage device on which files and directories are created and from where they are retrieved
• It handles the physical disk chores of block allocation and mapping
• ObS API enables creation and deletion of objects (files)
Front End
• Runs on every workstation on which client wants to use zFS• Provides access to zFS files and directories
Lease Manager
• Leases are used to maintain data integrity in zFS• They have an expiration period that is set in advance• Each ObS has one lease manager which acquires the major lease• It grants exclusive leases on objects residing on the ObS
File Manager
• Each zFS file is managed by a single file manager• It obtains the exclusive lease from the lease manager• It keeps track of each accomplished open() and read() request
Cooperative Cache
• Due to fast network connections, it takes lesser time to retrieve data from another machines memory than from a local disk
Transaction Server
• Each directory operation is protected inside a transaction• It helps maintain consistency of the file-system• Acquires all required leases and holds onto them for as long as it
can
The Cooperative Cache
It is integrated with the Linux kernel cache as :
• OS does not require two separate caches with different policies that may interfere
• This provides comparable local performance between zFS and other local file systems in Linux
As a result of above, following is achieved :
• Kernel evokes page eviction when available memory is low• Caching is done per page basis-not on whole files• Pages of zFS and other file systems are treated equally• Pages remain in cache until memory pressure causes kernel to
discard them• When eviction is invoked and a zFS page is the candidate then
decision is passed to a zFS routine
Cooperative cache algorithm
A page in cooperative cache is either singlet or replicated
When a client wants to open a file for reading :• The local cache is checked for the page• In case of a cache miss, zFS requests the page and its lease from
the file manager• The file manager checks if the requested pages are already
present in another machine's memory in the network• If not, zFS grants the leases to the client, which in turn reads the
pages from the OSD directly marking each page as a singlet• If the pages requested reside in the memory of some other node
B, it sends a message to B to send the pages and leases to A• Both A and B mark the pages as replicated. Node B is called a
third-party node
Cooperative cache algorithm
When memory becomes scarce , kernel invokes page eviction
• page is a replicated
• page is a singlet, the page is forwarded to another node using the
following steps :
1. A message is sent to the zFS file manager indicating that the page is sent to
another machine B, the node with the largest free memory known to A
2. The page is forwarded to B
3. The page is discarded from the page cache of A
Cooperative cache algorithm
Effects of Node Failure and Network Delays
Node Failure :
• acceptable for the file manager to assume existence of pages on nodes
• unacceptable to have pages on nodes, where the file manager is unaware
• Thus order of steps for forwarding singlet page is important– Node failure before step 1 - The file manager will eventually detect this and
update its data– Node failure after step 1 - The file manager is informed that the page is on B
although it is not true. Same situation as 1– Failure after step 2 - does not pose any problem
Cooperative cache algorithm
Network Delays :
Case 1 :
• A replicated page residing on nodes M and N is discarded from M
– zFS file manager sends a singlet message to N
– Due to network delay, this message reaches N after memory pressure developed
on N and it discarded the page as it was marked replicated
Cooperative cache algorithm
Case 2 :
• A page has not arrived on N and a singlet message arrived and was
ignored. N sent a reject message when asked to forward the page
• No problem if the page never arrives
• However, if the page arrives after the reject message is sent, it causes
inconsistency
Cooperative cache algorithm
Case 3 :
Cooperative cache algorithm
Case 4 :
• Page was moved from N to M to O where its recirculation
count exceeded its limit
• O sends a release_lease message which arrives before move
notification
Choosing proper third party node
• zFS FMGR uses enhanced round robin method• For each page range granted to node N, FMGR records time t(N)• For every request the FMGR scans all nodes holding the page
range• For each selected node Ni, the FMGR checks if currentTime -t(Ni) > C. This checks whether enough time
passed for the pages granted to Ni to reach it• If true, Ni is marked as potential provider; next node is checked• Among the marked nodes, the node with largest range Nmax is
chosen• For the next request, FMGR starts scan from node Nmax+1
Pre-fetching data in zFS
• Overhead for transmitting a data block over a network is composed of two parts : – The network setup overhead– The transmission time of the data block
• It is more efficient to transmit k pages in one message rather than transmitting them in a separate message
– Researchers tested the time it takes to transmit a file of N pages in chunks of 1...k pages in one message
– Best results were achieved for k=4 and k=8– Similar performance was achieved by zFS pre-fetching mechanism
zFS Testing environment
The Server PC ran an OSD simulator
Another PC ran the Lease Manager, File Manager and Transaction Manager
Four PCs ran the zFS front-end
NFS Testing environment
The Server PC ran an NFS server with eight NFS daemons (nfsd)
Four PCs ran the NFS clients
Methodology Used
• IOZONE benchmark tool was used to compare zFS’ performance to that of NFS
• NFS does not carry out pre-fetching so to make up for this, IOZONE was configured to read the NFS mounted file using record sizes of n=1,4,8,16 pages
• zFS mounted files were read with record size of one page but with pre-fetching parameter R=1,4,8,16 pages
Comparing zFS and NFS
Two scenarios were investigated during testing :
• file size smaller than the server's cache and all the data resided in the server’s cache
• The file size much larger than the size of the server’s cache
Results for scenario I
Results for scenario II
Observations
• The performance of NFS was almost the same for different block sizes
• But its performance is almost four times better when the file fits entirely in the memory
• The performance of zFS with cooperative cache is much better than NFS
• When cooperative cache was deactivated, different behaviors were observed for different range of pages
Observations
• The performance of zFS for R=1 is lower than that of NFS
• For larger ranges, the performance of zFS was slightly better than that of NFS due to pre-fetching
• When cooperative cache is used, zFS performance is significantly better than NFS
• Performance with cooperative cache is lower in second case due to memory pressure and discarded pages generating reject messages
Conclusion
• The results show that using the cache of all the clients as one cooperative cache gives better performance as compared to NFS as well as the case when cooperative cache is not used
• The results also show that using pre-fetching with ranges of four and eight pages results in much better performance