21
Jinyong Yoon, 2010. 10. 18.

Scale and Performance in a Distributed File System

  • Upload
    africa

  • View
    68

  • Download
    0

Embed Size (px)

DESCRIPTION

Jinyong Yoon, 2010. 10. 18. Scale and Performance in a Distributed File System. Outline. Andrew File System The Prototype Changes for Performance Effect of Changes for Performance Comparison with A Remote-Open File System Conclusion. Andrew File System. - PowerPoint PPT Presentation

Citation preview

Page 1: Scale and Performance in a Distributed File System

Jinyong Yoon, 2010. 10. 18.

Page 2: Scale and Performance in a Distributed File System

Andrew File System The Prototype Changes for Performance Effect of Changes for Performance Comparison with A Remote-Open File

System Conclusion

Page 3: Scale and Performance in a Distributed File System

Developed at Carnegie Mellon University Distributed file system by considerations of scale

Locality of file references Present a homogeneous, location-transparent

file name space to all the client workstations Use 4.2 BSD

Server▪ A set of trusted servers – Vice

Clients▪ User level processes – Venus▪ File system call hooking▪ Contacts with servers only opens and closes for a whole-file transfer

▪ Caches files from Vice▪ Store modified copies of files back on the servers

Page 4: Scale and Performance in a Distributed File System

workstation

Venus

UserProgram

Unix Kernel

Disk

Server

Vice

Unix Kernel

Disk

workstation

Venus

UserProgram

Unix Kernel

Disk

workstation

Venus

UserProgram

Unix Kernel

Disk

Server

Vice

Unix Kernel

Disk

Network

Page 5: Scale and Performance in a Distributed File System

Venus on the client with a dedicated process Persistent process on the server Each server stored the directory hierarchy

Mirroring the structure of the Vice files .admin directory – Vice file status info Stub directory – location database

Vice-Venus interface by their full pathname There’s no notion of a low-level name such as

inode Before using a cached file, Venus verifies its

timestamp Each open of a file thus resulted in at least one

interaction with a server, even if the file were already in the cache and up to date

Page 6: Scale and Performance in a Distributed File System

stat primitive To test for the presence of files To obtain status information before opening files Each stat call involved a cache validity check Increase total running time and the load on

servers Dedicated Process

Excessive context switching overhead Critical resource limits excess High virtual memory paging demands

Page 7: Scale and Performance in a Distributed File System

Remote Procedure Call (RPC) Simplification of implementation Network related resources in the kernel to be

exceeded Location Database

Difficult to move users’ directories between servers

Etc. Use Vice file without recompilation or relinking

Page 8: Scale and Performance in a Distributed File System

Benchmark Command scripts that operates on a collection of files 70 files (source code of an application program) 200kb Stand-alone Benchmark and 5 phases

Page 9: Scale and Performance in a Distributed File System

Skewed distribution of Vice calls TestAuth – Validate cache entries GetFileStat – Obtain status information about files absent

from the cache

Page 10: Scale and Performance in a Distributed File System

Load unit Load placed on a server by a single client workstation

running this benchmark A load unit – 5 Andrew users

Page 11: Scale and Performance in a Distributed File System

CPU/disk utilization profiling

Performance bottleneck is CPU Frequently context switches The time spent by the servers in traversing full pathnames

Page 12: Scale and Performance in a Distributed File System

Cache management Previous▪ Status(in virtual memory)/Data(in local disk)

cache▪ Interception only opening/closing operations▪ Modifications to a cached files are reflected back

to Vice when the file is closed Callback - the server promises to notify it before

allowing a modification▪ This reduces cache validation traffic▪ Each should maintain callback state information ▪ There is a potential for inconsistency

Page 13: Scale and Performance in a Distributed File System

Name resolution Previous▪ inode – unique, fixed-length▪ pathname – one or more, variable-length▪ namei routine – maps a pathname to an inode▪ Each Vice pathname involves implicit namei

operation▪ CPU overhead on the servers

fid – unique, fixed-length, two-level name▪ Map a component of a pathname to a fid▪ Each 32 bit-Volume number, Vnode number,

Uniquifuier▪ Volume number: Identifying a Volume on one server▪ Vnode number: Index into an file storage information array▪ Uniquifuier: Allowing Reuse of Vnode number

Page 14: Scale and Performance in a Distributed File System

Communication and server process structure Using Lightweight Processes (LWPs) instead

of a single process An LWP is bound to a particular client only

for the duration of a single server operation. Low-level storage representation

Access files by their inodes▪ vnode on the servers▪ inode on the clients

Page 15: Scale and Performance in a Distributed File System

workstation

UserProgram

Unix Kernel

Unix File System

Unix file system calls

- If D is in the cache and has a callback on it

- If D is in the cache but has no callback on it

- If D is not in the cacheNon-local file operations

Local Disk

Page 16: Scale and Performance in a Distributed File System

Scalability 19% slower than stand-alone workstation Prototype is 70% slower

Page 17: Scale and Performance in a Distributed File System

Scalability

Page 18: Scale and Performance in a Distributed File System

Remote Open The data in a file are not fetched en masse Instead the remote site potentially participates in each

individual read an write operation File is actually opened on the remote site rather than the

local site NFS

Page 19: Scale and Performance in a Distributed File System
Page 20: Scale and Performance in a Distributed File System

Advantage of remote-open file system Low latency

Page 21: Scale and Performance in a Distributed File System

Scale impacts Andrew in areas besides performance and operability