Upload
christian-simpson
View
222
Download
1
Embed Size (px)
Citation preview
Distributed File Systems
File
A file is a collection of data with a user view (file structure) and a physical view (blocks).
Files contain both data and attributes
File = Name + Attributes + Data
A directory is a file that provides a mapping from text names to internal file identifiers.
File Systems
• File system is the interface to disk storage
• File systems implement file management:
– Naming and locating a file
– Accessing a file – create, delete, open, close, read, write
– Physical allocation of a file.
4
Directory module: relates file names to file IDs
File module: relates file IDs to particular files
Access control module: checks permission for operation requested
File access module: reads or writes file data or attributes
Block module: accesses and allocates disk blocks
Device module: disk I/O and buffering
File system modules
What is a file system? (a typical module structure for implementation of non-DFS)
Files
Files
DeviceDeviceBlocksBlocks
DirectoriesDirectories
Distributed File System
• A distributed file system (DFS) is a file system with
distributed storage and users. Files may be located
remotely on servers.
• SUN Network File System NFS ver3& ver4
• Andrew File System AFS
• A case for DFS
Introduction
I want to store my thesis on
the server! I need to have my
book always available..
I need to store my analysis
and reports safely…
I need storage for my reports
My boss wants…
Server AServer A
Uhm… perhaps time has come to buy a rack of servers….
Introduction
• A Case for DS
• AServer AServer A
Server BServer B
Server CServer C
Wow… now I can store a lot
more documents…
Hey… but where did I put my
docs?
Same here… I
don’t remembe
r..
I am not sure whether server A, or B, or C…
Uhm… … maybe we need a DFS?... Well after the paper and a
nap…
• A Case for DFS
Introduction
Server CServer C
Server BServer BServer AServer A
Good… I can access my
folders from anywhere..
Wow! I do not have to
remember which server I
stored the data into…
Nice… my boss will promote
me!
It is reliable, fault tolerant, highly available, location transparent…. I
hope I can finish my newspaper now…
Distributed File SystemDistributed File System
Cluster-Based Distributed File Systems (1)
• The difference between
• (a) distributing whole files across several servers and (b) striping files for parallel access.
File service architecture
Client computer Server computer
Applicationprogram
Applicationprogram
Client module
Flat file service
Directory service
Distributed File System (DFS) Requirements • Transparency - server-side changes should be invisible to the client-side.
– Access transparency: A single set of operations is provided for access to local/remote files.
– Location Transparency: All client processes see a uniform file name space.
– Migration Transparency: When files are moved from one server to another, users should not see it
– Performance Transparency
– Scaling Transparency
• File Replication– A file may be represented by several copies for service efficiency and fault tolerance.
• Concurrent File Updates– Changes to a file by one client should not interfere with the operation of other clients
simultaneously accessing the same file.
12
DFS: Case Studies • NFS (Network File System)
– Developed by Sun Microsystems (in 1985)
– Most popular, open, and widely used.
– NFS protocol standardised through IETF (RFC 1813)
• AFS (Andrew File System)– Developed by Carnegie Mellon University as part of Andrew
distributed computing environments (in 1986)
– A research project to create campus wide file system.
– Public domain implementation is available on Linux (LinuxAFS)
– It was adopted as a basis for the DCE/DFS file system in the Open Software Foundation (OSF, www.opengroup.org) DEC (Distributed Computing Environment)
NFS
• It used originally for Unix-based workstations
• In NFS, each file server provides a view for its
local file system
• NFS can use both TCP and UDP
NFS - History
1985: Original Version (in-house use) 1989: NFSv2 (RFC 1094)
– Operated entirely over UDP– Stateless protocol (the core)– Support for 2GB files
1995: NFSv3 (RFC 1813)– Support for 64 bit (> 2GB files)– Support for asynchronous writes– Support for TCP– Support for additional attributes– Other improvements
2000-2003: NFSv4 (RFC 3010, RFC 3530)– Collaboration with IETF– Sun hands over the development of NFS
2010: NFSv4.1– Adds Parallel NFS (pNFS) for parallel data access
NFS
• NFS come with communication protocol allows clients
with different OS and machines to access files on server is
called Open Network Computing Protocol ONC
• Client with windows system can access UNIX file server
because of NFS protocol independent of OS
NFS Access
• We have two types of access models: Remote Access
model and download/upload model
• The remote access model is that client request to file
client unaware of actual location of file
• Upload/download model is that client download file
from server and access it locally.
NFS Access
The remote access model. The upload/download model
NFS architecture
UNIX kernel
protocol
Client computer Server computer
system calls
Local Remote
UNIXfile
system
NFSclient
NFSserver
UNIXfile
system
Applicationprogram
Applicationprogram
NFS
UNIX
UNIX kernel
Virtual file systemVirtual file system
Oth
er f
ile s
yste
m
NFS Architecture
The basic NFS architecture for UNIX systems.
NFS Communication
a) Reading data from a file in NFS version 3.b) Reading data using a compound procedure in version 4.
Naming in NFS
Mounting (part of) a remote file system in NFS.
Naming in NFS
Mounting nested directories from multiple servers in NFS.
Automounting
Dropbox FolderDropbox Folder
Dropbox Folder
Dropbox Folder
Dropbox Folder
Dropbox Folder
Automaticsynchronization
File Handles
• In Unix, file has a node number or file handle
which is reference to a file in NFS independent
of the name of file.
• It is created by server when file is created
• It is 128 byte in ver. 4 of NFS
NFS v3 (1995)
ASYN_WRITE: asynchronous write– Allow server-side caching
COMMIT: – Flush server’s write buffers
READPLUSDIR: – Obtain directory’s file names, handles & attributes
64 bits for file size/offset– NFS v2 allows only 32 bits
NFS v4 (2000)
Based on SUN’s WebNFS
Stateful protocol:– Open/close requests– Integrates mount & locking protocols– Lease-based locking– COMPOUND request: group of multiple operations
NFS operationsOperation v3 v4 Description
Create Yes No Create a regular file
Create No Yes Create a nonregular file
Link Yes Yes Create a hard link to a file
Symlink Yes No Create a symbolic link to a file
Mkdir Yes No Create a subdirectory in a given directory
Mknod Yes No Create a special file
Rename Yes Yes Change the name of a file
Rmdir Yes No Remove an empty subdirectory from a directory
Open No Yes Open a file
Close No Yes Close a file
Lookup Yes Yes Look up a file by means of a file name
Readdir Yes Yes Read the entries in a directory
Readlink Yes Yes Read the path name stored in a symbolic link
Getattr Yes Yes Read the attribute values for a file
Setattr Yes Yes Set one or more attribute values for a file
Read Yes Yes Read the data contained in a file
Write Yes Yes Write data to a file
29
• read(fh, offset, count) -> attr, data• write(fh, offset, count, data) -> attr• create(dirfh, name, attr) -> newfh, attr• remove(dirfh, name) status• getattr(fh) -> attr• setattr(fh, attr) -> attr• lookup(dirfh, name) -> fh, attr• rename(dirfh, name, todirfh, toname)• link(newdirfh, newname, dirfh, name)• readdir(dirfh, cookie, count) -> entries• symlink(newdirfh, newname, string) -> status• readlink(fh) -> string• mkdir(dirfh, name, attr) -> newfh, attr• rmdir(dirfh, name) -> status• statfs(fh) -> fsstats
NFS server operations (simplified)
fh = file handle:
Filesystem identifier i-node number i-node generation
*
Model flat file service
Read(FileId, i, n) -> DataWrite(FileId, i, Data)
Create() -> FileIdDelete(FileId)
GetAttributes(FileId) -> AttrSetAttributes(FileId, Attr)
Model directory service
Lookup(Dir, Name) -> FileIdAddName(Dir, Name, File)
UnName(Dir, Name)GetNames(Dir, Pattern)
->NameSeq
File attribute record structure
File length
Creation timestamp
Read timestamp
Write timestamp
Attribute timestamp
Reference count
Owner
File type
Access control list
File Attributes (1)
Some general mandatory file attributes in NFS.
Attribute Description
TYPE The type of the file (regular, directory, symbolic link)
SIZE The length of the file in bytes
CHANGEIndicator for a client to see if and/or when the file has changed
FSID Server-unique identifier of the file's file system
File Attributes (2)
Some general recommended file attributes.
Attribute Description
ACL an access control list associated with the file
FILEHANDLE The server-provided file handle of this file
FILEID A file-system unique identifier for this file
FS_LOCATIONS Locations in the network where this file system may be found
OWNER The character-string name of the file's owner
TIME_ACCESS Time when the file data were last accessed
TIME_MODIFY Time when the file data were last modified
TIME_CREATE Time when the file was created
File Sharing (1)a) On a single processor, when a
read follows a write, the value returned by the read is the value just written.
b) In a distributed system with caching, obsolete values may be returned.
Semantics of File Sharing
Method Comment
UNIX semantics Every operation on a file is instantly visible to all processes
Session semantics No changes are visible to other processes until the file is closed
Immutable files No updates are possible; simplifies sharing and replication
Transaction All changes occur atomically
Server Caching• In delayed-write, when a page has been altered,
its new contents are written back to the disk only when the buffer page is required for another page.
–Unix sync operation writes pages to disk every 30 seconds
• In write-through, data in write operations is stored in the memory cache at the server and written to disk before a reply is sent to the client.
Failures & Retransmission
Three situations for handling retransmissions.a) The request is still in progressb) The reply has just been returnedc) The reply has been some time ago, but was lost.
Andrew File System (AFS)
• Developed at CMU (~1985), commercial product by Transarc (part of OSF/DCE)
• Segment network into clusters, with one file server per cluster
• Dynamic reconfigurations to balance load• Stateful protocol + aggressive caching
– Servers participate in client cache management
• Entire files are cached• Session semantics
– Weaker than UNIX semantics– See new data on next open()– Immediate updates of metadata
Interception and Caching
fd = open(pathname)– Files in local file system are opened as normal.– Cached files in shared file system are opened locally.– Other shared files are copied to cache.
All read/write directed to cached copy.close(fd)
– Local or cached copy is closed.– If shared file is modified it is copied back to server.
Vice File ServiceFlat file service with volumes
96 bit file identifier
Volume– Group of related files (e.g. one user’s files)– Used for location and management– Server is custodian of volume
Volume location database replicated at each server
volume number file handle uniquifier
Vice Interface
Fetch(fid) -> attr, data Returns the attributes (status) and, optionally, the contents of fileidentified by the fid and records a callback promise on it.
Store(fid, attr, data) Updates the attributes and (optionally) the contents of a specifiedfile.
Create() -> fid Creates a new file and records a callback promise on it.
Remove(fid) Deletes the specified file.
SetLock(fid, mode) Sets a lock on the specified file or directory. The mode of thelock may be shared or exclusive. Locks that are not removed expire after 30 minutes.
ReleaseLock(fid) Unlocks the specified file or directory.
RemoveCallback(fid) Informs server that a Venus process has flushed a file from itscache.
BreakCallback(fid) This call is made by a Vice server to a Venus process. It cancelsthe callback promise on the relevant file.
AFS propertiesAdvantages
– Only contact server on open/close– Usually single user at one workstation– Most files are read in entirety– Cache copies valid for long periods– Disk based cache survives reboot– Simplified caching scheme
Disadvantages– Files larger than cache can’t be opened– Workstations require disk– Not appropriate for database support
The Coda File System
Type of user Description
Owner The owner of a file
Group The group of users associated with a file
Everyone Any user of a process
Interactive Any process accessing the file from an interactive terminal
Network Any process accessing the file via the network
DialupAny process accessing the file through a dialup connection to the server
Batch Any process accessing the file as part of a batch job
Anonymous Anyone accessing the file without authentication
Authenticated Any authenticated user of a process
Service Any system-defined service process
Overview of Coda (I)
Overview of Coda (II)
Communication (I)Side effects in Coda's RPC2 system.
Communication (II)
a) Sending an invalidation message one at a time.
b) Sending invalidation messages in parallel.
NamingClients in Coda have access to a single shared name space.
File IdentifiersReplicated Volume ID (32-bits)
vnode
xFS: A “Serverless” File System• Distribute file server processing across a set of
available hosts, at the granularity of individual files– Separate file management & file storage/access
– Dynamically assign files to hosts
• Software RAID storage system– Striping file data across disks
– Log-structured organization
• Manager Map– Replicated at all hosts
Overview of xFS
A typical distribution of xFS processes across multiple machines.
Processes (I)The principle of log-based striping in xFS.
Processes (II)
Reading a block of data in xFS.
NamingMain data structures used in xFS.
Data structure Description
Manager map Maps file ID to manager
Imap Maps file ID to log address of file's inode
Inode Maps block number (i.e., offset) to log address of block
File identifier Reference used to index into manager map
File directory Maps a file name to a file identifier
Log addresses Triplet of stripe group, ID, segment ID, and segment offset
Stripe group map Maps stripe group ID to list of storage servers