Distributed File Systems File A file is a collection of data with a user view (file structure) and...

Preview:

Citation preview

Distributed File Systems

File

A file is a collection of data with a user view (file structure) and a physical view (blocks).

Files contain both data and attributes

File = Name + Attributes + Data

A directory is a file that provides a mapping from text names to internal file identifiers.

File Systems

• File system is the interface to disk storage

• File systems implement file management:

– Naming and locating a file

– Accessing a file – create, delete, open, close, read, write

– Physical allocation of a file.

4

Directory module: relates file names to file IDs

File module: relates file IDs to particular files

Access control module: checks permission for operation requested

File access module: reads or writes file data or attributes

Block module: accesses and allocates disk blocks

Device module: disk I/O and buffering

File system modules

What is a file system? (a typical module structure for implementation of non-DFS)

Files

Files

DeviceDeviceBlocksBlocks

DirectoriesDirectories

Distributed File System

• A distributed file system (DFS) is a file system with

distributed storage and users. Files may be located

remotely on servers.

• SUN Network File System NFS ver3& ver4

• Andrew File System AFS

• A case for DFS

Introduction

I want to store my thesis on

the server! I need to have my

book always available..

I need to store my analysis

and reports safely…

I need storage for my reports

My boss wants…

Server AServer A

Uhm… perhaps time has come to buy a rack of servers….

Introduction

• A Case for DS

• AServer AServer A

Server BServer B

Server CServer C

Wow… now I can store a lot

more documents…

Hey… but where did I put my

docs?

Same here… I

don’t remembe

r..

I am not sure whether server A, or B, or C…

Uhm… … maybe we need a DFS?... Well after the paper and a

nap…

• A Case for DFS

Introduction

Server CServer C

Server BServer BServer AServer A

Good… I can access my

folders from anywhere..

Wow! I do not have to

remember which server I

stored the data into…

Nice… my boss will promote

me!

It is reliable, fault tolerant, highly available, location transparent…. I

hope I can finish my newspaper now…

Distributed File SystemDistributed File System

Cluster-Based Distributed File Systems (1)

• The difference between

• (a) distributing whole files across several servers and (b) striping files for parallel access.

File service architecture

Client computer Server computer

Applicationprogram

Applicationprogram

Client module

Flat file service

Directory service

Distributed File System (DFS) Requirements • Transparency - server-side changes should be invisible to the client-side.

– Access transparency: A single set of operations is provided for access to local/remote files.

– Location Transparency: All client processes see a uniform file name space.

– Migration Transparency: When files are moved from one server to another, users should not see it

– Performance Transparency

– Scaling Transparency

• File Replication– A file may be represented by several copies for service efficiency and fault tolerance.

• Concurrent File Updates– Changes to a file by one client should not interfere with the operation of other clients

simultaneously accessing the same file.

12

DFS: Case Studies • NFS (Network File System)

– Developed by Sun Microsystems (in 1985)

– Most popular, open, and widely used.

– NFS protocol standardised through IETF (RFC 1813)

• AFS (Andrew File System)– Developed by Carnegie Mellon University as part of Andrew

distributed computing environments (in 1986)

– A research project to create campus wide file system.

– Public domain implementation is available on Linux (LinuxAFS)

– It was adopted as a basis for the DCE/DFS file system in the Open Software Foundation (OSF, www.opengroup.org) DEC (Distributed Computing Environment)

NFS

• It used originally for Unix-based workstations

• In NFS, each file server provides a view for its

local file system

• NFS can use both TCP and UDP

NFS - History

1985: Original Version (in-house use) 1989: NFSv2 (RFC 1094)

– Operated entirely over UDP– Stateless protocol (the core)– Support for 2GB files

1995: NFSv3 (RFC 1813)– Support for 64 bit (> 2GB files)– Support for asynchronous writes– Support for TCP– Support for additional attributes– Other improvements

2000-2003: NFSv4 (RFC 3010, RFC 3530)– Collaboration with IETF– Sun hands over the development of NFS

2010: NFSv4.1– Adds Parallel NFS (pNFS) for parallel data access

NFS

• NFS come with communication protocol allows clients

with different OS and machines to access files on server is

called Open Network Computing Protocol ONC

• Client with windows system can access UNIX file server

because of NFS protocol independent of OS

NFS Access

• We have two types of access models: Remote Access

model and download/upload model

• The remote access model is that client request to file

client unaware of actual location of file

• Upload/download model is that client download file

from server and access it locally.

NFS Access

The remote access model. The upload/download model

NFS architecture

UNIX kernel

protocol

Client computer Server computer

system calls

Local Remote

UNIXfile

system

NFSclient

NFSserver

UNIXfile

system

Applicationprogram

Applicationprogram

NFS

UNIX

UNIX kernel

Virtual file systemVirtual file system

Oth

er f

ile s

yste

m

NFS Architecture

The basic NFS architecture for UNIX systems.

NFS Communication

a) Reading data from a file in NFS version 3.b) Reading data using a compound procedure in version 4.

Naming in NFS

Mounting (part of) a remote file system in NFS.

Naming in NFS

Mounting nested directories from multiple servers in NFS.

Automounting

Dropbox FolderDropbox Folder

Dropbox Folder

Dropbox Folder

Dropbox Folder

Dropbox Folder

Automaticsynchronization

File Handles

• In Unix, file has a node number or file handle

which is reference to a file in NFS independent

of the name of file.

• It is created by server when file is created

• It is 128 byte in ver. 4 of NFS

NFS v3 (1995)

ASYN_WRITE: asynchronous write– Allow server-side caching

COMMIT: – Flush server’s write buffers

READPLUSDIR: – Obtain directory’s file names, handles & attributes

64 bits for file size/offset– NFS v2 allows only 32 bits

NFS v4 (2000)

Based on SUN’s WebNFS

Stateful protocol:– Open/close requests– Integrates mount & locking protocols– Lease-based locking– COMPOUND request: group of multiple operations

NFS operationsOperation v3 v4 Description

Create Yes No Create a regular file

Create No Yes Create a nonregular file

Link Yes Yes Create a hard link to a file

Symlink Yes No Create a symbolic link to a file

Mkdir Yes No Create a subdirectory in a given directory

Mknod Yes No Create a special file

Rename Yes Yes Change the name of a file

Rmdir Yes No Remove an empty subdirectory from a directory

Open No Yes Open a file

Close No Yes Close a file

Lookup Yes Yes Look up a file by means of a file name

Readdir Yes Yes Read the entries in a directory

Readlink Yes Yes Read the path name stored in a symbolic link

Getattr Yes Yes Read the attribute values for a file

Setattr Yes Yes Set one or more attribute values for a file

Read Yes Yes Read the data contained in a file

Write Yes Yes Write data to a file

29

• read(fh, offset, count) -> attr, data• write(fh, offset, count, data) -> attr• create(dirfh, name, attr) -> newfh, attr• remove(dirfh, name) status• getattr(fh) -> attr• setattr(fh, attr) -> attr• lookup(dirfh, name) -> fh, attr• rename(dirfh, name, todirfh, toname)• link(newdirfh, newname, dirfh, name)• readdir(dirfh, cookie, count) -> entries• symlink(newdirfh, newname, string) -> status• readlink(fh) -> string• mkdir(dirfh, name, attr) -> newfh, attr• rmdir(dirfh, name) -> status• statfs(fh) -> fsstats

NFS server operations (simplified)

fh = file handle:

Filesystem identifier i-node number i-node generation

*

Model flat file service

Read(FileId, i, n) -> DataWrite(FileId, i, Data)

Create() -> FileIdDelete(FileId)

GetAttributes(FileId) -> AttrSetAttributes(FileId, Attr)

Model directory service

Lookup(Dir, Name) -> FileIdAddName(Dir, Name, File)

UnName(Dir, Name)GetNames(Dir, Pattern)

->NameSeq

File attribute record structure

File length

Creation timestamp

Read timestamp

Write timestamp

Attribute timestamp

Reference count

Owner

File type

Access control list

File Attributes (1)

Some general mandatory file attributes in NFS.

Attribute Description

TYPE The type of the file (regular, directory, symbolic link)

SIZE The length of the file in bytes

CHANGEIndicator for a client to see if and/or when the file has changed

FSID Server-unique identifier of the file's file system

File Attributes (2)

Some general recommended file attributes.

Attribute Description

ACL an access control list associated with the file

FILEHANDLE The server-provided file handle of this file

FILEID A file-system unique identifier for this file

FS_LOCATIONS Locations in the network where this file system may be found

OWNER The character-string name of the file's owner

TIME_ACCESS Time when the file data were last accessed

TIME_MODIFY Time when the file data were last modified

TIME_CREATE Time when the file was created

File Sharing (1)a) On a single processor, when a

read follows a write, the value returned by the read is the value just written.

b) In a distributed system with caching, obsolete values may be returned.

Semantics of File Sharing

Method Comment

UNIX semantics Every operation on a file is instantly visible to all processes

Session semantics No changes are visible to other processes until the file is closed

Immutable files No updates are possible; simplifies sharing and replication

Transaction All changes occur atomically

Server Caching• In delayed-write, when a page has been altered,

its new contents are written back to the disk only when the buffer page is required for another page.

–Unix sync operation writes pages to disk every 30 seconds

• In write-through, data in write operations is stored in the memory cache at the server and written to disk before a reply is sent to the client.

Failures & Retransmission

Three situations for handling retransmissions.a) The request is still in progressb) The reply has just been returnedc) The reply has been some time ago, but was lost.

Andrew File System (AFS)

• Developed at CMU (~1985), commercial product by Transarc (part of OSF/DCE)

• Segment network into clusters, with one file server per cluster

• Dynamic reconfigurations to balance load• Stateful protocol + aggressive caching

– Servers participate in client cache management

• Entire files are cached• Session semantics

– Weaker than UNIX semantics– See new data on next open()– Immediate updates of metadata

Interception and Caching

fd = open(pathname)– Files in local file system are opened as normal.– Cached files in shared file system are opened locally.– Other shared files are copied to cache.

All read/write directed to cached copy.close(fd)

– Local or cached copy is closed.– If shared file is modified it is copied back to server.

Vice File ServiceFlat file service with volumes

96 bit file identifier

Volume– Group of related files (e.g. one user’s files)– Used for location and management– Server is custodian of volume

Volume location database replicated at each server

volume number file handle uniquifier

Vice Interface

Fetch(fid) -> attr, data Returns the attributes (status) and, optionally, the contents of fileidentified by the fid and records a callback promise on it.

Store(fid, attr, data) Updates the attributes and (optionally) the contents of a specifiedfile.

Create() -> fid Creates a new file and records a callback promise on it.

Remove(fid) Deletes the specified file.

SetLock(fid, mode) Sets a lock on the specified file or directory. The mode of thelock may be shared or exclusive. Locks that are not removed expire after 30 minutes.

ReleaseLock(fid) Unlocks the specified file or directory.

RemoveCallback(fid) Informs server that a Venus process has flushed a file from itscache.

BreakCallback(fid) This call is made by a Vice server to a Venus process. It cancelsthe callback promise on the relevant file.

AFS propertiesAdvantages

– Only contact server on open/close– Usually single user at one workstation– Most files are read in entirety– Cache copies valid for long periods– Disk based cache survives reboot– Simplified caching scheme

Disadvantages– Files larger than cache can’t be opened– Workstations require disk– Not appropriate for database support

The Coda File System

Type of user Description

Owner The owner of a file

Group The group of users associated with a file

Everyone Any user of a process

Interactive Any process accessing the file from an interactive terminal

Network Any process accessing the file via the network

DialupAny process accessing the file through a dialup connection to the server

Batch Any process accessing the file as part of a batch job

Anonymous Anyone accessing the file without authentication

Authenticated Any authenticated user of a process

Service Any system-defined service process

Overview of Coda (I)

Overview of Coda (II)

Communication (I)Side effects in Coda's RPC2 system.

Communication (II)

a) Sending an invalidation message one at a time.

b) Sending invalidation messages in parallel.

NamingClients in Coda have access to a single shared name space.

File IdentifiersReplicated Volume ID (32-bits)

vnode

xFS: A “Serverless” File System• Distribute file server processing across a set of

available hosts, at the granularity of individual files– Separate file management & file storage/access

– Dynamically assign files to hosts

• Software RAID storage system– Striping file data across disks

– Log-structured organization

• Manager Map– Replicated at all hosts

Overview of xFS

A typical distribution of xFS processes across multiple machines.

Processes (I)The principle of log-based striping in xFS.

Processes (II)

Reading a block of data in xFS.

NamingMain data structures used in xFS.

Data structure Description

Manager map Maps file ID to manager

Imap Maps file ID to log address of file's inode

Inode Maps block number (i.e., offset) to log address of block

File identifier Reference used to index into manager map

File directory Maps a file name to a file identifier

Log addresses Triplet of stripe group, ID, segment ID, and segment offset

Stripe group map Maps stripe group ID to list of storage servers

Recommended