Jeff's Filesystem Jeff's Filesystem Papers Review Papers Review Part I.Part I.
Review of Review of "Design and "Design and
Implementation Implementation of The Second of The Second
Extended Extended Filesystem"Filesystem"
The Design and The Design and Implementation of The Implementation of The Second Extended Second Extended FilesystemFilesystem
By Remy Card, Theodore Ts'o and By Remy Card, Theodore Ts'o and Stephen TweedieStephen Tweedie
Pascal Institut, MIT, and EdinburghPascal Institut, MIT, and EdinburghVery Linux-oriented.Very Linux-oriented.
This presentation is an academic review, the ideas presented are either quotes or paraphrases of the reviewed document.
HistoryHistory
VFS Virtual File SystemVFS Virtual File SystemDeveloped to ease addition of new FS's Developed to ease addition of new FS's into Linuxinto Linux
EFS Extended File SystemEFS Extended File Systemincreased max filesystem size and max increased max filesystem size and max filename size but used linked lists to filename size but used linked lists to keep track of inodes and no keep track of inodes and no timestamping, and bad perfromance timestamping, and bad perfromance and fragemtnationand fragemtnation
Xia Extension of old Minix FSXia Extension of old Minix FSExt2FSExt2FS
similar funtionality to Xia, but based on similar funtionality to Xia, but based on EFS code.EFS code.
Basic ConceptsBasic Concepts
InodeInodeFile Type, access rights, owners, File Type, access rights, owners, timestamps, size, pointers to timestamps, size, pointers to datablocks.datablocks.
DirectoriesDirectorieshierarchical tree, can contain files and hierarchical tree, can contain files and subdirssubdirs
Implemented as a special type of fileImplemented as a special type of filecontains list of entriescontains list of entries
ƒ Each entry is inode and filename.Each entry is inode and filename.
Basic Concepts - Basic Concepts - continuedcontinued
LinksLinksMultiple names associated w/ an inode. Multiple names associated w/ an inode. Hardlinks only for files in same FS. Hardlinks only for files in same FS.
ƒ Not dirs and not cross FS. Not dirs and not cross FS. Symlinks a file that contains a Symlinks a file that contains a filename, can be used for dirs and for filename, can be used for dirs and for cross FS files.cross FS files.
Device Special FilesDevice Special Filesan access point for the device driver.an access point for the device driver.Char mode/ block modeChar mode/ block modeDoc contains some specifics on how to Doc contains some specifics on how to access.access.
VFSVFS
System calls use the VFS so you can System calls use the VFS so you can have any type of FS underneath.have any type of FS underneath.
VFS has a set of funcs that every FS VFS has a set of funcs that every FS must implementmust implementabstracting the interface... abstracting the interface... Good coding maybe an example of Good coding maybe an example of bridge pattern if you feel like being bridge pattern if you feel like being SE450 orientedSE450 oriented
ext2fs standard featuresext2fs standard features
Supports std UNIX filetypes: Supports std UNIX filetypes: regular filesregular filesdirsdirsdev filesdev filessymlinks. symlinks.
4TB limit on FS size.4TB limit on FS size.Long filenamesLong filenamesReserve 5% of blocks for root user Reserve 5% of blocks for root user to prevent procs from filling up FS (of to prevent procs from filling up FS (of course this doesn't work if you are course this doesn't work if you are doing something stupid like running a doing something stupid like running a daemon as root)daemon as root)
ext2fs advanced featuresext2fs advanced features
File attribs on a directory level and File attribs on a directory level and directory inheretance for new files.directory inheretance for new files.
Can force metadata to be written Can force metadata to be written synch to maintain consistency or can synch to maintain consistency or can be done asynchbe done asynch
Users can choose logical block size Users can choose logical block size to trade off between seek time and to trade off between seek time and disk wastage.disk wastage.
Fast Symlinks Fast Symlinks store targetfile name in inode rather store targetfile name in inode rather than in datablockthan in datablock
Tradeoff, filename must be <60 charsTradeoff, filename must be <60 chars
ext2fs advanced featuresext2fs advanced features (continued)(continued)
Clean/not clean flag in superblock Clean/not clean flag in superblock forced checks after certain number of forced checks after certain number of mountsmounts
Source deletion of filesSource deletion of filesrandom data overwrite of file when random data overwrite of file when deleteddeleted
can be enabled or disabledcan be enabled or disabledImmutable files Immutable files Append-only files (for logs)Append-only files (for logs)
ext2fs physical structureext2fs physical structure
Block GroupsBlock Groupsa block is a contiguous chunk of diska block is a contiguous chunk of diskSuperblock data (for that block) is Superblock data (for that block) is replicated in that block group.replicated in that block group.
ƒ recovery from failure can occur when recovery from failure can occur when superblock is corruptsuperblock is corrupt
ƒ reduces seek time to update control reduces seek time to update control structure data for that block is structure data for that block is proximate to actual data.proximate to actual data.
ext2fs directory ext2fs directory implementationimplementation
Directory is just a file containingDirectory is just a file containingInode identifierInode identifierentry lengthentry length
ƒ entry length is variable to save space entry length is variable to save space in directory entryin directory entry
name lengthname lengthfilenamefilename
ext2fs performance ext2fs performance optimizationsoptimizations
Buffer cache managmentBuffer cache managmentreadaheads...reads several contiguous readaheads...reads several contiguous blocks into buffer cache when one is blocks into buffer cache when one is requested.requested.
AllocationAllocationtries to put both inode and data into tries to put both inode and data into same block.same block.
Preallocates more proximate blocks Preallocates more proximate blocks when allocating a new block.when allocating a new block.
ext2fs libraryext2fs library
Routines for programs to bypass VFS Routines for programs to bypass VFS and access ext2fs directlyand access ext2fs directlyOpen and close FS Open and close FS read and write bitmapsread and write bitmapscreate new FScreate new FScheck bad blocks.check bad blocks.Create and expand directories, Create and expand directories, add and remove dir entriesadd and remove dir entriespath <=> inode resolution.path <=> inode resolution.Scan Inode table read and write Inodes, Scan Inode table read and write Inodes, allocate and dealloc blocks, etc.allocate and dealloc blocks, etc.
ext2fs toolsext2fs tools
tune2fstune2fsTuning, and repair, modifies fs Tuning, and repair, modifies fs configuration.configuration.
e2fscke2fsckscans disk and checks it for bad inodesscans disk and checks it for bad inodescompiles new bitmapscompiles new bitmapsfixes datablocks claimed by multiple fixes datablocks claimed by multiple inodesinodes
directory validity checkdirectory validity checklink count for inodeslink count for inodes
debugfsdebugfsan interactive interface to the ext2fs an interactive interface to the ext2fs library.library.
PerformancePerformance
Block IO - Better than FFS, Block IO - Better than FFS, Character IO - Worse than FFS, Character IO - Worse than FFS, Generally better than Xiafs and FFS.Generally better than Xiafs and FFS.