ANATOMY OF LINUX JOURNALING FILE SYSTEMS
M. Tim JonesEmulex
Overview
• Paper surveys past and current Linux JFS
• Presents three modes of operation– Writeback mode– Ordered mode– Data mode
• Discusses tail packing
History
• IBM's JFS– First released in 1990– Updated since then (JSF2)
• Silicon Graphics' XFS– Released in 1994– Ported into Linux in 1994
History
• Smart FS– Developed for the Amiga– Supported by Linux until 2005
• ext3fs– Most commonly used– Extension of ext2 with journaling– Supported by Linux since 2001
History
• Reiser File System– Introduced many new features– Author now serves 15 years to life sentence
for second-degree homicide• Killed his estranged wife• Plea-bargained his first-degree homicide conviction
Variation on Journaling• Writeback mode:
– Only journals metadata– Makes no guarantee that data updates will be
written to disk before associated metadata are marked as committed
• Ordered mode: – Makes that guarantee
• Data mode:– Journals metadata and data updates
Writeback mode issues
• Metadata can be marked as committed before the data they point to are written to disk– File system can be corrupted if the system
crashes • After some metadata are marked as
committed • Before the data they point to are written to
disk
Writeback mode issues
Committed i-node
Block BDue to a crash,block B was neverwritten to disk
Data mode issues
• Most reliable• Slowest:
– All data must be written twice
JFS2
• Supports– Ordered journaling– Extent-based allocation:
• Allocates contiguous sets of blocks–Better read and write performance–Metadata are only updated for the
extent
JFS2
• Uses B+ trees for– Fast directory lookups– Managing extent descriptors
• Has no internal journal commit policy– Relies on timeouts of kupdate daemon
• Daemon that periodically writes modified buffers to disk
–Typically every five seconds
XFS
• Supports full 64-bit addressing• Uses B+ trees for both directories and file
allocation• Uses extent-based allocation with variable block
size support (512 B to 64 KB)• Uses delayed allocation for extents
– Extent is not allocated until blocks are ready to be written on disk
Extent-based allocation
• When a process creates a file, the file system allocates a set of contiguous physical blocks to the file– Improves access times for large files– Reduces file fragmentation
• Large files can occupy multiple extents– ext4 extents can go up to 128 MB with
a 4 KB block size
Ext3fs
• Compatible with non-journaling ext2 FS• Supports
– Writeback– Ordered– Journal
data journaling modes• Does not support extents
– Not as fast as JFS, XFS and Reiser FS
A parenthesis: Ext2
• Essentially analogous to the UNIX fast file system we have discussed– Fifteen block addresses per i-node– Cylinder groups are called block groups
• Major differences include– Larger maximum file size: 16 GB to 2 TB– Various extensions
• Online compression, full ACLs, …
ReiserFS
• Introduced in 2001• Now dead • Default mode is ordered• Includes tail packing
– Uses empty space at the end of large files– Reduces internal fragmentation
Tail packing
• Also known as tail merging• Tail here refers to the last block of a file
– Rarely full• Tail packing stores in the same block
– Tails of several files– Very small files
• Reduces internal fragmentation• Adds complexity
Without tail packing
File A
File B
File C Too much wasted space
With tail packing
File A
File B
File C Shares last block of file A
Now occupies a single block
Reiser 4
• Was designed from scratch• Was to use
– Wandering logs– Delayed allocation of extents
• As in XFS
Ext4fs (I)
• Evolution from ext3fs– Can mount an ext4fs partition as ext3fs or
an ext3fs partition as ext4fs• 64-bit file system
– 48-bit block addresses• Can support very large volumes
– One exabyte, that is, 230 gigabytes!– Very large files (16 terabytes)
Ext4fs (II)
• Can support extents– Becomes then incompatible with ext3fs
• Uses delayed extent allocation– Reduces file fragmentation
• Especially when file grows• Checksums contents of journal
– More reliable
Ext4fs (III)
• Uses H-trees instead of B+ or B* trees for indexes
• Includes an online defragmenting tool– e4defrag– Can defragment individual files or entire file
systems• Minimum timestamp resolution is one ns
Conclusions
• Journaling file systems– Protect data against computer crashes and
power failures– Allow faster file system recovery after a crash
• No need to fschk the whole file system– Have become the new standard