ANATOMY OF LINUX JOURNALING FILE SYSTEMS

ANATOMY OF LINUX JOURNALING FILE SYSTEMS

M. Tim JonesEmulex

Overview

• Paper surveys past and current Linux JFS

• Presents three modes of operation– Writeback mode– Ordered mode– Data mode

• Discusses tail packing

History

• IBM's JFS– First released in 1990– Updated since then (JSF2)

• Silicon Graphics' XFS– Released in 1994– Ported into Linux in 1994

History

• Smart FS– Developed for the Amiga– Supported by Linux until 2005

• ext3fs– Most commonly used– Extension of ext2 with journaling– Supported by Linux since 2001

History

• Reiser File System– Introduced many new features– Author now serves 15 years to life sentence

for second-degree homicide• Killed his estranged wife• Plea-bargained his first-degree homicide conviction

Variation on Journaling• Writeback mode:

– Only journals metadata– Makes no guarantee that data updates will be

written to disk before associated metadata are marked as committed

• Ordered mode: – Makes that guarantee

• Data mode:– Journals metadata and data updates

Writeback mode issues

• Metadata can be marked as committed before the data they point to are written to disk– File system can be corrupted if the system

crashes • After some metadata are marked as

committed • Before the data they point to are written to

disk

Writeback mode issues

Committed i-node

Block BDue to a crash,block B was neverwritten to disk

Data mode issues

• Most reliable• Slowest:

– All data must be written twice

JFS2

• Supports– Ordered journaling– Extent-based allocation:

• Allocates contiguous sets of blocks–Better read and write performance–Metadata are only updated for the

extent

JFS2

• Uses B+ trees for– Fast directory lookups– Managing extent descriptors

• Has no internal journal commit policy– Relies on timeouts of kupdate daemon

• Daemon that periodically writes modified buffers to disk

–Typically every five seconds

XFS

• Supports full 64-bit addressing• Uses B+ trees for both directories and file

allocation• Uses extent-based allocation with variable block

size support (512 B to 64 KB)• Uses delayed allocation for extents

– Extent is not allocated until blocks are ready to be written on disk

Extent-based allocation

• When a process creates a file, the file system allocates a set of contiguous physical blocks to the file– Improves access times for large files– Reduces file fragmentation

• Large files can occupy multiple extents– ext4 extents can go up to 128 MB with

a 4 KB block size

Ext3fs

• Compatible with non-journaling ext2 FS• Supports

– Writeback– Ordered– Journal

data journaling modes• Does not support extents

– Not as fast as JFS, XFS and Reiser FS

A parenthesis: Ext2

• Essentially analogous to the UNIX fast file system we have discussed– Fifteen block addresses per i-node– Cylinder groups are called block groups

• Major differences include– Larger maximum file size: 16 GB to 2 TB– Various extensions

• Online compression, full ACLs, …

ReiserFS

• Introduced in 2001• Now dead • Default mode is ordered• Includes tail packing

– Uses empty space at the end of large files– Reduces internal fragmentation

Tail packing

• Also known as tail merging• Tail here refers to the last block of a file

– Rarely full• Tail packing stores in the same block

– Tails of several files– Very small files

• Reduces internal fragmentation• Adds complexity

Without tail packing

File A

File B

File C Too much wasted space

With tail packing

File A

File B

File C Shares last block of file A

Now occupies a single block

Reiser 4

• Was designed from scratch• Was to use

– Wandering logs– Delayed allocation of extents

• As in XFS

Ext4fs (I)

• Evolution from ext3fs– Can mount an ext4fs partition as ext3fs or

an ext3fs partition as ext4fs• 64-bit file system

– 48-bit block addresses• Can support very large volumes

– One exabyte, that is, 230 gigabytes!– Very large files (16 terabytes)

Ext4fs (II)

• Can support extents– Becomes then incompatible with ext3fs

• Uses delayed extent allocation– Reduces file fragmentation

• Especially when file grows• Checksums contents of journal

– More reliable

Ext4fs (III)

• Uses H-trees instead of B+ or B* trees for indexes

• Includes an online defragmenting tool– e4defrag– Can defragment individual files or entire file

systems• Minimum timestamp resolution is one ns

Conclusions

• Journaling file systems– Protect data against computer crashes and

power failures– Allow faster file system recovery after a crash

• No need to fschk the whole file system– Have become the new standard

Documents

ANATOMY OF LINUX JOURNALING FILE SYSTEMS