28
Swaminathan Sundararaman , Sriram Subramanian, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Michael M. Swift

Why panic () ? Improving Reliability through Restartable File Systems

  • Upload
    betrys

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Swaminathan Sundararaman , Sriram Subramanian, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Michael M. Swift. Why panic () ? Improving Reliability through Restartable File Systems. Data Availability. Slave Nodes. GFS Maste r. GFS Maste r. Slave Nodes. - PowerPoint PPT Presentation

Citation preview

Page 1: Why  panic () ?  Improving Reliability through Restartable File Systems

Swaminathan Sundararaman, Sriram Subramanian, Abhishek Rajimwale, Andrea C.

Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Michael M. Swift

Page 2: Why  panic () ?  Improving Reliability through Restartable File Systems

Applications require data Use FS to reliably store data

Both hardware and software can fail

Typical Solution Large clusters for availability Reliability through replication

2

GFS MasterGFS

Master

Sla

ve

Nod

esS

lave

N

odes

Page 3: Why  panic () ?  Improving Reliability through Restartable File Systems

OS

FS

Replication infeasible for desktop environments

Wouldn’t RAID work? Can only tolerate H/W failures

FS crash are more severe Services/applications are killed Requiring OS reboot and

recovery Need: better reliability in the event of file system failures

3

Raid Controller

Dis

ks

Dis

k

App

App

App

Page 4: Why  panic () ?  Improving Reliability through Restartable File Systems

MotivationBackgroundRestartable file systemsAdvantages and limitationsConclusions

4

Page 5: Why  panic () ?  Improving Reliability through Restartable File Systems

5

Page 6: Why  panic () ?  Improving Reliability through Restartable File Systems

6

int journal_mark_dirty(….){    struct reiserfs_journal_cnode *cn = NULL;    if (!cn) {        cn = get_cnode(p_s_sb);        if (!cn) {            reiserfs_panic(p_s_sb, "get_cnode failed!\n");        }}}

void reiserfs_panic(struct super_block *sb, ...){ BUG();   /* this is not actually called, but makes reiserfs_panic() "noreturn" */    panic("REISERFS: panic %s\n“, error_buf);}

ReiserFS

File systems already detect failures

Recovery: simplified by generic recovery mechanism

Page 7: Why  panic () ?  Improving Reliability through Restartable File Systems

1. Code to recover from all failures Not feasible in reality

2. Restart on failure Previous work have taken this approach

FS need: stateful & lightweightrecovery

7

HeavyweightLightweight

Stat

eles

sSt

atef

ulNooks/Shadow

Xen, MinixL4, Nexus

SafeDriveSingularity

CuriOSEROS

Page 8: Why  panic () ?  Improving Reliability through Restartable File Systems

Goal: build lightweight & stateful solution to tolerate file-system failures

Solution: single generic recovery mechanism for any file system failure

1. Detect failures through assertions2. Cleanup resources used by file system3. Restore file-system state before crash4. Continue to service new file system requests

8

FS Failures: completely transparent to applications

Page 9: Why  panic () ?  Improving Reliability through Restartable File Systems

9

Transparency Multiple applications using FS upon crash Intertwined execution

Fault-tolerance Handle a gamut of failures Transform to fail-stop failures

Consistency OS and FS could be left in an inconsistent state

Page 10: Why  panic () ?  Improving Reliability through Restartable File Systems

FS consistency required to prevent data loss

10

Not all FS support crash-consistency FS state constantly modified by applications

Periodically checkpoint FS state Mark dirty blocks as Copy-On-Write Ensure each checkpoint is atomically written

On Crash: revert back to the last checkpoint

Page 11: Why  panic () ?  Improving Reliability through Restartable File Systems

11

VFS

File System

Application

Epoch 0 Epoch 1

time

chec

kpoi

ntOpen (“file”) write() read()

Completed In-progressLegend: Crash

write()

Periodically create

checkpoints1

Move to recent checkpoint4

Replay completed operations

5

Unwind in-flight

processes3

File System Crash2

Re-execute unwound process

6

1

2

4

5

6

write() Close()3

Page 12: Why  panic () ?  Improving Reliability through Restartable File Systems

File systems constantly modified Hard to identify a consistent recovery

point

Naïve Solution: Prevent any new FS operation and call sync Inefficient and unacceptable overhead

12

Page 13: Why  panic () ?  Improving Reliability through Restartable File Systems

13

VFS

File System

Page Cache

Disk

App

App

App

File Systems write to disk through Page Cache

All requests go through the VFS layer

ext3VFA

T Control requests to FS and dirty pages to disk

Page 14: Why  panic () ?  Improving Reliability through Restartable File Systems

14

VFS

File System

Page Cache

Disk

App

VFS

File System

Page Cache

App

Disk

Regular

VFS

File System

Page Cache

App

Disk

STOP STOP

Membrane

11

Page 15: Why  panic () ?  Improving Reliability through Restartable File Systems

Have built-in crash consistency mechanism Journaling or Snapshotting

Seamlessly integrate with these mechanism Need FSes to indicate beginning and end of

an transaction Works for data and ordered journaling mode Need to combine writeback mode with COW

15

Page 16: Why  panic () ?  Improving Reliability through Restartable File Systems

Log operations at the VFS level Need not modify existing file systems

Operations: open, close, read, write, symlink, unlink, seek, etc. Read:

Logs are thrown away after each checkpoint

What about logging writes?16

Page 17: Why  panic () ?  Improving Reliability through Restartable File Systems

Mainly used for replaying writesGoal: Reduce the overhead of

logging writes Soln: Grab data from page cache during

recovery

17

VFS

File System

Page Cache

VFS

File System

Page Cache

VFS

File System

Page Cache

Write (fd, buf, offset, count)

Page 18: Why  panic () ?  Improving Reliability through Restartable File Systems

18

Page 19: Why  panic () ?  Improving Reliability through Restartable File Systems

19

Page 20: Why  panic () ?  Improving Reliability through Restartable File Systems

Setup

20

Page 21: Why  panic () ?  Improving Reliability through Restartable File Systems

21

Page 22: Why  panic () ?  Improving Reliability through Restartable File Systems

22

Page 23: Why  panic () ?  Improving Reliability through Restartable File Systems

Restart ext2 during random-read micro benchmark

23

Page 24: Why  panic () ?  Improving Reliability through Restartable File Systems

Data (Mb)

Recovery Time (ms)

10 12.920 13.240 16.1

24

Page 25: Why  panic () ?  Improving Reliability through Restartable File Systems

Improves tolerance to file system failures Build trust in new file systems (e.g., ext4, btrfs)

Quick-fix bug patching Developer transform corruptions to restart Restart instead of extensive code restructuring

Encourage more integrity checks in FS code Assertions could be seamlessly transformed to

restart File systems more robust to failures/crashes

25

Page 26: Why  panic () ?  Improving Reliability through Restartable File Systems

Only tolerate fail-stop failures Not address-space based Faults could corrupt other kernel components

FS restart may be visible to application e.g., Inode numbers could be changed after

restart

26

VFS

File System

Application

Epoch 0After Crash RecoveryBefore Crash

Epoch 0

create (“file1”) stat (“file1”) write (“file1”, 4k)

File : file1Inode# : 15

create (“file1”) stat (“file1”)write (“file1”, 4k)

File1: inode# 12

File1: inode# 15

Inode# Mismatch

File : file1Inode# : 12

Page 27: Why  panic () ?  Improving Reliability through Restartable File Systems

Failures are inevitable in file systems Learn to cope and not hope to avoid them

Generic recovery mechanism for FS failures Improves FS reliability availability of

data Users: Install new FSes with confidence Developers: Ship FS faster; as not all

exception cases are now show-stoppers27

Page 28: Why  panic () ?  Improving Reliability through Restartable File Systems

Questions and Comments

28

Advanced Systems Lab (ADSL)University of Wisconsin-Madison

http://www.cs.wisc.edu/adsl