34
Diskless Checkpointing 15 Nov 2001

Diskless Checkpointing

  • Upload
    mayes

  • View
    49

  • Download
    0

Embed Size (px)

DESCRIPTION

Diskless Checkpointing. 15 Nov 2001. Motivation. Checkpointing on Stable Storage Disk access is a major bottleneck! Incremental Checkpointing Copy-on-write Compression Memory Exclusion Diskless Checkpointing. Diskless?. Extra memory is available (e.g. NOW) Use memory instead of disk - PowerPoint PPT Presentation

Citation preview

Page 1: Diskless Checkpointing

Diskless Checkpointing

15 Nov 2001

Page 2: Diskless Checkpointing

Motivation

Checkpointing on Stable Storage Disk access is a major bottleneck!

Incremental Checkpointing Copy-on-write Compression Memory Exclusion Diskless Checkpointing

Page 3: Diskless Checkpointing

Diskless?

Extra memory is available (e.g. NOW) Use memory instead of disk

Good: Network Bandwidth > Disk Bandwidth

Bad: Memory is not stable

Page 4: Diskless Checkpointing

Bottom-line

NOW with (n+m) processors The application runs on exactly n procs,

and should proceed as long as The number of processors in the system is at least n The failures occur within certain constraint

AvailableProcessors (n+m)

ApplicationProcessors (n)

ChkpntProcessors (m)

Page 5: Diskless Checkpointing

Overview

Coordinated Chkpnt (Sync-and-Stop)

To checkpoint, Application Proc: Chkpnt the state in memory Chkpnt Proc: Encoding the application chkpnts and

storing the encodings in memory

To recover, Non-failed Procs: Roll-back Replacement processors are chosen. Replacement Proc: Calculate the chkpnts of the failed

procs using other chkpnts & encodings

Page 6: Diskless Checkpointing

Outline

Application Processor Chkpnt Disk-based Diskless

Incremental Forked (or copy-on-write)

Optimization

Encoding the chkpnts Parity (RAID level 5) Mirroring 1-Dimensional Parity 2-Dimensional Parity Reed-Solomon CodingReed-Solomon Coding Optimization

Result

Page 7: Diskless Checkpointing

Application Processor Chkpnt

Goal

The processor should be able to roll back to its most recent chkpnt.

Need to tolerate failures when chkpnt Make sure that each coordinated chkpnt

remains valid until the next coordinated chkpnt has been completed.

Page 8: Diskless Checkpointing

Disk-based Chkpnt

To chkpnt Save all values in the stack,

heap, and registers to disk To recover

Overwrites the address space with the stored checkpoint

Space Demands 2M in disk

(M: the size of an application processor’s address space)

Page 9: Diskless Checkpointing

Simple Diskless Chkpnt

To chkpnt Wait until encoding calculated Overwrite diskless chkpnts in

memory To recover

Roll-backed from in-memory chkpnts

Space Demands Extra M in memory

(M: the size of an application processor’s address space)

Page 10: Diskless Checkpointing

Incremental Diskless Chkpnt

To chkpnt Initially set all pages R_ONLY On page fault, copy & set RW

To recover Restore all RW pages

Space Demands Extra I in memory

(I: the incremental chkpnt size)

Page 11: Diskless Checkpointing

Forked Diskless Chkpnt

To chkpnt Application clones itself

To recover Overwrites state with clone’s Or clone assumes the role of

the application Space Demands

Extra 2I in memory

(I: the incremental chkpnt size)

Page 12: Diskless Checkpointing

Optimizations

Breaking the chkpnt into chunks Efficient use of memory

Sending Diffs (Incremental) Bitwise xor of the current copy and chkpnt copy Unmodified pages need not be sent

Compressing Diffs Unmodified regions of memory

Page 13: Diskless Checkpointing

Application Processor Chkpnt (review)

Simple Diskless Chkpnt:Extra M in memory

Incremental Diskless Chkpnt:Extra I in memory

Forked Diskless Chkpnt:Extra 2I in memory, less CPU activity

Optimizations:Chkpnt into chunks, diffs, and compressed diffs

Page 14: Diskless Checkpointing

Encoding the chkpnts

Goal

Extra chkpnt processors should store enough information that the chkpnts of failed processors may be reconstructed.

Notation: Number of chkpnt processors (m) Number of application processors (n)

Page 15: Diskless Checkpointing

To chkpnt,

On failure of ith proc,

Can tolerate: Only one processor failure

Remarks: Chkpnt processor is a bottleneck of

communication and computation

Parity (RAID level 5, m=1)

ApplicationProcessor

ChkpntProcessor

jib

j-th byte ofApplication processor i

jb1jb2

jb3jb4

jckpb

jn

jjjckp bbbb ...21Example

n=4, m=1

jckp

jn

ji

ji

jji bbbbbb ...... 111

Page 16: Diskless Checkpointing

Mirroring (m=n)

ApplicationProcessor

ChkpntProcessor

jib

j-th byte ofApplication processor i

jb1jb2

jb3jb4

jckpb 1

Examplen=m=4

jckpb 2

jckpb 3

jckpb 4

To chkpnt,

On failure of ith proc,

Can tolerate: Up to n processor failures Except the failure of both an application

processor and its checkpoint processor Remarks:

Fast, no calculation needed

ji

jckpi bb

jckpi

ji bb

Page 17: Diskless Checkpointing

1-Dimensional Parity (1<m<n)

ApplicationProcessor

ChkpntProcessor

jib

j-th byte ofApplication processor i

jb1jb2

jb3jb4

jckpb 1

Examplen=4, m=2

jckpb 2

To chkpnt, Application processors are partitioned

into m groups. ith chkpnt processor calculates the parity

of the chkpnts in group i On failure of ith proc,

Same as in Parity encoding

Can tolerate: One processor failure per group

Remarks: More efficient in communication and

computation

Page 18: Diskless Checkpointing

2-Dimensional Parity

ApplicationProcessor

ChkpntProcessor

jib

j-th byte ofApplication processor i

Examplen=4, m=4

To chkpnt, Application processors are arranged

logically in a two-dimensional grid Each chkpnt processor calculates the

parity of the row or the column On failure of ith proc,

Same as in Parity encoding

Can tolerate: Any two-processor failures

Remarks: Multicast

Page 19: Diskless Checkpointing

Reed-Solomon Coding (m)

To chkpnt, Vandermonde matrix F, s.t. f(i,j)=j^(i-1) Use matrix-vector multiplication to calculate chkpnt

To recover, Use Gaussian Elimination

Can tolerate: Any m failures

Remarks: Use Galois Fields to perform arithmetic Computation overhead

Page 20: Diskless Checkpointing

Optimizations

Sending and calculating the encoding in RAID level 5-based encodings (e.g. Parity)

(a) DIRECT: C1 bottleneck (b) FAN-IN: log(n) step

Page 21: Diskless Checkpointing

Encoding the Chkpnts (review)

Parity (RAID level 5, m=1) Only one failure, bottleneck

Mirroring (m=n) Up to n failures (unless both app and chkpnt fail), fast

1-Dimensional Parity One failure per group, more efficient than Parity

2-Dimensional Parity Any two failures, comm overhead w/o multicast

Reed-Solomon Coding Any m failures, computation overhead

DIRECT vs. FAN-IN

Page 22: Diskless Checkpointing

Testing Applications (1)

CPU-Intensive parallel programs Instances that took 1.5~2 hrs on 16 processors

NBODY : N-body interactions among particles in a system Particles are partitioned among processors Location field of each particle is updated Expectation:

Poor with incremental chkpnt Good with diff-based compression

MAT : FP matrix product of two square matrices (Cannon’s alg.) All three matrices are partitioned in square blocks among processors In each step, adds the product and passing the input submatrices Expectation:

Incremental chkpnt Very poor with diff-based compression

Page 23: Diskless Checkpointing

Testing Applications (2) PSTSWM : Nonlinear shallow water equations on a rotating sphere

Majority pages, but only few bytes per page are modified Expectation:

Poor with incremental chkpnt Good with diff-based compression

CELL : Parallel cellular automaton simulation program Two (sparse) grids of cellular automata (current/next) Expectation:

Poor with incremental chkpnt Good with compression

PCG : Ax=b for a large, sparse matrix First, converted to a small, dense format Expectation:

Incremental chkpnt Very poor with diff-based compression

Page 24: Diskless Checkpointing

Diskless Checkpointing

20 Nov 2001

Page 25: Diskless Checkpointing

Disk-based vs. Diskless Chkpnt

 

Disk-based DisklessWhere to chkpnt? In stable storage In local memory

How to recover? Restore from stable storage Re-calculate

Remarks Can tolerate whole failure Cannot tolerate whole failure

Low BW to stable storage Memory is much faster

    Encoding (+communication) overhead

Page 26: Diskless Checkpointing

Recalculate the lost chkpnt?

Error Detection & Correctionin Digital Communication

Chkpnt Recoveryin Diskless Chkpnt

1-bit Parity (m=1)

Mirroring (m=n)

Remarks-Difference: we can easily know that which node is wrong in chkpnt system.-Some codings can be used to recover from errors in Digital Comm, too. (e.g. Reed-Solomon)

11001011[1] (right)11000011[1] (detectable)11001011[0] (detectable)11000011[0] (oops)

11001011[1] (chkpnt)1100X011[1] (tolerable)11001011[X] (tolerable)1100X011[X] (intolerable)

11001011[11001011] (right)11001011[11001010] (detectable)11001011[00111100] (detectable)11001010[11001010] (oops)

11001011[11001011] (right)11001011[1100101X] (tolerable)11001011[XXXXXXXX] (tolerable)1100101X[1100101X] (intolerable)

Page 27: Diskless Checkpointing

Performance

Criteria Latency: time between chkpnt initiated and ready for recovery Overhead: increase in execution time with chkpnt

Applications

NBODY N-body interactionsPSTSWM Simulation of the states on 3-D systemCELL Parallel cellular automaton

MAT FP Matrix multiplication (Canon’s)PCG PCG for sparse matrix

Majority pages, but only few bytes per page are modified

Only small parts are updated, but updated in their entirety

App Description Pattern

Page 28: Diskless Checkpointing

Implementation

BASE : No chkpnt DISK-FORK : Disk-based chkpnt w/ fork()

SIMP : Simple diskless INC : Incremental diskless FORK : Forked diskless INC-FORK : Incremental, forked diskless

C-SIMP : w/ diff-based compression C-INC C-FORK C-INC-FORK

Page 29: Diskless Checkpointing

Experiment Framework

Network of 24 Sun Sparc5 w/s connected to each other by a fast, switched Ethernet: ~ 5MB/s

Each w/s has 96MB of physical memory 38MB of local disk storage

Disks with bandwidth of 1.7MB/s are connected via Ethernet, and NFS on Ethernet achieved a bandwidth of 0.13 MB/s

Latency: time between chkpnt initiated and ready for recovery Overhead: increase in execution time with chkpnt

Page 30: Diskless Checkpointing
Page 31: Diskless Checkpointing
Page 32: Diskless Checkpointing

Discussion

Latency: diskless has much lower latency than disk-based. Lowers the expected running time of the application in the

presence of failures (has small recovery time) Overhead: comparable…

Page 33: Diskless Checkpointing

Recommendations

DISK-FORK: If chkpnt are small If the likelihood of wholesale system failures are high

C-FORK: If many pages, but a few bytes per page are modified

INC-FORK: If not a significant number of pages are modified

Page 34: Diskless Checkpointing

Reference

J. S. Plank, K. Li, and M.A. Puening. "Diskless checkpointing." IEEE Transactions on Parallel & Distributed Systems, 9(10):972—986, Oct. 1998