Upload
austin-benson
View
78
Download
3
Embed Size (px)
Citation preview
Silent error resiliencein numerical time-stepping schemes
Austin Benson*Institute for Computational and Mathematical Engineering
Stanford University
Sven Schmit* (ICME) and Rob Schreiber (HP Labs)
SIAM PP 2014
* work done while interning at HP Labs
February 19, 2014
Illustrative example 2
Crank−Nicolson Solution
i∆t
x
0 0.2 0.4 0.6 0.8 1
0
0.5
1
1.5
20 50 100 150 200
10−7
10−6
10−5
10−4
10−3
10−2
Di
i
Richardson / Crank−Nicolson
forward / backward Euler
ut =1
100uxx + 0.1 (sin(2πt) + cos(2πx))
t ∈ [0, 2], x ∈ [0, 1]
u(x , 0) = x(x − 1)
∆x = 1/160,∆t = 1/100
Illustrative example: what’s at fault? 3
Crank−Nicolson Solution
i∆t
x
0 0.2 0.4 0.6 0.8 1
0
0.5
1
1.5
20 50 100 150 200
10−7
10−6
10−5
10−4
10−3
10−2
Di
i
Richardson / Crank−Nicolson
forward / backward Euler
I At step 120, multiplied single entry in RHS of Crank-Nicolsonand Backward Euler linear solves by 0.995
Main idea 4
0 50 100 15010
−10
10−8
10−6
10−4
10−2
iteration (i)
Di
RK 4/5 differences
I At each time step, base method B generates B1,B2, . . .
I Auxiliary method A “checks” with A1,A2, . . .
I Di = ||Bi − Ai || abnormal → possible error
Main idea 4
0 50 100 15010
−10
10−8
10−6
10−4
10−2
iteration (i)
Di
RK 4/5 differences
I At each time step, base method B generates B1,B2, . . .
I Auxiliary method A “checks” with A1,A2, . . .
I Di = ||Bi − Ai || abnormal → possible error
Main idea 4
0 50 100 15010
−10
10−8
10−6
10−4
10−2
iteration (i)
Di
RK 4/5 differences
I At each time step, base method B generates B1,B2, . . .
I Auxiliary method A “checks” with A1,A2, . . .
I Di = ||Bi − Ai || abnormal → possible error
What are these things? 5
0 50 100 15010
−10
10−8
10−6
10−4
10−2
iteration (i)
Di
RK 4/5 differences
I Base method B: higher-order scheme (Runge-Kutta 5)
I Auxiliary method A “checks”: lower-order scheme(Runge-Kutta 4)
I Want A needs to be cheap: embedded pairs
[Fehlberg, 1969], [Dormand and Prince, 1980]
What are these things? 5
0 50 100 15010
−10
10−8
10−6
10−4
10−2
iteration (i)
Di
RK 4/5 differences
I Base method B: higher-order scheme (Runge-Kutta 5)
I Auxiliary method A “checks”: lower-order scheme(Runge-Kutta 4)
I Want A needs to be cheap: embedded pairs
[Fehlberg, 1969], [Dormand and Prince, 1980]
What are these things? 5
0 50 100 15010
−10
10−8
10−6
10−4
10−2
iteration (i)
Di
RK 4/5 differences
I Base method B: higher-order scheme (Runge-Kutta 5)
I Auxiliary method A “checks”: lower-order scheme(Runge-Kutta 4)
I Want A needs to be cheap: embedded pairs
[Fehlberg, 1969], [Dormand and Prince, 1980]
RK 1/2 A/B scheme 6
ODE: u′ = f (t, u).
kB1 = f (tn, uBn )
uBn+1 = uBn + hf(tn + h/2, uBn + hkB1 /2
)Re-use data!
uAn+1 = uBn + hkB1
Dn+1 = ‖uAn+1 − uBn+1‖
Forward / Backward Euler A/B scheme 7
Want to solve: ut = kuxx (1D)
AUBn+1 = UBn
Re-use data!
UAn+1 = BUBn
Dn+1 = ‖UBn+1 − UAn+1‖
Lots of these schemes 8
I Backward / Forward Euler, Richardson / Crank-Nicolson
I Runge-Kutta 2/3, 4/5
I Adams-Bashforth linear multistep method 2/3, 4/5
I Explicit check on implicit scheme
I Extrapolation
I Key idea: Auxiliary method A re-uses data andcommunication from base method B
Lots of these schemes 8
I Backward / Forward Euler, Richardson / Crank-Nicolson
I Runge-Kutta 2/3, 4/5
I Adams-Bashforth linear multistep method 2/3, 4/5
I Explicit check on implicit scheme
I Extrapolation
I Key idea: Auxiliary method A re-uses data andcommunication from base method B
Detecting errors 9
Crank−Nicolson Solution
i∆t
x
0 0.2 0.4 0.6 0.8 1
0
0.5
1
1.5
20 50 100 150 200
10−7
10−6
10−5
10−4
10−3
10−2
Di
i
Richardson / Crank−Nicolson
forward / backward Euler
I Exercise in step detection
Detecting errors 10
Crank−Nicolson Solution
i∆t
x
0 0.2 0.4 0.6 0.8 1
0
0.5
1
1.5
20 50 100 150 200
10−7
10−6
10−5
10−4
10−3
10−2
Di
i
Richardson / Crank−Nicolson
forward / backward Euler
Dn+1 = ‖An+1 − Bn+1‖∞
Jn+1 =Dn+1 − Dn
Dn, relative jump
Vn+1 =Var(Dn−p+1, . . . ,Dn+1)
Var(Dn−p, . . . ,Dn), variance change
I p = 10 is usually good
Error detection algorithm 11
input : tolerances τJ and τV , scaling parameters Γ > 1, γ < 1for n = 1, 2, . . . do
Dn+1 := ‖An+1 − Bn+1‖if Jn+1 > τJ and Vn+1 > τV then
FlagError()
Move back in timeendif Jn+1 > τJ then τJ := ΓτJ else τJ := γτJif Vn+1 > τV then τV := ΓτV else τV := γτV
end
I Γ = 1.4, γ = 0.95
Which errors matter? 12
I Bn and An are the outputs of B and A when a fault is injected
I Bn and An are the outputs when no fault is injected
Local truncation error-normalized error:
Ln =‖Bn − Bn‖‖Bn − An‖
≈ Difference caused by error
local truncation error
Experimental setup 13
Crank−Nicolson Solutioni∆
t
x
0 0.2 0.4 0.6 0.8 1
0
0.5
1
1.5
20 50 100 150 200
10−7
10−6
10−5
10−4
10−3
10−2
Di
i
Richardson / Crank−Nicolson
forward / backward Euler
I Solve equation and artificially inject error at one time step
I Do this for many trials with different types of errors
I True positive rate: #(real errors detected) / #(trials)
I False positive rate: #(non-errors “detected”) / #(time steps)
Experimental setup 13
Crank−Nicolson Solutioni∆
t
x
0 0.2 0.4 0.6 0.8 1
0
0.5
1
1.5
20 50 100 150 200
10−7
10−6
10−5
10−4
10−3
10−2
Di
i
Richardson / Crank−Nicolson
forward / backward Euler
I Solve equation and artificially inject error at one time step
I Do this for many trials with different types of errors
I True positive rate: #(real errors detected) / #(trials)
I False positive rate: #(non-errors “detected”) / #(time steps)
Experimental setup 13
Crank−Nicolson Solutioni∆
t
x
0 0.2 0.4 0.6 0.8 1
0
0.5
1
1.5
20 50 100 150 200
10−7
10−6
10−5
10−4
10−3
10−2
Di
i
Richardson / Crank−Nicolson
forward / backward Euler
I Solve equation and artificially inject error at one time step
I Do this for many trials with different types of errors
I True positive rate: #(real errors detected) / #(trials)
I False positive rate: #(non-errors “detected”) / #(time steps)
Experimental setup 13
Crank−Nicolson Solutioni∆
t
x
0 0.2 0.4 0.6 0.8 1
0
0.5
1
1.5
20 50 100 150 200
10−7
10−6
10−5
10−4
10−3
10−2
Di
i
Richardson / Crank−Nicolson
forward / backward Euler
I Solve equation and artificially inject error at one time step
I Do this for many trials with different types of errors
I True positive rate: #(real errors detected) / #(trials)
I False positive rate: #(non-errors “detected”) / #(time steps)
Heat equation 14
I ut = 0.001uxx + (1−√
1− 4(t − t2))/(2− 2t)
I u(x , 0) = 6|x − 1/2| − 3
I Error:Multiply entry of RHS in linear solves by z ∼ N(1, 5e-5)at a single time step
1 2 3 4 5 60
0.2
0.4
0.6
0.8
1
LTE−normalized Error
Tru
e p
ositiv
e r
ate
FE/BE, ∆x = 1 / 200, ∆t = 1 / 100
FPR = 0.000
Detected at step of fault
Detected at step or step after fault
1 2 3 4 5 60
0.2
0.4
0.6
0.8
1
LTE−normalized Error
Tru
e p
ositiv
e r
ate
R/CN, ∆x = 1 / 200, ∆t = 1 / 100
FPR = 0.012
Heat equation 15
I ut = 0.01uxx + q(x , t), q(x , t) = xe−t/2
I u(x , 0) = 4x(x − 1)(x − 2)
I Error:Multiply q(x , t) at one discrete x by z ∼ N(1, 0.1)at a single time step
0.5 1 1.5 2 2.5 3 3.5 40
0.2
0.4
0.6
0.8
1
LTE−normalized Error
Tru
e p
ositiv
e r
ate
FE/BE, ∆x = 1 / 100, ∆t = 1 / 60
FPR = 0.000
Detected at step of fault
Detected at step or step after fault
0.5 1 1.5 2 2.5 3 3.5 40
0.2
0.4
0.6
0.8
1
LTE−normalized Error
Tru
e p
ositiv
e r
ate
R/CN, ∆x = 1 / 100, ∆t = 1 / 60
FPR = 0.000
Adams-Bashforth 16
I u′′
(t)− b(1− u(t)2)u′(t) + u(t) = 0
I u′(0) = 1, u(0) = 0
I Error:Multiply one derivative evaluation by z ∼ N(1, 0.1)
100
101
102
103
0
0.2
0.4
0.6
0.8
1
LTE−normalized Error
Tru
e p
ositiv
e r
ate
AB23 on Van der Pol with h = 1 / 20, b = 2
FPR = 0.037
Detected at step of fault
Detected at step or step after fault
100
101
102
103
0
0.2
0.4
0.6
0.8
1
LTE−normalized Error
Tru
e p
ositiv
e r
ate
AB45 on Van der Pol with h = 1 / 20, b = 2
FPR = 0.052
Runge-Kutta 17
I u′′
(t)− b(1− u(t)2)u′(t) + u(t) = 0
I u′(0) = 1, u(0) = 0
I Error:Multiply one derivative evaluation by z ∼ N(1, 0.1)
100
101
102
103
0
0.2
0.4
0.6
0.8
1
LTE−normalized Error
Tru
e p
ositiv
e r
ate
RK23 on Van der Pol with h = 1 / 10, b = 2
FPR = 0.066
100
101
102
103
0
0.2
0.4
0.6
0.8
1
LTE−normalized Error
Tru
e p
ositiv
e r
ate
RK45 on Van der Pol with h = 1 / 10, b = 2
FPR = 0.098
Key ideas 18
Key ideas:
I Take advantage of “paired” solvers to check solutions
I High-impact error → easier to detect
I Simple detection scheme work pretty well
End 19
Questions? Samples:
I What is the performance penalty?
I Why does detection occur one step after the fault?
Information:
I Austin Benson: [email protected]
I Pre-print: see http://stanford.edu/~arbenson
Tardy error detection 20
128 130 132 134 136 138 1402.8
3
3.2
3.4
3.6
3.8
4x 10
−5
Time step (i)
Di
Tardy error detection on heat equation
FE/BE difference
Step of fault
0 20 40 60 80 1000
0.5
1
1.5
2
2.5
3
3.5
4x 10
−5
i (vector component)
|BE
(i)
− F
E(i)|
Component−wise absolute difference BE/FE
Step before fault
Step of fault
Step after fault