Upload
hoangkhue
View
218
Download
0
Embed Size (px)
Citation preview
Compensated Algorithms
Philippe Langlois, Nicolas Louvet
Equipe DALI, Universite de Perpignan
http://webdali.univ-perp.fr
Compensated Algorithms – N. Louvet April 18, 2007 1 / 33
Motivation
Results computed in floating point arithmetic are possibly corrupted by
rounding errors.
Compensated algorithms = algorithms that correct the rounding errors
generated during the computation:
If r is a computed result, how to find a correcting term c such that
r = r ⊕ c is more accurate than r ?
Aim of this presentation is:
I to recall the principle of so-called compensated algorithms,
I to present some details about the compensated Horner algorithm 1.
Context: IEEE-754 fp arithmetique, rounding to the nearest, no underflow.
1S. Graillat, Ph. Langlois, N. Louvet. Compensated Horner Scheme, Research Report
DALI-LP2A, Universite de Perpignan, aug. 2005 (submitted)Compensated Algorithms – N. Louvet April 18, 2007 2 / 33
Outline
1 Introduction
2 Principle of Compensated Algorithms
3 Sketch of proof for the compensated Horner algorithm
4 Conclusion
5 More slides
Faithful rounding with the CHS
Practical efficiency
Compensated Algorithms – N. Louvet April 18, 2007 3 / 33
Introduction
Why do we need compensated algorithms ?
Compensated Algorithms – N. Louvet April 18, 2007 4 / 33
Backward stable algorithms vs. condition number
Backward stable algorithms:
The accuracy of the computed solution satisfies
accuracy . condition number× u,
where
I u is the computing precision:
IEEE-754 double, 53-bits mantissa, rounding to the nearest ⇒ u = 2−53.
1 1+
u
I the condition number quantify the difficulty to solve the problem accuratly.
Examples: summation, dot product, Horner algorithm, substitution for
triangular system solving.
Compensated Algorithms – N. Louvet April 18, 2007 5 / 33
Accuracy of the Horner scheme
We consider the polynomial
p(x) =n∑
i=0
aixi ,
with ai ∈ F, x ∈ F
Algorithm (Horner scheme)function r0 = Horner (p, x)
rn = an
for i = n − 1 : −1 : 0
r i = r i+1 ⊗ x ⊕ ai
end
Relative accuracy of the evaluation with the Horner scheme:
|Horner (p, x)− p(x)||p(x)|
≤ γ2n︸︷︷︸≈2nu
cond(p, x)
u is the computing precision
cond(p, x) denotes the condition number of the evaluation:
cond(p, x) =
∑|aix
i ||p(x)|
≥ 1
.
Compensated Algorithms – N. Louvet April 18, 2007 6 / 33
Accuracy of the Horner scheme
We consider the polynomial
p(x) =n∑
i=0
aixi ,
with ai ∈ F, x ∈ F
Algorithm (Horner scheme)function r0 = Horner (p, x)
rn = an
for i = n − 1 : −1 : 0
r i = r i+1 ⊗ x ⊕ ai
end
Relative accuracy of the evaluation with the Horner scheme:
|Horner (p, x)− p(x)||p(x)|
≤ γ2n︸︷︷︸≈2nu
cond(p, x)
u is the computing precision
cond(p, x) denotes the condition number of the evaluation:
cond(p, x) =
∑|aix
i ||p(x)|
≥ 1
.Compensated Algorithms – N. Louvet April 18, 2007 6 / 33
Accuracy . condition number of the problem × u
1e-18
1e-16
1e-14
1e-12
1e-10
1e-08
1e-06
1e-04
0.01
1
100000 1e+10 1e+15 1e+20 1e+25
rela
tive
forw
ard
erro
r
condition number
Accuracy of the polynomial evaluation [n=50]
u
1/u
γ2n cond
How to manage ill-conditioned cases?
Compensated Algorithms – N. Louvet April 18, 2007 7 / 33
Compensated algorithms
Algorithms that correct the generated rounding errors
Many examples:
I Compensated summation: Neumaier (74), Sum2 in Ogita-Rump-Oishi (05)
I Compensated Horner algorithm
I Compensated substitution for triangular system solving
Accuracy as if computed in twice the working precision:
accuracy . u + condition number× u2
More efficient than fixed length expansion libraries (double-double)
Not considered here: Kahan’s compensated summation (65), Priest’s
doubly compensated summation (92)
Compensated Algorithms – N. Louvet April 18, 2007 8 / 33
Accuracy of the result . u + condition number× u2.
1e-18
1e-16
1e-14
1e-12
1e-10
1e-08
1e-06
1e-04
0.01
1
100000 1e+10 1e+15 1e+20 1e+25 1e+30 1e+35
rela
tive
forw
ard
erro
r
condition number
Accuracy of polynomial evaluation with the compensated Horner scheme [n=50]
u
1/u 1/u2
u + γ2n2 cond
How is computed the compensated result?
Compensated Algorithms – N. Louvet April 18, 2007 9 / 33
Principle of compensated Algorithms
The compensated result is r = r ⊕ c . The correcting term c is
an approximate of the forward error in r ,
computed thanks to “Error-Free Transformations”.
Compensated Algorithms – N. Louvet April 18, 2007 10 / 33
Compensated result
x
Input space D Output space R
Forward error
G
G
r = G(x)
r = G(x)
Forward error analysis
Compensated Algorithms – N. Louvet April 18, 2007 11 / 33
Compensated result
x
Input space D Output space R
x + ∆x
Backward errorForward error
G
G
G
r = G(x)
r = G(x)
Forward error analysis
Backward error analysis
Identify r as the exact solution of a perturbed problem:
r = G (x + ∆x)
Compensated Algorithms – N. Louvet April 18, 2007 11 / 33
Compensated result
x
r
r
r + c
c
c
G
G
Input space D Output space Rr
G
Backward error
x + ∆x
How to improve the quality of the computed result r ?
First, compute an approximate c of the forward error c = r − r .
c is a correcting term for r .
Then, a compensated result r = r ⊕ c .
Compensated Algorithms – N. Louvet April 18, 2007 11 / 33
How to compute the correcting term c?
c is an approximate of the forward error c = r − r
Classical error analysis:
Let p(x) =∑n
i=0 aixi be a polynomial with floating point coefficient.
I Backward error result: Horner algorithm computes
Horner (p, x) =nX
i=0
(1 + δi )aixi , with |δi | ≤ γ2n ≈ 2nu
I Forward error result:
c = p(x)− Horner (p, x) = −nX
i=0
δiaixi ⇒ |c| ≤ γ2n
nXi=0
|ai ||x |i .
As the δi are unknown, classical error analysis does not solve our problem. . .
But we can do better thanks to Error-Free Transformations!
Compensated Algorithms – N. Louvet April 18, 2007 12 / 33
Error-free transformations
Error-Free Transformations (EFT) are properties and algorithms to compute the
rounding errors at the current working precision.
+ (x , y) = 2Sum (a, b) 6 flops Knuth (74)
such that a + b = x + y and x = a⊕ b
× (x , y) = 2Prod (a, b) 17 flops Veltkamp
such that a× b = x + y and x = a⊗ b Dekker (71)
with a, b, x , y ∈ F.
Compensated Algorithms – N. Louvet April 18, 2007 13 / 33
How to compute the correcting term c thanks to EFT?
Algorithm (Horner scheme)function r0 = Horner (p, x)
rn = an
for i = n − 1 : −1 : 0
pi = r i+1 ⊗ x % rounding error πi ∈ F ⇒ pi = r i+1x − πi
r i = pi ⊕ ai % rounding error σi ∈ F ⇒ r i = pi + ai − σi
For i = n − 1 : −1 : 0,
r i = r i+1x + ai − πi − σi .
Since rn = an,
Horner (p, x) =n∑
i=0
(ai − πi − σi )xi , with πn = σn = 0.
Compensated Algorithms – N. Louvet April 18, 2007 14 / 33
How to compute the correcting term c thanks to EFT?
c is an approximate of the forward error c = r − r
Express r w.r.t. the data and the elementary rounding errors:
Horner algorithm computes
Horner (p, x) =n∑
i=0
(ai − πi − σi )xi , with πn = σn = 0.
Deduce an expression for the forward error c = r − r :
c = p(x)− Horner (p, x) =n−1∑i=0
(πi + σi )xi .
If we manage to find a closed form formula for c w.r.t
I the data,
I the elementary rounding errors (exactly computable thanks to EFT),
we can easily compute a correcting term c .
Compensated Algorithms – N. Louvet April 18, 2007 14 / 33
An EFT for the Horner Algorithm
We have
c =n−1∑i=0
(πi + σi )xi = (pπ + pσ)(x),
with pπ(x) =∑
πiXi and pσ(x) =
∑σiX
i .
Algorithm (EFT for Horner)
function [ r0, pπ, pσ] = EFTHorner (p, x)
rn = an
for i = n − 1 : −1 : 0
[ pi , πi ] = 2Prod ( r i+1, x)
[ r i , σi ] = 2Sum ( pi , ai )
pπ[i ] = πi ; pσ[i ] = σi
This algorithm is an EFT for the Horner algorithm since
p(x) = Horner (p, x) + (pπ + pσ)(x)︸ ︷︷ ︸=c
.
Compensated Algorithms – N. Louvet April 18, 2007 15 / 33
Compensated Horner Algorithm
Since c = (pπ + pσ)(x), we compute c as Horner (pπ ⊕ pσ, x).
Algorithm (Compensated Horner scheme)
function r = CompHorner (p, x)
[ r , pπ, pσ] = EFTHorner (p, x) % r = Horner (p, x)
c = Horner (pπ⊕pσ, x)
r = r⊕ c
Next question: how to prove something about the accuracy of r ?
(Accuracy of the result . u + condition number× u2)
Difficult to answer in a general manner. . .
Compensated Algorithms – N. Louvet April 18, 2007 16 / 33
Sketch of proof for the compensated Horneralgorithm
The key “ingredient” here is the EFT for the Horner algorithm,
p(x) = Horner (p, x) + (pπ + pσ)(x)︸ ︷︷ ︸=c
.
Compensated Algorithms – N. Louvet April 18, 2007 17 / 33
Sketch of proof for the compensated Horner algorithm
We recall:
r = Horner (p, x),
c = (pπ + pσ)(x) is the forward error in r ,
c = Horner (pπ ⊕ pσ, x) is the computed correcting term,
r = r ⊕ c is the compensated result.
Since the compensated result is r = r ⊕ c ,
|p(x)− r | = |p(x)− (1 + ε)( r + c)|, with |ε| ≤ u.
Using the EFT for the Horner scheme, r = p(x)− c ,
|p(x)− r | = |p(x)− (1 + ε)(p(x) + c − c)|
≤ u|p(x)|+ (1 + u)| c − c |
How can we bound |c − c |?
Compensated Algorithms – N. Louvet April 18, 2007 18 / 33
Sketch of proof for the compensated Horner algorithm
|c − c | is the forward error in c = Horner (pπ ⊕ pσ, x). Then
|c − c | ≤ γ2n−1( ˜pπ + pσ)(x),
with ( ˜pπ + pσ)(x) =∑n−1
i=0 |pπ + pσ||x i |. We bound “largely” this term as
follows,
( ˜pπ + pσ)(x) ≤ γ2n p(x),
with p(x) =∑n
i=0 |ai ||x i |. Therefore,
|c − c | ≤ γ2n−1γ2n p(x).
Nota:
γk =ku
1− ku= ku +Ou2.
Compensated Algorithms – N. Louvet April 18, 2007 18 / 33
Sketch of proof for the compensated Horner algorithm
Then,
|p(x)− r | ≤ u|p(x)|+ (1 + u)γ2n−1γ2n p(x),
≤ u|p(x)|+ γ22n p(x).
Now we turn to relative accuracy, and we obtain the following theorem.
TheoremGiven p a polynomial with floating point coefficients, and x a floating point
value, let r be the compensated evaluation of p(x) computed with
CompHorner. Then,
|p(x)− r ||p(x)|
≤ u + γ22n cond(p, x).
Again, γ22n ≈ 4n2u2.
Compensated Algorithms – N. Louvet April 18, 2007 18 / 33
Accuracy of the result . u + condition number× u2.
1e-18
1e-16
1e-14
1e-12
1e-10
1e-08
1e-06
1e-04
0.01
1
100000 1e+10 1e+15 1e+20 1e+25 1e+30 1e+35
rela
tive forw
ard
err
or
condition number
Accuracy of polynomial evaluation [n=25]
u
1/u 1/u2
γ2n cond u + γ2n2 cond
HornerDDHorner
CompHorner
Compensated Algorithms – N. Louvet April 18, 2007 19 / 33
Conclusion
We have recall how to define a correcting term:
correcting term = approximate of the forward error.
Compensating the Horner algorithm improves the accuracy:
I the accuracy of the compensated result is the same as if the result was
computed in doubled working precision.
I Remark: this is also true for compensated summation or compensated
triangular system solving.
Compensated Algorithms – N. Louvet April 18, 2007 20 / 33
Faithful rounding with the CHS
Compensated Algorithms – N. Louvet April 18, 2007 21 / 33
Faithful rounding
Definition
A floating point number x is said to be a faithful rounding of a real number x if
either x = x ,
or x is one of the two floating point neighbours of x .x
x
The error bound
|CompHorner (p, x)− p(x)||p(x)|
≤ u + γ22n︸︷︷︸
2n2u2
cond(p, x)
is too large for reasoning about faithful rounding.
Compensated Algorithms – N. Louvet April 18, 2007 22 / 33
An a posteriori test
We recall:
I r = CompHorner (p, x) is the compensated result,
I c = (pπ + pσ)(x) is the exact (real) correcting term for br ,I bc = Horner (pπ ⊕ pσ, x) is the computed (floating point) correcting term.
The following error bound on the computed correcting term holds:
|c − c | ≤ fl
(γ2n−1Horner (|pπ| ⊕ |pσ|, |x |)
1− 2(n + 1)u
)=: β.
Then, we can perform a dynamic test for faithful rounding:
Theorem
β <u
2| r | ⇒ |c − c | < u
2| r | ⇒ r is a faithful rounding of p(x).
Compensated Algorithms – N. Louvet April 18, 2007 23 / 33
An a posteriori test
1e-18
1e-16
1e-14
1e-12
1e-10
1e-08
1e-06
1e-04
0.01
1
100000 1e+10 1e+15 1e+20 1e+25 1e+30 1e+35
rela
tive
forw
ard
erro
r
condition number
Accuracy of polynomial evaluation with the compensated Horner scheme [n=50]
u
1/u 1/u2
u + γ2n2 cond
(1-u)/(2+u)uγ2n-2
Compensated Algorithms – N. Louvet April 18, 2007 24 / 33
Practical efficiency
Compensated Algorithms – N. Louvet April 18, 2007 25 / 33
Overhead to obtain more accuracy
Theoretical ratios (flops):
CompHorner
Horner∼ 10.5
CompHornerIsFaith
Horner∼ 13
DDHorner
Horner∼ 14
Some practical ratios (running times 2):
CompHornerHorner
CompHornerIsFaithHorner
DDHornerHorner
Pentium 4, 3.00 GHz GCC 3.3.5 3.77 5.52 10.00
ICC 9.1 3.06 5.31 8.88
Athlon 64, 2.00 GHz GCC 4.0.1 3.89 4.43 10.48
Itanium 2, 1.4 GHz GCC 3.4.6 3.64 4.59 5.50
ICC 9.1 1.87 2.30 8.78
∼ 2− 4 ∼ 4− 6 ∼ 5− 10
2Average ratios for polynomials of degree 5 to 200.Compensated Algorithms – N. Louvet April 18, 2007 26 / 33
Some comparisons
How does the more accurate algorithms compare to each other?
CompHornerIsFaithCompHorner
DDHornerCompHorner
DDHornerCompHornerIsFaith
Pentium 4, 3.00 GHz GCC 3.3.5 1.46 2.66 1.89
ICC 9.1 1.71 2.92 1.72
Athlon 64, 2.00 GHz GCC 4.0.1 1.14 2.70 2.38
Itanium 2, 1.4 GHz GCC 3.4.6 1.27 1.51 1.27
ICC 9.1 1.24 4.67 3.84
≤ 2 ∼ 2− 5 ∼ 2− 5
Compensated Algorithms – N. Louvet April 18, 2007 27 / 33
What is Instruction-Level Parallelism?
All processors since about 1985, including those in the embedded space,
use pipelining to overlap the execution of instructions and improve
performance. This potential overlap among instruction is called
instruction-level parallelism (ILP) since the instruction can be evaluated in
parallel. (Hennessy & Patterson)
A wide range of techniques have been developped to exploit the parallelism
available among instructions (pipelining, superscalar architectures...)
Amount of ILP available in a code:
I if two instructions are parallel they can execute simultaneously in a pipeline,
I if two instructions are dependent they must be executed in order.
How to determine wheter an instruction is dependent on another?
Compensated Algorithms – N. Louvet April 18, 2007 28 / 33
Dependences between instructions
Three different types of dependences:
I control dependences,
I name dependences,
I data dependences (or true dependences).
A control dependence determine the ordering of an instruction with respect
to a branch instruction.
A name dependence occurs when two instructions use the same register or
memory location (name), but there is in fact no flow of data between
instructions associated with that name.
But here we are mainly interested by data dependences.
Compensated Algorithms – N. Louvet April 18, 2007 29 / 33
Data dependences
An instruction i is data dependent on an instruction j if either
I instruction j produce a result that may be used by instruction i ,
I their exist a chain of dependences of the first type between i and j .
j
i i
k
j
If two instructions are data dependent, they cannot execute simultaneously.
Dependences are properties of programs: the presence of a data dependence
in an instruction sequence reflects a data dependence in the source code.
What about the dependences in CompHorner and DDHorner?
Compensated Algorithms – N. Louvet April 18, 2007 30 / 33
Difference between DDHorner and CompHorner
function r = CompHorner(P, x)
sn = ai; cn = 0
for i = n − 1 : −1 : 0
[pi, πi] = 2Prod(si+1, x)
[si, σi] = 2Sum(pi, ai)
ci = ci+1 ⊗ x ⊕ (πi ⊕ σi )
end
r = s0 ⊕ c0
function r = DDHorner(P, x)
shn = ai; sln = 0
for i = n − 1 : −1 : 0
%% [phi, pli]=[shi+1, sli+1] ⊗ x
[th, tl] = 2Prod(shi+1, x)
tl = sli+1 ⊗ x ⊕ tl
[phi, pli] = Fast2Sum(th, tl)
%% [shi, sli]=[phi, pli] ⊕ ai
[th, tl] = 2Sum(phi, ai)
tl = tl ⊕ pli
[shi, sli] = Fast2Sum(th, tl)
end
r = sh0
Compensated Algorithms – N. Louvet April 18, 2007 31 / 33
Difference between DDHorner and CompHorner
function r = CompHorner’(P, x)
sn = ai; cn = 0
for i = n − 1 : −1 : 0
[pi, πi] = 2Prod(si+1, x)
ti = ci+1 ⊗ x ⊕ πi
[si, σi] = 2Sum(pi, ai)
ci = ti ⊕ σi
end
r = s0 ⊕ c0
function r = DDHorner(P, x)
shn = ai; sln = 0
for i = n − 1 : −1 : 0
%% [phi, pli]=[shi+1, sli+1] ⊗ x
[th, tl] = 2Prod(shi+1, x)
tl = sli+1 ⊗ x ⊕ tl
[phi, pli] = Fast2Sum(th, tl)
%% [shi, sli]=[phi, pli] ⊕ ai
[th, tl] = 2Sum(phi, ai)
tl = tl ⊕ pli
[shi, sli] = Fast2Sum(th, tl)
end
r = sh0
Compensated Algorithms – N. Louvet April 18, 2007 32 / 33
Difference in the data flow graph
10
9
1
2
3
4
5
6
7
8
11
12
13
14
15
16
17
18
19
x_lor
r
x_lo x_hi
r r x
+
x_hi
x c
P[i]
P[i]
22
splitter
c
r
20*
+
21
P[i]
P[i]
sh splittersh x sl x
2
3
4
56
7
8
9
10
11
12
13
15
25
* *
−
−
−* *
* *−
+
+
+
+
+
sh
x_hi x_lo
ph
sh
14
pl
sl
21
+19
20−
23−−
22−
+24
26+
27−
+28
+16
−17
18−
*2
3−
−4
−5
*8
*6
−7
+9
*12 10
*
+11
+13
14−
15−
16−
18−
17−
+19
*1
*1
DDHornerCompHorner
We represent all data dependences in
the inner loop of each algorithm.
More parallelism among floating point
operations in CompHorner than in
DDHorner.
Thus more potential ILP, and greater
pratical performance!
Compensated Algorithms – N. Louvet April 18, 2007 33 / 33