78
Rounding Errors in Complex Floating-Point Multiplication Colin Percival [email protected] IRMACS, Simon Fraser University Rounding Errors in Complex Floating-Point Multiplication – p.1/24

Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival [email protected] IRMACS, Simon Fraser University

Embed Size (px)

Citation preview

Page 1: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Rounding Errors in ComplexFloating-Point Multiplication

Colin Percival

[email protected]

IRMACS, Simon Fraser University

Rounding Errors in Complex Floating-Point Multiplication – p.1/24

Page 2: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Review of Floating-Point Arithmetic

Floating-point values are expressed as a sign,significand, and exponent, e.g.,

(−1)s · βe−B · m

where {s, e, m} ⊂ N, 0 < e < E, βt−1 ≤ m < βt, andβ, t, B, E are parameters of the floating-point systembeing used.

As a special case, e = 0 and m = βt−1 represents ±0.

In IEEE 754 “double precision” arithmetic, β = 2, t = 53,B = 1075 and E = 2047.

There are also denormals, infinities, and NaNs, but innumerical code they usually never occur.

Rounding Errors in Complex Floating-Point Multiplication – p.2/24

Page 3: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Review of Floating-Point Arithmetic

Floating-point values are expressed as a sign,significand, and exponent, e.g.,

(−1)s · βe−B · m

where {s, e, m} ⊂ N, 0 < e < E, βt−1 ≤ m < βt, andβ, t, B, E are parameters of the floating-point systembeing used.

As a special case, e = 0 and m = βt−1 represents ±0.

In IEEE 754 “double precision” arithmetic, β = 2, t = 53,B = 1075 and E = 2047.

There are also denormals, infinities, and NaNs, but innumerical code they usually never occur.

Rounding Errors in Complex Floating-Point Multiplication – p.2/24

Page 4: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Review of Floating-Point Arithmetic

Floating-point values are expressed as a sign,significand, and exponent, e.g.,

(−1)s · βe−B · m

where {s, e, m} ⊂ N, 0 < e < E, βt−1 ≤ m < βt, andβ, t, B, E are parameters of the floating-point systembeing used.

As a special case, e = 0 and m = βt−1 represents ±0.

In IEEE 754 “double precision” arithmetic, β = 2, t = 53,B = 1075 and E = 2047.

There are also denormals, infinities, and NaNs, but innumerical code they usually never occur.

Rounding Errors in Complex Floating-Point Multiplication – p.2/24

Page 5: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Review of Floating-Point Arithmetic

Floating-point values are expressed as a sign,significand, and exponent, e.g.,

(−1)s · βe−B · m

where {s, e, m} ⊂ N, 0 < e < E, βt−1 ≤ m < βt, andβ, t, B, E are parameters of the floating-point systembeing used.

As a special case, e = 0 and m = βt−1 represents ±0.

In IEEE 754 “double precision” arithmetic, β = 2, t = 53,B = 1075 and E = 2047.

There are also denormals, infinities, and NaNs, but innumerical code they usually never occur.

Rounding Errors in Complex Floating-Point Multiplication – p.2/24

Page 6: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Terminology

Denote by ⊕, , and ⊗ the results of roundedfloating-point addition, subtraction, and multiplication,and define the “unit in the last place” ulp(x) for x 6= 0 asthe (unique) power of β such that

βt−1 ≤ |x| /ulp(x) < βt

and ulp(0) = 0.

Note that for x 6= 0,

ulp((−1)s · βe−B · m) = βe−B

and ulp(x) ≤ x · β1−t.

Also define ε = 1

2ulp(1) = 1

2β1−t.

Rounding Errors in Complex Floating-Point Multiplication – p.3/24

Page 7: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Terminology

Denote by ⊕, , and ⊗ the results of roundedfloating-point addition, subtraction, and multiplication,and define the “unit in the last place” ulp(x) for x 6= 0 asthe (unique) power of β such that

βt−1 ≤ |x| /ulp(x) < βt

and ulp(0) = 0.

Note that for x 6= 0,

ulp((−1)s · βe−B · m) = βe−B

and ulp(x) ≤ x · β1−t.

Also define ε = 1

2ulp(1) = 1

2β1−t.

Rounding Errors in Complex Floating-Point Multiplication – p.3/24

Page 8: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Terminology

Denote by ⊕, , and ⊗ the results of roundedfloating-point addition, subtraction, and multiplication,and define the “unit in the last place” ulp(x) for x 6= 0 asthe (unique) power of β such that

βt−1 ≤ |x| /ulp(x) < βt

and ulp(0) = 0.

Note that for x 6= 0,

ulp((−1)s · βe−B · m) = βe−B

and ulp(x) ≤ x · β1−t.

Also define ε = 1

2ulp(1) = 1

2β1−t.

Rounding Errors in Complex Floating-Point Multiplication – p.3/24

Page 9: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Floating-Point Rounding Errors

IEEE 754 requires that arithmetic operations produceresults which are exactly rounded, i.e., the same as ifthe values were computed to infinite precision prior torounding.

Rounding Errors in Complex Floating-Point Multiplication – p.4/24

Page 10: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Floating-Point Rounding Errors

IEEE 754 requires that arithmetic operations produceresults which are exactly rounded, i.e., the same as ifthe values were computed to infinite precision prior torounding.

In round-to-nearest mode on IEEE 754 systems,

|(x + y) − (x ⊕ y)| ≤ 1

2ulp(x + y) < ε(x + y)

|(x − y) − (x y)| ≤ 1

2ulp(x − y) < ε(x − y)

|(xy) − (x ⊗ y)| ≤ 1

2ulp(xy) < ε(xy)

Rounding Errors in Complex Floating-Point Multiplication – p.4/24

Page 11: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Floating-Point Rounding Errors

IEEE 754 requires that arithmetic operations produceresults which are exactly rounded, i.e., the same as ifthe values were computed to infinite precision prior torounding.

In round-to-nearest mode on IEEE 754 systems,

|(x + y) − (x ⊕ y)| ≤ 1

2ulp(x + y) < ε(x + y)

|(x − y) − (x y)| ≤ 1

2ulp(x − y) < ε(x − y)

|(xy) − (x ⊗ y)| ≤ 1

2ulp(xy) < ε(xy)

Rounding Errors in Complex Floating-Point Multiplication – p.4/24

Page 12: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Complex Rounding Errors

For complex z0 = a0 + ib0, z1 = a1 + ib1, if we computez2 = a2 + ib2 = (a0 ⊕ a1) + i(b0 ⊕ b1), then

|(z0 + z1) − z2| =√

((a0 + a1) − a2)2 + ((b0 + b1) − b2)2

<√

(ε |a0 + a1|)2 + (ε |b0 + b1|)2 = ε |z0 + z1|

Problem: If we compute

x2 = (a0 ⊗ a1) (b0 ⊗ b1)

y2 = (a0 ⊗ b1) ⊕ (b0 ⊗ a1),

what is the smallest α such that |z0z1 − z2| < εα |z0z1| ?

Rounding Errors in Complex Floating-Point Multiplication – p.5/24

Page 13: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Complex Rounding Errors

For complex z0 = a0 + ib0, z1 = a1 + ib1, if we computez2 = a2 + ib2 = (a0 ⊕ a1) + i(b0 ⊕ b1), then

|(z0 + z1) − z2| =√

((a0 + a1) − a2)2 + ((b0 + b1) − b2)2

<√

(ε |a0 + a1|)2 + (ε |b0 + b1|)2 = ε |z0 + z1|

Problem: If we compute

x2 = (a0 ⊗ a1) (b0 ⊗ b1)

y2 = (a0 ⊗ b1) ⊕ (b0 ⊗ a1),

what is the smallest α such that |z0z1 − z2| < εα |z0z1| ?

Rounding Errors in Complex Floating-Point Multiplication – p.5/24

Page 14: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Motivation

The Fast Fourier Transform makes large polynomial(and integer) arithmetic practical.

Finite Field arithmetic is slow on most CPUs.Floating-Point arithmetic has rounding errors.

Theorem [Percival, 2002]: The FFT allows accuratecomputation of the cyclic convolution z = x ∗ y of twovectors of length N = 2n of Gaussian integers if

|x| · |y| · ((1 + ε)3n(1 + εα)3n+1(1 + β)3n − 1) <1

2

where εα is the maximum relative error of complexmultiplication, and β is the maximum error in theprecomputed complex roots of unity used.

Rounding Errors in Complex Floating-Point Multiplication – p.6/24

Page 15: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Motivation

The Fast Fourier Transform makes large polynomial(and integer) arithmetic practical.

Finite Field arithmetic is slow on most CPUs.

Floating-Point arithmetic has rounding errors.

Theorem [Percival, 2002]: The FFT allows accuratecomputation of the cyclic convolution z = x ∗ y of twovectors of length N = 2n of Gaussian integers if

|x| · |y| · ((1 + ε)3n(1 + εα)3n+1(1 + β)3n − 1) <1

2

where εα is the maximum relative error of complexmultiplication, and β is the maximum error in theprecomputed complex roots of unity used.

Rounding Errors in Complex Floating-Point Multiplication – p.6/24

Page 16: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Motivation

The Fast Fourier Transform makes large polynomial(and integer) arithmetic practical.

Finite Field arithmetic is slow on most CPUs.Floating-Point arithmetic has rounding errors.

Theorem [Percival, 2002]: The FFT allows accuratecomputation of the cyclic convolution z = x ∗ y of twovectors of length N = 2n of Gaussian integers if

|x| · |y| · ((1 + ε)3n(1 + εα)3n+1(1 + β)3n − 1) <1

2

where εα is the maximum relative error of complexmultiplication, and β is the maximum error in theprecomputed complex roots of unity used.

Rounding Errors in Complex Floating-Point Multiplication – p.6/24

Page 17: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Motivation

The Fast Fourier Transform makes large polynomial(and integer) arithmetic practical.

Finite Field arithmetic is slow on most CPUs.Floating-Point arithmetic has rounding errors.

Theorem [Percival, 2002]: The FFT allows accuratecomputation of the cyclic convolution z = x ∗ y of twovectors of length N = 2n of Gaussian integers if

|x| · |y| · ((1 + ε)3n(1 + εα)3n+1(1 + β)3n − 1) <1

2

where εα is the maximum relative error of complexmultiplication, and β is the maximum error in theprecomputed complex roots of unity used.

Rounding Errors in Complex Floating-Point Multiplication – p.6/24

Page 18: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Previous Bounds

We can take α =√

8 [Higham, Accuracy and Stability ofNumerical Algorithms].

We can take α =√

16/3 [Olver, 1986].

We can take α =√

5 [Percival, 2002].

Conjectured based on comparing the results ofsingle-precision and double-precision complexmultiplication of several million randomly choseninputs.Unfortunately the proof was wrong...... and it took five years before anyone noticed!

Rounding Errors in Complex Floating-Point Multiplication – p.7/24

Page 19: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Previous Bounds

We can take α =√

8 [Higham, Accuracy and Stability ofNumerical Algorithms].

We can take α =√

16/3 [Olver, 1986].

We can take α =√

5 [Percival, 2002].

Conjectured based on comparing the results ofsingle-precision and double-precision complexmultiplication of several million randomly choseninputs.Unfortunately the proof was wrong...... and it took five years before anyone noticed!

Rounding Errors in Complex Floating-Point Multiplication – p.7/24

Page 20: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Previous Bounds

We can take α =√

8 [Higham, Accuracy and Stability ofNumerical Algorithms].

We can take α =√

16/3 [Olver, 1986].

We can take α =√

5 [Percival, 2002].

Conjectured based on comparing the results ofsingle-precision and double-precision complexmultiplication of several million randomly choseninputs.Unfortunately the proof was wrong...... and it took five years before anyone noticed!

Rounding Errors in Complex Floating-Point Multiplication – p.7/24

Page 21: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Previous Bounds

We can take α =√

8 [Higham, Accuracy and Stability ofNumerical Algorithms].

We can take α =√

16/3 [Olver, 1986].

We can take α =√

5 [Percival, 2002].

Conjectured based on comparing the results ofsingle-precision and double-precision complexmultiplication of several million randomly choseninputs.

Unfortunately the proof was wrong...... and it took five years before anyone noticed!

Rounding Errors in Complex Floating-Point Multiplication – p.7/24

Page 22: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Previous Bounds

We can take α =√

8 [Higham, Accuracy and Stability ofNumerical Algorithms].

We can take α =√

16/3 [Olver, 1986].

We can take α =√

5 [Percival, 2002].

Conjectured based on comparing the results ofsingle-precision and double-precision complexmultiplication of several million randomly choseninputs.Unfortunately the proof was wrong...

... and it took five years before anyone noticed!

Rounding Errors in Complex Floating-Point Multiplication – p.7/24

Page 23: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Previous Bounds

We can take α =√

8 [Higham, Accuracy and Stability ofNumerical Algorithms].

We can take α =√

16/3 [Olver, 1986].

We can take α =√

5 [Percival, 2002].

Conjectured based on comparing the results ofsingle-precision and double-precision complexmultiplication of several million randomly choseninputs.Unfortunately the proof was wrong...... and it took five years before anyone noticed!

Rounding Errors in Complex Floating-Point Multiplication – p.7/24

Page 24: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Error Bound

Theorem 1. [Brent, Percival, Zimmermann, 2006]

Let z0 = a0 + b0i and z1 = a1 + b1i, with a0, b0, a1, b1

floating-point values with t-digit base-β significands, and

z2 = ((a0 ⊗ a1) (b0 ⊗ b1)) + ((a0 ⊗ b1) ⊕ (b0 ⊗ a1))i.

Providing that no overflow or underflow occur, no denormal

values are produced, arithmetic results are correctly rounded

to a nearest representable value, z0z1 6= 0, and βt ≥ 25,

|z0z1 − z2| <1

2β1−t |z0z1| = ε

√5 |z0z1| .

Rounding Errors in Complex Floating-Point Multiplication – p.8/24

Page 25: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Equivalent Inputs

Without loss of generality, we can assume the greatestpossible relative error occurs when

0 ≤ a0, b0, a1, b1, by multiplying by powers of i,

b0b1 ≤ a0a1, by taking complex congugates andmultiplying z0, z1 by i,

b0a1 ≤ a0b1, by swapping z0 and z1,1

2≤ a0 < 1, by multiplying z0 by powers of 2, and

1

2≤ a0a1 < 1, by multiplying z1 by powers of 2,

Rounding Errors in Complex Floating-Point Multiplication – p.9/24

Page 26: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Equivalent Inputs

Without loss of generality, we can assume the greatestpossible relative error occurs when

0 ≤ a0, b0, a1, b1, by multiplying by powers of i,

b0b1 ≤ a0a1, by taking complex congugates andmultiplying z0, z1 by i,

b0a1 ≤ a0b1, by swapping z0 and z1,1

2≤ a0 < 1, by multiplying z0 by powers of 2, and

1

2≤ a0a1 < 1, by multiplying z1 by powers of 2,

Rounding Errors in Complex Floating-Point Multiplication – p.9/24

Page 27: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Equivalent Inputs

Without loss of generality, we can assume the greatestpossible relative error occurs when

0 ≤ a0, b0, a1, b1, by multiplying by powers of i,

b0b1 ≤ a0a1, by taking complex congugates andmultiplying z0, z1 by i,

b0a1 ≤ a0b1, by swapping z0 and z1,

1

2≤ a0 < 1, by multiplying z0 by powers of 2, and

1

2≤ a0a1 < 1, by multiplying z1 by powers of 2,

Rounding Errors in Complex Floating-Point Multiplication – p.9/24

Page 28: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Equivalent Inputs

Without loss of generality, we can assume the greatestpossible relative error occurs when

0 ≤ a0, b0, a1, b1, by multiplying by powers of i,

b0b1 ≤ a0a1, by taking complex congugates andmultiplying z0, z1 by i,

b0a1 ≤ a0b1, by swapping z0 and z1,1

2≤ a0 < 1, by multiplying z0 by powers of 2, and

1

2≤ a0a1 < 1, by multiplying z1 by powers of 2,

Rounding Errors in Complex Floating-Point Multiplication – p.9/24

Page 29: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Equivalent Inputs

Without loss of generality, we can assume the greatestpossible relative error occurs when

0 ≤ a0, b0, a1, b1, by multiplying by powers of i,

b0b1 ≤ a0a1, by taking complex congugates andmultiplying z0, z1 by i,

b0a1 ≤ a0b1, by swapping z0 and z1,1

2≤ a0 < 1, by multiplying z0 by powers of 2, and

1

2≤ a0a1 < 1, by multiplying z1 by powers of 2,

none of which affect the resulting relative error.

Rounding Errors in Complex Floating-Point Multiplication – p.9/24

Page 30: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Imaginary Error

To bound the imaginary error |=(z0z1 − z2)|, we considertwo cases:

Case I1: ulp(a0b1 + b0a1) < ulp(a0 ⊗ b1 + b0 ⊗ a1)

Case I2: ulp(a0 ⊗ b1 + b0 ⊗ a1) ≤ ulp(a0b1 + b0a1)

In each case, we find that

|(a0 ⊗ b1 + b0 ⊗ a1) − ((a0 ⊗ b1) ⊕ (b0 ⊗ a1))| < ε·(a0b1+b0a1)

and thus

|=(z0z1 − z2)| < ε · (2a0b1 + 2b0a1).

Rounding Errors in Complex Floating-Point Multiplication – p.10/24

Page 31: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Imaginary Error

To bound the imaginary error |=(z0z1 − z2)|, we considertwo cases:

Case I1: ulp(a0b1 + b0a1) < ulp(a0 ⊗ b1 + b0 ⊗ a1)

Case I2: ulp(a0 ⊗ b1 + b0 ⊗ a1) ≤ ulp(a0b1 + b0a1)

In each case, we find that

|(a0 ⊗ b1 + b0 ⊗ a1) − ((a0 ⊗ b1) ⊕ (b0 ⊗ a1))| < ε·(a0b1+b0a1)

and thus

|=(z0z1 − z2)| < ε · (2a0b1 + 2b0a1).

Rounding Errors in Complex Floating-Point Multiplication – p.10/24

Page 32: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Imaginary Error

To bound the imaginary error |=(z0z1 − z2)|, we considertwo cases:

Case I1: ulp(a0b1 + b0a1) < ulp(a0 ⊗ b1 + b0 ⊗ a1)

Case I2: ulp(a0 ⊗ b1 + b0 ⊗ a1) ≤ ulp(a0b1 + b0a1)

In each case, we find that

|(a0 ⊗ b1 + b0 ⊗ a1) − ((a0 ⊗ b1) ⊕ (b0 ⊗ a1))| < ε·(a0b1+b0a1)

and thus

|=(z0z1 − z2)| < ε · (2a0b1 + 2b0a1).

Rounding Errors in Complex Floating-Point Multiplication – p.10/24

Page 33: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Imaginary Error

To bound the imaginary error |=(z0z1 − z2)|, we considertwo cases:

Case I1: ulp(a0b1 + b0a1) < ulp(a0 ⊗ b1 + b0 ⊗ a1)

Case I2: ulp(a0 ⊗ b1 + b0 ⊗ a1) ≤ ulp(a0b1 + b0a1)

In each case, we find that

|(a0 ⊗ b1 + b0 ⊗ a1) − ((a0 ⊗ b1) ⊕ (b0 ⊗ a1))| < ε·(a0b1+b0a1)

and thus

|=(z0z1 − z2)| < ε · (2a0b1 + 2b0a1).

Rounding Errors in Complex Floating-Point Multiplication – p.10/24

Page 34: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Real Error

To bound the real error |<(z0z1 − z2)|, we will considerfour cases:

Case R1: ulp(b0b1) ≤ ulp(a0a1) ≤ ulp(a0 ⊗ a1 − b0 ⊗ b1)

Case R2: ulp(b0b1) < ulp(a0 ⊗ a1 − b0 ⊗ b1) < ulp(a0a1)

Case R3: ulp(a0 ⊗ a1 − b0 ⊗ b1) ≤ ulp(b0b1) < ulp(a0a1)

Case R4:ulp(a0 ⊗ a1 − b0 ⊗ b1) < ulp(b0b1) = ulp(a0a1)

Since we have assumed that 0 ≤ b0b1 ≤ a0a1, these fourcases cover all possible inputs.

Once we have bounds on the real error for each ofthese cases, we can combine them with the imaginaryerror bound to obtain a bound on the complex error.

Rounding Errors in Complex Floating-Point Multiplication – p.11/24

Page 35: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Real Error

To bound the real error |<(z0z1 − z2)|, we will considerfour cases:

Case R1: ulp(b0b1) ≤ ulp(a0a1) ≤ ulp(a0 ⊗ a1 − b0 ⊗ b1)

Case R2: ulp(b0b1) < ulp(a0 ⊗ a1 − b0 ⊗ b1) < ulp(a0a1)

Case R3: ulp(a0 ⊗ a1 − b0 ⊗ b1) ≤ ulp(b0b1) < ulp(a0a1)

Case R4:ulp(a0 ⊗ a1 − b0 ⊗ b1) < ulp(b0b1) = ulp(a0a1)

Since we have assumed that 0 ≤ b0b1 ≤ a0a1, these fourcases cover all possible inputs.

Once we have bounds on the real error for each ofthese cases, we can combine them with the imaginaryerror bound to obtain a bound on the complex error.

Rounding Errors in Complex Floating-Point Multiplication – p.11/24

Page 36: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Real Error

To bound the real error |<(z0z1 − z2)|, we will considerfour cases:

Case R1: ulp(b0b1) ≤ ulp(a0a1) ≤ ulp(a0 ⊗ a1 − b0 ⊗ b1)

Case R2: ulp(b0b1) < ulp(a0 ⊗ a1 − b0 ⊗ b1) < ulp(a0a1)

Case R3: ulp(a0 ⊗ a1 − b0 ⊗ b1) ≤ ulp(b0b1) < ulp(a0a1)

Case R4:ulp(a0 ⊗ a1 − b0 ⊗ b1) < ulp(b0b1) = ulp(a0a1)

Since we have assumed that 0 ≤ b0b1 ≤ a0a1, these fourcases cover all possible inputs.

Once we have bounds on the real error for each ofthese cases, we can combine them with the imaginaryerror bound to obtain a bound on the complex error.

Rounding Errors in Complex Floating-Point Multiplication – p.11/24

Page 37: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Real Error

To bound the real error |<(z0z1 − z2)|, we will considerfour cases:

Case R1: ulp(b0b1) ≤ ulp(a0a1) ≤ ulp(a0 ⊗ a1 − b0 ⊗ b1)

Case R2: ulp(b0b1) < ulp(a0 ⊗ a1 − b0 ⊗ b1) < ulp(a0a1)

Case R3: ulp(a0 ⊗ a1 − b0 ⊗ b1) ≤ ulp(b0b1) < ulp(a0a1)

Case R4:ulp(a0 ⊗ a1 − b0 ⊗ b1) < ulp(b0b1) = ulp(a0a1)

Since we have assumed that 0 ≤ b0b1 ≤ a0a1, these fourcases cover all possible inputs.

Once we have bounds on the real error for each ofthese cases, we can combine them with the imaginaryerror bound to obtain a bound on the complex error.

Rounding Errors in Complex Floating-Point Multiplication – p.11/24

Page 38: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Real Error

To bound the real error |<(z0z1 − z2)|, we will considerfour cases:

Case R1: ulp(b0b1) ≤ ulp(a0a1) ≤ ulp(a0 ⊗ a1 − b0 ⊗ b1)

Case R2: ulp(b0b1) < ulp(a0 ⊗ a1 − b0 ⊗ b1) < ulp(a0a1)

Case R3: ulp(a0 ⊗ a1 − b0 ⊗ b1) ≤ ulp(b0b1) < ulp(a0a1)

Case R4: ulp(a0 ⊗ a1 − b0 ⊗ b1) < ulp(b0b1) = ulp(a0a1)

Since we have assumed that 0 ≤ b0b1 ≤ a0a1, these fourcases cover all possible inputs.

Once we have bounds on the real error for each ofthese cases, we can combine them with the imaginaryerror bound to obtain a bound on the complex error.

Rounding Errors in Complex Floating-Point Multiplication – p.11/24

Page 39: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Real Error

To bound the real error |<(z0z1 − z2)|, we will considerfour cases:

Case R1: ulp(b0b1) ≤ ulp(a0a1) ≤ ulp(a0 ⊗ a1 − b0 ⊗ b1)

Case R2: ulp(b0b1) < ulp(a0 ⊗ a1 − b0 ⊗ b1) < ulp(a0a1)

Case R3: ulp(a0 ⊗ a1 − b0 ⊗ b1) ≤ ulp(b0b1) < ulp(a0a1)

Case R4: ulp(a0 ⊗ a1 − b0 ⊗ b1) < ulp(b0b1) = ulp(a0a1)

Since we have assumed that 0 ≤ b0b1 ≤ a0a1, these fourcases cover all possible inputs.

Once we have bounds on the real error for each ofthese cases, we can combine them with the imaginaryerror bound to obtain a bound on the complex error.

Rounding Errors in Complex Floating-Point Multiplication – p.11/24

Page 40: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Real Error

To bound the real error |<(z0z1 − z2)|, we will considerfour cases:

Case R1: ulp(b0b1) ≤ ulp(a0a1) ≤ ulp(a0 ⊗ a1 − b0 ⊗ b1)

Case R2: ulp(b0b1) < ulp(a0 ⊗ a1 − b0 ⊗ b1) < ulp(a0a1)

Case R3: ulp(a0 ⊗ a1 − b0 ⊗ b1) ≤ ulp(b0b1) < ulp(a0a1)

Case R4: ulp(a0 ⊗ a1 − b0 ⊗ b1) < ulp(b0b1) = ulp(a0a1)

Since we have assumed that 0 ≤ b0b1 ≤ a0a1, these fourcases cover all possible inputs.

Once we have bounds on the real error for each ofthese cases, we can combine them with the imaginaryerror bound to obtain a bound on the complex error.

Rounding Errors in Complex Floating-Point Multiplication – p.11/24

Page 41: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Case R1

ulp(b0b1) ≤ ulp(a0a1) ≤ ulp(a0 ⊗ a1 − b0 ⊗ b1)

|<(z0z1 − z2)| ≤1

2ulp(b0b1) +

1

2ulp(a0a1) +

1

2ulp(a0 ⊗ a1 − b0 ⊗ b1)

≤ 1

2ulp(b0b1) +

2

2ulp(a0 ⊗ a1 − b0 ⊗ b1)

≤ ε · (2a0a1 − b0b1) + 2ε2 |z0z1|

Rounding Errors in Complex Floating-Point Multiplication – p.12/24

Page 42: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Case R1

ulp(b0b1) ≤ ulp(a0a1) ≤ ulp(a0 ⊗ a1 − b0 ⊗ b1)

Note that

1

2ulp(a0 ⊗ a1 − b0 ⊗ b1) < ε · (a0a1 − b0b1 + ε(a0a1 + b0b1))

|<(z0z1 − z2)| ≤1

2ulp(b0b1) +

1

2ulp(a0a1) +

1

2ulp(a0 ⊗ a1 − b0 ⊗ b1)

≤ 1

2ulp(b0b1) +

2

2ulp(a0 ⊗ a1 − b0 ⊗ b1)

≤ ε · (2a0a1 − b0b1) + 2ε2 |z0z1|

Rounding Errors in Complex Floating-Point Multiplication – p.12/24

Page 43: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Case R1

ulp(b0b1) ≤ ulp(a0a1) ≤ ulp(a0 ⊗ a1 − b0 ⊗ b1)

Note that

1

2ulp(a0 ⊗ a1 − b0 ⊗ b1) < ε · (a0a1 − b0b1 + ε(a0a1 + b0b1))

Consequently,

|<(z0z1 − z2)| ≤1

2ulp(b0b1) +

1

2ulp(a0a1) +

1

2ulp(a0 ⊗ a1 − b0 ⊗ b1)

≤ 1

2ulp(b0b1) +

2

2ulp(a0 ⊗ a1 − b0 ⊗ b1)

≤ ε · (2a0a1 − b0b1) + 2ε2 |z0z1|

Rounding Errors in Complex Floating-Point Multiplication – p.12/24

Page 44: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Case R2

ulp(b0b1) < ulp(a0 ⊗ a1 − b0 ⊗ b1) < ulp(a0a1)

Rounding Errors in Complex Floating-Point Multiplication – p.13/24

Page 45: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Case R2

ulp(b0b1) < ulp(a0 ⊗ a1 − b0 ⊗ b1) < ulp(a0a1)

Note that ulp(x) < ulp(y) implies ulp(x) ≤ 1

2ulp(y), i.e.,

ulp(b0b1) ≤1

2ulp(a0 ⊗ a1 − b0 ⊗ b1)

ulp(a0 ⊗ a1 − b0 ⊗ b1) ≤1

2ulp(a0a1)

Rounding Errors in Complex Floating-Point Multiplication – p.13/24

Page 46: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Case R2

ulp(b0b1) < ulp(a0 ⊗ a1 − b0 ⊗ b1) < ulp(a0a1)

Note that ulp(x) < ulp(y) implies ulp(x) ≤ 1

2ulp(y), i.e.,

ulp(b0b1) ≤1

2ulp(a0 ⊗ a1 − b0 ⊗ b1)

ulp(a0 ⊗ a1 − b0 ⊗ b1) ≤1

2ulp(a0a1)

Consequently,

|<(z0z1 − z2)| ≤(

1

8+

1

4+

1

2

)

· ulp(a0a1)

≤ ε ·(

7

4a0a1

)

Rounding Errors in Complex Floating-Point Multiplication – p.13/24

Page 47: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Case R3

ulp(a0 ⊗ a1 − b0 ⊗ b1) ≤ ulp(b0b1) < ulp(a0a1)

Rounding Errors in Complex Floating-Point Multiplication – p.14/24

Page 48: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Case R3

ulp(a0 ⊗ a1 − b0 ⊗ b1) ≤ ulp(b0b1) < ulp(a0a1)

Note that

(a0 ⊗ a1) (b0 ⊗ b1) = a0 ⊗ a1 − b0 ⊗ b1

ulp(b0b1) ≤1

2ulp(a0a1)

Rounding Errors in Complex Floating-Point Multiplication – p.14/24

Page 49: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Case R3

ulp(a0 ⊗ a1 − b0 ⊗ b1) ≤ ulp(b0b1) < ulp(a0a1)

Note that

(a0 ⊗ a1) (b0 ⊗ b1) = a0 ⊗ a1 − b0 ⊗ b1

ulp(b0b1) ≤1

2ulp(a0a1)

Consequently,

|<(z0z1 − z2)| ≤(

1

4+

1

2

)

ulp(a0a1)

≤ ε ·(

3

2a0a1

)

Rounding Errors in Complex Floating-Point Multiplication – p.14/24

Page 50: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Case R4

ulp(a0 ⊗ a1 − b0 ⊗ b1) < ulp(b0b1) = ulp(a0a1)

Rounding Errors in Complex Floating-Point Multiplication – p.15/24

Page 51: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Case R4

ulp(a0 ⊗ a1 − b0 ⊗ b1) < ulp(b0b1) = ulp(a0a1)

Note that

(a0 ⊗ a1) (b0 ⊗ b1) = a0 ⊗ a1 − b0 ⊗ b1

Rounding Errors in Complex Floating-Point Multiplication – p.15/24

Page 52: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Case R4

ulp(a0 ⊗ a1 − b0 ⊗ b1) < ulp(b0b1) = ulp(a0a1)

Note that

(a0 ⊗ a1) (b0 ⊗ b1) = a0 ⊗ a1 − b0 ⊗ b1

Consequently,

|<(z0z1 − z2)| ≤1

2ulp(a0a1) +

1

2ulp(b0b1)

≤ ε · (a0a1 + b0b1)

Rounding Errors in Complex Floating-Point Multiplication – p.15/24

Page 53: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Absolute Complex Error

Imaginary error: |=(z0z1 − z2)| < ε · (2a0b1 + 2b0a1)

Case R1: |<(z0z1 − z2)| ≤ ε · (2a0a1 − b0b1) + 2ε2 |z0z1|=⇒ |z0z1 − z2| < ε

(

32/7 + 2ε)

|z0z1|

Case R2: |<(z0z1 − z2)| ≤ ε ·(

7

4a0a1

)

=⇒ |z0z1 − z2| < ε√

1024/207 |z0z1|Case R3: |<(z0z1 − z2)| ≤ ε ·

(

3

2a0a1

)

=⇒ |z0z1 − z2| < ε√

256/55 |z0z1|Case R4: |<(z0z1 − z2)| ≤ ε · (a0a1 + b0b1)

=⇒ |z0z1 − z2| < ε√

5 |z0z1|

Rounding Errors in Complex Floating-Point Multiplication – p.16/24

Page 54: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Absolute Complex Error

Imaginary error: |=(z0z1 − z2)| < ε · (2a0b1 + 2b0a1)

Case R1: |<(z0z1 − z2)| ≤ ε · (2a0a1 − b0b1) + 2ε2 |z0z1|=⇒ |z0z1 − z2| < ε

(

32/7 + 2ε)

|z0z1|

Case R2: |<(z0z1 − z2)| ≤ ε ·(

7

4a0a1

)

=⇒ |z0z1 − z2| < ε√

1024/207 |z0z1|Case R3: |<(z0z1 − z2)| ≤ ε ·

(

3

2a0a1

)

=⇒ |z0z1 − z2| < ε√

256/55 |z0z1|Case R4: |<(z0z1 − z2)| ≤ ε · (a0a1 + b0b1)

=⇒ |z0z1 − z2| < ε√

5 |z0z1|

Rounding Errors in Complex Floating-Point Multiplication – p.16/24

Page 55: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Absolute Complex Error

Imaginary error: |=(z0z1 − z2)| < ε · (2a0b1 + 2b0a1)

Case R1: |<(z0z1 − z2)| ≤ ε · (2a0a1 − b0b1) + 2ε2 |z0z1|=⇒ |z0z1 − z2| < ε

(

32/7 + 2ε)

|z0z1|

Case R2: |<(z0z1 − z2)| ≤ ε ·(

7

4a0a1

)

=⇒ |z0z1 − z2| < ε√

1024/207 |z0z1|

Case R3: |<(z0z1 − z2)| ≤ ε ·(

3

2a0a1

)

=⇒ |z0z1 − z2| < ε√

256/55 |z0z1|Case R4: |<(z0z1 − z2)| ≤ ε · (a0a1 + b0b1)

=⇒ |z0z1 − z2| < ε√

5 |z0z1|

Rounding Errors in Complex Floating-Point Multiplication – p.16/24

Page 56: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Absolute Complex Error

Imaginary error: |=(z0z1 − z2)| < ε · (2a0b1 + 2b0a1)

Case R1: |<(z0z1 − z2)| ≤ ε · (2a0a1 − b0b1) + 2ε2 |z0z1|=⇒ |z0z1 − z2| < ε

(

32/7 + 2ε)

|z0z1|

Case R2: |<(z0z1 − z2)| ≤ ε ·(

7

4a0a1

)

=⇒ |z0z1 − z2| < ε√

1024/207 |z0z1|Case R3: |<(z0z1 − z2)| ≤ ε ·

(

3

2a0a1

)

=⇒ |z0z1 − z2| < ε√

256/55 |z0z1|

Case R4: |<(z0z1 − z2)| ≤ ε · (a0a1 + b0b1)

=⇒ |z0z1 − z2| < ε√

5 |z0z1|

Rounding Errors in Complex Floating-Point Multiplication – p.16/24

Page 57: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Absolute Complex Error

Imaginary error: |=(z0z1 − z2)| < ε · (2a0b1 + 2b0a1)

Case R1: |<(z0z1 − z2)| ≤ ε · (2a0a1 − b0b1) + 2ε2 |z0z1|=⇒ |z0z1 − z2| < ε

(

32/7 + 2ε)

|z0z1|

Case R2: |<(z0z1 − z2)| ≤ ε ·(

7

4a0a1

)

=⇒ |z0z1 − z2| < ε√

1024/207 |z0z1|Case R3: |<(z0z1 − z2)| ≤ ε ·

(

3

2a0a1

)

=⇒ |z0z1 − z2| < ε√

256/55 |z0z1|Case R4: |<(z0z1 − z2)| ≤ ε · (a0a1 + b0b1)

=⇒ |z0z1 − z2| < ε√

5 |z0z1|

Rounding Errors in Complex Floating-Point Multiplication – p.16/24

Page 58: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Worst-Case Multiplicands for β = 2

Theorem 2. Assume that

|z0z1 − z2||z0z1|

> ε√

5 − nε > ε · max(

1024/207,√

32/7 + 2ε)

for some positive integer n. Then a0 6= b0, a1 6= b1, and

a0a1 = 1/2 + (jaa + 1/2)ε + kaaε2 a0b1 = 1/2 + (jab + 1/2)ε + kabε

2

b0a1 = 1/2 + (jba + 1/2)ε + kbaε2 b0b1 = 1/2 + (jbb + 1/2)ε + kbbε

2

for some integers jxy, kxy satisfying

0 ≤ jaa, jab, jba, jbb <n

4, |kaa| , |kbb| < n, |kab| , |kba| <

n

2

Rounding Errors in Complex Floating-Point Multiplication – p.17/24

Page 59: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Sketch of Proof

From the argument in Theorem 1, case R4 must hold:ulp(b0b1) = ulp(a0a1) = ε, and there is no rounding errorintroduced in the subtraction.

Juggling of inequalities leads to

1 ≤ |z0z1|2 ≤ 5

5 − nε

ε2(5 − nε) < |z0z1 − z2|2 < 5ε2

Considering what these bounds imply about a0a1,|a0a1 − a0 ⊗ a1|, et cetera, provides the result desired.

Rounding Errors in Complex Floating-Point Multiplication – p.18/24

Page 60: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Sketch of Proof

From the argument in Theorem 1, case R4 must hold:ulp(b0b1) = ulp(a0a1) = ε, and there is no rounding errorintroduced in the subtraction.

Juggling of inequalities leads to

1 ≤ |z0z1|2 ≤ 5

5 − nε

ε2(5 − nε) < |z0z1 − z2|2 < 5ε2

Considering what these bounds imply about a0a1,|a0a1 − a0 ⊗ a1|, et cetera, provides the result desired.

Rounding Errors in Complex Floating-Point Multiplication – p.18/24

Page 61: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Sketch of Proof

From the argument in Theorem 1, case R4 must hold:ulp(b0b1) = ulp(a0a1) = ε, and there is no rounding errorintroduced in the subtraction.

Juggling of inequalities leads to

1 ≤ |z0z1|2 ≤ 5

5 − nε

ε2(5 − nε) < |z0z1 − z2|2 < 5ε2

Considering what these bounds imply about a0a1,|a0a1 − a0 ⊗ a1|, et cetera, provides the result desired.

Rounding Errors in Complex Floating-Point Multiplication – p.18/24

Page 62: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Computation is Useful!

At this point, I turned to computation.

Pruned exhaustive search of IEEE 754single-precision inputs, using the results of Theorem2 to eliminate most of the search space.Searching took about 5 CPU-hours.Worst case inputs:

a0 =3

4b0 =

3

4(1 − 4ε) a1 =

2

3(1 + 11ε) b1 =

2

3(1 + 5ε)

This suggests a more general form...

Rounding Errors in Complex Floating-Point Multiplication – p.19/24

Page 63: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Computation is Useful!

At this point, I turned to computation.

Pruned exhaustive search of IEEE 754single-precision inputs, using the results of Theorem2 to eliminate most of the search space.

Searching took about 5 CPU-hours.Worst case inputs:

a0 =3

4b0 =

3

4(1 − 4ε) a1 =

2

3(1 + 11ε) b1 =

2

3(1 + 5ε)

This suggests a more general form...

Rounding Errors in Complex Floating-Point Multiplication – p.19/24

Page 64: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Computation is Useful!

At this point, I turned to computation.

Pruned exhaustive search of IEEE 754single-precision inputs, using the results of Theorem2 to eliminate most of the search space.Searching took about 5 CPU-hours.

Worst case inputs:

a0 =3

4b0 =

3

4(1 − 4ε) a1 =

2

3(1 + 11ε) b1 =

2

3(1 + 5ε)

This suggests a more general form...

Rounding Errors in Complex Floating-Point Multiplication – p.19/24

Page 65: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Computation is Useful!

At this point, I turned to computation.

Pruned exhaustive search of IEEE 754single-precision inputs, using the results of Theorem2 to eliminate most of the search space.Searching took about 5 CPU-hours.Worst case inputs:

a0 =3

4b0 =

3

4(1 − 4ε) a1 =

2

3(1 + 11ε) b1 =

2

3(1 + 5ε)

This suggests a more general form...

Rounding Errors in Complex Floating-Point Multiplication – p.19/24

Page 66: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Computation is Useful!

At this point, I turned to computation.

Pruned exhaustive search of IEEE 754single-precision inputs, using the results of Theorem2 to eliminate most of the search space.Searching took about 5 CPU-hours.Worst case inputs:

a0 =3

4b0 =

3

4(1 − 4ε) a1 =

2

3(1 + 11ε) b1 =

2

3(1 + 5ε)

This suggests a more general form...

Rounding Errors in Complex Floating-Point Multiplication – p.19/24

Page 67: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Worst-Case Multiplicands for β = 2

Theorem 3. Assume that

|z0z1 − z2||z0z1|

> ε√

5 − nε > ε · max(

1024/207,√

32/7 + 2ε)

for some n < 1

4ε−1/2 and ε ≤ 2−6. Then there exist integers c0, d0,

α0, β0, c1, d1, α1, β1 satisfying

z0 =c0

d0

(1 + i + (α0 + β0i)ε) z1 =c1

d1

(1 + i + (α1 + β1i)ε)

min(α0, β0) + min(α1, β1) ≥ 0 2c0c1 = d0d1 < 3n

|α0α1| , |α0β1| , |β0α1| , |β0β1| < n1

2< a0, b0, a1, b1 < 1.

Rounding Errors in Complex Floating-Point Multiplication – p.20/24

Page 68: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Worst-Case Multiplicands for IEEE 754

Now an “exhaustive” search is far less exhausting.

The IEEE 754 single-precision worst-case inputs are

a0 =3

4b0 =

3

4(1 − 4ε) a1 =

2

3(1 + 11ε) b1 =

2

3(1 + 5ε)

and have a relative error of ε ·√

4.9999899864.

The IEEE 754 double-precision worst-case inputs are

a0 =3

4(1 + 4ε) b0 =

3

4a1 =

2

3(1 + 7ε) b1 =

2

3(1 + ε)

and have a relative error of ε ·√

4.9999999999999893.

Clearly ε√

5 is the best (practical) bound possible.

Rounding Errors in Complex Floating-Point Multiplication – p.21/24

Page 69: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Worst-Case Multiplicands for IEEE 754

Now an “exhaustive” search is far less exhausting.

The IEEE 754 single-precision worst-case inputs are

a0 =3

4b0 =

3

4(1 − 4ε) a1 =

2

3(1 + 11ε) b1 =

2

3(1 + 5ε)

and have a relative error of ε ·√

4.9999899864.

The IEEE 754 double-precision worst-case inputs are

a0 =3

4(1 + 4ε) b0 =

3

4a1 =

2

3(1 + 7ε) b1 =

2

3(1 + ε)

and have a relative error of ε ·√

4.9999999999999893.

Clearly ε√

5 is the best (practical) bound possible.

Rounding Errors in Complex Floating-Point Multiplication – p.21/24

Page 70: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Worst-Case Multiplicands for IEEE 754

Now an “exhaustive” search is far less exhausting.

The IEEE 754 single-precision worst-case inputs are

a0 =3

4b0 =

3

4(1 − 4ε) a1 =

2

3(1 + 11ε) b1 =

2

3(1 + 5ε)

and have a relative error of ε ·√

4.9999899864.

The IEEE 754 double-precision worst-case inputs are

a0 =3

4(1 + 4ε) b0 =

3

4a1 =

2

3(1 + 7ε) b1 =

2

3(1 + ε)

and have a relative error of ε ·√

4.9999999999999893.

Clearly ε√

5 is the best (practical) bound possible.

Rounding Errors in Complex Floating-Point Multiplication – p.21/24

Page 71: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Worst-Case Multiplicands for IEEE 754

Now an “exhaustive” search is far less exhausting.

The IEEE 754 single-precision worst-case inputs are

a0 =3

4b0 =

3

4(1 − 4ε) a1 =

2

3(1 + 11ε) b1 =

2

3(1 + 5ε)

and have a relative error of ε ·√

4.9999899864.

The IEEE 754 double-precision worst-case inputs are

a0 =3

4(1 + 4ε) b0 =

3

4a1 =

2

3(1 + 7ε) b1 =

2

3(1 + ε)

and have a relative error of ε ·√

4.9999999999999893.

Clearly ε√

5 is the best (practical) bound possible.

Rounding Errors in Complex Floating-Point Multiplication – p.21/24

Page 72: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Roots of Unity

We have the best possible error bound onmultiplication, but for FFTs we still need a bound on theerrors in the precomputed roots of unity.

Even if your CPU provides exactly roundedtranscendental functions, cos(2πk/2n) + i sin(2πk/2n) stillsuffers from rounding in the value of π used and themultiplication k ⊗ π, in addition to the two trigonometricevaluations.

Many FFT implementations use shockingly inaccurateiterations to compute roots of unity.

It is possible to compute the 2nth roots of unity in27

32· 2n + O(n) FLOPS with a maximum error < 2ε.

... I need to find time to write this paper some day.

Rounding Errors in Complex Floating-Point Multiplication – p.22/24

Page 73: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Roots of Unity

We have the best possible error bound onmultiplication, but for FFTs we still need a bound on theerrors in the precomputed roots of unity.

Even if your CPU provides exactly roundedtranscendental functions, cos(2πk/2n) + i sin(2πk/2n) stillsuffers from rounding in the value of π used and themultiplication k ⊗ π, in addition to the two trigonometricevaluations.

Many FFT implementations use shockingly inaccurateiterations to compute roots of unity.

It is possible to compute the 2nth roots of unity in27

32· 2n + O(n) FLOPS with a maximum error < 2ε.

... I need to find time to write this paper some day.

Rounding Errors in Complex Floating-Point Multiplication – p.22/24

Page 74: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Roots of Unity

We have the best possible error bound onmultiplication, but for FFTs we still need a bound on theerrors in the precomputed roots of unity.

Even if your CPU provides exactly roundedtranscendental functions, cos(2πk/2n) + i sin(2πk/2n) stillsuffers from rounding in the value of π used and themultiplication k ⊗ π, in addition to the two trigonometricevaluations.

Many FFT implementations use shockingly inaccurateiterations to compute roots of unity.

It is possible to compute the 2nth roots of unity in27

32· 2n + O(n) FLOPS with a maximum error < 2ε.

... I need to find time to write this paper some day.

Rounding Errors in Complex Floating-Point Multiplication – p.22/24

Page 75: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Roots of Unity

We have the best possible error bound onmultiplication, but for FFTs we still need a bound on theerrors in the precomputed roots of unity.

Even if your CPU provides exactly roundedtranscendental functions, cos(2πk/2n) + i sin(2πk/2n) stillsuffers from rounding in the value of π used and themultiplication k ⊗ π, in addition to the two trigonometricevaluations.

Many FFT implementations use shockingly inaccurateiterations to compute roots of unity.

It is possible to compute the 2nth roots of unity in27

32· 2n + O(n) FLOPS with a maximum error < 2ε.

... I need to find time to write this paper some day.

Rounding Errors in Complex Floating-Point Multiplication – p.22/24

Page 76: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

Roots of Unity

We have the best possible error bound onmultiplication, but for FFTs we still need a bound on theerrors in the precomputed roots of unity.

Even if your CPU provides exactly roundedtranscendental functions, cos(2πk/2n) + i sin(2πk/2n) stillsuffers from rounding in the value of π used and themultiplication k ⊗ π, in addition to the two trigonometricevaluations.

Many FFT implementations use shockingly inaccurateiterations to compute roots of unity.

It is possible to compute the 2nth roots of unity in27

32· 2n + O(n) FLOPS with a maximum error < 2ε.

... I need to find time to write this paper some day.

Rounding Errors in Complex Floating-Point Multiplication – p.22/24

Page 77: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

A Historical Note

“Indeed, in unpublished work R.P. Brent hasdemonstrated that in base 2, for example, [the errorterm] can be reduced to

√5 . . . ”

— F.W.J. Olver, 1986

Rounding Errors in Complex Floating-Point Multiplication – p.23/24

Page 78: Rounding Errors in Complex Floating-Point Multiplication · Rounding Errors in Complex Floating-Point Multiplication Colin Percival cperciva@irmacs.sfu.ca IRMACS, Simon Fraser University

References

N.J. Higham, Accuracy and Stability of NumericalAlgorithms, Second Edition, SIAM, 2002.

C. Percival, Rapid multiplication modulo the sum anddifference of highly composite numbers, Math. Comp.72 (2002), 387–395.

R.P. Brent, C. Percival, P. Zimmermann, Error boundson complex floating-point multiplication, Math. Comp.,to appear.

Rounding Errors in Complex Floating-Point Multiplication – p.24/24