14

Click here to load reader

Computations that require higher than double precision for robust and exact decision making

Embed Size (px)

Citation preview

Page 1: Computations that require higher than double precision for robust and exact decision making

This article was downloaded by: [Florida International University]On: 21 December 2014, At: 08:57Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

International Journal of ComputerMathematicsPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/gcom20

Computations that require higher thandouble precision for robust and exactdecision makingSudebkumar Prasant Pal , Rakesh Kumar Koul , FrahadMusadeekh , P. H. D. Ramakrishna & Hironmay Basua Department of Computer Science and Engineering , IndianInstitute of Technology , Kharagpur, 721302, IndiaPublished online: 12 May 2010.

To cite this article: Sudebkumar Prasant Pal , Rakesh Kumar Koul , Frahad Musadeekh , P. H. D.Ramakrishna & Hironmay Basu (2004) Computations that require higher than double precision forrobust and exact decision making , International Journal of Computer Mathematics, 81:5, 595-605,DOI: 10.1080/00207160410001684235

To link to this article: http://dx.doi.org/10.1080/00207160410001684235

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to orarising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

Page 2: Computations that require higher than double precision for robust and exact decision making

Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

Flor

ida

Inte

rnat

iona

l Uni

vers

ity]

at 0

8:57

21

Dec

embe

r 20

14

Page 3: Computations that require higher than double precision for robust and exact decision making

International Journal of Computer MathematicsVol. 81, No. 5, May 2004, pp. 595–605

COMPUTATIONS THAT REQUIRE HIGHER THANDOUBLE PRECISION FOR ROBUST AND EXACT

DECISION MAKING

SUDEBKUMAR PRASANT PAL∗, RAKESH KUMAR KOUL, FRAHAD MUSADEEKH,P. H. D. RAMAKRISHNA and HIRONMAY BASU

Department of Computer Science and Engineering, Indian Institute of Technology,Kharagpur 721302, India

(Revised 7 July 2003; In final form 17 December 2003)

Consider the computation of deciding relative orientations of objects undergoing multiple translations and rotations.Such an orientation test involves the computation of expressions based on arithmetic operations, square roots andtrigonometric functions. The computation of signs of such expressions using double precision floating-point arithmeticin modern computers may result in errors. In this article we demonstrate the existence of examples where doubleprecision is not sufficient to compute the correct sign of an expression. We consider (i) simple expressions involvingonly the four basic arithmetic operations, (ii) expressions involving the square-root function and (iii) expressionsrepresenting orientation tests in two- and three-dimensions involving objects undergoing arbitrary rotations by anglesgiven in radians, thereby requiring the computation of trigonometric functions. We develop a system that uses requisitehigh precision for computing the correct sign of such expressions. The system uses our floating-point filter called L-filterand the bigfloat extended precision package in LEDA (Library of Efficient Data Types and Algorithms).

Keywords: Floating-point filter; Exact computation; Geometric transformations; Collision detection

C.R. Categories: G.1.0; I.3.5

1 INTRODUCTION

Computations in science and engineering often require precise computation of arithmeticexpressions. In some cases, limited degree of error is permissible. Given the permissible errorbound, expressions can be evaluated with requisite high precision to meet the error boundrequirement. Another problem is that of determining whether an expression E evaluates toless than a constant V . This is equivalent to checking whether E − V is negative. In suchcomputations, signs of expressions are more important than the actual values of the expressions.In this article, we concentrate on the problem of correctly determining signs of expressionsusing requisite high precision. Softwares for predicting collisions in dynamic environmentsrequire to compute the relative orientations of moving objects. Consider the computationof relative orientations of objects undergoing multiple translations and rotations. Such anorientation test requires the computation of the signs of expressions based on the four basic

∗ Corresponding author. E-mail: [email protected]

ISSN 0020-7160 print; ISSN 1029-0265 online c© 2004 Taylor & Francis LtdDOI: 10.1080/00207160410001684235

Dow

nloa

ded

by [

Flor

ida

Inte

rnat

iona

l Uni

vers

ity]

at 0

8:57

21

Dec

embe

r 20

14

Page 4: Computations that require higher than double precision for robust and exact decision making

596 S. P. PAL et al.

arithmetic operations, square roots and trigonometric functions. The computation of suchexpressions using double precision floating-point arithmetic may result in the accumulationof round-off errors. We demonstrate the existence of examples where double precision is notsufficient to compute the correct signs of expressions. Incorrect computation of the sign of suchan expression may result in wrong decisions about the relative orientation between transformedor moving objects in two- and three-dimensions.

In order to guarantee correct computation of relative orientations, we use high-precisionfloating-point computations. We develop a system that uses requisite higher precision forcomputing the correct sign of an expression. The system uses our floating-point filter calledL-filter [1, 2] and the bigfloat extended precision package in LEDA [3].

One of the main contributions of our work is that we have considered transformationsinvolving rotations by arbitrary angles (given in radians) for two- and three-dimensional spaces.Rotations are critical in many applications and sequences of rotations are likely to result in theaccumulation of large round-off errors. We provide arbitrary rotation angles in radians and wishto compute rotations about arbitrary axes in three-dimensional space. Our L-filter supportsorientation tests performed with objects that have undergone arbitrarily long sequences ofrotation transformations, interspersed with translations. Such a facility is likely to have greatutility in simulating large motion planning and collision detection applications.

In this article we demonstrate different types of examples of failure using double pre-cision. Section 2 considers examples involving only arithmetic operations and square roots.Section 3 presents examples in two dimensions involving the computation of the rotation trans-formation, thereby requiring the evaluation of trigonometric functions. Section 4 demonstratesexamples involving rotation transformations in three-dimensional space, requiring computa-tions of trigonometric functions. The problems in these sections can be solved correctly usingour floating-point filter. Section 5 briefly discusses our floating-point filter (L-filter). Section 6concludes with a discussion on applications and future research directions.

2 EXAMPLES USING SIMPLE ARITHMETIC

2.1 Cube Example

Consider the expression:

E = f 3 − g3 − (f 2 + g2 + f × g) + 0.5

The value of E is +0.5 for any combination of values of f and g, such that f − g = 1. Whenwe computed the expression on an Intel Pentium-IImachine with inputs f = 10,000,000and g = 9,999,999, we got the result −0.5. When we computed the same expression on SGIOctane, Ultra-SPARC and DEC platforms, we got a result of −64640.5. Using our floating-point filter L-filter [1], we got the correct result of +0.5 at a higher precision.

2.2 Example Using the Fourth Power

Consider the expression E = (x4 − y4) − (x3 + x2y + xy2 + y3) − 1. For x = 1,000,000,y = 999,999, we have (x − y) = 1. Since, (x4 − y4)/(x − y) = (x3 + x2y + xy2 + y3), weobserve that E should be equal to −1. Computation using double precision, gave us a resultof 101,799,935 on Ultra-SPARC, DEC & SGI Octane machines. On an Intel machine wegot a result of 22,527. Using MuPAD [4], we got the correct result. Our floating-point filterL-filter [1] gave the correct result −1 at a higher precision.

Dow

nloa

ded

by [

Flor

ida

Inte

rnat

iona

l Uni

vers

ity]

at 0

8:57

21

Dec

embe

r 20

14

Page 5: Computations that require higher than double precision for robust and exact decision making

FLOATING-POINT FILTER 597

2.3 Example Using a Determinant

Consider another example where we evaluate a 3 × 3 determinant. Computation of a determi-nant is important in deciding orientation of a point with respect to a line or plane. Considerthe determinant

D =∣∣∣∣∣∣

1 50000 500001 50000 0.00051 0.0005 0.0005

∣∣∣∣∣∣.

Let Exp = D × 10,000. D can be simplified as D = −(25 × 108 − 50 + 25 × 10−8). There-fore, Exp = −10,000 × (25 × 108 − 50 + 25 × 10−8) ⇒ Exp = −25 × 1012 + 5 × 105 −25 × 10−4 ⇒ Exp + 25 × 1012 − 5 × 105 + 0.0025 = 0. Therefore, Exp + 25 × 1012 − 5 ×105 + 0.003 > 0 (since 0.003 > 0.0025) ⇒ h > 0, where h = Exp + 25 × 1012 − 5 × 105 +0.003. However, when we evaluated the expression h using double precision, we got an incor-rect result of h = −0.00090625. We got the correct positive sign for h, computing with L-filterat a higher precision.

2.4 Square-Root Example

Consider evaluating a logical expression, Expression = (√x + √

y == √x + y + 2

√xy

).

Let L denotes the left hand side of Expression and R denotes the right hand side, i.e., L =√x + √

y, and R = √x + y + 2

√xy. By definition, Expression should always be TRUE.

However, due to errors generated in evaluating the square-roots, the value of Expression, ascomputed at double precision may not be TRUE for some input combinations.

Extending this example further, we added a small positive term 2 × 10−16 to R. Let g

represents the expression 100 × [L − (R + 2 × 10−16)]. The value of g should be less thanzero. We have an example for x = 2 and y = 3, where g is computed as positive using dou-ble precision. The value of g computed using double precision on Ultra-Sparc, DEC, SGIOctane and Intel P-II machines was 0.44 × 10−13. Using MuPAD we got the correct value−1.9984014437068253 × 10−14. Using L-filter we computed the values of g for precisionvalues 53, 62 and 63 as 4.440892 × 10−14, −9.97466 × 10−15 and −1.001802 × 10−14, witherror bounds 9.02883 × 10−12, 1.76344 × 10−14 and 8.81722 × 10−15, respectively. With 53-bit precision, L-filter computation gives |value(g)| < error(g). For precision up to 62, L-filtercould not decide sign(g). For 63-bit precision, we have |value(g)| > error(g).At this precision,L-filter is able to decide the correct sign(g).

2.5 High-Power Expression

Consider E = (f 8 − g8)/h5 − q. Let f = 9,999,999, g = 9,999,998, h = 100,000,000, q =7,999,991,601. When we computed this example on an Ultra-SPARC, DEC, SGI Octane andIntel P-II machines using double precision, we got incorrect values of +0.6795034408569336.However, when we computed the same example using MuPAD, we got the answer as−0.9960800010499998. The actual sign of the expression is negative. Using L-filter, wecomputed the value of E at precisions 53, 57 and 58 as 0.6795025, −1.089965 and −0.9538518with errors 17.7636, 1.11022 and 0.555111, respectively. For p = 58, we have |Evalue| > Eerror.Thus, the value of precision required to find the correct sign of E using L-filter is 58.

Dow

nloa

ded

by [

Flor

ida

Inte

rnat

iona

l Uni

vers

ity]

at 0

8:57

21

Dec

embe

r 20

14

Page 6: Computations that require higher than double precision for robust and exact decision making

598 S. P. PAL et al.

2.6 Rump’s Example

ConsiderE = 333.75b6 + a2(11a2b2 − b6 − 121b4 − 2) + 5.5b8 + a/2b, wherea = 77,617and b = 33,096. Rump [5] evaluated this expression using FORTRAN on an IBM System370 mainframe using single, double and extended precision. In all three cases, the computedvalue began with the digits: +1.172603. However, the correct value (to 40 decimal digits)is −0.82739605994682136814116509509547981629199. We computed this expression onUltra SPARC, DEC and SGI Octane getting a value near +1.1726039400531787. On IntelP-II we got the value +5.76461 × 1017. With MuPAD we got the value −0.827396059946821.Using L-filter, we computed the value of E for precisions 53, 127 and 128. At 53-bitprecision, we got the value of E as 1.17260394005318 with a huge error bound of1.84585E + 22. For p = 127 and p = 128, we got computed values for the expression as−0.82739605994682136814116509548 (for both the precision values), with differing errorbounds of 0.977185 and 0.488592, respectively. For p = 128, we have |Evalue| > Eerror. Theprecision at which L-filter can decide the sign of Rump’s expression correctly is, therefore, 128.

2.7 Arguments of Correctness

In the above examples we provide integer inputs that are exactly representable in floating-point.However, due to rounding errors that creep in during computation, double precision fails todecide the signs of expressions correctly.

In the square-root example in Section 2.4, we have inputs that are as small as x = 2 andy = 3. The inputs are integers, and are exactly representable in binary format. Therefore, thereare no input errors. We have

√x + √

y = √x + y + 2

√xy, therefore the sign of the expression√

x + √y − {√

x + y + 2√

xy + 2e − 16}

is negative. However, double precision computesthe sign of this expression as positive. Using our L-filter, we have been able to get the correctsign of the expression. This example shows that double precision may fail to give the correctsign for certain simple expressions involving arithmetic operations on simple inputs.

In the cube example (Sec. 2.1), we are evaluating the expression, f 3 − g3 − (f 2 + g2 +f × g) + 0.5. Since algebraically, f − g = 1, so we have: (f 3 − g3)/(f − g) = (f 2 + g2 +f × g). Therefore, the final value of the expression should be +0.5. Here, in the denominatorterm, we have f − g = 1, so we are ruling out the possibility of division by a higher degreeterm. In this respect, this example is comparatively better than the example in Section 2.5,where we are performing division by a term having fifth power.

In the example in Section 2.2, we are evaluating the sign of expression (x4 − y4) − (x3 +x2y + xy2 + y3) − 1. Here x − y = 1, so we have: (x4 − y4)/(x − y) = (x3 + x2y + xy2 +y3). Therefore, the actual value of the expression is −1. Here, we are using higher powers of theinput variables, x and y, as compared to those in the previous example. However, this exampleis better than other examples, in the sense that here we are not performing any division.

In the example in Section 2.5, we are evaluating the sign of expression E = (f 8 − g8)/h5 −q. For the particular input combination, we have (f 8 − g8)/h5 = 7999991600.00391999895,as computed by MuPAD [4]. Therefore, E = 7999991600.00391999895 − 7999991601,which is negative. However, double precision computes the sign of the expression as positive.The example suffers from a drawback that we are performing a division by a term containingthe fifth power of h.

In Rump’s example (Sec. 2.6), we are performing only one benign division. In this respect,this example is worth consideration. The sign of this expression is negative. The sign of theexpression as computed by MuPAD and our L-filter is negative. Double precision computesthe sign incorrectly. If we compare Rump’s example with other examples in this section,we notice that in the cube example (Sec. 2.1), the maximum power computed is 3, and in

Dow

nloa

ded

by [

Flor

ida

Inte

rnat

iona

l Uni

vers

ity]

at 0

8:57

21

Dec

embe

r 20

14

Page 7: Computations that require higher than double precision for robust and exact decision making

FLOATING-POINT FILTER 599

the quadruple example (Sec. 2.2) we compute up to a maximum power of 4. Moreover, we arenot performing division in these examples. Therefore, these examples are better than Rump’sexample.

3 TWO-DIMENSIONAL EXAMPLES

In this section, we discuss examples involving two-dimensional transformations. These exam-ples have practical significance in geometric problems like collision detection.

3.1 Four-Angles Example

We considered counterclockwise rotation of the point P(1.01, 1.01) by four angles about theorigin. The first three angles were 0.78539816339744828 rad each and the fourth angle was0.7853981633974485 rad. We tested the orientation of the rotated point P ′ against the directedline segment from (0, 0) to −(−2, −2). On Ultra-SPARC, SGI Octane and Intel 486, the pointP(1.01, 1.01) was wrongly rotated to finally land exactly on the line x = y. The rotated pointmust actually lie below the line x = y (as shown in Fig. 1), because the sum of the four givenangles is actually more than π and we are performing counterclockwise rotations by the fourangles. L-filter decided the orientation of the rotated point correctly at 70-bit precision.

3.2 Two-Dimensional Example Involving Trigonometric Functions

As another example, consider a projectile that is projected with a velocity v, at an angle θ . Thehorizontal range of the projectile is given by the expression:

R = 2v2

gsin θ cos θ = v2

gsin 2θ.

From the above relation, we note that the value of R at θ = 45◦ is double the value of R atθ = 15◦. Now, let us consider two balls B1 and B2 of radii r1 and r2 as projectiles (see Fig. 2).Initially the two balls are exactly touching each other, so the distance between the centres

FIGURE 1 The four angles add up to more than two right angles.

Dow

nloa

ded

by [

Flor

ida

Inte

rnat

iona

l Uni

vers

ity]

at 0

8:57

21

Dec

embe

r 20

14

Page 8: Computations that require higher than double precision for robust and exact decision making

600 S. P. PAL et al.

FIGURE 2 Number of bumps = 80, v = 3.3 m s−1, r1 = 0.1 m, r2 = 0.1 m.

of the two balls is r1 + r2. The ball B1 is projected with a velocity of v meters per secondat an angle of θ1 radians, the decimal representation of which is slightly more than 15◦. Theball B2 is projected with the same velocity of v meters per second at an angle of θ2 radians,the decimal representation of which is slightly more than 45◦. We give n bumps to ball B2,and 2n bumps to the ball B1. Assume that there is no loss of energy during the bumps. Thetotal distance covered by ball B1 (say d1) should have been exactly the same as that coveredby B2 (say d2), had the angles of projection been exactly 15◦ and 45◦, respectively. Since θ1

is slightly greater than 15◦, B1 will cover a distance that is greater than the distance that theprojectile might have covered, if it were projected at an angle of exactly 15◦. Since θ2 is slightlygreater than 45◦, B2 will cover a distance that is smaller than the distance that the projectilemight have covered, if it were projected at an angle of exactly 45◦ (Note that the value ofhorizontal range is maximum at θ = 45◦.). Therefore, the distance D between the centres ofthe two balls at the end of all the bumps, should be less than r1 + r2, i.e., the two balls shouldcollide. D can be computed as D = (R2 + r1 + r2) − R1, where R1 (R2) is the total distancetraversed horizontally by the ball (B2) in 2n (n) bumps. In other words, the collision conditionis (R2 + r1 + r2) − R1 < r1 + r2 or simply R2 − R1 < 0. However, when we computed theexpression Exp = R2 − R1 in double precision, we got a positive value. The result shows thatdouble precision fails to detect a collision between the two balls. We have got such an examplefor the following input set: θ1 = 0.26179938779915 (θ1 is slightly greater than 15◦ and sorange will be a bit greater), θ2 = 0.785398163397449 (θ2 is slightly greater than 45◦ and sorange will be a bit smaller).

The computed value for Exp using double precision on various hardware platforms like UltraSPARC, DEC, SGI Octane and Intel-PII was positive, about 0.17 × 10−13. We computed theexpression using the LEDA real filter [3, 6]. We declared all variables to be of the LEDA realnumber type and computed the sign and value of the expression Exp. The correct value of Expis negative, indicating that the two balls should collide. When we used the sign() function ofLEDA real to compute the sign of Exp, we got the correct negative sign. While computingthe sign of Exp, we printed down the value of the various subexpressions (R1 and R2) and theexpression Exp before and after sign comparison. The values are shown in Table I. Observethat although the value of Exp is negative before sign comparison (using the sign() functionof LEDA real), better values of the expression Exp and the subexpressions R1 and R2 arerecomputed by LEDA real filter internally, during the execution of sign().

We also computed the Exp using the type LEDA bigfloat at higher precisions (see Tab. II).We observed that as the number of bits increased, the value of the computed expression forExp became more accurate.

Dow

nloa

ded

by [

Flor

ida

Inte

rnat

iona

l Uni

vers

ity]

at 0

8:57

21

Dec

embe

r 20

14

Page 9: Computations that require higher than double precision for robust and exact decision making

FLOATING-POINT FILTER 601

TABLE I Values of R1, R2 and Exp computed using LEDA real filter.

R1 R2 Exp

Before sign 88.9073394495404 89.1073394495408 −1.27902815comparison 759588 × 10−12

After sign 88.9073394495414 89.1073394495412 −1.71484608comparison 539440050493962 82470498379451 900191 × 10−13

TABLE II Values of R1, R2 and Exp using LEDA bigfloat type.

Values

Precision R1 R2 Exp

54 88.90733944954156 89.10733944954122 −3.382295 × 10−13

120 88.9073394495414539440 89.1073394495412824704 −1.71484608900050493962813476 983794510693938 191463519 × 10−13

200 88.90733944954145394 89.107339449541282 −1.714846089001914640050493962813457406 47049837945106939330 35178363528056262654867018851316318693 8538378073498041979019 1477464674 × 10−13

Finally, we present the results obtained using our L-filter. At precisions 53, 59 and 60,the values computed were 1.704192 × 10−14, −1.785793 × 10−13 and −1.795785 × 10−13,with error bounds 1.6273 × 10−11, 2.54266 × 10−13 and 1.27133 × 10−13, respectively. Forprecision up to 59-bits, L-filter could not decide sign(Exp). For 60-bit precision, we have|value (Exp)| > error(Exp); at this precision L-filter is able to decide the corect sign of Exp.

4 THREE-DIMENSIONAL EXAMPLE WITH ROTATIONS

We consider two bodies undergoing the effect of a finite number of translations and rotationsin three-dimensional space. We demonstrate that there may be an erroneous decision aboutcollision between two such bodies if double precision is used for computations. We considerthe orientation of a point p with respect to a plane face P . The relative orientation betweenthe point p and the face P should not change if the same set of transformations is applied toboth p and P . We find that executing the transformations in double precision can alter therelative orientation between the point and the plane. We demonstrate using an arbitrary set ofn rotations and n translations to the plane P and the point p. The translations and rotations areapplied alternately, starting with a rotation. In the example, the plane P is specified by pointsp1(2.945, 2.945, 0), p2(2.94, 2.945, 0) and p3(2.945, 2.94, 0). We check the orientation ofthe point p(2.945, 2.945, 10−12) with respect to the above plane. We refer to the tables for theangles of rotation (Tab. III), axes of rotation (Tab. IV) and the translation vectors (Tab. V) usedin this example. Note that rotations are applied in three-dimensional space about arbitrarilyoriented axes. We use the standard sequence of two-dimensional rotations about coordinateaxes and translations for performing each three-dimensional rotation as in Ref. [7].

Using double precision, the determinant representing the orientation of the point is computedas a positive quantity before the sequence of transformations is applied. Since we apply the sameset of transformations to both the plane and the point, the sign of the orientation determinantbefore and after the transformations should be identical. However, after the transformations, the

Dow

nloa

ded

by [

Flor

ida

Inte

rnat

iona

l Uni

vers

ity]

at 0

8:57

21

Dec

embe

r 20

14

Page 10: Computations that require higher than double precision for robust and exact decision making

602 S. P. PAL et al.

TABLE III Angles of rotation.

Number Angle Number Angle Number Angle Number Angle

1 0.73 7 0.69 13 0.51 19 −0.732 −0.69 8 −0.67 14 −1.31 20 0.833 0.75 9 0.99 15 0.91 21 0.734 −0.74 10 −0.99 16 −0.61 22 −1.155 1.11 11 0.88 17 1.01 23 0.836 −0.68 12 −0.83 18 0.89 24 −0.74

25 0.83

orientation determinant is computed as a negative quantity −5.46648 × 10−17. We concludethat computation using double precision fails to decide the correct orientation of the transformedpoint. L-filter could solve this problem at 88-bit precision.

A nice feature of this example is that the angles, axes of rotation and the translation vectorshave been generated by some smart guesswork. In the course of generating this particularexample, we have come across several other similar instances where we failed to compute thecorrect orientation using double precision. We initially chose a point which had one coordinateof the order of 10−16. For this point, we got an example with only four rotations and transforma-tions. In this particular example, the order of the magnitude of the smallest coordinate is very

TABLE IV Initial and terminal points defining axes ofrotation.

Initial point Final point

1 12.9, 1e−4, 9.9 90, 0.0011, 32 3.9, 2, −6 2.1, 1, 4.33 5.5, 4, 5 4.2, 1.2, 5.94 −2.9, 2, 5 −0.6, 1.33, 6.85 2.4, 7, −16 16, −3e−3, 16.86 −20.4, −7, −36.9 2, 8.3, 127 2.4, −9, 6 12, 2.3, 1.98 12.9, 27, 24 −34, −6.9, 1.29 12.6 , 37, −22.9 10, −6.3, −2

10 −14.7, 7, −1 −16, 6.8, 1.211 4.7, −7, 11 2.9, 4.3, −2.712 2, 8.3, 12 23, 3.3, 2.913 12, 2.3, 1.9 2.3, 19.4, −2.314 −34, −6.9, 1.2 23, 4.2, −215 10, −6.3, −2 −147, 7, 216 −16, 6.8, 1.2 90, 1.1e−4, 317 24.3, 0.7, −0.1 21, 1, 4.318 −14.7, −7, 11 −14.2, −1.2, −5.919 1.47, −72, 1.3 4.2, −32, −5920 −147, 7, 2 13, 9.8, 2.221 90, 1.1e−4, 3 12.9, 1e−4, 922 21, 1, 4.3 3.9, 2, 623 −14.2, −1.2, −5.9 5.5, 4, 524 4.2, −32, −59 2.3, −19.4, −2.325 −14.2, −22, −49 8.9, 42, −2

Dow

nloa

ded

by [

Flor

ida

Inte

rnat

iona

l Uni

vers

ity]

at 0

8:57

21

Dec

embe

r 20

14

Page 11: Computations that require higher than double precision for robust and exact decision making

FLOATING-POINT FILTER 603

TABLE V The translation vectors.

1 1000, −2000, 3000 14 −2000, 5000, −30002 −2000, 3000, −4000 15 3000, −4000, 40003 3000, −1000, 1000 16 −4000, 3000, −10004 −4000, 2000, −4000 17 1000, −1000, 30005 1000, −3000, 3000 18 −2000, 2000, −10006 −2000, 4000, −2000 19 3000, −4000, 30007 3000, −3000, 1000 20 −4000, 4000, −30008 −4000, 5000, −2000 21 −4000, 4000, 30009 1000, −2000, 4000 22 1000, −3000, 1000

10 −2000, 4000, −2000 23 −2000, 5000, −300011 3000, −1000, 3000 24 3000, −4000, 400012 −4000, 4000, −3000 25 −4000, 3000, −100013 1000, −3000, 1000

low. Therefore, we improved the y-coordinate value of the point p by four orders of magnitudeby setting its coordinates to (2.945, 2.945, 10−12) and increased the number of rotations to 25.The correct orientation determination after applying all the transformations was possible withL-filter at 88-bit precision.

5 FLOATING-POINT FILTERS

Double precision floating-point representation is not sufficient for the computation of thecorrect sign of an arbitrary expression. High-precision computation is therefore inevitable.However, computations at higher precision result in higher costs in terms of computationtime and memory requirements. Therefore, we use floating-point filters for computing signsof expressions correctly. A floating-point filter filters out the computations that can be donecorrectly using machine (double) precision from the computations that need to be performed ata higher precision [see Refs. 1, 2, 8–10]. We have developed our floating-point filter L-filter [1]for computing the correct sign of an expression. The L-filter has two parts. One part, Part I, isused to compute the value of an expression using double precision. The other part, Part II, isused to compute the value of an expression at stipulated extended precisions. In both the parts,the upper bound on the error in the computed expression is computed using double precision.We initially try to compute the sign of an expression E using Part I of the filter. We checkwhether the magnitude of the computed value of the expression E, is greater than the computederror upper bound. If so, we declare that the sign of the expression E is the same as the signof the machine computed value of E; in such cases, we do not need to invoke higher precisioncomputations. We invoke Part II if the magnitude of the computed value of the expression islesser than the error upper bound computed in Part I. We increase the precision by a fixedstep and recompute the value of E at this precision. Every time, we increase the precision andrecompute the value of E at that precision, and check whether the modulus of the value of E

computed by Part II of L-filter is greater than the calculated error upper bound. The iterativeprocess continues, until we are able to decide the sign of the expression when the error upperbound becomes lesser than the absolute value of the computed expression.

A floating-point filter needs a multi-precision package, as an underlying layer. We usedthe multi-precision data types of LEDA’s bigfloat multi-precision package [3, 6]. (L-filter,however, does not use LEDA’s inbuilt filter, LEDA real.). On top of this package, we have builtour library. An application program interacts with the innermost layer through our library.

Dow

nloa

ded

by [

Flor

ida

Inte

rnat

iona

l Uni

vers

ity]

at 0

8:57

21

Dec

embe

r 20

14

Page 12: Computations that require higher than double precision for robust and exact decision making

604 S. P. PAL et al.

LEDA’s internal filter, LEDA real does not deal with trigonometric functions [11]. Trigono-metric functions arise in problems where two-dimensional or three-dimensional rotations andshear come into picture. Our L-filter is designed to deal with trigonometric functions. In L-filter,we do not use the standard built-in trigonometric functions like sine and cosine as provided bythe compiler’s library. We use our own accurate implementations of sine and cosine functionscomputed using sufficiently large number of terms in the Taylor expansion of these functionsusing requisite high precision.We account for the Taylor approximation errors and the round-offerrors in this process and incorporate these error bounds in L-filter; the details are in Refs. [1, 10]and are omitted here because they are not relevant to the main theme of this article.

6 CONCLUDING REMARKS

The use of filters and high precision certainly increases the overhead in terms of runningtime. The reward is the guarantee of exact decision about signs of expressions. One importantdirection of research is, therefore, the design of efficient filters and multi-precision packages.A system that uses mere double precision will not be able to correctly resolve signs of allexpressions; such systems could best declare that they cannot decide the correct sign usingdouble precision. It is, however, observed that most expressions are not so complex and thecomputations of their correct signs is possible using double precision. This is, indeed, animportant observation because this implies that geometric software would possibly not slowdown too much when equipped with a system like ours for exact computation of signs ofexpressions. The requirement of higher precision for resolving the sign of an expression wouldbe a rare phenomenon in most practical computing.

One of the main contributions of our work is that we have considered transformationsinvolving rotations by arbitrary angles (given in radians) for two- and three-dimensional spaces.Rotations are critical in many applications and sequences of rotations are likely to result in theaccumulation of large round-off errors. It is generally true that the examples such as that inSection 4 are rare, and require considerable effort to discover. We have, however, observed thatmany similar examples can be generated by using a fairly random choice of angles, lengths,axes of rotation and translation vectors. This implies that such counterexamples exist in plenty.In problems, where we do have a sequence of a large number of rotations and translations,we may sometimes encounter predicates or orientations tests where software using doubleprecision would fail to make a correct decision.

An interesting feature of our examples is that we have selected the inputs with reasonablysmall number of significant digits. Especially in the example in Section 4, all the angles ofrotation have a maximum of three significant digits. Input angles with a few significant digitsare common in practical systems. We have also limited the difference between the highest andthe lowest orders of the magnitudes of input quantities.

Our filter can compute the correct sign of an expression, that involves arithmetic operationslike addition, subtraction, multiplication, division, square root and trigonometric functions likesine and cosine. Functions like logarithms and probability distribution functions are importantin various scientific applications. In future, we plan to include some more functions in our filter.

Acknowledgement

The author S. P. Pal acknowledges the support of a research grant from All India Council forTechnical Education, New Delhi, India, during the period 1997–2000.

Dow

nloa

ded

by [

Flor

ida

Inte

rnat

iona

l Uni

vers

ity]

at 0

8:57

21

Dec

embe

r 20

14

Page 13: Computations that require higher than double precision for robust and exact decision making

FLOATING-POINT FILTER 605

References

[1] Koul, R. K. (2000). A system for the exact computation of orientations of transformed geometric objects,Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, 721302, India.

[2] Musadeekh, F., Koul, R. K., Ramakrishnna, P. H. D. and Pal, S. P. (2001). Computations that require higherthan double precision for robust and exact decision making, Technical Report TR/IIT/CSE/SPP2, Departmentof Computer Science and Engineering, IIT Kharagpur, 721302, India, July (2001), Presented in the Interna-tional Conference on Energy, Automation and Information Technology, Electrical Engineering Department, IITKharagpur, 721302, India.

[3] Mehlhorn, K. and Naher, S. (1989). LEDA, a library for efficient data types and algorithms, TR A 04/89, FB10,Universitat des Saarlandes, Saarbrucken.

[4] Mupad homepage. http://www.mupad.de.[5] Yap, C. and Dube, T. (1995). The exact computation paradigm, In: D. Z. Du and F. K. Hwang (Eds.), Computing

in Euclidean Geometry pp. 452–486.[6] Mehlhorn, K. and Naher, S. (1995). Leda, a platform for combinatorial and geometric computing, Communica-

tions of the ACM, 38(1), 96–102.[7] Hearn, D. and Baker, P. (1990). Computer Graphics, Prentice-Hall of India.[8] Agarwal, A. (1998). A library for robust geometric computation based on semi-static error analysis, MSc thesis,

Department of Mathematics, Indian Institute of Technology, Kharagpur, 721302, India.[9] Burnikel, C., Funke, S. and Seel, M. (1998). Exact geometric predicates using cascaded computation, In: Proc

14th Annu. ACM Sympos. Comput. Geom..[10] Ramakrishna, P. H. D. (2000). Bounding errors in the computation of trigonometric functions and roots of

polynomials, MTech thesis, Department of Computer Science and Engineering, Indian Institute of Technology,Kharagpur, 721302, India.

[11] Mehlhorn, K., Michael, S., Naher, S. and Uhrig, C., The LEDA user manual, version 3.7.1, http://www.mpi-sb.mpg.de/LEDA/.

Dow

nloa

ded

by [

Flor

ida

Inte

rnat

iona

l Uni

vers

ity]

at 0

8:57

21

Dec

embe

r 20

14

Page 14: Computations that require higher than double precision for robust and exact decision making

Dow

nloa

ded

by [

Flor

ida

Inte

rnat

iona

l Uni

vers

ity]

at 0

8:57

21

Dec

embe

r 20

14