CS 267: Applications of Parallel Computers Floating Point Arithmetic

  • Published on

  • View

  • Download

Embed Size (px)


CS 267: Applications of Parallel Computers Floating Point Arithmetic. James Demmel www.cs.berkeley.edu/~demmel/cs267_Spr05. Outline. A little history IEEE floating point formats Error analysis Exception handling Using exception handling to go faster How to get extra precision cheaply - PowerPoint PPT Presentation


  • CS 267: Applications of Parallel Computers

    Floating Point ArithmeticJames Demmel


  • OutlineA little historyIEEE floating point formatsError analysisException handling Using exception handling to go fasterHow to get extra precision cheaplyDangers of Parallel and Heterogeneous Computing

  • A little historyVon Neumann and Goldstine - 1947 Cant expect to solve most big [n>15] linear systems without carrying many decimal digits [d>8], otherwise the computed answer would be completely inaccurate. - WRONG!Turing - 1949Carrying d digits is equivalent to changing the input data in the d-th place and then solving Ax=b. So if A is only known to d digits, the answer is as accurate as the data deserves.Backward Error Analysis Rediscovered in 1961 by Wilkinson and publicized (Turing Award 1970)Starting in the 1960s- many papers doing backward error analysis of various algorithmsMany years where each machine did FP arithmetic slightly differentlyBoth rounding and exception handling differedHard to write portable and reliable softwareMotivated search for industry-wide standard, beginning late 1970sFirst implementation: Intel 8087Turing Award 1989 to W. Kahan for design of the IEEE Floating Point Standards 754 (binary) and 854 (decimal)Nearly universally implemented in general purpose machines

  • Defining Floating Point ArithmeticRepresentable numbersScientific notation: +/- d.dd x rexpsign bit +/-radix r (usually 2 or 10, sometimes 16)significand d.dd (how many base-r digits d?)exponent exp (range?)others?Operations:arithmetic: +,-,x,/,... how to round result to fit in formatcomparison ()conversion between different formats short to long FP numbers, FP to integerexception handling what to do for 0/0, 2*largest_number, etc.binary/decimal conversionfor I/O, when radix not 10Language/library support for these operations

  • IEEE Floating Point Arithmetic Standard 754 - Normalized NumbersNormalized Nonzero Representable Numbers: +- 1.dd x 2expMacheps = Machine epsilon = 2-#significand bits = relative error in each operationOV = overflow threshold = largest numberUN = underflow threshold = smallest number

    +- Zero: +-, significand and exponent all zeroWhy bother with -0 laterFormat # bits #significand bits macheps #exponent bits exponent range---------- -------- ----------------------- ------------ -------------------- ----------------------Single 32 23+1 2-24 (~10-7) 8 2-126 - 2127 (~10+-38)Double 64 52+1 2-53 (~10-16) 11 2-1022 - 21023 (~10+-308)Double >=80 >=64 =15 2-16382 - 216383 (~10+-4932) Extended (80 bits on Intel machines)

  • Rules for performing arithmeticAs simple as possible:Take the exact value, and round it to the nearest floating point number (correct rounding)Break ties by rounding to nearest floating point number whose bottom bit is zero (rounding to nearest even)Other rounding options too (up, down, towards 0)Dont need exact value to do this!Early implementors worried it might be too expensive, but it isntApplies to+,-,*,/sqrtconversion between formatsrem(a,b) = remainder of a after dividing by ba = q*b + rem, q = floor(a/b)cos(x) = cos(rem(x,2*pi)) for |x| >= 2*pi cos(x) is exactly periodic, with period = rounded(2*pi)

  • Error AnalysisBasic error formulafl(a op b) = (a op b)*(1 + d) whereop one of +,-,*,/|d|
  • Example: polynomial evaluation using Horners ruleHorners rule to evaluate p = S ck * xkp = cn, for k=n-1 downto 0, p = x*p + ckNumerically Stable: Get p = S ck * xk where ck = ck (1+k) and | k | (n+1) Apply to (x-2)9 = x9 - 18*x8 + - 512k=0n^^^

  • Example: polynomial evaluation (continued)(x-2)9 = x9 - 18*x8 + - 512We can compute error bounds usingfl(a op b)=(a op b)*(1+d)

  • Cray ArithmeticHistorically very importantCrays among the fastest machinesOther fast machines emulated it (Fujitsu, Hitachi, NEC)Sloppy roundingfl(a + b) not necessarily (a + b)(1+d) but instead fl(a + b) = a*(1+da) + b*(1+db) where |da|,|db| 1Not with IEEE arithmeticFastest (at one time) eigenvalue algorithm in LAPACK failsNeed Pk (ak - bk) accuratelyNeed to preprocess by setting each ak = 2*ak - ak (kills bottom bit)Latest Crays do IEEE arithmeticMore cautionary tales: www.cs.berkeley.edu/~wkahan/ieee754status/why-ieee.pdf

  • General approach to error analysisSuppose we want to evaluate x = f(z)Ex: z = (A,b), x = solution of linear system Ax=bSuppose we use a backward stable algorithm alg(z)Ex: Gaussian elimination with pivotingThen alg(z) = f(z + e) where backward error e is smallError bound from Taylor expansion (scalar case):alg(z) = f(z+e) f(z) + f(z)*eAbsolute error = |alg(z) f(z)| |f(z)|* |e|Relative error = |alg(z) f(z)| / |f(z)| |f(z)*z/f(z)|* |e/z|Condition number = |f(z)*z/f(z)|Relative error (of output) = condition number * relative error of input |e/z|Applies to multivariate case tooEx: Gaussian elimination with pivoting for solving Ax=b

  • What happens when the exact value is not a real number, or is too small or too large to represent accurately?

    You get an exception

  • Exception HandlingWhat happens when the exact value is not a real number, or too small or too large to represent accurately?5 Exceptions:Overflow - exact result > OV, too large to representUnderflow - exact result nonzero and < UN, too small to representDivide-by-zero - nonzero/0Invalid - 0/0, sqrt(-1), Inexact - you made a rounding error (very common!)Possible responsesStop with error message (unfriendly, not default)Keep computing (default, but how?)

  • IEEE Floating Point Arithmetic Standard 754 - DenormsDenormalized Numbers: +-0.dd x 2min_expsign bit, nonzero significand, minimum exponentFills in gap between UN and 0Underflow Exceptionoccurs when exact nonzero result is less than underflow threshold UNEx: UN/3return a denorm, or zeroWhy bother? Necessary so that following code never divides by zero if (a != b) then x = a/(a-b)

  • IEEE Floating Point Arithmetic Standard 754 - +- Infinity+- Infinity: Sign bit, zero significand, maximum exponentOverflow Exceptionoccurs when exact finite result too large to represent accuratelyEx: 2*OVreturn +- infinityDivide by zero Exceptionreturn +- infinity = 1/+-0 sign of zero important! Example laterAlso return +- infinity for3+infinity, 2*infinity, infinity*infinityResult is exact, not an exception!

  • IEEE Floating Point Arithmetic Standard 754 - NAN (Not A Number)NAN: Sign bit, nonzero significand, maximum exponentInvalid Exceptionoccurs when exact result not a well-defined real number0/0sqrt(-1)infinity-infinity, infinity/infinity, 0*infinityNAN + 3NAN > 3?Return a NAN in all these casesTwo kinds of NANsQuiet - propagates without raising an exceptiongood for indicating missing dataEx: max(3,NAN) = 3Signaling - generate an exception when touchedgood for detecting uninitialized data

  • Exception Handling User InterfaceEach of the 5 exceptions has the following featuresA sticky flag, which is set as soon as an exception occursThe sticky flag can be reset and read by the user reset overflow_flag and invalid_flag perform a computation test overflow_flag and invalid_flag to see if any exception occurredAn exception flag, which indicate whether a trap should occurNot trapping is the defaultInstead, continue computing returning a NAN, infinity or denormOn a trap, there should be a user-writable exception handler with access to the parameters of the exceptional operationTrapping or precise interrupts like this are rarely implemented for performance reasons.

  • Exploiting Exception Handling to Design Faster AlgorithmsParadigm:

    Quick with high probabilityAssumes exception handling done quickly! (Will manufacturers do this?)Ex 1: Solving triangular system Tx=bPart of BLAS2 - highly optimized, but riskyIf T nearly singular, as when computing condition numbers, expect very large x, so scale inside inner loop: slow but low riskUse paradigm with sticky flags to detect nearly singular TUp to 9x faster on Dec Alpha Ex 2: Computing eigenvalues (part of next LAPACK release)

    Demmel/Li (www.cs.berkeley.edu/~xiaoye)1) Try fast, but possibly risky algorithm2) Quickly test for accuracy of answer (use exception handling)3) In rare case of inaccuracy, rerun using slower low risk algorithmFor k= 1 to n d = ak - s - bk2/d if |d| < tol, d = -tol if d < 0, count++vs.For k= 1 to n d = ak - s - bk2/d ok to divide by 0 count += signbit(d)

  • Summary of Values Representable in IEEE FP+- ZeroNormalized nonzero numbersDenormalized numbers+-InfinityNANs Signaling and quietMany systems have only quiet

    000000nonzero1.1001.1nonzeroNot 0 or all 1sanything

  • Simulating extra precisionWhat if 64 or 80 bits is not enough?Very large problems on very large machines may need moreSometimes only known way to get right answer (mesh generation)Sometimes you can trade communication for extra precisionCan simulate high precision efficiently just using floating pointEach extended precision number s is represented by an array (s1,s2,,sn) where each sk is a FP number s = s1 + s2 + + sn in exact arithmetics1 >> s2 >> >> sn Ex: Computing (s1,s2) = a + b if |a|
  • Hazards of Parallel and Heterogeneous ComputingWhat new bugs arise in parallel floating point programs?Ex 1: NonrepeatabilityMakes debugging hard!Ex 2: Different exception handlingCan cause programs to hangEx 3: Different rounding (even on IEEE FP machines)Can cause hanging, or wrong results with no warningSee www.netlib.org/lapack/lawns/law


View more >