1
A Parallel Compensated Horner Scheme S. Graillat 1 , Y. Ibrahimy, C. Jeangoudoux 1 , C. Lauter 1 1 Sorbonne Universités, UPMC Univ Paris 06, CNRS, LIP6 UMR 7606, F-75005, Paris, France {stef.graillat,clothilde.jeangoudoux,christoph.lauter}@lip6.fr The Compensated Horner Scheme [1, 2] is an accurate and fast algorithm to evaluate univariate polynomials in floating-point arithmetic. The accuracy of the computed result is similar to the one given by the Horner scheme computed in twice the working precision. The implementation of the Compensated Horner Scheme runs at least as fast as existing implementations of Horner Scheme producing the same output accuracy. It is based on the so-called error-free transformations. These are algorithms that make it possible to compute (in pure floating-point arithmetic) the rounding error for the elementary operations (addition, subtraction and multiplication). In- deed, it is possible to show that these elementary rounding errors can be represented exactly as floating-point numbers (unless underflow or overflow occurs). Parallelizing compensated algorithms is tedious even for summation and dot product algorithms [3]. In this talk, we will present a parallel version of the Com- pensated Horner Scheme. Some experiments on multicore and Graphics Processor Units (GPU) architectures will be presented to show the efficiency of this algo- rithm. References [1] S. Graillat, N. Louvet, and Ph. Langlois. Compensated Horner scheme. Research Report 04, Équipe de recherche DALI, Laboratoire LP2A, Université de Perpignan Via Domitia, France, 52 avenue Paul Alduy, 66860 Perpignan cedex, France, July 2005. [2] S. Graillat, Ph. Langlois, and N. Louvet. Algorithms for accurate, validated and fast polyno- mial evaluation. Japan J. Indust. Appl. Math., 26(2-3):191–214, 2009. [3] N. Yamanaka, T. Ogita, S. M. Rump, and S. Oishi. A parallel algorithm for accurate dot product. Parallel Comput., 34(6-8):392–410, 2008. 1

A Parallel Compensated Horner Schememoreno/HPCA-ACA-2017/Graillat-HPC-ACA.pdf · A Parallel Compensated Horner Scheme S. Graillat1, Y. Ibrahimy, C. Jeangoudoux1, C. Lauter1 1 Sorbonne

Embed Size (px)

Citation preview

Page 1: A Parallel Compensated Horner Schememoreno/HPCA-ACA-2017/Graillat-HPC-ACA.pdf · A Parallel Compensated Horner Scheme S. Graillat1, Y. Ibrahimy, C. Jeangoudoux1, C. Lauter1 1 Sorbonne

A Parallel Compensated Horner Scheme

S. Graillat1, Y. Ibrahimy, C. Jeangoudoux1, C. Lauter1

1 Sorbonne Universités, UPMC Univ Paris 06, CNRS, LIP6 UMR 7606, F-75005, Paris, France{stef.graillat,clothilde.jeangoudoux,christoph.lauter}@lip6.fr

The Compensated Horner Scheme [1, 2] is an accurate and fast algorithm toevaluate univariate polynomials in floating-point arithmetic. The accuracy of thecomputed result is similar to the one given by the Horner scheme computed in twicethe working precision. The implementation of the Compensated Horner Schemeruns at least as fast as existing implementations of Horner Scheme producing thesame output accuracy.

It is based on the so-called error-free transformations. These are algorithmsthat make it possible to compute (in pure floating-point arithmetic) the roundingerror for the elementary operations (addition, subtraction and multiplication). In-deed, it is possible to show that these elementary rounding errors can be representedexactly as floating-point numbers (unless underflow or overflow occurs).

Parallelizing compensated algorithms is tedious even for summation and dotproduct algorithms [3]. In this talk, we will present a parallel version of the Com-pensated Horner Scheme. Some experiments on multicore and Graphics ProcessorUnits (GPU) architectures will be presented to show the efficiency of this algo-rithm.

References[1] S. Graillat, N. Louvet, and Ph. Langlois. Compensated Horner scheme. Research Report 04,

Équipe de recherche DALI, Laboratoire LP2A, Université de Perpignan Via Domitia, France,52 avenue Paul Alduy, 66860 Perpignan cedex, France, July 2005.

[2] S. Graillat, Ph. Langlois, and N. Louvet. Algorithms for accurate, validated and fast polyno-mial evaluation. Japan J. Indust. Appl. Math., 26(2-3):191–214, 2009.

[3] N. Yamanaka, T. Ogita, S. M. Rump, and S. Oishi. A parallel algorithm for accurate dotproduct. Parallel Comput., 34(6-8):392–410, 2008.

1