Accurate and Efficient Floating Point Summation

1 January 2004

journal article
Published by Society for Industrial & Applied Mathematics (SIAM) in SIAM Journal on Scientific Computing

Vol. 25 (4), 1214-1248
https://doi.org/10.1137/s1064827502407627

Abstract

We present and analyze several simple algorithms for accurately computing the sum of n floating point numbers using a wider accumulator. Let f and F be the number of significant bits in the summands and the accumulator, respectively. Then assuming gradual underflow, no overflow, and round-to-nearest arithmetic, up to approximately 2F-f numbers can be added accurately by simply summing the terms in decreasing order of exponents, yielding a sum correct to within about 1.5 units in the last place (ulps). We apply this result to the floating point formats in the IEEE floating point standard. For example, a dot product of single precision vectors of length at most 33 computed using double precision and sorting is guaranteed correct to nearly 1.5 ulps. If double-extended precision is used, the vector length can be as large as 65,537. We also investigate how the cost of sorting can be reduced or eliminated while retaining accuracy.

Keywords

This publication has 14 references indexed in Scilit:

Accuracy and Stability of Numerical Algorithms
Published by Society for Industrial & Applied Mathematics (SIAM) ,2002
The Accuracy of Floating Point Summation
SIAM Journal on Scientific Computing, 1993
Parallel algorithms for the rounding exact summation of floating point numbers
Computing, 1982
Software for Doubled-Precision Floating-Point Computations
ACM Transactions on Mathematical Software, 1981
Floating-Point Computation of Functions with Maximum Accuracy
IEEE Transactions on Computers, 1977
Formalization and implementation of floating-point matrix operations
Computing, 1976
Correction d'une somme en arithmetique a virgule flottante
Numerische Mathematik, 1972
On accurate floating-point summation
Communications of the ACM, 1971
A floating-point technique for extending the available precision
Numerische Mathematik, 1971
Quasi double-precision in floating point addition
BIT Numerical Mathematics, 1965

Cited by 55 articles