Floating Point System

Floating point arithmetic

IEEE floating point standard

Floating Point System

Objective

Understand and handle error from floating point calculation
Role of stability in numerical calculation

Vocabulary

Base, Sign bit, Exponent, Mantissa, underflow, overflow, rounding, chopping, roundoff error, absolute error, relative error, significant digits, truncation error, machine epsilon.

Concepts

Distribution of floating point numbers
Addition of floating point numbers
Loss of associativity and distributivity
Catastrophic cancellation
Prevention of catastrophic cancellation
Forward error analysis
Backward error analysis

Extra:

IEEE standard
Subnormal numbers, guard bit

Some External Links:

Floating Points,  An excellent general introduction by Cleve Moler on the floating points and IEEE Standard.

Basic Issues in Floating Point Arithmetic and Error Analysis, Supplementary Lecture Notes of J. Demmel on Floating Point System, University of California at Berkeley, September 1995. 

Miscalculating Area and Angles of a Needle-like Triangle, Notes of W. Kahan on common misconceptions of floating point calculations, University of California at Berkeley, September 1997. 

IEEE Floating Point Arithmetic, More advanced Lecture Notes of W. Kahan on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic, University of California at Berkeley, September 1995. 

Floating-Point Number Tutorial,  Java applets are used to visualize the significance of mantissa size and exponent range and the meaning of underflow, overflow, and roundoff error.

 

References

Goldberg, David, What every computer scientist should know about floating-point arithmetic, ACM Computing Surveys, Vol.23, No. 1 (March 1991), pp. 5-48 (Electronic version not available.)

 

Correction to Misconceptions

The machine epsilon is not the smallest machine representable number.
 
 
Back to Top