# IEEE Floating Point Standard

Floating point system

Floating point arithmetic

IEEE floating point standard

## IEEE Floating Point Standard

### What is the IEEE Floating Point Standard?

The IEEE floating point standard is a floating point arithmetic system adopted by the Institute for Electrical and Electronics Engineer in the early 1980s.

Requirements for machines adopting the IEEE floating point standard

1. Arithmetic should be correctly rounded
2. floating point numbers should be consistently represented across machines
3. Exception handling should be sensible and consistent

Web Reference:

### Floating point number representation

Single precision numbers in a 32-bit machine

The bit pattern b1b2b3...b9b10b11...b32  of a word in a 32-bit machine represents the real number

(-1)s x 2e-127 x (1.f)2

where s = b1,  e = (b2...b9)2, and f = b10b11...b32.

 sign bit biased exponent fraction from normalized mantissa 1 bit 8 bits 23 bits s e f

Note that only the fraction from the normalized mantissa is stored and so there is a hidden bit and the mantissa is actually represented by 24 binary digits.

Double precision numbers in a 32-bit machine

The bit pattern b1b2b3...b12b13b14...b64  of two words in a 32-bit machine represents the real number

(-1)s x 2e-1023 x (1.f)2

where s = b1,  e = (b2...b12)2, and f = b13b14...b64.

 sign bit biased exponent fraction from normalized mantissa 1 bit 11 bits 52 bits s e f

Note that only the fraction from the normalized mantissa is stored and so there is a hidden bit and the mantissa is actually represented by 53 binary digits.

Decimal values of some normalized floating point numbers on a 32-bit machine:

 Single Precision Double Precision Machine epsilon 2-23 or 1.192 x 10-7 2-52 or 2.220 x 10-16 Smallest positive 2-126 or 1.175 x 10-38 2-1022 or 2.225 x 10-308 Largest positive (2- 2-23) 2127 or 3.403 x 1038 (2- 2-52) 21023 or 1.798 x 10308 Smallest subnormal 2-150 or 7.0 x 10-46 2-1075 or 2.5 x 10-324 Decimal Precision 6 significant digits 15 significant digits

### Rounding in IEEE standard

Round to the nearest mode is the most common choice.  Basically, given a real number x, its correctly rounded value is the floating point number fl(x) that is closest to x