IEEE 32 Bit Floating Point Format

BITS

00000000001111111111222222222233
01234567890123456789012345678901
SEEEEEEEEFFFFFFFFFFFFFFFFFFFFFFF

= (-1)^s2^e-1271.f

where s = S
      e = EEEEEEEE, e > 0, e < 255
      f = FFFFFFFFFFFFFFFFFFFFFFF

Denormals: e = 0; Specials: (NaN, Inf, -Inf) e = 255

BYTES

SEEEEEEE EFFFFFFF FFFFFFFF FFFFFFFF

HEX

SEEE EEEE EFFF FFFF FFFF FFFF FFFF FFFF

Example: What is the IEEE 32 bit floating point
representation for the decimal number -11.5?

STEPS:

1. Convert to binary:

-11.5₁₀ = -1011.1₂

2. Convert to normalized binary scientific notation:

-1011.1₂ = -1.0111x2³

3. Determine s, e and f:

s = 1, e - 127 = 3₁₀ = 00000011₂
f = 01110000000000000000000₂

4. Assemble the 32 bits:

1 10000010 01110000000000000000000

11000001001110000000000000000000


5. Convert to hex:

1100 0001 0011 1000 0000 0000 0000 0000

C1380000

PROBLEMS:

1. What is the largest regular (non-special) 
   floating point number?

2. What is the smallest regular positive 
   (non-special) floating point number?

3. What is the best floating representation for 
   3.1415926535?

4. Is zero a floating point number? a denormal? 
   What about +0 and -0?

5. The floating point approximation to a real number 
   should be the "nearest" floating point number.
   How does one determine nearest?

6. What about rounding? Round-up, round-down, 
   round-to-even, round-to-odd