Binary Real Numbers
Fractions
- Fraction is not a whole number (stuff past the decimal point)
- Represents how many negative powers fit in
- Fractional Decimal → Binary
- Write the fraction in the left corner
- Multiply the number by 2 and extract the integer part as the binary (written below)
- Repeat, moving to the right, until the number is 0.0

Fixed Point
- N bits available, divided into the integer and fraction components
- Conversion: Convert integer and fraction parts separately
Floating Point
- Base 2 scientific notation
- Normalized Representation
- The form ±2exp⋅1.xx…
- 1.xx is a binary fraction that always starts with 1 (to save space)
- Stored as
- Sign
- Exponent
- Mantissa
- Numbers after decimal point
- Filled with 0s to the right (because it’s a fraction)
- Sizes
- Single Precision (32 bits)
- Sign: 1
- Exp: 8 (excess 127)
- Mantissa: 23
- Double Precision (64 bits)
- Sign: 1
- Exp: 11 (excess 1023)
- Mantissa: 52
- Decimal → FP Steps
- Convert integer and fractional components to binary
- Normalize
- Convert exponent to excess representation
- Assemble into final representation
- Range
- With a k bit mantissa
- [−(2−2k)⋅eEmax,+(2−2−k)⋅2Emax]
Operations
- Addition/Subtractions
- Identify the operand with the bigger exponent
- Shift the mantissa of the operand with smaller exponent to the left until the two exponents match
- Perform the addition/subtraction of the resulting mantissa
- Normalize the result
- Perform underflow/overflow test
- Multiplication
- Sign = product of signs
- Add exponents
- Multiply mantissa
- Normalize if needed
- Underflow/overflow test
Special Cases
Exponent | Fraction | Represents |
---|
0 | 0 | 0 |
0 | Nonzero | +- denormalized |
1-254 | Anything | +- floating point |
255 | 0 | +- infinity |
255 | Nonzero | Nan |