Skip to content

Floating-point - Normal/Denormal #196

@lingeandrea

Description

@lingeandrea

Slide 45/83 4. Basic concepts II Integral and floating-point types
Starting from the doc of Carl Burch, I discover that the used floating-point representation is 8-bit (1+4+3 sign-exponent-mantissa). It is better to write this information on the slide otherwise numbers in binary notation are incomprehensible.

Furthermore, if I understand 8-bit notation correctly, the binary number 00001111 (sign=0, exponent=0001, mantissa=111) = 1.111 x 2^(-7+1) = 1.111 x 2^(-6) = 0.000001111 = 1/64 + 1/128 + 1/256 + 1/512 = 1/64 (1 + 1/2 + 1/4 + 1/8) = 1/64 ((8 + 4 + 2 + 1)/8) = 1/64 15/8 or 15/8//64 not 17/8//64 as written.
Same thing for 00000111 --> 15/8//128

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions