Floating-point - Normal/Denormal

Slide 45/83 4. Basic concepts II Integral and floating-point types
Starting from the doc of Carl Burch, I discover that the used floating-point representation is 8-bit (1+4+3 sign-exponent-mantissa). It is better to write this information on the slide otherwise numbers in binary notation are incomprehensible.

Furthermore, if I understand 8-bit notation correctly, the binary number 00001111 (sign=0, exponent=0001, mantissa=111) = 1.111 x 2^(-7+1) = 1.111 x 2^(-6) = 0.000001111 = 1/64 + 1/128 + 1/256 + 1/512 = 1/64 (1 + 1/2 + 1/4 + 1/8) = 1/64 ((8 + 4 + 2 + 1)/8) = 1/64 15/8 or 15/8//64 not 17/8//64 as written.
Same thing for 00000111 --> 15/8//128

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Floating-point - Normal/Denormal #196

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Floating-point - Normal/Denormal #196

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions