C Floating-Point – Representing Integers in Doubles

c++floating-pointieee-754mathprecision

Can a double (of a given number of bytes, with a reasonable mantissa/exponent balance) always fully precisely hold the range of an unsigned integer of half that number of bytes?

E.g. can an eight byte double fully precisely hold the range of numbers of a four byte unsigned int?

What this will boil down to is if a two byte float can hold the range of a one byte unsigned int.

A one byte unsigned int will of course be 0 -> 255.

Best Answer

An IEEE754 64-bit double can represent any 32-bit integer, simply because it has 53-odd^(a) bits available for precision and the 32-bit integer only needs, well, 32 :-)

It would be plausible for a (non IEEE754 double precision) 64-bit floating point number to have less than 32 bits of precision. That would allow truly huge numbers (due to the exponent) but at the cost of precision.

The bottom line is that, provided there are more bits of precision in the mantissa of the floating point number than there are in the integer (and enough bits in the exponent to scale it), then it can be represented without loss of precision.

^(a) Technically, the 53rd bit of precision is an implied 1 at the start of the sequence so the amount of "variablity" may only be 52 bits. Whether it's 52 or 53, it's still enough bits to represent every 32-bit integer.

Best Answer

Related Solutions

Related Question