C++ – Should I Use Double or Float?

c++double-precisionfloating-pointtypes

What are the advantages and disadvantages of using one instead of the other in C++?

Best Answer

If you want to know the true answer, you should read What Every Computer Scientist Should Know About Floating-Point Arithmetic.

In short, although double allows for higher precision in its representation, for certain calculations it would produce larger errors. The "right" choice is: use as much precision as you need but not more and choose the right algorithm.

Many compilers do extended floating point math in "non-strict" mode anyway (i.e. use a wider floating point type available in hardware, e.g. 80-bits and 128-bits floating), this should be taken into account as well. In practice, you can hardly see any difference in speed -- they are natives to hardware anyway.

Related Solutions

C++ – Understanding Double Precision

First you should read one (or both) of these articles: What Every Computer Scientist Should Know About Floating-Point Arithmetic and The Perils of Floating Point.

If you are looking for a solution for your template, I would suggest using template specialization for the cases where T==double and T==float.

Float vs Double – Differences Between Float and Double in C++

Huge difference.

As the name implies, a double has 2x the precision of float^[1]. In general a double has 15 decimal digits of precision, while float has 7.

Here's how the number of digits are calculated:

double has 52 mantissa bits + 1 hidden bit: log(2⁵³)÷log(10) = 15.95 digits

float has 23 mantissa bits + 1 hidden bit: log(2²⁴)÷log(10) = 7.22 digits

This precision loss could lead to greater truncation errors being accumulated when repeated calculations are done, e.g.

float a = 1.f / 81;
float b = 0;
for (int i = 0; i < 729; ++ i)
    b += a;
printf("%.7g\n", b); // prints 9.000023

while

double a = 1.0 / 81;
double b = 0;
for (int i = 0; i < 729; ++ i)
    b += a;
printf("%.15g\n", b); // prints 8.99999999999996

Also, the maximum value of float is about 3e38, but double is about 1.7e308, so using float can hit "infinity" (i.e. a special floating-point number) much more easily than double for something simple, e.g. computing the factorial of 60.

During testing, maybe a few test cases contain these huge numbers, which may cause your programs to fail if you use floats.

Of course, sometimes, even double isn't accurate enough, hence we sometimes have long double^[1] (the above example gives 9.000000000000000066 on Mac), but all floating point types suffer from round-off errors, so if precision is very important (e.g. money processing) you should use int or a fraction class.

Furthermore, don't use += to sum lots of floating point numbers, as the errors accumulate quickly. If you're using Python, use fsum. Otherwise, try to implement the Kahan summation algorithm.

^{[1]: The C and C++ standards do not specify the representation of float, double and long double. It is possible that all three are implemented as IEEE double-precision. Nevertheless, for most architectures (gcc, MSVC; x86, x64, ARM) float is indeed a IEEE single-precision floating point number (binary32), and double is a IEEE double-precision floating point number (binary64).}

Best Answer

Related Solutions

C++ – Understanding Double Precision

Float vs Double – Differences Between Float and Double in C++

Related Question