In general, the rules described above apply to all languages, including C,
C++, and assembler. The samples below demonstrate some of the rules using
FORTRAN PowerStation. All of the samples were compiled using FORTRAN
PowerStation 32 without any options, except for the last one, which is
written in C.
Please refer to the FORTRAN manual(s) supplied with Microsoft FORTRAN for a
description of numeric constants, and article
36068 for a description of
the internal representation of floating-point values.
The first sample demonstrates two things:
- That FORTRAN constants are single precision by default (C constants are
double precision by default).
- Calculations that contain any single precision terms are not much more
accurate than calculations in which all terms are single precision.
After being initialized with 1.1 (a single precision constant), y is as
inaccurate as a single precision variable.
x = 1.100000000000000 y = 1.100000023841858
The result of multiplying a single precision value by an accurate double
precision value is nearly as bad as multiplying two single precision
values. Both calculations have thousands of times as much error as
multiplying two double precision values.
true = 1.320000000000000 (multiplying 2 double precision values)
y = 1.320000052452087 (multiplying a double and a single)
z = 1.320000081062318 (multiplying 2 single precision values)
Sample Code
C Compile options: none
real*8 x,y,z
x = 1.1D0
y = 1.1
print *, 'x =',x, 'y =', y
y = 1.2 * x
z = 1.2 * 1.1
print *, x, y, z
Sample 2 uses the quadratic equation. It demonstrates that even double
precision calculations are not perfect, and that the result of a
calculation should be tested before it is depended on if small errors can
have drastic results. The input to the square root function in sample 2 is
only very slightly negative, but it is still invalid. If the double
precision calculations did not have slight errors, the result would be:
Root = -1.1500000000
Instead, it generates the following error:
run-time error M6201: MATH
- sqrt: DOMAIN error
Sample Code
C Compile options: none
real*8 a,b,c,x,y
x = b**2
y = 4*a*c
print *,x,y,x-y
print "(' Root =',F16.10)",(-b+dsqrt(x-y))/(2*a)
Sample 3 demonstrates that due to optimizations that occur even if
optimization is not turned on, values may temporarily retain a higher
precision than expected, and that it is very unwise to test two floating-
point values for equality.
In this example, two values are both equal and not equal. At the first IF,
the value of Z is still on the coprocessor's stack and has the same
precision as Y. Therefore X does not equal Y and the first message is
printed out. At the time of the second IF, Z had to be loaded from memory
and therefore had the same precision and value as X, and the second message
also is printed.
Sample Code
C Compile options: none
real*8 y
if ( then
print *,'X does not equal Z'
end if
if (x.eq.z) then
print *,'X equals Z'
end if
The first part of sample code 4 calculates the smallest possible difference
between two numbers close to 1.0. It does this by adding a single bit to
the binary representation of 1.0.
x = 1.00000000000000000 (one bit more than 1.0)
y = 1.00000000000000000 (exactly 1.0)
x-y = .00000000000000022 (smallest possible difference)
Some versions of FORTRAN round the numbers when displaying them so that the
inherent numerical imprecision is not so obvious. This is why x and y look
the same when displayed.
The second part of sample code 4 calculates the smallest possible
difference between 2 numbers close to 10.0. Again, it does this by adding a
single bit to the binary representation of 10.0. Notice that the difference
between numbers near 10 is larger than the difference near 1. This
demonstrates the general principle that the larger the absolute value of a
number, the less precisely it can be stored in a given number of bits.
x = 10.00000000000000000 (one bit more than 10.0)
y = 10.00000000000000000 (exactly 10.0)
x-y = .00000000000000178
The binary representation of these numbers is also displayed to show that
they do differ by only one bit.
x = 4024000000000001 Hex
y = 4024000000000000 Hex
The last part of sample code 4 shows that simple nonrepeating decimal
values often can be represented in binary only by a repeating fraction. In
this case x=1.05, which requires a repeating factor CCCCCCCC....(Hex) in
the mantissa. In FORTRAN, the last digit "C" is rounded up to "D" in order
to maintain the highest possible accuracy:
x = 3FF0CCCCCCCCCCCD (Hex representation of 1.05D0)
Even after rounding, the result is not perfectly accurate. There is some
error after the least significant digit, which we can see by removing the
first digit.
x-1 = .05000000000000004
Sample Code
C Compile options: none
IMPLICIT real*8 (A-Z)
integer*4 i(2)
real*8 x,y
equivalence (i(1),x)
print "(1x,'x =',F20.17,' y=',f20.17)", x,y
print "(1x,'x-y=',F20.17)", x-y
print *
print "(1x,'x =',F20.17,' y=',f20.17)", x,y
print "(1x,'x-y=',F20.17)", x-y
print *
print "(1x,'x =',Z16,' Hex y=',Z16,' Hex')", x,y
print *
print "(1x,'x =',F20.17)", x
print "(1x,'x =',Z16,' Hex')", x
print "(1x,'x-1=',F20.17)", x
print *
In C, floating constants are doubles by default. Use an "f" to indicate a
float value, as in "89.95f".
/* Compile options needed: none
#include <stdio.h>
void main()
float floatvar;
double doublevar;
/* Print double constant. */
printf("89.95 = %f\n", 89.95); // 89.95 = 89.950000
/* Printf float constant */
printf("89.95 = %f\n", 89.95F); // 89.95 = 89.949997
/*** Use double constant. ***/
floatvar = 89.95;
doublevar = 89.95;
printf("89.95 = %f\n", floatvar); // 89.95 = 89.949997
printf("89.95 = %lf\n", doublevar); // 89.95 = 89.950000
/*** Use float constant. ***/
floatvar = 89.95f;
doublevar = 89.95f;
printf("89.95 = %f\n", floatvar); // 89.95 = 89.949997
printf("89.95 = %lf\n", doublevar); // 89.95 = 89.949997