PRB: Rounding Error Casting Double to Long (12297)
The information in this article applies to:
- Microsoft C for MS-DOS 5.1
- Microsoft C for MS-DOS 6.0
- Microsoft C for MS-DOS 6.0a
- Microsoft C for MS-DOS 6.0ax
- Microsoft C for OS/2 5.1
- Microsoft C for OS/2 6.0
- Microsoft C for OS/2 6.0a
- Microsoft C/C++ for MS-DOS 7.0
- Microsoft Visual C++ 1.0
- Microsoft Visual C++ 1.5
- Microsoft Visual C++ 1.51
- Microsoft Visual C++ 1.52
This article was previously published under Q12297 SYMPTOMS
An application produces incorrect results when it casts an expression
of type double to type long.
CAUSE
The error is caused by differences in arithmetic precision.
STATUS
The C and C++ standards specify that when an application converts a
floating-point number to an integer, the number is truncated toward
zero. When rounding is the preferred behavior, the application can
add or subtract 0.5 as appropriate so truncation provides the
correct result. The text below presents a macro for this purpose.
MORE INFORMATION
The following code example demonstrates this behavior when compiled
with any floating point math option other than /FPa. The application
displays -4049 as the value for Long2 and Long4, which is incorrect.
The application displays the correct value -4050 for Long1 and Long3.
Sample Code
/*
* Compile options needed: /FPc or /FPc87 or /FPi or /FPi87
*/
#include "stdio.h"
main()
{
long val1, val2, val3;
double mul1, mul2;
val1 = 45000;
mul1 = 0.09;
mul2 = (double)val1 * mul1 * -1.00;
printf("%7ld Long1 ", (long)mul2);
val2 = (long)mul2;
printf("%7ld Long2 ", (long)((double)val1 * mul1 * -1.00));
printf("%7ld Long3 ", val2);
val3 =(long)((double)val1 * mul1 * -1.00);
printf("%ld Long4 \n", val3);
}
The application produces the incorrect results by converting a 10-byte
real value to a long; the application produces correct results by
converting a 10-byte real to an 8-byte real and converting that value
to a long.
According to the type conversion rules, the conversion from a 10-byte
real to an 8-byte real rounds the number from -4049.99999999999985 to
-4050.0. When the application converts this value to a long, the value
-4050 results. However, when the application directly converts the
double value to a long, the application truncates toward zero. In this
example, -4049.99999999999985 becomes -4049.
Many numbers (such as .01) are repeating fractions in the binary
numbering system which cannot be represented exactly. Any
representation of these numbers is slightly more or less than the
"true" value. When a calculation involves one of these values, the
representation error propagates and can be magnified. Because the
error is present only in the least significant part of the number,
errors occur only when a calculation loses precision in an
intermediate value.
The conversions always truncate toward zero. The following macro
effectively rounds the number by increasing the magnitude of the
number by 0.5 then converting the number to an integer.
#define ROUNDL( d ) ((long)((d) + ((d) > 0 ? 0.5 : -0.5)))
The problem does not occur with the Microsoft Visual C++ 32-bit Editions as
Win32 does not support the long double data type for compatibility reasons.
For more information on converting floating-point numbers to integers,
see the type conversions section of the "C Language Reference" manual.
Modification Type: | Major | Last Reviewed: | 12/1/2003 |
---|
Keywords: | kb16bitonly kbprb KB12297 |
---|
|