Floating-Point Types

The Compiler supports the two IEEE standard formats (32 and 64 bits wide) for floating-point types. The following table shows the range of values for the various floating-point representations.

The Compiler implements the default format for a float as 32-bit IEEE, and double as IEEE 64-bit format. If you need speed more than the added accuracy of double arithmetic operations, issue the -Fd: Double is IEEE32 command-line option. Using this option, the Compiler implements both float and double using the IEEE 32-bit format.

Use the -T: Flexible Type Management option to change the default format of a float or double.

Table 1. Floating-Point Representation
Type Default Format Default Value Range Formats Available with the -T Option
Min Max
float IEEE32 1.17549435E-38F 3.402823466E+38F IEEE32, IEEE64
double IEEE64 2.2259738585972014E-308 1.7976931348623157E+308 IEEE32, IEEE64
long double IEEE64 2.2259738585972014E-308 1.7976931348623157E+308 IEEE32, IEEE64
long long double IEEE64 2.2259738585972014E-308 1.7976931348623157E+308 IEEE32, IEEE64