Strange behavior with NaN and -ffast-math
My previous blog post said that computations producing Inf
, NaN
, or -0.0
in programs compiled with -ffinite-math-only
and -fno-signed-zeros
might cause the program to behave in strange ways, such as not evaluating either the true or false part of an if
-statement.
I have received several questions about this, so let’s look at an example of how this can happen.
Example – vectorization
There are cases where the compiler can generate better code by splitting an if-then-else
if (x > y) {
do_something();
} else {
do_something_else();
}
into
if (x > y) {
do_something();
}
if (!(x > y)) {
do_something_else();
}
-ffinite-math-only
tells the compiler that no NaN
values will ever be seen when running the program, so the compiler optimizes this to
if (x > y) {
do_something();
}
if (x <= y) {
do_something_else();
}
But this means that neither do_something
nor do_something_else
is evaluated if x
or y
happens to be NaN
when the program runs.
This splitting of if-then-else
helps vectorization where it makes it easier to work with element masks. This can be seen with the function below when compiled with clang 13.0.0 (godbolt)
float a[1024];
float b[1024];
void foo(void) {
for (int i = 0; i < 1024; ++i) {
if (b[i] > 42.0f) {
a[i] = b[i] + 1.0f;
} else {
b[i] = a[i] + 1.0f;
}
}
}
The generated code uses masked moves to store the values
vmovups ymm2, ymmword ptr [rax + b+4096]
vcmpleps ymm3, ymm2, ymm0
vaddps ymm4, ymm1, ymmword ptr [rax + a+4096]
vmaskmovps ymmword ptr [rax + b+4096], ymm3, ymm4
vcmpltps ymm3, ymm0, ymm2
vaddps ymm2, ymm2, ymm1
vmaskmovps ymmword ptr [rax + a+4096], ymm3, ymm2
The mask calculated by vcmpleps
or vcmpltps
is false when the corresponding element in ymm2
(which contains b[i]
) is NaN
, so no value is stored when b[i]
is NaN
.
How to avoid such problems
This kind of strange behavior is uncommon – the usual failure mode is just that the program produces an incorrect value when NaN
, Inf
, or -0.0
is seen. But producing an incorrect value is not a desirable behavior either…
It is possible to detect the use of NaN
and Inf
by enabling trapping1 using
feenableexcept(FE_OVERFLOW | FE_INVALID | FE_DIVBYZERO);
as described in the previous blog post. But the best way to avoid problems is not to use -ffast-math
at all – I usually enable -ffast-math
just to see if it improves the performance. If it does, I manually apply the profitable optimizations in the source code (if they are safe for my use case) to get the same performance without the flag.
Updated: Corrected compiler flags in godbolt example. Clarified the text explaining the example.
-
This does not detect
-0.0
. But-0.0
is very unlikely to cause any problems. ↩