# Strange behavior with NaN and -ffast-math

My previous blog post said that computations producing Inf, NaN, or -0.0 in programs compiled with -ffinite-math-only and -fno-signed-zeros might cause the program to behave in strange ways, such as not evaluating either the true or false part of an if-statement.

# Example – vectorization

There are cases where the compiler can generate better code by splitting an if-then-else

if (x > y) {
do_something();
} else {
do_something_else();
}


into

if (x > y) {
do_something();
}
if (!(x > y)) {
do_something_else();
}


-ffinite-math-only tells the compiler that no NaN values will ever be seen when running the program, so the compiler optimizes this to

if (x > y) {
do_something();
}
if (x <= y) {
do_something_else();
}


But this means that neither do_something nor do_something_else is evaluated if x or y happens to be NaN when the program runs.

This splitting of if-then-else helps vectorization where it makes it easier to work with element masks. This can be seen with the function below when compiled with clang 13.0.0 (godbolt)

float a[1024];
float b[1024];

void foo(void) {
for (int i = 0; i < 1024; ++i) {
if (b[i] > 42.0f) {
a[i] = b[i] + 1.0f;
} else {
b[i] = a[i] + 1.0f;
}
}
}


The generated code uses masked moves to store the values

  vmovups     ymm2, ymmword ptr [rax + b+4096]
vcmpleps    ymm3, ymm2, ymm0
vaddps      ymm4, ymm1, ymmword ptr [rax + a+4096]
vmaskmovps  ymmword ptr [rax + b+4096], ymm3, ymm4
vcmpltps    ymm3, ymm0, ymm2
vmaskmovps  ymmword ptr [rax + a+4096], ymm3, ymm2


The mask calculated by vcmpleps or vcmpltps is false when the corresponding element in ymm2 (which contains b[i]) is NaN, so no value is stored when b[i] is NaN.

# How to avoid such problems

This kind of strange behavior is uncommon – the usual failure mode is just that the program produces an incorrect value when NaN, Inf, or -0.0 is seen. But producing an incorrect value is not a desirable behavior either…

It is possible to detect the use of NaN and Inf by enabling trapping1 using

feenableexcept(FE_OVERFLOW | FE_INVALID | FE_DIVBYZERO);


as described in the previous blog post. But the best way to avoid problems is not to use -ffast-math at all – I usually enable -ffast-math just to see if it improves the performance. If it does, I manually apply the profitable optimizations in the source code (if they are safe for my use case) to get the same performance without the flag.

Updated: Corrected compiler flags in godbolt example. Clarified the text explaining the example.

1. This does not detect -0.0. But -0.0 is very unlikely to cause any problems.

Written on October 26, 2021