Strange behavior with NaN and -ffast-math

My previous blog post said that computations producing Inf, NaN, or -0.0 in programs compiled with -ffinite-math-only and -fno-signed-zeros might cause the program to behave in strange ways, such as not evaluating either the true or false part of an if-statement.

I have received several questions about this, so let’s look at an example of how this can happen.

Example – vectorization

There are cases where the compiler can generate better code by splitting an if-then-else

if (x > y) {
  do_something();
} else {
  do_something_else();
}

into

if (x > y) {
  do_something();
}
if (!(x > y)) {
  do_something_else();
}

-ffinite-math-only tells the compiler that no NaN values will ever be seen when running the program, so the compiler optimizes this to

if (x > y) {
  do_something();
}
if (x <= y) {
  do_something_else();
}

But this means that neither do_something nor do_something_else is evaluated if x or y happens to be NaN when the program runs.

This splitting of if-then-else helps vectorization where it makes it easier to work with element masks. This can be seen with the function below when compiled with clang 13.0.0 (godbolt)

float a[1024];
float b[1024];

void foo(void) {
  for (int i = 0; i < 1024; ++i) {
    if (b[i] > 42.0f) {
      a[i] = b[i] + 1.0f;
    } else {
      b[i] = a[i] + 1.0f;
    }
  }
}

The generated code uses masked moves to store the values

  vmovups     ymm2, ymmword ptr [rax + b+4096]
  vcmpleps    ymm3, ymm2, ymm0
  vaddps      ymm4, ymm1, ymmword ptr [rax + a+4096]
  vmaskmovps  ymmword ptr [rax + b+4096], ymm3, ymm4
  vcmpltps    ymm3, ymm0, ymm2
  vaddps      ymm2, ymm2, ymm1
  vmaskmovps  ymmword ptr [rax + a+4096], ymm3, ymm2

The mask calculated by vcmpleps or vcmpltps is false when the corresponding element in ymm2 (which contains b[i]) is NaN, so no value is stored when b[i] is NaN.

How to avoid such problems

This kind of strange behavior is uncommon – the usual failure mode is just that the program produces an incorrect value when NaN, Inf, or -0.0 is seen. But producing an incorrect value is not a desirable behavior either…

It is possible to detect the use of NaN and Inf by enabling trapping¹ using

feenableexcept(FE_OVERFLOW | FE_INVALID | FE_DIVBYZERO);

as described in the previous blog post. But the best way to avoid problems is not to use -ffast-math at all – I usually enable -ffast-math just to see if it improves the performance. If it does, I manually apply the profitable optimizations in the source code (if they are safe for my use case) to get the same performance without the flag.

Updated: Corrected compiler flags in godbolt example. Clarified the text explaining the example.

This does not detect -0.0. But -0.0 is very unlikely to cause any problems. ↩

Written on October 26, 2021