Part 2: Verifying GCC optimizations using an SMT solver
This post describes the implementation of pysmtgcc. See “GCC Translation Validation” for background information.
Using an SMT solver
SMT solvers are used to find solutions to a system of equations. As a first example, let’s consider
\[3x + xy = 12\\\\ x^2 + y^2 = 13\]where \(x,y\in\mathbb{Z}\). There is a convenient Python API for Z3 and CVC51 that we can use this to solve this equation.
Creating the variables
We start by creating the variables \(x\) and \(y\).
x = z3.Const("x", z3.IntSort())
y = z3.Const("y", z3.IntSort())
IntSort()
sets the type to be a bignum integer. We could have used some other type, such as BitVecSort(32) representing 32-bit integers.
Creating the equations to solve
Next, we create a solver representing the problem to solve and populate it with our formulas
solver = z3.Solver()
solver.append(z3.IntVal(3) * x + x * y == z3.IntVal(12))
solver.append(x * x + y * y == z3.IntVal(13))
The Python API can convert Python values to SMT values without an explicit cast, so we can write, e.g., z3.IntVal(3)
as 3
, which makes this easier to read
solver = z3.Solver()
solver.append(3 * x + x * y == 12)
solver.append(x * x + y * y == 13)
Running the solver
Finally, we add a call to solver.check()
to solve the equation. The solver will, in principle, test all possible values, but it has clever ways of eliminating cases that cannot be a solution, and it is surprisingly fast for most cases that happen in reality.
The SMT solver returns one of
sat
– the system of equations has a solution. The solution can then be obtained by callingsolver.model()
.unsat
– no solution exists2.unknown
– the operation timed out, or the solver ran out of memory.
res = solver.check()
if res == z3.sat:
print(f"{solver.model()}")
elif res == z3.unsat:
print("Has no solution")
else:
print("Timeout")
Summary
Putting all of this together gives us the Python script
import z3
x = z3.Const("x", z3.IntSort())
y = z3.Const("y", z3.IntSort())
solver = z3.Solver()
solver.append(3 * x + x * y == 12)
solver.append(x * x + y * y == 13)
res = solver.check()
if res == z3.sat:
print(f"{solver.model()}")
elif res == z3.unsat:
print("Has no solution")
else:
print("Timeout")
Verifying compiler optimizations
We can use an SMT solver to verify compiler optimizations. The idea is that an optimization is valid if the original and optimized functions return the same value for all valid input3, so we let the SMT solver try to find a solution to
\[f(x) \neq f_{optimized}(x)\]The optimization is valid if it does not exist any solution.
Example: To check the optimization where
unsigned src(unsigned a, unsigned b)
{
return (a & ~b) - (a & b);
}
is optimized to
unsigned tgt(unsigned a, unsigned b)
{
return (a ^ b) - b;
}
we encode it as
import z3
a = z3.Const("a", z3.BitVecSort(32))
b = z3.Const("b", z3.BitVecSort(32))
src_retval = (a & ~b) - (a & b)
tgt_retval = (a ^ b) - b
solver = z3.Solver()
solver.append(src_retval != tgt_retval)
res = solver.check()
if res == z3.unsat:
print("Transformation seems to be correct.")
elif res == z3.sat:
print(f"Incorrect: {solver.model()}")
else:
print("Timeout")
Undefined behavior
One complication is that some operations have restrictions on what values they accept. For example, the result of 1/x
is not defined for x=0
. We must therefore exclude any input that invokes this undefined behavior for any instruction. We must also check that the optimized function does not introduce any new undefined behavior for values where the original function had a defined result.
This is done by adding two SMT formulas src_invokes_ub
and tgt_invokes_ub
that keep track of the cases where the two functions invoke UB, and we change the solver setup to
solver = z3.Solver()
solver.append(z3.Not(src_invokes_ub))
solver.append(Or(src_retval != tgt_retval, tgt_invokes_ub))
Example: To check the optimization where
int src(int x)
{
return 1 / x;
}
is optimized to
int tgt(int x)
{
return ((unsigned)x + 1) <= 2 ? x : 0;
}
we encode it as
import z3
x = z3.Const("x", z3.BitVecSort(32))
src_retval = 1 / x
src_invokes_ub = x == 0
tgt_retval = z3.If(z3.ULE((x + 1), 2), x, 0)
tgt_invokes_ub = False
solver = z3.Solver()
solver.append(z3.Not(src_invokes_ub))
solver.append(z3.Or(src_retval != tgt_retval, tgt_invokes_ub))
res = solver.check()
if res == z3.unsat:
print("Transformation seems to be correct.")
elif res == z3.sat:
print(f"Incorrect: {solver.model()}")
else:
print("Timeout")
Note: One must be careful with src_invokes_ub
as it makes the SMT solver ignore all values where it is True
– e.g., a bug always setting it to True
will say that all optimizations are correct, regardless of what they do!
pysmtgcc
does that checking in two steps – first, it checks that the functions return the same value
solver = z3.Solver()
solver.append(z3.Not(src_invokes_ub))
solver.append(src_retval != tgt_retval, tgt_invokes_ub)
res = solver.check()
...
and then it checks that the optimized function does not have additional UB
solver = z3.Solver()
solver.append(z3.Not(src_invokes_ub))
solver.append(tgt_invokes_ub)
res = solver.check()
...
The reason is that it made it easier for me to debug issues – if the return value differs, then I can just take the model values and generate a test case. If it is UB that does not modify the return value, then it needs a closer look to determine which instruction that invokes undefined behavior.
Translating GIMPLE functions
Generating an SMT formula for a GIMPLE function is done by iterating over the IR and building the SMT expressions as we go. We keep track of the translated expressions using a Python dictionary, which we use when looking up instruction operands
def get_tree_as_smt(expr, smt_fun):
if isinstance(expr, gcc.ParmDecl):
return smt_fun.tree_to_smt[expr]
if isinstance(expr, gcc.SsaName):
return smt_fun.tree_to_smt[expr]
if isinstance(expr, gcc.IntegerCst):
if isinstance(expr.type, gcc.BooleanType):
return BoolVal(expr.constant != 0)
assert isinstance(expr.type, gcc.IntegerType)
return BitVecVal(expr.constant, expr.type.precision)
...
The translation of a GIMPLE function is done as
for bb in fun.cfg.inverted_post_order:
for stmt in bb.gimple:
if isinstance(stmt, gcc.GimpleAssign) and len(stmt.rhs) == 2:
# Binary operation
rhs0 = get_tree_as_smt(stmt.rhs[0], smt_fun)
rhs1 = get_tree_as_smt(stmt.rhs[1], smt_fun)
if stmt.exprcode == gcc.BitIorExpr:
smt_fun.tree_to_smt[stmt.lhs] = rhs0 | rhs1
elif stmt.exprcode == gcc.BitAndExpr:
smt_fun.tree_to_smt[stmt.lhs] = rhs0 & rhs1
elif stmt.exprcode == gcc.BitXorExpr:
smt_fun.tree_to_smt[stmt.lhs] = rhs0 ^ rhs1
elif ...
...
elif ...
...
where fun
is the GIMPLE function we receive from GCC, and smt_fun
is the class representing this function in pysmtgcc
.
Doing the translation in one pass works reasonably well for experiments like pysmtgcc
, but it has problems. For example, it would be hard to handle loops with this method4. So a real implementation would need to transform the IR into a representation it can analyze and modify before generating the SMT formula.
GIMPLE undefined behavior
GIMPLE undefined behavior is similar to how it is done in the C language. For example, arithmetic instructions such as PLUS_EXPR
invokes undefined behavior if a signed addition wraps, and the program is then invalid5. So the implementation just builds invokes_ub
as describes in “Verifying compiler optimizations” above. For example, integer addition is implemented as
if stmt.exprcode == gcc.PlusExpr:
res = rhs0 + rhs1
if not stmt.lhs.type.overflow_wraps:
erhs0 = SignExt(1, rhs0)
erhs1 = SignExt(1, rhs1)
eres = erhs0 + erhs1
smt_fun.invokes_ub = Or(SignExt(1, res) != eres, smt_fun.invokes_ub)
smt_fun.tree_to_smt[stmt.lhs] = res
It is, unfortunately, a bit unclear exactly what is UB in GIMPLE6. Some are documented in tree.def
, but not all. And the text in tree.def
is a bit unclear. For example, for the number of bits to shift by in shift instructions, it says
Note that the result is undefined if the second operand is larger than or equal to the first operand’s type size.
Does this mean that this invokes undefined behavior? Or that it is defined, but it can return an arbitrary value? Anyway, GCC optimizes code involving shift instructions in ways that are invalid for both interpretations7.
Annoyances
This section discusses some issues that annoyed me during the implementation.
Floating point
The IEEE-754 floating-point representation for NaN is not unique – for example, the significand may contain a payload thath can be used to pass additional information8. The SMT solver does, however, canonicalize NaN to one value, which makes the result differ after some optimizations.
This is not a big problem – different CPU architectures handle payload differently, so the GCC middle-end can only make a few assumptions. But it fails tests where a function
int foo(int i)
{
union {
int i;
float f;
} u;
u.i = i;
u.f = u.f;
return u.i;
}
is optimized to
int foo(int i)
{
return i;
}
as this gives different results for the SMT solver when, e.g., i = -2
.
We could fix this particular case by always representing a floating-point number as a bit vector, and only making it a floating-point sort when needed. But that would give us a similar problem when optimizing
int foo(int i)
{
union {
int i;
float f;
} u;
float t = -0.0;
u.i = i;
u.f = u.f + t;
return u.i;
}
as it converts the bit vector to a floating-point sort when doing the addition, but not when the addition with -0.0
has been eliminated.
I cannot see any way to solve all the problems without writing my own SMT floating point implementation9…
Interprocedural optimizations
GCC does interprocedural optimizations that move code between functions etc., and this cannot be handled by pysmtgcc
as it works on one function at a time. We mitigate this by skipping the interprocedural (“IPA”) passes in plugin1.py
, but that does not solve all problems…
The IPA passes do global analysis, which can then be queried by the normal GIMPLE passes that work one function at a time. There are two cases where use of such information incorrectly is reported as miscompilations:
-
Initialized static variables that are never written work in the same ways as constants, so their uses can be substituted by the constants (that can then be constant folded etc.).
-
Static functions that are always called with constants for some arguments can be optimized with knowledge of those values. For example, if a function
int foo(int a) { if (a > 10) return 42; return 23; }
only is called as
foo(1)
andfoo(9)
, then it can be optimized to the equivalent ofint foo(int a) { return 23; }
We can solve this by letting pysmtgcc
query the IPA information, and generate SMT using that information:
- Treat global
static
variables in the same way asconst
if the IPA information says it is not written to. - Adding extra constraints to function
gcc.ParmDecl
, restricting the values to the ones used in the calls.
But the gcc-python-plugin
does not have APIs for querying the IPA information, so I currently ignore those issues (by passing -Dstatic=""
to the compiler).
Parallel Z3
It is possible to get Z3 to distribute its work over multiple threads by setting
the parallel.enable
parameter to True
:
z3.set_param("parallel.enable", True)
I guess it makes Z3 faster, but timeouts stop working, so the resulting plugin is slower when some checks take “forever” to analyze instead of giving up after SOLVER_TIMEOUT
seconds.
Performance
It is common that optimization passes do not find anything to optimize, so the two functions we compare are often identical. Z3 seems to detect this and is generally fast when checking such, but I noticed that some functions timed out on each UB query (thus taking several hours to check the translation unit).
I initially did the checking as
solver.append(z3.Not(src_invokes_ub))
solver.append(tgt_invokes_ub)
Changing this to
solver.append(z3.Not(src_invokes_ub))
solver.append(tgt_invokes_ub != src_invokes_ub)
made it quick for the cases that used to time out. But other UB checks got slower for cases where src_invokes_ub
and tgt_invokes_ub
were not identical. So I am now using
solver.append(z3.Not(src_invokes_ub))
solver.append(tgt_invokes_ub != src_invokes_ub)
solver.append(tgt_invokes_ub)
which seems to be the best of both worlds.
-
There are some minor API differences between Z3 and CVC5. For example, CVC5 is missing the
fpToIEEEBV
andto_smt2
functions. I am currently only using Z3 as I have not figured out how to enable timeouts for CVC5 using the Python API. But I am using the CVC5 API documentation, which is much nicer than the Z3 documentation. ↩ -
SMT solvers can, in general, generate a proof that it does not exist any solution. This is useful if we want to verify that the absence of a solution is not due to a bug in the SMT solver (solutions do not need a proof – we can just substitute the values and verify that it is correct). We are not using that functionality in
pysmtgcc
. ↩ -
Memory access complicates this. See part 3 of this blog series. ↩
-
Control flow is discussed in part 8 of this blog series. ↩
-
This is much simpler than how LLVM handles UB. For LLVM, the instruction is valid, but it returns a
poison
orundef
value, and the program is invalid if thepoison
/undef
is used in certain ways. This enables some optimizations that are not valid in GIMPLE (but GCC can do those optimizations in the back end where things like signed wrapping are not UB), but it makes the implementation more complex – many of the bugs found by Alive2 are related to how optimization passes propagatepoison
andundef
values. ↩ -
See e.g., Agner Fog’s “Floating point exception tracking and NAN propagation” ↩
-
On the other hand, different CPU architectures handle this differently, so it is not obvious what the correct middle-end semantics is, even if we write our own SMT floating point implementation. ↩