Testing decompiler semantics

easier to understand to a programmer and uses the same semantics as the binary program. Both properties (don’t change the semantic and produce high level source code) are required. Without the first you get a nice looking program which doesn’t represent the starting binary. Without the later you may just look at a disassembler listing.

Modelling instructions

One important early step of preserving the semantics is the transformation from CPU instructions to an (CPU agnostic) internal representation. This can be done in two sub-steps (binary instructions to disassembled instructions and than disassembled instructions to internal representation) or in one sub-step. One advantage of using two sub-steps is that a 3rd party disassembler can be used which reduces development time. Regardless of the implementation the outcome is a list of internal statements for each binary instruction. For i386 such statements usually include: the actual operation, update of the flags and update of smaller and larger registers (inc %ax will also update %al, %ah and %eax). The task of verifying that all the side effects of the binary instructions were correctly modeled is made more difficult by the fact that unused side effects are discarded by the decompiler. So the test design should capture all side effects. This is done by folding all side effects into the return value. Here I used two test function types: one for the register values and one for the flag values. In the end a test methods looks like this:
080805b3 <inst_101_values_var_7>:
// save registers
 80805b3:       push   %ebx
 80805b4:       push   %edx
 80805b5:       push   %ebp
 80805b6:       push   %esi
 80805b7:       push   %edi
 80805b8:       push   %gs
// initialize all registers with random values
 80805ba:       mov    $0xf2e8b1b3,%eax
 80805bf:       mov    $0xf96ff545,%ebx
 80805c4:       mov    $0xc8a6ed47,%ecx
 80805c9:       mov    $0x4e0f4c05,%edx
 80805ce:       mov    $0x50690c3e,%ebp
 80805d3:       mov    $0xb7e7569c,%esi
 80805d8:       mov    $0xb3138312,%edi
// set all flags to a defined state
 80805dd:       neg    %eax
// the operation under test
 80805df:       inc    %ax
// merge all into %eax
 80805e1:       add    %ebx,%eax
 80805e3:       add    %ecx,%eax
 80805e5:       add    %edx,%eax
 80805e7:       add    %ebp,%eax
 80805e9:       add    %esi,%eax
 80805eb:       add    %edi,%eax
// substract the expected value; the values comes from
// the execution on a real CPU
 80805ed:       sub    $0xd8a162cb,%eax
// restore registers and return
 80805f2:       pop    %gs
 80805f4:       pop    %edi
 80805f5:       pop    %esi
 80805f6:       pop    %ebp
 80805f7:       pop    %edx
 80805f8:       pop    %ebx
 80805f9:       ret
The variant which tests flags is similar:
080b7493 <inst_226_flags_var_0>:
...
 80b74bd:       neg    %eax
// the operation under test
 80b74bf:       sbb    $0x20,%ah
// %ebp = CF?1:0
 80b74c2:       mov    $0x0,%ebp
 80b74c7:       mov    $0x1,%eax
 80b74cc:       cmovb  %eax,%ebp
// %edi = OF?2:0
 80b74cf:       mov    $0x0,%edi
 80b74d4:       mov    $0x2,%eax
 80b74d9:       cmovo  %eax,%edi
// %esi = PF?4:0
 80b74dc:       mov    $0x0,%esi
 80b74e1:       mov    $0x4,%eax
 80b74e6:       cmovp  %eax,%esi
// %edx = SF?8:0
 80b74e9:       mov    $0x0,%edx
 80b74ee:       mov    $0x8,%eax
 80b74f3:       cmovs  %eax,%edx
// %ecx = AF?16:0
// there is no condition code for AF but with
// the aaa instruction we can convert AF to CF
 80b74f6:       mov    $0x0,%ecx
 80b74fb:       mov    $0x0,%al
 80b74fd:       aaa
 80b74fe:       mov    $0x10,%eax
 80b7503:       cmovb  %eax,%ecx
// add them together
 80b7506:       xor    %eax,%eax
 80b7508:       add    %ebp,%eax
 80b750a:       add    %edi,%eax
 80b750c:       add    %esi,%eax
 80b750e:       add    %edx,%eax
 80b7510:       add    %ecx,%eax
// substract the expected value
 80b7512:       sub    $0x6,%eax
 80b7515:       pop    %gs
...

Reducing expressions

As you can see from the listing these tests assume that the decompiler evaluates/emulates/simplifies the instructions into a simple return 0. I know from the past that some decompilers are stronger and some are weaker in their ability to reduce complex expressions to simpler ones. To avoid additional barriers I chose in the above tests only simple instructions and corresponding transformations. In general a decompiler has to perform transformation to achieve the goal of high-level output. This can be as simple as x^x => 0 or to merge addition and shifts into multiplication (see for example this godbolt test). Normally at least one of the expressions is a variable (for example a function parameter with unknown value) while in the tests from above all values are fixed. However supporting the transformation of such expressions shouldn’t be difficult if the framework for expressions with a variable is already in place. And yes this means that the decompiler is in fact a CPU emulator.

Results

This is for the integer instruction of i386+. The ELF binary and the results are in in github. This binary test file contains two methods (one for the register values and for flag values) for each of the 290 instruction variants defined in the input file. So 580 functions in total.

Ghidra (version 9.1.2)

For some reason ghidra thinks in 81% (469/580) that edx is an input register but doesn’t model it as a function parameter. Also odd is that the return type is 64bit and that the register is shifted 64bits to the left. But in general this looks like the expected “return 0”;
longlong inst_0_values_var_0(void)
{
  uint in_EDX;
  return (ulonglong)in_EDX << 0x20;
}
In around 100 cases the parity flag is not modeled:
undefined8 inst_0_flags_var_0(void)
{
  undefined4 in_EDX;
  int iVar1;
  bool in_PF;
  iVar1 = 0;
  if (in_PF) {
    iVar1 = 4;
  }
  return CONCAT44(in_EDX,iVar1 + -4);
}
In the remaining cases the instruction is modeled with a loop and ghidra is unable to reduce this loop when all values are constant. The following case contains a bit scan forward (bsf):
undefined8 inst_30_values_var_0(void)
{
  int iVar1;
  undefined4 in_EDX;
  iVar1 = 0;
  while ((0x25e0215eU >> iVar1 & 1) == 0) {
    iVar1 = iVar1 + 1;
  }
  return CONCAT44(in_EDX,iVar1 + -1);
}

Retdec (current git master@a82f16e9)

In 97% (563/580) of the cases retdec returns 0 as expected. All other cases are non-zero concrete values.

Reko (current git master@9ed122)

There are 133 flag value functions which return zero. It looks like they are from instructions which do not change flags at all. Reko is sadly not able to reduce basic expressions further:
word32 inst_10_values_var_0()
{
  return -2802897708 + 2802897708;
}
This makes it harder to assess the degree of correctness. Using grep I found 35 cases of the above with identical values. But other cases are also ok but can’t be easily detected (​-0xD0AA51D7 == 794144297):
word32 inst_7_values_var_0()
{
  return -794144297 - 0xD0AA51D7;
}
Reko doesn’t remove branches it detects as not taken (if(false)) which probably prevents a simplification:
word32 inst_7_flags_var_0()
{
  word32 ebp_33 = 0x00;
  if (false)
    ebp_33 = 0x01;
  word32 edi_37 = 0x00;
  if (OVERFLOW(0x5F))
    edi_37 = 0x02;
  word32 esi_42 = 0x00;
  if (!P)
    esi_42 = 0x04;
  word32 edx_48 = 0x00;
  if (false)
    edx_48 = 0x08;
  return ebp_33 + edi_37 + esi_42 + edx_48 - 0x07;
}
In the above case it is also unclear where P comes from.

Snowman (current git master@aba7af)

Snowman returns zero in 133 cases. Similar to reko snowman also doesn’t perform addition:
int32_t inst_222_values_var_0() {
  return -0x616cb887 - 0x52aee46f + 0x1aec8e77 - 0xab5e0a0 - 0x1e414626 - 0x7a54e6ce - 0x588d02fe - 0x6af7e0ee;
}
Similar to reko branches not taken are not removed and make the output harder to understand:
int32_t inst_63_values_var_0() {
  int32_t ebp1;
  ebp1 = 0x3f43bd7f;
  if (0) {
    ebp1 = 0x223bf959;
  }
  return -0x4632d959 - 0x20980a29 + 0x4f46ab54 + 0x223bf959 + ebp1 - 0x1fbdb32d - 0x52def38c + 0x28a1280f;
}
It looks like some of the tested instructions are not supported by snowman:
int32_t inst_40_values_var_0() {
  __asm__("btc si, 0xcd");
  return -0x6cc9a392 + 0x712f984b + 0x594b4859 + 0x4a23f663 - 0x1f526abe + 0x36d033e2 - 0x756e1d3d - 0x49e4bf5c;
}
and
int32_t inst_56_values_var_0() {
  int32_t ebx1;
  ebx1 = 0xd3699e30;
  if (__intrinsic()) {
    *reinterpret_cast<int16_t*>(&ebx1) = 0x3dc3;
  }
  return -0x9cea9113 + ebx1 + 0x58c3d6e6 + 0x4deac45f + 0x6398c301 - 0x43786f8a + 0x8993dc3 - 0x5e7399c;
}

Hopper (4.5.29-demo)

I can’t export the output code to a file and therefore can only look at the output at random:
  • no simple arithmetic transformation similar to reko and snowman
  • no value propagation (like from the eax assignment to the return statement)
  • saving and restoring registers is visible
  • it is unclear where the magic CPU_FLAGS comes from
  • various instructions like rcl not implemented
  • no masking of shift counts
Some sample functions:
int inst_20_values_var_0() {
    eax = -0x21ab6f3b + 0x6855b113 - 0x46aa41d8;
    gs = gs;
    return eax;
}
int inst_20_flags_var_0() {
    stack[-4] = ebx;
    stack[-8] = edx;
    stack[-12] = ebp;
    stack[-16] = esi;
    stack[-20] = edi;
    esp = esp - 0x18;
    stack[-24] = gs;
    esi = 0xfa629d75;
    ebp = 0x0;
    if (esi < 0x0) {
            ebp = 0x1;
    }
    edi = 0x0;
    if (CPU_FLAGS & O) {
            edi = 0x2;
    }
    edx = 0x0;
    if (CPU_FLAGS & S) {
            edx = 0x8;
    }
    esi = 0xfa629d75;
    eax = edx + esi + edi + ebp + 0x0 - 0x8;
    gs = stack[-24];
    return eax;
}
int inst_187_values_var_0() {
  eax = (-0x35f0dfb3 << 0x4596d164) + 0xd4303404 - 0x752238d4;
  gs = gs;
  return eax;
}

Holdec (unreleased version)

The current version return zero for all functions but it took 2 weeks to arrive at this point. Quite a few bugs we found and also some TODOs are now done. While it is known that C has undefined behavior it turns out that i386+ CPUs also have their share (mostly in the flags department). Also related to flags I learned that the Adjust Flag (AF) is set by some instructions but only used aaa, aas, daa and das which work with packed and unpacked BCD values. These instructions are very rarely used and removed in x64. Or in other words: supporting AF correct is a rather low prio task.

The future

  1. Ensure that holdec correctly models the additional instructions like popcnt.
  2. Create a new test for the instructions of x64 and test that holdec produces “return 0″s.
  3. Create a new test for i386 FPU instructions and test that holdec supports these correctly.
  4. Create new tests for AARCH64: integer instructions, integer and float vector instructions, …
]]>

This entry was posted in decompiler and tagged , , . Bookmark the permalink.