easier to understand to a programmer and uses the same semantics as the binary program. Both properties (don’t change the semantic and produce high level source code) are required. Without the first you get a nice looking program which doesn’t represent the starting binary. Without the later you may just look at a disassembler listing.
Modelling instructions
One important early step of preserving the semantics is the transformation from CPU instructions to an (CPU agnostic) internal representation. This can be done in two sub-steps (binary instructions to disassembled instructions and than disassembled instructions to internal representation) or in one sub-step. One advantage of using two sub-steps is that a 3rd party disassembler can be used which reduces development time. Regardless of the implementation the outcome is a list of internal statements for each binary instruction. For i386 such statements usually include: the actual operation, update of the flags and update of smaller and larger registers (inc %ax
will also update %al
, %ah
and %eax
).
The task of verifying that all the side effects of the binary instructions were correctly modeled is made more difficult by the fact that unused side effects are discarded by the decompiler. So the test design should capture all side effects. This is done by folding all side effects into the return value. Here I used two test function types: one for the register values and one for the flag values.
In the end a test methods looks like this:
080805b3 <inst_101_values_var_7>: // save registers 80805b3: push %ebx 80805b4: push %edx 80805b5: push %ebp 80805b6: push %esi 80805b7: push %edi 80805b8: push %gs // initialize all registers with random values 80805ba: mov $0xf2e8b1b3,%eax 80805bf: mov $0xf96ff545,%ebx 80805c4: mov $0xc8a6ed47,%ecx 80805c9: mov $0x4e0f4c05,%edx 80805ce: mov $0x50690c3e,%ebp 80805d3: mov $0xb7e7569c,%esi 80805d8: mov $0xb3138312,%edi // set all flags to a defined state 80805dd: neg %eax // the operation under test 80805df: inc %ax // merge all into %eax 80805e1: add %ebx,%eax 80805e3: add %ecx,%eax 80805e5: add %edx,%eax 80805e7: add %ebp,%eax 80805e9: add %esi,%eax 80805eb: add %edi,%eax // substract the expected value; the values comes from // the execution on a real CPU 80805ed: sub $0xd8a162cb,%eax // restore registers and return 80805f2: pop %gs 80805f4: pop %edi 80805f5: pop %esi 80805f6: pop %ebp 80805f7: pop %edx 80805f8: pop %ebx 80805f9: retThe variant which tests flags is similar:
080b7493 <inst_226_flags_var_0>: ... 80b74bd: neg %eax // the operation under test 80b74bf: sbb $0x20,%ah // %ebp = CF?1:0 80b74c2: mov $0x0,%ebp 80b74c7: mov $0x1,%eax 80b74cc: cmovb %eax,%ebp // %edi = OF?2:0 80b74cf: mov $0x0,%edi 80b74d4: mov $0x2,%eax 80b74d9: cmovo %eax,%edi // %esi = PF?4:0 80b74dc: mov $0x0,%esi 80b74e1: mov $0x4,%eax 80b74e6: cmovp %eax,%esi // %edx = SF?8:0 80b74e9: mov $0x0,%edx 80b74ee: mov $0x8,%eax 80b74f3: cmovs %eax,%edx // %ecx = AF?16:0 // there is no condition code for AF but with // the aaa instruction we can convert AF to CF 80b74f6: mov $0x0,%ecx 80b74fb: mov $0x0,%al 80b74fd: aaa 80b74fe: mov $0x10,%eax 80b7503: cmovb %eax,%ecx // add them together 80b7506: xor %eax,%eax 80b7508: add %ebp,%eax 80b750a: add %edi,%eax 80b750c: add %esi,%eax 80b750e: add %edx,%eax 80b7510: add %ecx,%eax // substract the expected value 80b7512: sub $0x6,%eax 80b7515: pop %gs ...
Reducing expressions
As you can see from the listing these tests assume that the decompiler evaluates/emulates/simplifies the instructions into a simplereturn 0
. I know from the past that some decompilers are stronger and some are weaker in their ability to reduce complex expressions to simpler ones. To avoid additional barriers I chose in the above tests only simple instructions and corresponding transformations.
In general a decompiler has to perform transformation to achieve the goal of high-level output. This can be as simple as x^x => 0 or to merge addition and shifts into multiplication (see for example this godbolt test). Normally at least one of the expressions is a variable (for example a function parameter with unknown value) while in the tests from above all values are fixed. However supporting the transformation of such expressions shouldn’t be difficult if the framework for expressions with a variable is already in place. And yes this means that the decompiler is in fact a CPU emulator.
Results
This is for the integer instruction of i386+. The ELF binary and the results are in in github. This binary test file contains two methods (one for the register values and for flag values) for each of the 290 instruction variants defined in the input file. So 580 functions in total.Ghidra (version 9.1.2)
For some reason ghidra thinks in 81% (469/580) that edx is an input register but doesn’t model it as a function parameter. Also odd is that the return type is 64bit and that the register is shifted 64bits to the left. But in general this looks like the expected “return 0”;longlong inst_0_values_var_0(void) { uint in_EDX; return (ulonglong)in_EDX << 0x20; }In around 100 cases the parity flag is not modeled:
undefined8 inst_0_flags_var_0(void) { undefined4 in_EDX; int iVar1; bool in_PF; iVar1 = 0; if (in_PF) { iVar1 = 4; } return CONCAT44(in_EDX,iVar1 + -4); }In the remaining cases the instruction is modeled with a loop and ghidra is unable to reduce this loop when all values are constant. The following case contains a bit scan forward (
bsf
):
undefined8 inst_30_values_var_0(void) { int iVar1; undefined4 in_EDX; iVar1 = 0; while ((0x25e0215eU >> iVar1 & 1) == 0) { iVar1 = iVar1 + 1; } return CONCAT44(in_EDX,iVar1 + -1); }
Retdec (current git master@a82f16e9)
In 97% (563/580) of the cases retdec returns 0 as expected. All other cases are non-zero concrete values.Reko (current git master@9ed122)
There are 133 flag value functions which return zero. It looks like they are from instructions which do not change flags at all. Reko is sadly not able to reduce basic expressions further:word32 inst_10_values_var_0() { return -2802897708 + 2802897708; }This makes it harder to assess the degree of correctness. Using grep I found 35 cases of the above with identical values. But other cases are also ok but can’t be easily detected (-0xD0AA51D7 == 794144297):
word32 inst_7_values_var_0() { return -794144297 - 0xD0AA51D7; }Reko doesn’t remove branches it detects as not taken (
if(false)
) which probably prevents a simplification:
word32 inst_7_flags_var_0() { word32 ebp_33 = 0x00; if (false) ebp_33 = 0x01; word32 edi_37 = 0x00; if (OVERFLOW(0x5F)) edi_37 = 0x02; word32 esi_42 = 0x00; if (!P) esi_42 = 0x04; word32 edx_48 = 0x00; if (false) edx_48 = 0x08; return ebp_33 + edi_37 + esi_42 + edx_48 - 0x07; }In the above case it is also unclear where P comes from.
Snowman (current git master@aba7af)
Snowman returns zero in 133 cases. Similar to reko snowman also doesn’t perform addition:int32_t inst_222_values_var_0() { return -0x616cb887 - 0x52aee46f + 0x1aec8e77 - 0xab5e0a0 - 0x1e414626 - 0x7a54e6ce - 0x588d02fe - 0x6af7e0ee; }Similar to reko branches not taken are not removed and make the output harder to understand:
int32_t inst_63_values_var_0() { int32_t ebp1; ebp1 = 0x3f43bd7f; if (0) { ebp1 = 0x223bf959; } return -0x4632d959 - 0x20980a29 + 0x4f46ab54 + 0x223bf959 + ebp1 - 0x1fbdb32d - 0x52def38c + 0x28a1280f; }It looks like some of the tested instructions are not supported by snowman:
int32_t inst_40_values_var_0() { __asm__("btc si, 0xcd"); return -0x6cc9a392 + 0x712f984b + 0x594b4859 + 0x4a23f663 - 0x1f526abe + 0x36d033e2 - 0x756e1d3d - 0x49e4bf5c; }and
int32_t inst_56_values_var_0() { int32_t ebx1; ebx1 = 0xd3699e30; if (__intrinsic()) { *reinterpret_cast<int16_t*>(&ebx1) = 0x3dc3; } return -0x9cea9113 + ebx1 + 0x58c3d6e6 + 0x4deac45f + 0x6398c301 - 0x43786f8a + 0x8993dc3 - 0x5e7399c; }
Hopper (4.5.29-demo)
I can’t export the output code to a file and therefore can only look at the output at random:- no simple arithmetic transformation similar to reko and snowman
- no value propagation (like from the eax assignment to the return statement)
- saving and restoring registers is visible
- it is unclear where the magic CPU_FLAGS comes from
- various instructions like rcl not implemented
- no masking of shift counts
int inst_20_values_var_0() { eax = -0x21ab6f3b + 0x6855b113 - 0x46aa41d8; gs = gs; return eax; } int inst_20_flags_var_0() { stack[-4] = ebx; stack[-8] = edx; stack[-12] = ebp; stack[-16] = esi; stack[-20] = edi; esp = esp - 0x18; stack[-24] = gs; esi = 0xfa629d75; ebp = 0x0; if (esi < 0x0) { ebp = 0x1; } edi = 0x0; if (CPU_FLAGS & O) { edi = 0x2; } edx = 0x0; if (CPU_FLAGS & S) { edx = 0x8; } esi = 0xfa629d75; eax = edx + esi + edi + ebp + 0x0 - 0x8; gs = stack[-24]; return eax; } int inst_187_values_var_0() { eax = (-0x35f0dfb3 << 0x4596d164) + 0xd4303404 - 0x752238d4; gs = gs; return eax; }
Holdec (unreleased version)
The current version return zero for all functions but it took 2 weeks to arrive at this point. Quite a few bugs we found and also some TODOs are now done. While it is known that C has undefined behavior it turns out that i386+ CPUs also have their share (mostly in the flags department). Also related to flags I learned that the Adjust Flag (AF) is set by some instructions but only usedaaa
, aas
, daa
and das
which work with packed and unpacked BCD values. These instructions are very rarely used and removed in x64. Or in other words: supporting AF correct is a rather low prio task.
The future
- Ensure that holdec correctly models the additional instructions like
popcnt
. - Create a new test for the instructions of x64 and test that holdec produces “return 0″s.
- Create a new test for i386 FPU instructions and test that holdec supports these correctly.
- Create new tests for AARCH64: integer instructions, integer and float vector instructions, …