Modelling overlapping registers

pop eax inc ax sahf # loads the content of ah into the eflags register as an example for the different modeling variants. To avoid the AND and shift operations for the masking I assume some pseudo functions like setLowByte(base, newByteValue).

  • treat each register independent. This will lead to wrong semantics:
eax := pop()
ax := ax + 1
flags := inc_flags(ax)
flags := unpack_flags(ah)
So the final value of flags will be unpack_flags(initialValueOfAh). Wrong.
  • use one register (the widest) to model the registers which are parts of it:
eax := pop()
eax := setLowWord(eax, getLowWord(eax) + 1)
flags := inc_flags(getLowWord(eax))
flags := unpack_flags(getHighByte(eax))
The final flags value is unpack_flags(getHighByte(setLowWord(eax, getLowWord(popedValue) + 1))). This could be transformed by the decompiler to the desired unpack_flags(getHighByte(getLowWord(popedValue) + 1)). If some parts of the widest register are not required for the computation (e.g. only ax is used) it is possible that they are undefined. With this modeling of aliased registers an artificial and unnecessary reference will stay. So pop ax;inc ax;sahf would yield:
eax := setLowWord(eax,pop())
eax := setLowWord(eax, getLowWord(eax) + 1)
flags := inc_flags(getLowWord(eax))
flags := unpack_flags(getHighByte(eax))
Here the initial value of eax is not required but after transformation to SSA and remove of unused registers the reference will stay.
  • Each register is modeled as a separate register but there are “update assignments”.
eax := pop()
ax := getLowWord(eax)
ah := getHighByte(eax)
al := getLowByte(eax)
ax := ax + 1
flags := inc_flags(ax)
eax := setLowWord(eax, ax)
ah := getHighByte(ax)
al := getLowByte(ax)
flags := unpack_flags(ah)
Here the final value of flags would be by pure value propagation unpack_flags(getHighByte(getLowWord(poppedValue) + 1)). And for pop ax;inc ax;sahf this variant would result in:
ax := pop()
eax := setLowWord(eax, ax)
ah := getHighByte(eax)
al := getLowByte(eax)
ax := ax + 1
flags := inc_flags(ax)
eax := setLowWord(eax, ax)
ah := getHighByte(ax)
al := getLowByte(ax)
flags := unpack_flags(ah)
with a final flags value of unpack_flags(getHighByte(poppedValue + 1)). Exactly the expected value. Note that the update assignments do not perform the operation (+1 in our case) again and also do not change the state of the program in any other way. The only disadvantage is the larger amount of assignments which have to be generated, transformed into the SSA form but the majority will be removed since their values are unused. Mike writes that the last variant is the best solution. Holdec also uses this solution. ]]>

Posted in decompiler, holdec | Leave a comment

Modulo arithmetic and two's complement

0. Right? Wrong. If x=-128 then we get -(-128)<0 => -128<0 => true while for x>0 we get -128>0 => false. So -x<0 should be transformed to x>0 || x==-128. However it is not clear if this second form is easier to understand and so be preferred. ]]>

Posted in decompiler, holdec, holdec1.2 | Leave a comment

Version 1.1 is there

Posted in decompiler, holdec | Tagged | Leave a comment

A repository of decompiler subjects

git repository of programs to decompile. Currently these include test programs from some other decompilers I know and malware from the internet. Feel free to add some more programs. ]]>

Posted in Uncategorized | Leave a comment

First public version of holdec

download it and there is also some initial documentation. ]]>

Posted in decompiler, holdec | Tagged | Leave a comment

Example added

example of a decompiled output. A comparison with other decompilers will come soon. ]]>

Posted in decompiler, holdec | Tagged | Leave a comment

About the handling of CPU flags

stack the flags are also a core area of a decompiler. The decompiler has to know which flags are affected by each assembler command, the correct flag value and which flag combinations are tested by the conditional jump or set commands. Note in the following example the cmp command affects all relevant flags while the dec command affects some flags but not the carry flag which is tested by the jump command jb. This means that the dec command has no effect on the control flow and since also its changed register value is not used, it has no effect on the result value.

test:
        movl    $10, %eax
        cmpl    $10, %ebx
        dec     %ecx
        jb      .L1
        movl    $7, %eax
        jmp     .L2
.L1:
        movl    $42, %eax
.L2:
        ret
A correct decompiled version could be:
// addr = 080483a0.0
// signature= func(test, ret=[<0, int(undef, 4),,unknown>], para=[<0, int(undef, 4),p1,reg[ebx]>, <1, int(undef, 4),p2,reg[ecx]>], varargs=false)
??? test(???)
{
  return p1  <  10 ? 42 : 7;
}
Wrong would be a variant using ecx. ]]>

Posted in decompiler, holdec | Tagged | Leave a comment

About the problems of stack tracking

%eax is preserved. However the value is changed and so the return value of the function (in %eax) is defined:

main:
        pushl   %eax
        movl    $42, (%esp)
        popl    %eax
        ret
A correct decompilation (as done by holdec) is
// addr = 080483a0.0
// signature= func(main, ret=[<0, int(undef, 4),null,reg[eax]>], para=[], varargs=false)
??? main(???)
{
  return 42;
}
While this provides tiny test program is a problem for some decompiler the following slightly changed program (also returning 42) is not:
main:
        pushl   %ebx
        movl    $42, %eax
        popl    %ebx
        ret
Another test of the stack tracking is to assume that a parameter is passed in %ebx which should be returned (in %eax). This can be done directly:
main:
        movl    %ebx, %eax
        ret
or via the stack
main:
        pushl   %ebx
        popl    %eax
        ret
As expected not all decompiler pass this small test. Holdec (if given the information about the parameter in %ebx) will decompile it to
// addr = 080483a0.0
// signature= func(main, ret=[<0, int(undef, 4),,unknown>], para=[<0, int(undef, 4),parameter1,reg[ebx]>], varargs=false)
??? main(???)
{
  return parameter1;
}
]]>

Posted in decompiler, holdec | Tagged | Leave a comment