Modelling overlapping registers

pop eax inc ax sahf # loads the content of ah into the eflags register as an example for the different modeling variants. To avoid the AND and shift operations for the masking I assume some pseudo functions like setLowByte(base, newByteValue).

  • treat each register independent. This will lead to wrong semantics:
eax := pop()
ax := ax + 1
flags := inc_flags(ax)
flags := unpack_flags(ah)
So the final value of flags will be unpack_flags(initialValueOfAh). Wrong.
  • use one register (the widest) to model the registers which are parts of it:
eax := pop()
eax := setLowWord(eax, getLowWord(eax) + 1)
flags := inc_flags(getLowWord(eax))
flags := unpack_flags(getHighByte(eax))
The final flags value is unpack_flags(getHighByte(setLowWord(eax, getLowWord(popedValue) + 1))). This could be transformed by the decompiler to the desired unpack_flags(getHighByte(getLowWord(popedValue) + 1)). If some parts of the widest register are not required for the computation (e.g. only ax is used) it is possible that they are undefined. With this modeling of aliased registers an artificial and unnecessary reference will stay. So pop ax;inc ax;sahf would yield:
eax := setLowWord(eax,pop())
eax := setLowWord(eax, getLowWord(eax) + 1)
flags := inc_flags(getLowWord(eax))
flags := unpack_flags(getHighByte(eax))
Here the initial value of eax is not required but after transformation to SSA and remove of unused registers the reference will stay.
  • Each register is modeled as a separate register but there are “update assignments”.
eax := pop()
ax := getLowWord(eax)
ah := getHighByte(eax)
al := getLowByte(eax)
ax := ax + 1
flags := inc_flags(ax)
eax := setLowWord(eax, ax)
ah := getHighByte(ax)
al := getLowByte(ax)
flags := unpack_flags(ah)
Here the final value of flags would be by pure value propagation unpack_flags(getHighByte(getLowWord(poppedValue) + 1)). And for pop ax;inc ax;sahf this variant would result in:
ax := pop()
eax := setLowWord(eax, ax)
ah := getHighByte(eax)
al := getLowByte(eax)
ax := ax + 1
flags := inc_flags(ax)
eax := setLowWord(eax, ax)
ah := getHighByte(ax)
al := getLowByte(ax)
flags := unpack_flags(ah)
with a final flags value of unpack_flags(getHighByte(poppedValue + 1)). Exactly the expected value. Note that the update assignments do not perform the operation (+1 in our case) again and also do not change the state of the program in any other way. The only disadvantage is the larger amount of assignments which have to be generated, transformed into the SSA form but the majority will be removed since their values are unused. Mike writes that the last variant is the best solution. Holdec also uses this solution. ]]>

This entry was posted in decompiler, holdec. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *