Decompilation

provide a correct view of the executable

the produced source code should be similar to hand written code (e.g. has similar structure, is compact)

the produced source code should be compilable

To achieve these goals a decompiler has to solve the following main tasks to provide an information gain to the user:

separation of code and data: this should be as automatic as possible but user input is still required
reliable function identification: determine the code ranges of all functions
understand special idioms: this does not mean to support idioms which are compiler specific but single assembler commands or small groups of commands which are not (easy) representable in a language like C. This includes: indexed jumps, rep-commands of i386, SIMD-instructions or converting

ror    $0x8,%cx
ror    $0x10,%ecx
ror    $0x8,%cx

to a swab32(...)-call.

stack and function calls: both depend on each other. Sub-problems are:
- identify saved registers
- identify how parameters are passed (at the caller and at the callee site)
- construct the actual calls
- handle multiple entries and exits
beautification/compactification: this part usually uses a control-flow and data-flow graph:
- value propagation
- simplification of expressions
- recognizing high-level control flow (if, if-else, loops)
- reorder statements
- reduce the amount of memory accesses
type analysis
output

The resulting source code should compile and so can be further engineered with other tools like IDEs. Almost each of these problem areas are big enough in themselves. There is literature on most of them in varying amounts. ]]>

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories