Decompilation

  • provide a correct view of the executable
  • the produced source code should be similar to hand written code (e.g. has similar structure, is compact)
  • the produced source code should be compilable
  • To achieve these goals a decompiler has to solve the following main tasks to provide an information gain to the user:
    • separation of code and data: this should be as automatic as possible but user input is still required

    • reliable function identification: determine the code ranges of all functions

    • understand special idioms: this does not mean to support idioms which are compiler specific but single assembler commands or small groups of commands which are not (easy) representable in a language like C. This includes: indexed jumps, rep-commands of i386, SIMD-instructions or converting

    ror    $0x8,%cx
    ror    $0x10,%ecx
    ror    $0x8,%cx

    to a swab32(...)-call.

    • stack and function calls: both depend on each other. Sub-problems are:
      • identify saved registers
      • identify how parameters are passed (at the caller and at the callee site)
      • construct the actual calls
      • handle multiple entries and exits
    • beautification/compactification: this part usually uses a control-flow and data-flow graph:
      • value propagation
      • simplification of expressions
      • recognizing high-level control flow (if, if-else, loops)
      • reorder statements
      • reduce the amount of memory accesses
    • type analysis

    • output

    The resulting source code should compile and so can be further engineered with other tools like IDEs. Almost each of these problem areas are big enough in themselves. There is literature on most of them in varying amounts. ]]>

    Leave a Reply

    Your email address will not be published. Required fields are marked *