Introducing the DMI the Decompiler Maturity Index

Decompiler Maturity Index (DMI). A decompiler with a higher index is not automatically “better” but has more features.

Criteria

DMIC stands for DMI Criterion.

Basic

  • DMIC-A1: I’m able to build and run the decompiler
  • DMIC-A2:The decompiler is able to work on one subject (maybe provided by the project, otherwise chosen by me)
  • DMIC-A3: The project is active: There is a commit in the last 3 months.
  • DMIC-A4: The project has a history: There are 100 commits.
  • DMIC-A5: The decompiler is able to detect and output a simple loop.
  • DMIC-A6: The decompiler performs simple expression simplifications.
  • DMIC-A7: The decompiler produces some basic output for the hexdump subject (or a similar complex subject).
  • DMIC-A8: The decompiler support ia32 ELF. This is important since a lot of test subjects are in the ia32 ELF format.
  • DMIC-A9: The decompiler models all flag changes but also removes unused flag assignments.
  • DMIC-A10: The decompiler recognized the number of arguments for a stack based method call.
  • DMIC-A11: The decompiler propagates register values to other statements. In the some block and into other blocks of the same functions.
  • DMIC-A12: The decompiler understands indexed jumps using a jump table as generated by switch-case.

Intermediate

  • DMIC-B1: The decompiler supports more than one CPU architecture. This ensures that the core is quite generic. i386 and AMD64 are counting here as one.
  • DMIC-B2: The decompiler supports more than one executable format.
  • DMIC-B3: The decompiler understands advanced i386 opcodes like string operations with rep prefix or the cpuid instruction.
  • DMIC-B4: The decompiler has a GUI.
  • DMIC-B5: The decompiler understands printf/scanf format strings and passes the correct number of arguments in the function call.
  • DMIC-B6: The decompiler outputs a normal increasing for loop as a for loop.
  • DMIC-B7: The decompiler detects and outputs short circuit boolean expressions (|| and &&).
  • DMIC-B8: The decompiler outputs string/int literals from the data segment when possible (for example read-only).
  • DMIC-B9: The decompiler supports FPU operations and types.
  • DMIC-B10: The decompiler detects and removes jumps which can not be taken and dead blocks.
  • DMIC-B11: The decompiler propagates memory values to other statements.
  • DMIC-B12: The decompiler is able to output the fields of a struct when these fields are used.
  • DMIC-B13: The decompiler is able to cope with code which casts/uses union to interpret values in a conflicting way.
  • DMIC-B14: The decompiler detects local variables and replaces unstructured stack with local variables.

Advanced

  • DMIC-C1: The decompiler understands at least one way where the subject calls the OS directly. Think Linux syscalls, MS-DOS interrupts or Amiga libraries.
  • DMIC-C2: The decompiler detects common library methods in statically compiled subjects. Something like FLIRT.
  • DMIC-C3: The decompiler knows the signatures of common libraries like libc and applies these signatures.
  • DMIC-C4: The decompiler knows some advances expression transformations. For example division by multiplication.
  • DMIC-C5: The decompiler supports SIMD. At least models these with internal functions which capture the input and output. Or decompose them into their “real” semantics.
  • DMIC-C6: The decompiler detects the number of elements and the element sizes from a loop which goes over the array.
Of course these criteria and the categories are preliminary. Only when some decompiler projects are examined it will become clear how good the DMI will be able to catch the project state. For some of the criteria also some focused test subjects are missing. Stay tuned for next blog post with some data.]]>

This entry was posted in decompiler and tagged , , , , , , . Bookmark the permalink.