About passing floating point parameters

On i386 linux software floating point numbers are passed on the stack and not registers. This creates two issues:

  1. How does a decompiler recognize that a floating point number is passed?
  2. How does a decompiler express an individual call?

There are multiple sources for type recognition in general. For floating point numbers the relevant ones are:

  • if the value is in a floating point register (x87 ST*, XMM, SSE,…) it is with high probability a floating point value. This is not available with stack memory is used.
  • caller: if the caller writes the memory from a known floating point value (like from one of the registers) one can assume it is a floating point value
  • usage: if the called function performs floating point operations (at least loading) with the value it is a floating point value
  • printf: if other information say that it is a floating point value. The most common are printf format strings.

Let us take a look at these. We choose the C double type because it is 8 bytes long and therefore takes up two “stack slots”. This makes it easy to see the difference if the decompiler sees a method taking two ints or one which takes a double parameter.

If the caller passes in a float literal this float literal is not distinguishable from two 4 byte integers. If the called method just looks at the “bytes” we call this bytes in the table below.

The source code, executable and decompiler output are on github.

The first case (unknown_to_unknown) is not distinguishable from passing around two 4 byte integer in the binary form of the program. Therefore the decompilers can’t recognize this. It is included as a baseline.

CallerUsage in functionFunction
literalbytesunknown_to_unknown
literaldoubleunknown_to_double
doublebytesdouble_to_unknown
doubledoubledouble_to_double

And the decompiler reactions:

Functionreko
Ghidraretdec
unknown_to_unknowntwo uint32two uinttwo uint32_t
unknown_to_doublereal64doubletwo uint32_t
double_to_unknowntwo uint32two uintfloat80_t and uint32_t
double_to_doublereal64doublefloat80_t

We see that reko and ghidra only use the usage information: they look at the function body and not the caller. retdec on the other side looks at the caller but the actual types are wrong.

When we look at the calls for functions where the signature is correctly recognized we see that reko passes one argument to unknown_to_double and double_to_double. Ghidra an on the other side passes two integer to unknown_to_double while double_to_double is ok. retdec passes one double to double_to_double but in the other cases it doesn’t produce correct output.


When we look at the two printf calls the decompilers have to solve two issues:
  • parse the format string to get the types of the parameters (and then output the call according to these types)
  • support long double (“%Lf” in the format string) which is 10 bytes and takes three “stack slots”
None of the three decompilers tested get this. See my inline comments.
// ===== Original source
printf("unknown: int-a=%d double=%f int-b=%d long double=%Lf int-c=%d\n",
100, 2.31, 101, (long double)2.32, 102);

printf("double: int-a=%d double=%f int-b=%d long double=%Lf int-c=%d\n",
200, 2.41+argc, 201, (long double)(2.42+argc), 202);

// ===== reko
// - tLoc3C is not defined
// - doesn't realize that %Lf takes 3 stack slots
// - 2920577024 aka 0xae147800 is part of the long double
printf("unknown: int-a=%d double=%f int-b=%d long double=%Lf int-c=%d\n",
100, 2.31, 101, tLoc3C, 2920577024);

printf("double: int-a=%d double=%f int-b=%d long double=%Lf int-c=%d\n",
200, rLoc1_134 + g_r804A100, 0xC9, tLoc3C, (word32) (real80) (g_r804A0F8 + rLoc1_134));

// ===== ghidra
// - extra uVar3
// - int literals and not floating literals
dVar2 = (double)param_1 + 1.24;
uVar3 = (undefined4)((ulonglong)dVar2 >> 0x20);
printf("unknown: int-a=%d double=%f int-b=%d long double=%Lf int-c=%d\n",
100, 0x47ae147b, 0x40027ae1, 0x65,
0xae147800, 0x947ae147, 0, 0x66, uVar3);

// - doesn't known that the 3 stack slots belong together
fVar1 = (float10)2.42 + (float10)param_1;
printf("double: int-a=%d double=%f int-b=%d long double=%Lf int-c=%d\n",
200, (double)((float10)param_1 +(float10)2.41),
0xc9,
SUB104(fVar1,0),
(int)((unkuint10)fVar1 >> 0x20),
(char)((unkuint10)fVar1 >> 0x40),
0xca, uVar3);

// ===== retdec
// - How? Why? Only positive is that it the gets
// the number of extra arguments correct
printf("unknown: int-a=%d double=%f int-b=%d long double=%Lf int-c=%d\n",
100, 5.9415882152956413e-315, 0x40027ae1, 3.68165152720129934855e-4949L, -1);

// - the double and long double part is good
// - the ints are wrong
printf("double: int-a=%d double=%f int-b=%d long double=%Lf int-c=%d\n",
200, (float64_t)(v1 + 2.41L), 0, v1 + 2.42L);
While floating point numbers s are with us for a long time (the 80387 is from 1987) and we are not doing anything fancy like SIMD the decompilers are lacking in this regard: retdec is just strange/broken, ghidra doesn’t support floating point literals and reko has its own issues.

Thank you for reading and please send questions or feedback via email to holdec@kronotai.com or contact me on Twitter.

]]>
This entry was posted in decompiler, floating point and tagged , , , , , . Bookmark the permalink.