Ghidra Decompiler - CLI guide

Ghidra has a decompiler that unlike the rest of the program (written in java) is written in C++. This caught my attention so I started to hack on it. Unfortunately, there isn’t much written on the decompiler if one wants to use it standalone, in the terminal without the ghidra GUI. This article tries to fill that void.

Building The Decompiler

Fetch and unzip the ghidra package from their github release page

$ unzip ghidra_11.1.2_PUBLIC_20240709.zip

cd into the decompiler directory and build it

$ cd ghidra_11.1.2_PUBLIC/Ghidra/Features/Decompiler/src/decompile/cpp
$ make decomp_opt -j $(nproc --all)

You should end up with a executable called decomp_opt.

Running the Decompiler

While inside the directory, export the SLEIGHHOME env variable so our decompiler can find it, then run the executable.

$ export SLEIGHHOME=/home/shreeyash/ghidra_11.1.2_PUBLIC
$ ./decomp_opt
[decomp]>

The compiler is running now waiting for commands.

Note

Remember to always export the environment variable before running decomp_opt. You could consider tossing the two commands into a script, making life easier for you.

Decompile and view an ELF executable

Let’s start with a trivial c++ program with some control flow, compile it into an executable (ELF) and decompile it.

Here’s the program, save and compile it:

$ cat a.cpp
#include <iostream>
#define THRESHOLD 20
int foo() {
  return 10;
}
int main() {
  int b = foo();
  std::cout << "The threshold is " << THRESHOLD << '\n';
  std::cout << "You returned " << b << '\n';
  if (b < THRESHOLD) {
    std::cout << "get in\n";
  } else {
    std::cout << "get out!\n";
  }
}
$ g++ -no-pie a.cpp -o a
$ ./a
The threshold is 20
You returned 10
get in

The executable is ready, what’s left now is decompilation.

Let’s start the decompiler, and load our file:

$ ./decomp_opt
[decomp]> load file a
a successfully loaded: Intel/AMD 64-bit x86

We’ve loaded our executable in the decompiler. c++ is an abstract language with constructs that do not make any sense to a CPU. These include, but are not limited to: functions, structs, loops etc. In order to implement these, the compiler has to translate abstractions into concrete implementation which manifests itself in the form of control flow instructions like branch, compare, and jump. If we peep into an executable, we’ll notice what we called functions are now ‘addresses’ i.e. a number that represents a location in memory. Functions are run by jumping (i.e. setting the program counter) to an address. Essentially, if we wish to decompile a function we had in source, we’ll have to find the corresponding address at which it resides. a.cpp has two functions: main and foo. To find the address where a functions resides in the executable, we could use objdump.

$ objdump -C -D a
...
00000000004011c5 <main>:
4011c5:       f3 0f 1e fa             endbr64
4011c9:       55                      push   %rbp
4011ca:       48 89 e5                mov    %rsp,%rbp
4011cd:       48 83 ec 10             sub    $0x10,%rsp
4011d1:       e8 e0 ff ff ff          call   4011b6 <_Z5todayv>
4011d6:       89 45 fc                mov    %eax,-0x4(%rbp)
4011d9:       48 8d 05 24 0e 00 00    lea    0xe24(%rip),%rax        # 402004 <_IO_stdin_used+0x4>
4011e0:       48 89 c6                mov    %rax,%rsi
4011e3:       48 8d 05 96 2e 00 00    lea    0x2e96(%rip),%rax        # 404080 <_ZSt4cout@GLIBCXX_3.4>
4011ea:       48 89 c7                mov    %rax,%rdi
4011ed:       e8 9e fe ff ff          call   401090 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt>
4011f2:       48 89 c2                mov    %rax,%rdx
4011f5:       8b 45 fc                mov    -0x4(%rbp),%eax
...

Searching for ‘main’ reveals its label which resides at address 0x4011c5.

[decomp]> load addr 0x4011c5 main
Function main: 0x004011c5

load addr takes an address and an optional ‘label’. Label is essentially a name that we assign to that address. In this case, it was ‘main’—could’ve been anything for what its worth.

[decomp]> decompile
Decompiling main
Decompilation complete
[decomp]> print C

xunknown8 main(void)

{
  int4 iVar1;
  xunknown8 xVar2;

  iVar1 = func_0x004011b6();
  xVar2 = func_0x00401090(0x404080,0x402004);
  xVar2 = func_0x004010c0(xVar2,0x14);
  func_0x004010a0(xVar2,10);
  xVar2 = func_0x00401090(0x404080,0x402016);
  xVar2 = func_0x004010c0(xVar2,iVar1);
  func_0x004010a0(xVar2,10);
  if (iVar1 < 0x14) {
    func_0x00401090(0x404080,0x402024);
  }
  else {
    func_0x00401090(0x404080,0x40202c);
  }
  return 0;
}
[decomp]>

Just like that, we’ve decompiled our program. Notice how the names are garbled. This is because names (of variables and functions) are really neccessary to execute a program.

Let’s analyze the decompiled output. The latter part of all function names are their address. This means, we can look them up in the objdump. Moreover, if the set of commands that got us main s decompilation we to be repeated for all the functions present in in the output, the resulting decompilation of main would replace all address with the labels we assign to them. Looking up in objdump, we find func_0x004011b6 to be foo:

...
00000000004011b6 <foo()>:
4011b6:       f3 0f 1e fa             endbr64
4011ba:       55                      push   %rbp
4011bb:       48 89 e5                mov    %rsp,%rbp
4011be:       b8 0a 00 00 00          mov    $0xa,%eax
...

func_0x00401090 is not present in the executable, however, the calls to this function are shown in the objdump thusly:

4011ed:       e8 9e fe ff ff          call   401090 <std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)@plt>

Its quite obvious from the hint that func_0x00401090 is the operator << overloaded to accept a std::basic_ostream object and a const char *. The @plt at the end indicates that this function can be found in the .plt section of the executable. .plt which stands for Procedure Linkage Table is a redirection table of external functions that can be found in shared objects. So, func_0x00401090 is operator<< found in libstdc++.so that the program is linked to. It takes two arguments: both addresses to objects. A search reveals that the first argumnet is the object std::cout of which the definition resides in an external library (libstdc++.so) and the other argument is a char literal that can be found in the .rodata section of the executable.

$ objdup -s -j .rodata a
Contents of section .rodata:
402000 01000200 54686520 74687265 73686f6c  ....The threshol
402010 64206973 2000596f 75207265 7475726e  d is .You return
402020 65642000 67657420 696e0a00 67657420  ed .get in..get
402030 6f757421 0a00                        out!..

Indeed, the string “The threshold is “ is present at address 0x0402004.

Likewise, all following functions till func_0x004010a0 are overloads of operator<< that handle different types of data. What remains is the control flow. It checks if iVar1 which is b in the original source is less than 0x14 (THRESHOLD) and calls the familiar func_0x00401090 i.e. (operator<<).

Conclusion

Our work was made much easier by the fact that the executable was not ‘stripped’. Stripping is a process that gets rid of all the symbols that are not absolutely neccessary for execution (greatly reduces executable size). In the real world, especially if we are dealing with propreitary software, executables might be stripped. Unstripped executables allows us to tread faster by simply searching for symbols like we did to find main. Stripped executables require us to trace, find and deduce what we need. In a later article, I may demo decompilation of stripped executables.