Difficulty: Beginner

Module 2: LLVM Compiler Architecture

Understanding the compilation pipeline that makes compiler-level function masking possible.

Module Objective

Learn how the LLVM compiler framework is structured, what intermediate representations it uses, how the X86 backend transforms IR into machine code, and specifically where FunctionPeekaboo’s X86RetModPass hooks into the pipeline to instrument functions at the PreEmit phase.

1. What Is LLVM?

LLVM (originally “Low Level Virtual Machine,” now just a name) is a modular compiler infrastructure used by Clang (C/C++), Rust, Swift, and many other languages. Its key design principle is a three-phase architecture:

LLVM Three-Phase Architecture

Frontend
Clang, rustc, swiftc
Parses source → LLVM IR
Middle-End
Optimizer
Transform passes on IR
Backend
Target-specific
IR → machine code

The frontend (e.g., Clang for C/C++) parses source code into LLVM IR (Intermediate Representation). The middle-end runs optimization passes on this IR. The backend converts optimized IR into target-specific machine code (x86, ARM, RISC-V, etc.).

FunctionPeekaboo operates in the backend — specifically in the X86 backend. This is critical because by the time the backend runs, all high-level language features have been lowered to concrete machine instructions, and FunctionPeekaboo can inject exact assembly sequences.

2. LLVM Intermediate Representation (IR)

LLVM IR is a typed, SSA-form (Static Single Assignment) intermediate language. It looks like a cross between assembly and a high-level language:

LLVM IR; A simple function that adds two integers
define i32 @add(i32 %a, i32 %b) {
entry:
  %result = add i32 %a, %b
  ret i32 %result
}

; A function with a conditional branch
define i32 @max(i32 %a, i32 %b) {
entry:
  %cmp = icmp sgt i32 %a, %b
  br i1 %cmp, label %then, label %else

then:
  ret i32 %a

else:
  ret i32 %b
}

Key properties of LLVM IR that matter for FunctionPeekaboo:

Why Not Modify IR?

FunctionPeekaboo could theoretically inject masking logic at the IR level, but this would be problematic. At the IR level, there are no concrete machine instructions yet — the injected code would need to survive lowering, instruction selection, register allocation, and scheduling. By modifying at the backend level (after these phases), FunctionPeekaboo injects exact x86 machine instructions that go directly into the output binary.

3. The X86 Backend Pipeline

The LLVM X86 backend converts IR into x86 machine code through a series of passes. Each pass transforms the code further toward final machine code:

X86 Backend Pass Pipeline (Simplified)

PhasePass CategoryWhat Happens
1Instruction Selection (ISel)IR instructions are matched to x86 machine instructions using pattern matching (SelectionDAG or GlobalISel)
2Machine IR (MIR) OptimizationMachine instructions are optimized: peephole opts, dead code elimination, instruction combining
3Register AllocationVirtual registers are mapped to physical x86 registers (RAX, RCX, etc.), with spilling for overflows
4Prologue/Epilogue InsertionStack frame setup/teardown code is added (push rbp, sub rsp, etc.)
5Post-RA OptimizationFurther optimization after register allocation (register copy coalescing, branch folding)
6PreEmitFinal passes before code emission — this is where X86RetModPass runs
7Code EmissionMachine instructions are serialized to binary (MC layer) and written to the object file

The PreEmit phase is the last opportunity to modify the machine code before it is finalized. By this point, all register allocation is done, all frame setup is in place, and the instructions are in their final form. This makes it the ideal insertion point for FunctionPeekaboo’s stubs.

4. MachineFunction and MachineFunctionPass

In the LLVM backend, each function is represented as a MachineFunction object. A MachineFunctionPass is a pass that operates on one MachineFunction at a time — it receives each function, can inspect and modify its machine instructions, and returns whether it changed anything.

C++// Simplified MachineFunctionPass structure
class X86RetModPass : public MachineFunctionPass {
public:
  static char ID;
  X86RetModPass() : MachineFunctionPass(ID) {}

  bool runOnMachineFunction(MachineFunction &MF) override {
    // This method is called once per function
    // MF contains all MachineBasicBlocks
    // Each MBB contains MachineInstr objects

    // Check if this function should be instrumented
    if (!shouldInstrument(MF))
      return false;  // no changes made

    // Instrument the function
    addPrologueStub(MF);
    replaceReturns(MF);
    return true;  // function was modified
  }
};

The MachineFunction contains MachineBasicBlock objects, which in turn contain MachineInstr objects. FunctionPeekaboo’s X86RetModPass iterates through these to find all RET instructions and replace them with epilogue stubs, and to prepend prologue stubs to the function entry.

5. Machine Instructions at the PreEmit Stage

At the PreEmit stage, instructions look like concrete x86 machine instructions, but they are still represented as MachineInstr objects (not yet serialized to bytes). For example:

MIR (Machine IR); A simple function at PreEmit stage
bb.0.entry:
  liveins: $edi, $esi
  $eax = LEA32r $edi, 1, $esi, 0, $noreg  ; eax = edi + esi
  RET 0, $eax                               ; return eax

; After X86RetModPass, the RET is replaced:
bb.0.entry:
  liveins: $edi, $esi
  ; ... prologue stub (inline bytes) ...
  $eax = LEA32r $edi, 1, $esi, 0, $noreg
  ; ... epilogue stub replaces the RET ...
  CALL64pcrel32 @handler       ; call handler to re-encrypt
  RET 0, $eax                  ; then return

Key Advantage of PreEmit

At PreEmit, register allocation is complete, so FunctionPeekaboo knows exactly which physical registers are in use. The prologue stub can safely use registers that are known to be free (or save/restore them on the stack). The epilogue stub similarly knows the register state at each return point. This level of precision is only available in the backend.

6. How FunctionPeekaboo Integrates

FunctionPeekaboo adds a new MachineFunctionPass called X86RetModPass to the X86 backend’s pass pipeline. The integration requires modifying two key files in the LLVM source:

LLVM Modification Points

FileChange
lib/Target/X86/X86TargetMachine.cppRegister X86RetModPass in the target pass pipeline at the PreEmit stage
lib/Target/X86/X86RetModPass.cppThe new pass implementation (function detection, stub injection, metadata generation)
lib/Target/X86/CMakeLists.txtAdd the new source file to the build

The pass registration in X86TargetMachine.cpp places it at the PreEmit position:

C++// In X86TargetMachine.cpp - addPreEmitPass()
void X86PassConfig::addPreEmitPass() {
  // ... existing passes ...
  addPass(new X86RetModPass());  // FunctionPeekaboo's pass
}

7. Function Attributes for Registration

Not every function should be instrumented — only functions explicitly marked by the developer. FunctionPeekaboo uses LLVM function attributes to identify which functions to instrument:

C++// In the implant source code, mark functions for masking:
__attribute__((annotate("peekaboo")))
void beacon_checkin() {
    // This function will be self-masking
    // The attribute tells X86RetModPass to instrument it
}

// Unmarked functions are left alone:
void helper_function() {
    // This function will NOT be instrumented
    // It stays as normal code
}

The X86RetModPass checks each function for this attribute. If present, the function is registered for instrumentation: its address and size are recorded in the .funcmeta section, and prologue/epilogue stubs are injected.

Registration Granularity

The developer has full control over which functions are masked. Functions that are called extremely frequently (hot loops) might be left unmasked for performance. Functions containing sensitive logic (C2 communication, credential handling, lateral movement) should be masked. The attribute-based approach lets the developer make this trade-off per function.

8. Building LLVM with FunctionPeekaboo

To use FunctionPeekaboo, you must build a custom LLVM/Clang toolchain with the patch applied. The typical workflow is:

Bash# 1. Clone the LLVM project
git clone https://github.com/llvm/llvm-project.git
cd llvm-project

# 2. Apply FunctionPeekaboo patches
#    (copies X86RetModPass.cpp, modifies X86TargetMachine.cpp, etc.)
git apply functionpeekaboo.patch

# 3. Build LLVM + Clang with X86 backend
cmake -S llvm -B build -G Ninja \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLVM_ENABLE_PROJECTS="clang" \
  -DLLVM_TARGETS_TO_BUILD="X86"

ninja -C build

# 4. The resulting clang binary supports FunctionPeekaboo
# Use it to compile your implant:
build/bin/clang -target x86_64-pc-windows-msvc \
  -O2 implant.c -o implant.exe

# 5. Post-process the PE to set the entry point
python3 modifyEP.py implant.exe

Build Time Consideration

Building LLVM from source with Clang typically takes 30–60 minutes on modern hardware with sufficient RAM (16 GB+ recommended). This is a one-time cost — once the toolchain is built, recompiling the implant is fast. The custom Clang binary is the only tool needed; no runtime dependencies are added.

9. The Compilation Flow with FunctionPeekaboo

Here is the complete compilation flow from source code to a self-masking binary:

Full Compilation Pipeline

Source (.c)
Functions with
peekaboo attribute
Clang Frontend
Parse → LLVM IR
Optimizer
Standard passes
X86 Backend
ISel → RegAlloc
→ X86RetModPass
modifyEP.py
Adjust PE entry
to .stub section

The key addition is X86RetModPass in the backend and modifyEP.py as a post-build step. The optimizer runs unchanged, meaning all standard optimizations (-O2, -O3, LTO) work normally. FunctionPeekaboo does not interfere with optimization because it runs after all optimization is complete.

Knowledge Check

Q1: At which stage of the LLVM X86 backend does X86RetModPass run?

A) Instruction Selection (ISel)
B) Register Allocation
C) PreEmit (just before code emission)
D) During the optimization middle-end

Q2: Why does FunctionPeekaboo modify the LLVM backend rather than the IR?

A) IR does not support function attributes
B) At the backend level, concrete machine instructions are available, allowing exact x86 stub injection
C) IR modifications would be too fast to execute
D) The backend is simpler to modify

Q3: What mechanism does FunctionPeekaboo use to determine which functions to instrument?

A) It instruments every function automatically
B) It only instruments functions over 100 bytes
C) It uses a configuration file listing function names
D) Function attributes (annotations) mark functions for masking