Difficulty: Intermediate

Module 4: Function Registration & X86RetModPass

The LLVM MachineFunctionPass that transforms ordinary functions into self-masking ones.

Module Objective

Understand how the X86RetModPass MachineFunctionPass works: how it identifies functions to instrument, how it generates .funcmeta entries, how it inserts prologue and epilogue stubs, and the key implementation considerations at the machine instruction level.

1. Pass Registration in the X86 Backend

The X86RetModPass is registered as a PreEmit pass in the X86 target machine configuration. In LLVM, each target defines its pass pipeline through a TargetPassConfig subclass:

C++// X86TargetMachine.cpp - Pass pipeline configuration
void X86PassConfig::addPreEmitPass() {
  // Standard pre-emit passes run first
  addPass(new X86IndirectBranchTrackingPass());  // CET IBT support

  // FunctionPeekaboo's pass runs last in PreEmit
  // This ensures all other transformations are complete
  addPass(new X86RetModPass());
}

Running last in PreEmit is critical: it means all other passes (register allocation, frame layout, branch optimization) have finished. The function’s machine code is in its final form, and the pass can calculate exact byte sizes and offsets.

2. Function Identification

The pass’s runOnMachineFunction() method is called for every function in the compilation unit. The first thing it does is check whether this function should be instrumented:

C++bool X86RetModPass::runOnMachineFunction(MachineFunction &MF) {
  // Get the underlying LLVM IR function
  const Function &F = MF.getFunction();

  // Check for the "peekaboo" annotation attribute
  if (!F.hasFnAttribute("peekaboo"))
    return false;  // Skip this function - no changes

  // Also skip very small functions (< minimum stub size)
  // The prologue stub is 0x46 bytes; function must be larger
  if (estimateFunctionSize(MF) < MIN_FUNCTION_SIZE)
    return false;

  // This function should be instrumented
  instrumentFunction(MF);
  return true;  // We modified the function
}

Attribute Detection Mechanism

In the source code, the developer marks functions with __attribute__((annotate("peekaboo"))). This annotation survives through the entire LLVM pipeline — from C source through IR through MIR — and is accessible via the Function object at the backend level. The pass simply checks for this attribute string.

3. Function Size Calculation

Before injecting stubs, the pass needs to know the exact byte size of the function’s body. At the PreEmit stage, instructions are MachineInstr objects, not bytes, so the pass estimates the encoded size:

C++unsigned X86RetModPass::estimateFunctionSize(MachineFunction &MF) {
  unsigned Size = 0;
  for (MachineBasicBlock &MBB : MF) {
    for (MachineInstr &MI : MBB) {
      // Each instruction has a known encoding size
      // The MCCodeEmitter can compute exact sizes
      Size += getInstructionSize(MI);
    }
  }
  return Size;
}

The function size is recorded in the .funcmeta entry so the handler knows exactly how many bytes to XOR at runtime. Accuracy is essential — encrypting too few bytes leaves code exposed, while encrypting too many corrupts adjacent code.

4. The Instrumentation Process

Once a function is identified for instrumentation, X86RetModPass performs three transformations:

Three-Step Instrumentation

Inject Prologue Stub: Prepend a code sequence to the function’s entry block that calls the handler to decrypt the function body
Replace All Returns: Find every RET instruction and replace it with an epilogue stub that calls the handler to re-encrypt before returning
Emit .funcmeta Entry: Generate a metadata entry recording the function’s RVA, body size, XOR key, and initial encryption state

C++void X86RetModPass::instrumentFunction(MachineFunction &MF) {
  // Step 1: Generate a random XOR key for this function
  uint8_t xorKey = generateRandomKey();

  // Step 2: Calculate the function body size (excluding stubs)
  unsigned bodySize = estimateFunctionSize(MF);

  // Step 3: Inject prologue stub at function entry
  MachineBasicBlock &EntryBB = MF.front();
  injectPrologue(EntryBB, xorKey);

  // Step 4: Find and replace all RET instructions
  for (MachineBasicBlock &MBB : MF) {
    for (auto MI = MBB.begin(); MI != MBB.end(); ) {
      if (MI->isReturn()) {
        MI = replaceReturn(MBB, MI);  // Replace RET with epilogue stub
      } else {
        ++MI;
      }
    }
  }

  // Step 5: Emit .funcmeta entry
  emitFuncMetaEntry(MF, bodySize, xorKey);
}

5. Handling Multiple Return Points

C/C++ functions can have multiple return paths (early returns, conditional returns, switch/case exits). Every single RET instruction in the function must be replaced with an epilogue stub. Missing even one return point means the function could return without re-encrypting itself, leaving it in cleartext.

C++// Example: function with multiple returns
int process_command(int cmd) {
    if (cmd == 0) return -1;      // Early return (RET #1)

    if (cmd == 1) {
        do_thing_a();
        return 0;                  // Normal return (RET #2)
    }

    if (cmd == 2) {
        do_thing_b();
        return 1;                  // Another return (RET #3)
    }

    return -2;                     // Default return (RET #4)
}

// X86RetModPass replaces ALL FOUR RET instructions with epilogue stubs
// The compiler guarantees it sees every return point at the MachineInstr level

Compiler-Level Advantage

This is a key advantage of compiler-level instrumentation. At the source level, a developer might miss return paths hidden in macros, inlined functions, or complex control flow. At the machine instruction level, every RET opcode is visible and replaceable — the compiler guarantees completeness.

6. .funcmeta Entry Emission

For each instrumented function, the pass emits a structured entry into the .funcmeta section. This is done through LLVM’s MC (Machine Code) layer, which handles section management and object file emission:

C++void X86RetModPass::emitFuncMetaEntry(
    MachineFunction &MF, unsigned bodySize, uint8_t xorKey) {

  // Get or create the .funcmeta section
  MCContext &Ctx = MF.getContext();
  MCSection *MetaSection = Ctx.getELFSection(
      ".funcmeta", ELF::SHT_PROGBITS,
      ELF::SHF_ALLOC | ELF::SHF_WRITE);  // RW, no execute

  // Switch to the .funcmeta section
  MCStreamer &Streamer = ...;
  Streamer.switchSection(MetaSection);

  // Emit the entry fields
  MCSymbol *FuncSym = MF.getJTISymbol(0, Ctx);
  Streamer.emitSymbolValue(FuncSym, 4);   // Function RVA (4 bytes)
  Streamer.emitIntValue(bodySize, 4);      // Body size (4 bytes)
  Streamer.emitIntValue(xorKey, 1);        // XOR key (1 byte)
  Streamer.emitIntValue(0, 1);             // IsEncrypted = 0 initially
  Streamer.emitIntValue(0, 2);             // Padding (2 bytes)
}

Initial State: Not Encrypted

When the binary is first produced by the compiler/linker, functions are not encrypted — the IsEncrypted field is 0. The .stub initialization code encrypts them all on first run. This means the on-disk binary has functions in cleartext, but they are encrypted before any application code executes. This is acceptable because the on-disk binary can be protected by other means (signing, packing, etc.).

7. XOR Key Generation

Each function gets its own unique single-byte XOR key. A single byte (256 possible values, excluding 0) is used because:

Consideration	Single-Byte XOR	Multi-Byte Key
Speed	Fastest possible — single XOR per byte	Slightly slower (index tracking, modular arithmetic)
Code size	Minimal handler code	Larger handler with key indexing logic
Cryptographic strength	Weak (brute-forceable in 255 attempts)	Stronger but still not truly secure without proper cipher
Purpose	Sufficient to defeat signature scanning	Overkill — the goal is masking, not encryption

Masking vs Encryption

FunctionPeekaboo is a masking technique, not a cryptographic one. The XOR key prevents static signature matching (YARA rules, byte-pattern scans) but would not withstand cryptanalysis. A determined analyst with a memory dump could XOR-decrypt any function in seconds. The point is not to make analysis impossible — it is to make automated scanning fail, which single-byte XOR accomplishes effectively.

8. Register Preservation

The injected prologue and epilogue stubs must not corrupt any registers that the original function uses. At the PreEmit stage, register allocation is complete, so the pass knows exactly which registers are live:

C++void X86RetModPass::injectPrologue(MachineBasicBlock &EntryBB, uint8_t key) {
  // The prologue stub must:
  // 1. Save ALL registers it uses (push to stack)
  // 2. Save flags (pushfq)
  // 3. Call the handler
  // 4. Restore flags (popfq)
  // 5. Restore ALL registers (pop from stack)
  // 6. Fall through to the (now decrypted) function body

  // The stub uses a CALL/POP trick for PIC (position-independent)
  // addressing, which means it uses one register for the return
  // address. All other registers are preserved via push/pop.
}

Why Register Safety Matters

If the prologue stub accidentally clobbers RAX and the function body immediately uses RAX as an input parameter (which it can under the Microsoft x64 calling convention — but typically RCX, RDX, R8, R9 are parameter registers), the program would crash or produce wrong results. The stub saves and restores all volatile registers to guarantee transparent operation.

9. Interaction with Optimizations

Because X86RetModPass runs after all optimization passes, it does not interfere with any standard compiler optimizations:

Optimization	Interaction with X86RetModPass
Inlining	Already happened at IR level. If a peekaboo function is inlined into a non-peekaboo function, the inlined copy is not instrumented (it’s no longer a separate function)
Tail call optimization	Tail calls convert a `CALL + RET` to a `JMP`. X86RetModPass must also handle `JMP` instructions that serve as tail returns
LTO (Link-Time Optimization)	LTO merges and optimizes across translation units at link time. X86RetModPass runs per-function after LTO, so it sees the final optimized code
Sibling call optimization	Similar to tail calls; the pass checks for jump instructions that exit the function scope

Tail Call Edge Case

When the compiler optimizes a function call followed by a return into a single jump (tail call), the RET instruction disappears. The jump target is in another function, so the current function never formally returns. X86RetModPass must detect these tail-call jumps and insert the re-encryption epilogue before the jump, not after a nonexistent return.

10. Pass Output Summary

After X86RetModPass processes a function, the output consists of:

Instrumented Function Layout

Text+----------------------------------+
| Prologue Stub (0x46 bytes)       |  ← Injected: saves regs, calls handler to decrypt
+----------------------------------+
| Original Function Body           |  ← Unchanged machine code (will be XOR'd at runtime)
|   ... instructions ...           |
|   ... multiple basic blocks ...  |
+----------------------------------+
| Epilogue Stub (at each RET)      |  ← Injected: calls handler to re-encrypt, then RET
+----------------------------------+

.funcmeta entry:
  FunctionRVA  = RVA of function body start (after prologue)
  FunctionSize = byte size of original body (excluding stubs)
  XorKey       = random byte (1-255)
  IsEncrypted  = 0 (set to 1 by .stub initialization)

Knowledge Check

Q1: Why does X86RetModPass run at the PreEmit stage rather than earlier in the pipeline?

A) PreEmit is the only stage that supports custom passes

B) At PreEmit, register allocation is complete and instructions are in final form, allowing exact byte-level stub injection

C) PreEmit runs before optimization, giving better performance

D) The pass needs access to source code, which is only available at PreEmit

Q2: What happens if X86RetModPass misses a return instruction in a function?

A) The program crashes immediately

B) The function is not encrypted at all

C) The function can return without re-encrypting, leaving its body in cleartext

D) The handler automatically detects and fixes the problem

Q3: Why does FunctionPeekaboo use a single-byte XOR key rather than a stronger cipher?

A) The goal is defeating automated signature scanning, not resisting cryptanalysis; single-byte XOR is fast and sufficient for masking

B) LLVM does not support multi-byte operations

C) Stronger ciphers would corrupt the function code

D) Windows only supports single-byte XOR for memory operations

← Previous: PE Internals & Sections Next: Prologue & Epilogue Stubs →