Module 4: Function Registration & X86RetModPass
The LLVM MachineFunctionPass that transforms ordinary functions into self-masking ones.
Module Objective
Understand how the X86RetModPass MachineFunctionPass works: how it identifies functions to instrument, how it generates .funcmeta entries, how it inserts prologue and epilogue stubs, and the key implementation considerations at the machine instruction level.
1. Pass Registration in the X86 Backend
The X86RetModPass is registered as a PreEmit pass in the X86 target machine configuration. In LLVM, each target defines its pass pipeline through a TargetPassConfig subclass:
C++// X86TargetMachine.cpp - Pass pipeline configuration
void X86PassConfig::addPreEmitPass() {
// Standard pre-emit passes run first
addPass(new X86IndirectBranchTrackingPass()); // CET IBT support
// FunctionPeekaboo's pass runs last in PreEmit
// This ensures all other transformations are complete
addPass(new X86RetModPass());
}
Running last in PreEmit is critical: it means all other passes (register allocation, frame layout, branch optimization) have finished. The function’s machine code is in its final form, and the pass can calculate exact byte sizes and offsets.
2. Function Identification
The pass’s runOnMachineFunction() method is called for every function in the compilation unit. The first thing it does is check whether this function should be instrumented:
C++bool X86RetModPass::runOnMachineFunction(MachineFunction &MF) {
// Get the underlying LLVM IR function
const Function &F = MF.getFunction();
// Check for the "peekaboo" annotation attribute
if (!F.hasFnAttribute("peekaboo"))
return false; // Skip this function - no changes
// Also skip very small functions (< minimum stub size)
// The prologue stub is 0x46 bytes; function must be larger
if (estimateFunctionSize(MF) < MIN_FUNCTION_SIZE)
return false;
// This function should be instrumented
instrumentFunction(MF);
return true; // We modified the function
}
Attribute Detection Mechanism
In the source code, the developer marks functions with __attribute__((annotate("peekaboo"))). This annotation survives through the entire LLVM pipeline — from C source through IR through MIR — and is accessible via the Function object at the backend level. The pass simply checks for this attribute string.
3. Function Size Calculation
Before injecting stubs, the pass needs to know the exact byte size of the function’s body. At the PreEmit stage, instructions are MachineInstr objects, not bytes, so the pass estimates the encoded size:
C++unsigned X86RetModPass::estimateFunctionSize(MachineFunction &MF) {
unsigned Size = 0;
for (MachineBasicBlock &MBB : MF) {
for (MachineInstr &MI : MBB) {
// Each instruction has a known encoding size
// The MCCodeEmitter can compute exact sizes
Size += getInstructionSize(MI);
}
}
return Size;
}
The function size is recorded in the .funcmeta entry so the handler knows exactly how many bytes to XOR at runtime. Accuracy is essential — encrypting too few bytes leaves code exposed, while encrypting too many corrupts adjacent code.
4. The Instrumentation Process
Once a function is identified for instrumentation, X86RetModPass performs three transformations:
Three-Step Instrumentation
- Inject Prologue Stub: Prepend a code sequence to the function’s entry block that calls the handler to decrypt the function body
- Replace All Returns: Find every
RETinstruction and replace it with an epilogue stub that calls the handler to re-encrypt before returning - Emit .funcmeta Entry: Generate a metadata entry recording the function’s RVA, body size, XOR key, and initial encryption state
C++void X86RetModPass::instrumentFunction(MachineFunction &MF) {
// Step 1: Generate a random XOR key for this function
uint8_t xorKey = generateRandomKey();
// Step 2: Calculate the function body size (excluding stubs)
unsigned bodySize = estimateFunctionSize(MF);
// Step 3: Inject prologue stub at function entry
MachineBasicBlock &EntryBB = MF.front();
injectPrologue(EntryBB, xorKey);
// Step 4: Find and replace all RET instructions
for (MachineBasicBlock &MBB : MF) {
for (auto MI = MBB.begin(); MI != MBB.end(); ) {
if (MI->isReturn()) {
MI = replaceReturn(MBB, MI); // Replace RET with epilogue stub
} else {
++MI;
}
}
}
// Step 5: Emit .funcmeta entry
emitFuncMetaEntry(MF, bodySize, xorKey);
}
5. Handling Multiple Return Points
C/C++ functions can have multiple return paths (early returns, conditional returns, switch/case exits). Every single RET instruction in the function must be replaced with an epilogue stub. Missing even one return point means the function could return without re-encrypting itself, leaving it in cleartext.
C++// Example: function with multiple returns
int process_command(int cmd) {
if (cmd == 0) return -1; // Early return (RET #1)
if (cmd == 1) {
do_thing_a();
return 0; // Normal return (RET #2)
}
if (cmd == 2) {
do_thing_b();
return 1; // Another return (RET #3)
}
return -2; // Default return (RET #4)
}
// X86RetModPass replaces ALL FOUR RET instructions with epilogue stubs
// The compiler guarantees it sees every return point at the MachineInstr level
Compiler-Level Advantage
This is a key advantage of compiler-level instrumentation. At the source level, a developer might miss return paths hidden in macros, inlined functions, or complex control flow. At the machine instruction level, every RET opcode is visible and replaceable — the compiler guarantees completeness.
6. .funcmeta Entry Emission
For each instrumented function, the pass emits a structured entry into the .funcmeta section. This is done through LLVM’s MC (Machine Code) layer, which handles section management and object file emission:
C++void X86RetModPass::emitFuncMetaEntry(
MachineFunction &MF, unsigned bodySize, uint8_t xorKey) {
// Get or create the .funcmeta section
MCContext &Ctx = MF.getContext();
MCSection *MetaSection = Ctx.getELFSection(
".funcmeta", ELF::SHT_PROGBITS,
ELF::SHF_ALLOC | ELF::SHF_WRITE); // RW, no execute
// Switch to the .funcmeta section
MCStreamer &Streamer = ...;
Streamer.switchSection(MetaSection);
// Emit the entry fields
MCSymbol *FuncSym = MF.getJTISymbol(0, Ctx);
Streamer.emitSymbolValue(FuncSym, 4); // Function RVA (4 bytes)
Streamer.emitIntValue(bodySize, 4); // Body size (4 bytes)
Streamer.emitIntValue(xorKey, 1); // XOR key (1 byte)
Streamer.emitIntValue(0, 1); // IsEncrypted = 0 initially
Streamer.emitIntValue(0, 2); // Padding (2 bytes)
}
Initial State: Not Encrypted
When the binary is first produced by the compiler/linker, functions are not encrypted — the IsEncrypted field is 0. The .stub initialization code encrypts them all on first run. This means the on-disk binary has functions in cleartext, but they are encrypted before any application code executes. This is acceptable because the on-disk binary can be protected by other means (signing, packing, etc.).
7. XOR Key Generation
Each function gets its own unique single-byte XOR key. A single byte (256 possible values, excluding 0) is used because:
| Consideration | Single-Byte XOR | Multi-Byte Key |
|---|---|---|
| Speed | Fastest possible — single XOR per byte | Slightly slower (index tracking, modular arithmetic) |
| Code size | Minimal handler code | Larger handler with key indexing logic |
| Cryptographic strength | Weak (brute-forceable in 255 attempts) | Stronger but still not truly secure without proper cipher |
| Purpose | Sufficient to defeat signature scanning | Overkill — the goal is masking, not encryption |
Masking vs Encryption
FunctionPeekaboo is a masking technique, not a cryptographic one. The XOR key prevents static signature matching (YARA rules, byte-pattern scans) but would not withstand cryptanalysis. A determined analyst with a memory dump could XOR-decrypt any function in seconds. The point is not to make analysis impossible — it is to make automated scanning fail, which single-byte XOR accomplishes effectively.
8. Register Preservation
The injected prologue and epilogue stubs must not corrupt any registers that the original function uses. At the PreEmit stage, register allocation is complete, so the pass knows exactly which registers are live:
C++void X86RetModPass::injectPrologue(MachineBasicBlock &EntryBB, uint8_t key) {
// The prologue stub must:
// 1. Save ALL registers it uses (push to stack)
// 2. Save flags (pushfq)
// 3. Call the handler
// 4. Restore flags (popfq)
// 5. Restore ALL registers (pop from stack)
// 6. Fall through to the (now decrypted) function body
// The stub uses a CALL/POP trick for PIC (position-independent)
// addressing, which means it uses one register for the return
// address. All other registers are preserved via push/pop.
}
Why Register Safety Matters
If the prologue stub accidentally clobbers RAX and the function body immediately uses RAX as an input parameter (which it can under the Microsoft x64 calling convention — but typically RCX, RDX, R8, R9 are parameter registers), the program would crash or produce wrong results. The stub saves and restores all volatile registers to guarantee transparent operation.
9. Interaction with Optimizations
Because X86RetModPass runs after all optimization passes, it does not interfere with any standard compiler optimizations:
| Optimization | Interaction with X86RetModPass |
|---|---|
| Inlining | Already happened at IR level. If a peekaboo function is inlined into a non-peekaboo function, the inlined copy is not instrumented (it’s no longer a separate function) |
| Tail call optimization | Tail calls convert a CALL + RET to a JMP. X86RetModPass must also handle JMP instructions that serve as tail returns |
| LTO (Link-Time Optimization) | LTO merges and optimizes across translation units at link time. X86RetModPass runs per-function after LTO, so it sees the final optimized code |
| Sibling call optimization | Similar to tail calls; the pass checks for jump instructions that exit the function scope |
Tail Call Edge Case
When the compiler optimizes a function call followed by a return into a single jump (tail call), the RET instruction disappears. The jump target is in another function, so the current function never formally returns. X86RetModPass must detect these tail-call jumps and insert the re-encryption epilogue before the jump, not after a nonexistent return.
10. Pass Output Summary
After X86RetModPass processes a function, the output consists of:
Instrumented Function Layout
Text+----------------------------------+
| Prologue Stub (0x46 bytes) | ← Injected: saves regs, calls handler to decrypt
+----------------------------------+
| Original Function Body | ← Unchanged machine code (will be XOR'd at runtime)
| ... instructions ... |
| ... multiple basic blocks ... |
+----------------------------------+
| Epilogue Stub (at each RET) | ← Injected: calls handler to re-encrypt, then RET
+----------------------------------+
.funcmeta entry:
FunctionRVA = RVA of function body start (after prologue)
FunctionSize = byte size of original body (excluding stubs)
XorKey = random byte (1-255)
IsEncrypted = 0 (set to 1 by .stub initialization)
Knowledge Check
Q1: Why does X86RetModPass run at the PreEmit stage rather than earlier in the pipeline?
Q2: What happens if X86RetModPass misses a return instruction in a function?
Q3: Why does FunctionPeekaboo use a single-byte XOR key rather than a stronger cipher?