Module 2: x64 Stack Frames & Unwinding
The architecture that makes stack spoofing both possible and incredibly difficult.
Why This Module?
SilentMoonwalk fabricates stack frames that must survive RtlVirtualUnwind inspection. To understand how that works, you need to know how x64 Windows structures stack frames, stores unwind metadata, and uses that metadata to reconstruct call chains. This is the foundation everything else builds upon.
The x64 Stack: Basics
On x64 Windows, the stack grows downward (from high addresses to low). The RSP register always points to the top of the stack (the lowest used address). Key rules:
- 16-byte alignment — RSP must be 16-byte aligned before the CALL instruction. After CALL pushes the 8-byte return address, RSP is offset by 8, and the called function must re-align.
- Shadow space — Every function must reserve 32 bytes (0x20) of shadow space above the return address for the callee's use. This is part of the x64 calling convention.
- No mandatory frame pointer — Unlike x86, x64 functions are not required to use RBP as a frame pointer. Most functions use RSP-relative addressing exclusively.
x64 Stack Frame Layout (typical function with 2 local vars)
The x64 Calling Convention
x64 Windows uses a single calling convention (no more __cdecl vs __stdcall confusion):
| Parameter | Location | Notes |
|---|---|---|
| 1st integer/pointer | RCX | Also has shadow space at [RSP+8] |
| 2nd integer/pointer | RDX | Shadow space at [RSP+16] |
| 3rd integer/pointer | R8 | Shadow space at [RSP+24] |
| 4th integer/pointer | R9 | Shadow space at [RSP+32] |
| 5th+ parameters | Stack | At [RSP+40], [RSP+48], ... |
| Return value | RAX | Up to 64 bits |
| Volatile registers | RAX, RCX, RDX, R8-R11 | Caller must save if needed |
| Non-volatile registers | RBX, RBP, RDI, RSI, R12-R15 | Callee must preserve (push/pop) |
RUNTIME_FUNCTION: The Unwind Metadata
On x64 Windows, every non-leaf function must have a RUNTIME_FUNCTION entry in the module's .pdata section. This is the key structure that makes structured exception handling and stack unwinding work:
C// From winnt.h - the entry in .pdata for each function
typedef struct _RUNTIME_FUNCTION {
DWORD BeginAddress; // RVA of function start
DWORD EndAddress; // RVA of function end
DWORD UnwindData; // RVA of UNWIND_INFO structure
} RUNTIME_FUNCTION, *PRUNTIME_FUNCTION;
// The .pdata section is a sorted array of these entries.
// Given an instruction pointer (RIP), the OS binary-searches
// this array to find which function contains that address.
Leaf vs Non-Leaf Functions
A leaf function is one that does not call any other function, does not modify RSP (beyond the return address), and does not use SEH. Leaf functions do NOT need a RUNTIME_FUNCTION entry. For stack unwinding, the unwinder simply pops the return address from [RSP] and continues. All other functions (non-leaf) must have unwind metadata.
UNWIND_INFO: How to Reverse a Function's Prologue
The UNWIND_INFO structure describes exactly what a function's prologue did to the stack, so the unwinder can reverse those operations:
Ctypedef struct _UNWIND_INFO {
UBYTE Version : 3; // Currently 1 or 2
UBYTE Flags : 5; // UNW_FLAG_NHANDLER, UNW_FLAG_EHANDLER, UNW_FLAG_CHAININFO
UBYTE SizeOfProlog; // Size in bytes of the function's prologue
UBYTE CountOfCodes; // Number of UNWIND_CODE entries
UBYTE FrameRegister : 4; // If nonzero, the function uses a frame pointer (0=none, 5=RBP)
UBYTE FrameOffset : 4; // Scaled offset from RSP for the frame register
UNWIND_CODE UnwindCode[]; // Variable-length array of unwind operations
// Followed by optional exception handler or chained RUNTIME_FUNCTION
} UNWIND_INFO;
typedef union _UNWIND_CODE {
struct {
UBYTE CodeOffset; // Offset in prologue where this op occurred
UBYTE UnwindOp : 4; // The operation type
UBYTE OpInfo : 4; // Operation-specific info (register number, etc.)
};
USHORT FrameOffset; // Used for UWOP_ALLOC_LARGE operand
} UNWIND_CODE;
Unwind Operation Codes
Each UNWIND_CODE describes one prologue operation that the unwinder must reverse:
| Op Code | Value | Meaning | Reversal |
|---|---|---|---|
| UWOP_PUSH_NONVOL | 0 | Pushed a non-volatile register | Pop register from [RSP], RSP += 8 |
| UWOP_ALLOC_LARGE | 1 | Allocated large stack space | RSP += allocation size |
| UWOP_ALLOC_SMALL | 2 | Allocated 8-128 bytes on stack | RSP += (OpInfo * 8) + 8 |
| UWOP_SET_FPREG | 3 | Set frame pointer register | RSP = FrameRegister - FrameOffset*16 |
| UWOP_SAVE_NONVOL | 4 | Saved register to stack (2-slot) | Restore register from saved location |
| UWOP_SAVE_NONVOL_FAR | 5 | Saved register to stack (3-slot) | Restore register from far offset |
| UWOP_SAVE_XMM128 | 8 | Saved 128-bit XMM register | Restore XMM from saved location |
| UWOP_PUSH_MACHFRAME | 10 | Machine frame push (interrupts) | Restore RSP from machine frame |
RtlVirtualUnwind: The Unwinding Engine
This is the core function that EDRs rely on to walk call stacks. Given an instruction pointer (RIP) and a stack pointer (RSP), it computes the calling function's RIP and RSP:
CPEXCEPTION_ROUTINE RtlVirtualUnwind(
DWORD HandlerType, // UNW_FLAG_NHANDLER for stack walk
DWORD64 ImageBase, // Base of module containing RIP
DWORD64 ControlPc, // Current RIP (instruction pointer)
PRUNTIME_FUNCTION FunctionEntry, // RUNTIME_FUNCTION for this RIP
PCONTEXT ContextRecord, // CPU context (RSP, RBP, etc.)
PVOID *HandlerData, // Exception handler data (output)
PDWORD64 EstablisherFrame, // Frame pointer value (output)
PKNONVOLATILE_CONTEXT_POINTERS ContextPointers // Where registers were saved (output)
);
// The algorithm:
// 1. Look up UNWIND_INFO via FunctionEntry->UnwindData
// 2. Determine which unwind codes apply (based on ControlPc offset in function)
// 3. Reverse each applicable operation:
// - UWOP_ALLOC_SMALL: add size to RSP
// - UWOP_PUSH_NONVOL: pop register, add 8 to RSP
// - etc.
// 4. After all codes: [RSP] contains the return address = caller's RIP
// 5. RSP += 8 (skip over return address)
// 6. Update ContextRecord with new RIP and RSP
// 7. Return to allow walking the next frame
Critical for SilentMoonwalk
RtlVirtualUnwind does NOT simply read return addresses from the stack. It computes the caller's RSP by reversing prologue operations, then reads the return address from the computed location. This means that spoofing a return address alone is insufficient — you must ensure the entire frame layout matches what the unwind codes expect. If the unwind codes say "RSP += 0x40 for UWOP_ALLOC_SMALL, then pop RBX, then pop RBP", then those exact values must be at those exact stack offsets.
Walking a Complete Stack
A full stack walk is just a loop calling RtlVirtualUnwind repeatedly until it reaches the thread entry point:
C// Simplified stack walk loop (what RtlWalkFrameChain does internally)
void WalkStack(PCONTEXT ctx) {
DWORD64 ImageBase;
PRUNTIME_FUNCTION pFunc;
while (ctx->Rip != 0) {
// 1. Find RUNTIME_FUNCTION for current RIP
pFunc = RtlLookupFunctionEntry(ctx->Rip, &ImageBase, NULL);
if (pFunc == NULL) {
// Leaf function: return address is at [RSP]
ctx->Rip = *(DWORD64*)(ctx->Rsp);
ctx->Rsp += 8;
} else {
// Non-leaf: use unwind info to compute caller's context
PVOID handlerData;
DWORD64 establisher;
RtlVirtualUnwind(
UNW_FLAG_NHANDLER, ImageBase, ctx->Rip,
pFunc, ctx, &handlerData, &establisher, NULL
);
}
// ctx now contains the CALLER's RIP and RSP
printf(" Frame: RIP=0x%llx RSP=0x%llx\n", ctx->Rip, ctx->Rsp);
}
}
The Chaining Mechanism (UNW_FLAG_CHAININFO)
Some functions have multiple prologue regions described by chained UNWIND_INFO structures. When the Flags field includes UNW_FLAG_CHAININFO, a RUNTIME_FUNCTION follows the unwind codes, pointing to the parent unwind info. The unwinder processes the current codes, then follows the chain to the parent, applying those codes too. SilentMoonwalk must handle this correctly when selecting target functions.
RtlVirtualUnwind: One Iteration
(current frame)
FunctionEntry
→ UNWIND_INFO
ops on RSP
(previous frame)
Why This Matters for SilentMoonwalk
SilentMoonwalk must construct fake stack frames where:
- The return address at each frame points to an instruction inside a real function with a valid
RUNTIME_FUNCTIONentry. - The RSP offset between frames exactly matches what
RtlVirtualUnwindwill compute by reversing that function's unwind codes. - Non-volatile register save slots contain plausible values (not NULL, not pointing to freed memory).
- The final frame in the chain terminates cleanly at a known thread entry point like
RtlUserThreadStartorBaseThreadInitThunk.
Getting any of these wrong causes the stack walk to diverge, producing impossible frames or crashing — both of which are easily detected.
Pop Quiz: x64 Stack & Unwinding
Q1: What does the RUNTIME_FUNCTION structure's UnwindData field point to?
Q2: How does RtlVirtualUnwind find the caller's return address?
Q3: A leaf function in x64 terminology is one that: