Difficulty: Beginner

Module 2: x64 Stack Frames & Unwinding

The architecture that makes stack spoofing both possible and incredibly difficult.

Why This Module?

SilentMoonwalk fabricates stack frames that must survive RtlVirtualUnwind inspection. To understand how that works, you need to know how x64 Windows structures stack frames, stores unwind metadata, and uses that metadata to reconstruct call chains. This is the foundation everything else builds upon.

The x64 Stack: Basics

On x64 Windows, the stack grows downward (from high addresses to low). The RSP register always points to the top of the stack (the lowest used address). Key rules:

x64 Stack Frame Layout (typical function with 2 local vars)

Higher addresses (stack bottom)
Caller's frame...
Return Address (8 bytes) ← pushed by CALL
Shadow space: home for RCX (8 bytes)
Shadow space: home for RDX (8 bytes)
Shadow space: home for R8 (8 bytes)
Shadow space: home for R9 (8 bytes)
Local variable 1 (8 bytes)
Local variable 2 (8 bytes)
RSP → (top of stack, 16-byte aligned)
Lower addresses (stack top)

The x64 Calling Convention

x64 Windows uses a single calling convention (no more __cdecl vs __stdcall confusion):

ParameterLocationNotes
1st integer/pointerRCXAlso has shadow space at [RSP+8]
2nd integer/pointerRDXShadow space at [RSP+16]
3rd integer/pointerR8Shadow space at [RSP+24]
4th integer/pointerR9Shadow space at [RSP+32]
5th+ parametersStackAt [RSP+40], [RSP+48], ...
Return valueRAXUp to 64 bits
Volatile registersRAX, RCX, RDX, R8-R11Caller must save if needed
Non-volatile registersRBX, RBP, RDI, RSI, R12-R15Callee must preserve (push/pop)

RUNTIME_FUNCTION: The Unwind Metadata

On x64 Windows, every non-leaf function must have a RUNTIME_FUNCTION entry in the module's .pdata section. This is the key structure that makes structured exception handling and stack unwinding work:

C// From winnt.h - the entry in .pdata for each function
typedef struct _RUNTIME_FUNCTION {
    DWORD BeginAddress;      // RVA of function start
    DWORD EndAddress;        // RVA of function end
    DWORD UnwindData;        // RVA of UNWIND_INFO structure
} RUNTIME_FUNCTION, *PRUNTIME_FUNCTION;

// The .pdata section is a sorted array of these entries.
// Given an instruction pointer (RIP), the OS binary-searches
// this array to find which function contains that address.

Leaf vs Non-Leaf Functions

A leaf function is one that does not call any other function, does not modify RSP (beyond the return address), and does not use SEH. Leaf functions do NOT need a RUNTIME_FUNCTION entry. For stack unwinding, the unwinder simply pops the return address from [RSP] and continues. All other functions (non-leaf) must have unwind metadata.

UNWIND_INFO: How to Reverse a Function's Prologue

The UNWIND_INFO structure describes exactly what a function's prologue did to the stack, so the unwinder can reverse those operations:

Ctypedef struct _UNWIND_INFO {
    UBYTE Version : 3;       // Currently 1 or 2
    UBYTE Flags : 5;         // UNW_FLAG_NHANDLER, UNW_FLAG_EHANDLER, UNW_FLAG_CHAININFO
    UBYTE SizeOfProlog;      // Size in bytes of the function's prologue
    UBYTE CountOfCodes;      // Number of UNWIND_CODE entries
    UBYTE FrameRegister : 4; // If nonzero, the function uses a frame pointer (0=none, 5=RBP)
    UBYTE FrameOffset : 4;   // Scaled offset from RSP for the frame register
    UNWIND_CODE UnwindCode[]; // Variable-length array of unwind operations
    // Followed by optional exception handler or chained RUNTIME_FUNCTION
} UNWIND_INFO;

typedef union _UNWIND_CODE {
    struct {
        UBYTE CodeOffset;    // Offset in prologue where this op occurred
        UBYTE UnwindOp : 4;  // The operation type
        UBYTE OpInfo : 4;    // Operation-specific info (register number, etc.)
    };
    USHORT FrameOffset;      // Used for UWOP_ALLOC_LARGE operand
} UNWIND_CODE;

Unwind Operation Codes

Each UNWIND_CODE describes one prologue operation that the unwinder must reverse:

Op CodeValueMeaningReversal
UWOP_PUSH_NONVOL0Pushed a non-volatile registerPop register from [RSP], RSP += 8
UWOP_ALLOC_LARGE1Allocated large stack spaceRSP += allocation size
UWOP_ALLOC_SMALL2Allocated 8-128 bytes on stackRSP += (OpInfo * 8) + 8
UWOP_SET_FPREG3Set frame pointer registerRSP = FrameRegister - FrameOffset*16
UWOP_SAVE_NONVOL4Saved register to stack (2-slot)Restore register from saved location
UWOP_SAVE_NONVOL_FAR5Saved register to stack (3-slot)Restore register from far offset
UWOP_SAVE_XMM1288Saved 128-bit XMM registerRestore XMM from saved location
UWOP_PUSH_MACHFRAME10Machine frame push (interrupts)Restore RSP from machine frame

RtlVirtualUnwind: The Unwinding Engine

This is the core function that EDRs rely on to walk call stacks. Given an instruction pointer (RIP) and a stack pointer (RSP), it computes the calling function's RIP and RSP:

CPEXCEPTION_ROUTINE RtlVirtualUnwind(
    DWORD                          HandlerType,     // UNW_FLAG_NHANDLER for stack walk
    DWORD64                        ImageBase,        // Base of module containing RIP
    DWORD64                        ControlPc,        // Current RIP (instruction pointer)
    PRUNTIME_FUNCTION              FunctionEntry,    // RUNTIME_FUNCTION for this RIP
    PCONTEXT                       ContextRecord,    // CPU context (RSP, RBP, etc.)
    PVOID                          *HandlerData,     // Exception handler data (output)
    PDWORD64                       EstablisherFrame, // Frame pointer value (output)
    PKNONVOLATILE_CONTEXT_POINTERS ContextPointers   // Where registers were saved (output)
);

// The algorithm:
// 1. Look up UNWIND_INFO via FunctionEntry->UnwindData
// 2. Determine which unwind codes apply (based on ControlPc offset in function)
// 3. Reverse each applicable operation:
//    - UWOP_ALLOC_SMALL: add size to RSP
//    - UWOP_PUSH_NONVOL: pop register, add 8 to RSP
//    - etc.
// 4. After all codes: [RSP] contains the return address = caller's RIP
// 5. RSP += 8 (skip over return address)
// 6. Update ContextRecord with new RIP and RSP
// 7. Return to allow walking the next frame

Critical for SilentMoonwalk

RtlVirtualUnwind does NOT simply read return addresses from the stack. It computes the caller's RSP by reversing prologue operations, then reads the return address from the computed location. This means that spoofing a return address alone is insufficient — you must ensure the entire frame layout matches what the unwind codes expect. If the unwind codes say "RSP += 0x40 for UWOP_ALLOC_SMALL, then pop RBX, then pop RBP", then those exact values must be at those exact stack offsets.

Walking a Complete Stack

A full stack walk is just a loop calling RtlVirtualUnwind repeatedly until it reaches the thread entry point:

C// Simplified stack walk loop (what RtlWalkFrameChain does internally)
void WalkStack(PCONTEXT ctx) {
    DWORD64 ImageBase;
    PRUNTIME_FUNCTION pFunc;

    while (ctx->Rip != 0) {
        // 1. Find RUNTIME_FUNCTION for current RIP
        pFunc = RtlLookupFunctionEntry(ctx->Rip, &ImageBase, NULL);

        if (pFunc == NULL) {
            // Leaf function: return address is at [RSP]
            ctx->Rip = *(DWORD64*)(ctx->Rsp);
            ctx->Rsp += 8;
        } else {
            // Non-leaf: use unwind info to compute caller's context
            PVOID handlerData;
            DWORD64 establisher;
            RtlVirtualUnwind(
                UNW_FLAG_NHANDLER, ImageBase, ctx->Rip,
                pFunc, ctx, &handlerData, &establisher, NULL
            );
        }
        // ctx now contains the CALLER's RIP and RSP
        printf("  Frame: RIP=0x%llx RSP=0x%llx\n", ctx->Rip, ctx->Rsp);
    }
}

The Chaining Mechanism (UNW_FLAG_CHAININFO)

Some functions have multiple prologue regions described by chained UNWIND_INFO structures. When the Flags field includes UNW_FLAG_CHAININFO, a RUNTIME_FUNCTION follows the unwind codes, pointing to the parent unwind info. The unwinder processes the current codes, then follows the chain to the parent, applying those codes too. SilentMoonwalk must handle this correctly when selecting target functions.

RtlVirtualUnwind: One Iteration

RIP + RSP
(current frame)
RtlLookup
FunctionEntry
RUNTIME_FUNCTION
→ UNWIND_INFO
Reverse prologue
ops on RSP
Caller RIP + RSP
(previous frame)

Why This Matters for SilentMoonwalk

SilentMoonwalk must construct fake stack frames where:

  1. The return address at each frame points to an instruction inside a real function with a valid RUNTIME_FUNCTION entry.
  2. The RSP offset between frames exactly matches what RtlVirtualUnwind will compute by reversing that function's unwind codes.
  3. Non-volatile register save slots contain plausible values (not NULL, not pointing to freed memory).
  4. The final frame in the chain terminates cleanly at a known thread entry point like RtlUserThreadStart or BaseThreadInitThunk.

Getting any of these wrong causes the stack walk to diverge, producing impossible frames or crashing — both of which are easily detected.

Pop Quiz: x64 Stack & Unwinding

Q1: What does the RUNTIME_FUNCTION structure's UnwindData field point to?

UnwindData is an RVA pointing to the UNWIND_INFO structure, which contains the unwind codes that describe the function's prologue operations. The unwinder reverses these operations to compute the caller's RSP and find the return address.

Q2: How does RtlVirtualUnwind find the caller's return address?

RtlVirtualUnwind processes each UNWIND_CODE to reverse the function's prologue (undoing SUB RSP, PUSH operations, etc.), arriving at the correct RSP value where the return address was placed by the CALL instruction. It then reads [RSP] to get the caller's RIP.

Q3: A leaf function in x64 terminology is one that:

A leaf function makes no calls, doesn't adjust RSP (beyond the implicit CALL/RET), and uses no structured exception handling. It doesn't need a RUNTIME_FUNCTION entry. The unwinder handles it by simply reading the return address from [RSP].