Difficulty: Beginner

Module 3: x64 Stack Unwinding

How Windows navigates the call stack without frame pointers — and why Draugr must speak the same language.

Module Objective

Draugr constructs synthetic stack frames that must fool Windows' stack unwinder. To build convincing fakes, you need to understand the real thing: RUNTIME_FUNCTION, UNWIND_INFO, UNWIND_CODEs, and RtlVirtualUnwind. This module covers the entire x64 unwind mechanism from first principles.

1. The Frame Pointer Problem

In 32-bit x86, stack walking was simple. Every function established a frame pointer by saving EBP, then setting EBP = ESP. To walk the stack, you simply followed the chain of saved EBP values — each one pointed to the previous frame's EBP, creating a linked list up the call stack:

x86 ASM; x86 standard function prolog:
push ebp           ; Save caller's frame pointer
mov  ebp, esp      ; Establish new frame pointer
sub  esp, 0x20     ; Allocate local variables

; Stack walking: follow EBP chain
; [EBP]     = previous EBP (caller's frame pointer)
; [EBP + 4] = return address (where to resume after RET)

x64 changed everything. The Microsoft x64 ABI does not require frame pointers. Most functions do not save or use RBP as a frame pointer. The reason: RBP is freed up as a general-purpose register, giving the compiler an extra register for optimization. On x64 with only 16 GPRs, that extra register matters.

The Consequence: No RBP Chain to Follow

Without mandatory frame pointers, the classic EBP-chain walking technique is useless on x64. Windows needed a completely different mechanism to walk the stack. The solution: metadata-driven unwinding. Instead of following a linked list at runtime, the unwinder reads compile-time metadata that describes exactly what each function's prolog does to the stack.

2. RUNTIME_FUNCTION

Every non-leaf function (a function that calls other functions or modifies RSP) in a PE image has a corresponding RUNTIME_FUNCTION entry in the .pdata section. Leaf functions (which don't call anything and don't change RSP) don't need entries because their stack frame is zero-sized.

Ctypedef struct _RUNTIME_FUNCTION {
    DWORD BeginAddress;   // RVA of function start
    DWORD EndAddress;     // RVA of function end (exclusive)
    DWORD UnwindData;     // RVA of UNWIND_INFO structure
} RUNTIME_FUNCTION, *PRUNTIME_FUNCTION;

Field Details

FieldSizeDescription
BeginAddress4 bytesRelative Virtual Address of the first instruction of the function
EndAddress4 bytesRVA of the byte after the last instruction (exclusive end)
UnwindData4 bytesRVA of the UNWIND_INFO that describes this function's prolog

Given a return address captured during stack walking, the unwinder calls RtlLookupFunctionEntry to search the .pdata section for a RUNTIME_FUNCTION whose [BeginAddress, EndAddress) range contains that address. This is a binary search — the .pdata entries are sorted by BeginAddress.

C// Given a return address (RIP), find its RUNTIME_FUNCTION
PRUNTIME_FUNCTION RtlLookupFunctionEntry(
    DWORD64                ControlPc,    // The return address to look up
    PDWORD64               ImageBase,    // Output: base address of the module
    PUNWIND_HISTORY_TABLE  HistoryTable  // Optional: cache for repeated lookups
);

// Returns NULL if the address is in a leaf function (no .pdata entry)
// or if the address is not within any loaded module

Important: What Happens When RtlLookupFunctionEntry Returns NULL

If a return address doesn't match any RUNTIME_FUNCTION (either because it's in a leaf function or in unbacked memory), the unwinder assumes the address at [RSP] is the return address and pops it. This is the leaf function convention. For Draugr, this is relevant because synthetic frames must ensure that when the unwinder looks up return addresses, it finds valid RUNTIME_FUNCTION entries for BaseThreadInitThunk and RtlUserThreadStart.

3. UNWIND_INFO and UNWIND_CODEs

The UNWIND_INFO structure describes what a function's prolog does. The prolog is the sequence of instructions at the start of a function that sets up the stack frame — saving registers, allocating local variable space, establishing a frame pointer.

Ctypedef struct _UNWIND_INFO {
    UBYTE Version       : 3;    // Must be 1 (or 2 for chained)
    UBYTE Flags         : 5;    // UNW_FLAG_EHANDLER, UNW_FLAG_UHANDLER, UNW_FLAG_CHAININFO
    UBYTE SizeOfProlog;         // Size of the function prolog in bytes
    UBYTE CountOfCodes;         // Number of UNWIND_CODE entries
    UBYTE FrameRegister : 4;    // Frame register (0 = no frame register)
    UBYTE FrameOffset   : 4;    // Scaled offset of frame register from RSP
    UNWIND_CODE UnwindCode[1];  // Variable-length array of unwind operations
    // Followed by optional exception handler data
} UNWIND_INFO, *PUNWIND_INFO;

The UnwindCode array contains one or more UNWIND_CODE entries, each describing a single prolog operation. The codes are stored in reverse order (last prolog instruction first) so the unwinder can process them in the order needed to reverse the prolog.

Ctypedef union _UNWIND_CODE {
    struct {
        UBYTE CodeOffset;    // Offset in prolog where this operation occurs
        UBYTE UnwindOp : 4;  // Operation type (UWOP_*)
        UBYTE OpInfo   : 4;  // Operation-specific data
    };
    USHORT FrameOffset;      // Used as a 16-bit value for some operations
} UNWIND_CODE, *PUNWIND_CODE;

UNWIND_CODE Operations

Each unwind code type describes a specific prolog instruction and its effect on the stack:

UWOP Types and Stack Effects

UWOP CodeValueProlog InstructionStack EffectExtra Slots
UWOP_PUSH_NONVOL0push rbx / push rdi / etc.+8 bytes (one QWORD)0
UWOP_ALLOC_LARGE1sub rsp, N (large)+N bytes1 or 2
UWOP_ALLOC_SMALL2sub rsp, N (8 to 128)+(OpInfo*8 + 8) bytes0
UWOP_SET_FPREG3lea rbp, [rsp+N]No stack size change0
UWOP_SAVE_NONVOL4mov [rsp+N], rbxNo stack size change (saves to existing space)1
UWOP_SAVE_NONVOL_FAR5mov [rsp+N], rbx (large offset)No stack size change2
UWOP_SAVE_XMM1288movaps [rsp+N], xmm0No stack size change1
UWOP_SAVE_XMM128_FAR9movaps [rsp+N], xmm0 (large offset)No stack size change2
UWOP_PUSH_MACHFRAME10Hardware interrupt frame+40 or +48 bytes0

The "Extra Slots" Column

Some UWOP codes consume additional UNWIND_CODE slots. For example, UWOP_ALLOC_LARGE with OpInfo=0 uses one extra slot (16-bit value: N/8 allocation), and with OpInfo=1 uses two extra slots (full 32-bit value). UWOP_SAVE_NONVOL uses one extra slot for the scaled offset. When iterating UNWIND_CODEs, Draugr must skip these extra slots correctly or the frame size calculation will be wrong.

Concrete Example

Consider a function with this prolog:

x86-64 ASM; Function prolog:
push rbx              ; Save non-volatile register  [UWOP_PUSH_NONVOL, reg=rbx]
push rdi              ; Save non-volatile register  [UWOP_PUSH_NONVOL, reg=rdi]
sub  rsp, 0x28        ; Allocate 40 bytes locals    [UWOP_ALLOC_SMALL, OpInfo=4]
                       ;   (OpInfo * 8 + 8 = 4*8+8 = 40 = 0x28)

; Total stack consumed = 8 (push rbx) + 8 (push rdi) + 40 (sub rsp) = 56 bytes
; Plus the 8-byte return address pushed by CALL = 64 bytes total frame

The UNWIND_INFO for this function would contain 3 UNWIND_CODEs (in reverse order):

UNWIND_CODEs (reverse order)[0] UWOP_ALLOC_SMALL   OpInfo=4   (sub rsp, 0x28)     +40 bytes
[1] UWOP_PUSH_NONVOL   OpInfo=7   (push rdi, reg=RDI)  +8 bytes
[2] UWOP_PUSH_NONVOL   OpInfo=3   (push rbx, reg=RBX)  +8 bytes

Total prolog stack consumption: 40 + 8 + 8 = 56 bytes
Frame size (including return addr): 56 + 8 = 64 bytes

4. RtlVirtualUnwind

RtlVirtualUnwind is the core Windows API that performs a single step of stack unwinding. Given the current PC (program counter/RIP) and a RUNTIME_FUNCTION, it reverses the prolog's effects to compute the caller's register state:

CPEXCEPTION_ROUTINE RtlVirtualUnwind(
    ULONG                          HandlerType,      // UNW_FLAG_NHANDLER usually
    DWORD64                        ImageBase,         // Module base address
    DWORD64                        ControlPc,         // Current RIP
    PRUNTIME_FUNCTION              FunctionEntry,     // From RtlLookupFunctionEntry
    PCONTEXT                       ContextRecord,     // In/out: register state
    PVOID                         *HandlerData,       // Output: exception handler data
    PDWORD64                       EstablisherFrame,  // Output: caller's RSP
    PKNONVOLATILE_CONTEXT_POINTERS ContextPointers    // Output: saved register locations
);

RtlVirtualUnwind: Step-by-Step

Input: current RIP
+ RUNTIME_FUNCTION
Read UNWIND_INFO
from UnwindData RVA
Process each
UNWIND_CODE
Reverse prolog:
restore RSP + regs
Output: caller's
RIP + RSP

For each UNWIND_CODE, RtlVirtualUnwind reverses the corresponding prolog operation:

Unwind Reversal Logic

UWOP CodeProlog EffectUnwind Reversal
UWOP_PUSH_NONVOLRSP decreased by 8Read saved reg from [RSP], RSP += 8
UWOP_ALLOC_SMALLRSP decreased by NRSP += (OpInfo * 8 + 8)
UWOP_ALLOC_LARGERSP decreased by NRSP += N (from extra slot data)
UWOP_SET_FPREGFrame register setRSP = FrameReg - FrameOffset * 16
UWOP_SAVE_NONVOLRegister saved to stackRead saved reg from [RSP + offset * 8]

After processing all codes, RSP points at the saved return address. Reading [RSP] gives the caller's RIP. The unwinder then increments RSP by 8 (to account for the return address) to get the caller's RSP value.

5. Chained Unwind Info

Some functions have complex prologs that can't be described by a single UNWIND_INFO. The UNW_FLAG_CHAININFO flag links one RUNTIME_FUNCTION to another, creating a chain of unwind data:

C// In the UNWIND_INFO for a function with chained info:
// Flags field contains UNW_FLAG_CHAININFO (0x04)
// After the UnwindCode array (padded to even count), there's another RUNTIME_FUNCTION

// Draugr handles this in DraugrCalculateStackSize:
// 1. Parse UNWIND_CODEs from the current UNWIND_INFO
// 2. Check if UNW_FLAG_CHAININFO is set
// 3. If yes, follow the chained RUNTIME_FUNCTION and repeat
// 4. Sum all stack sizes from the entire chain

When Chaining Occurs

Chained unwind info is used when:

For BaseThreadInitThunk and RtlUserThreadStart, chained info is rare but must be handled correctly. If Draugr encounters a chain, it recursively follows it to accumulate the total stack frame size.

6. Why Draugr Must Parse UNWIND_CODEs

The Critical Requirement: Exact Frame Sizes

To construct synthetic frames for BaseThreadInitThunk and RtlUserThreadStart, Draugr must know the exact stack frame size of each function. This means parsing every UNWIND_CODE for each function and summing the stack contributions. Here's why precision is non-negotiable:

When the stack walker processes a synthetic frame, it performs these steps:

  1. Read a return address from the stack (this is the address Draugr placed)
  2. Call RtlLookupFunctionEntry to find the RUNTIME_FUNCTION for that address
  3. Read the UNWIND_INFO and compute the frame size from UNWIND_CODEs
  4. Add the frame size to the current RSP to locate the next return address

Why Frame Size Must Be Exact

Correct Frame Size

RSP + frameSize = next return address
Unwinder finds RtlUserThreadStart
Stack walk terminates cleanly

Off By Even 1 Byte

RSP + frameSize = garbage
Unwinder reads wrong return address
Stack walk produces nonsense / crash
Stack Layout (Draugr's Synthetic Frames); After Draugr builds the synthetic stack:
;
; [RSP + 0]                    = return addr into syscall stub (ntdll)
; [RSP + frameSize_BaseThread] = addr inside BaseThreadInitThunk
; [RSP + frameSize_BaseThread
;      + frameSize_RtlUser]    = addr inside RtlUserThreadStart
; [RSP + ... + 8]              = 0x0 (stack walk terminator)
;
; If frameSize_BaseThread is wrong, the unwinder looks at the wrong
; offset for the RtlUserThreadStart return address, and the entire
; spoof collapses.

Version Sensitivity

The frame sizes for BaseThreadInitThunk and RtlUserThreadStart can change between Windows versions because Microsoft may modify their prologs. This is why Draugr parses the UNWIND_CODEs dynamically at runtime rather than hardcoding frame sizes. By reading the actual .pdata metadata from kernel32.dll and ntdll.dll, Draugr automatically adapts to whatever Windows version is running.

Module 3 Quiz: x64 Stack Unwinding

Q1: Why doesn't x64 Windows use RBP frame pointer chains for stack walking like x86 did?

The x64 ABI dropped the requirement for frame pointers to free up RBP as a general-purpose register. With only 16 GPRs, every register matters for optimization. Instead, the compiler emits metadata (RUNTIME_FUNCTION + UNWIND_INFO) in the .pdata section that describes each function's stack layout, allowing metadata-driven unwinding without runtime frame pointer chains.

Q2: What does UWOP_ALLOC_SMALL with OpInfo=4 encode?

UWOP_ALLOC_SMALL encodes stack allocations from 8 to 128 bytes. The formula is: allocation = OpInfo * 8 + 8. So OpInfo=4 means 4 * 8 + 8 = 40 bytes, which corresponds to sub rsp, 0x28. This is the standard 32-byte shadow space (0x20) plus 8 bytes of alignment, commonly seen in functions that call other functions.

Q3: Why does Draugr parse UNWIND_CODEs at runtime instead of hardcoding frame sizes?

Microsoft can modify the prologs of BaseThreadInitThunk and RtlUserThreadStart in any Windows update, changing their frame sizes. Hardcoding sizes would break on different Windows versions. By parsing the actual UNWIND_CODEs from the .pdata section at runtime, Draugr dynamically computes the correct frame sizes for whatever version of kernel32.dll and ntdll.dll is loaded, ensuring compatibility across all supported Windows builds.