Module 3: x64 Stack Unwinding
How Windows navigates the call stack without frame pointers — and why Draugr must speak the same language.
Module Objective
Draugr constructs synthetic stack frames that must fool Windows' stack unwinder. To build convincing fakes, you need to understand the real thing: RUNTIME_FUNCTION, UNWIND_INFO, UNWIND_CODEs, and RtlVirtualUnwind. This module covers the entire x64 unwind mechanism from first principles.
1. The Frame Pointer Problem
In 32-bit x86, stack walking was simple. Every function established a frame pointer by saving EBP, then setting EBP = ESP. To walk the stack, you simply followed the chain of saved EBP values — each one pointed to the previous frame's EBP, creating a linked list up the call stack:
x86 ASM; x86 standard function prolog:
push ebp ; Save caller's frame pointer
mov ebp, esp ; Establish new frame pointer
sub esp, 0x20 ; Allocate local variables
; Stack walking: follow EBP chain
; [EBP] = previous EBP (caller's frame pointer)
; [EBP + 4] = return address (where to resume after RET)
x64 changed everything. The Microsoft x64 ABI does not require frame pointers. Most functions do not save or use RBP as a frame pointer. The reason: RBP is freed up as a general-purpose register, giving the compiler an extra register for optimization. On x64 with only 16 GPRs, that extra register matters.
The Consequence: No RBP Chain to Follow
Without mandatory frame pointers, the classic EBP-chain walking technique is useless on x64. Windows needed a completely different mechanism to walk the stack. The solution: metadata-driven unwinding. Instead of following a linked list at runtime, the unwinder reads compile-time metadata that describes exactly what each function's prolog does to the stack.
2. RUNTIME_FUNCTION
Every non-leaf function (a function that calls other functions or modifies RSP) in a PE image has a corresponding RUNTIME_FUNCTION entry in the .pdata section. Leaf functions (which don't call anything and don't change RSP) don't need entries because their stack frame is zero-sized.
Ctypedef struct _RUNTIME_FUNCTION {
DWORD BeginAddress; // RVA of function start
DWORD EndAddress; // RVA of function end (exclusive)
DWORD UnwindData; // RVA of UNWIND_INFO structure
} RUNTIME_FUNCTION, *PRUNTIME_FUNCTION;
Field Details
| Field | Size | Description |
|---|---|---|
BeginAddress | 4 bytes | Relative Virtual Address of the first instruction of the function |
EndAddress | 4 bytes | RVA of the byte after the last instruction (exclusive end) |
UnwindData | 4 bytes | RVA of the UNWIND_INFO that describes this function's prolog |
Given a return address captured during stack walking, the unwinder calls RtlLookupFunctionEntry to search the .pdata section for a RUNTIME_FUNCTION whose [BeginAddress, EndAddress) range contains that address. This is a binary search — the .pdata entries are sorted by BeginAddress.
C// Given a return address (RIP), find its RUNTIME_FUNCTION
PRUNTIME_FUNCTION RtlLookupFunctionEntry(
DWORD64 ControlPc, // The return address to look up
PDWORD64 ImageBase, // Output: base address of the module
PUNWIND_HISTORY_TABLE HistoryTable // Optional: cache for repeated lookups
);
// Returns NULL if the address is in a leaf function (no .pdata entry)
// or if the address is not within any loaded module
Important: What Happens When RtlLookupFunctionEntry Returns NULL
If a return address doesn't match any RUNTIME_FUNCTION (either because it's in a leaf function or in unbacked memory), the unwinder assumes the address at [RSP] is the return address and pops it. This is the leaf function convention. For Draugr, this is relevant because synthetic frames must ensure that when the unwinder looks up return addresses, it finds valid RUNTIME_FUNCTION entries for BaseThreadInitThunk and RtlUserThreadStart.
3. UNWIND_INFO and UNWIND_CODEs
The UNWIND_INFO structure describes what a function's prolog does. The prolog is the sequence of instructions at the start of a function that sets up the stack frame — saving registers, allocating local variable space, establishing a frame pointer.
Ctypedef struct _UNWIND_INFO {
UBYTE Version : 3; // Must be 1 (or 2 for chained)
UBYTE Flags : 5; // UNW_FLAG_EHANDLER, UNW_FLAG_UHANDLER, UNW_FLAG_CHAININFO
UBYTE SizeOfProlog; // Size of the function prolog in bytes
UBYTE CountOfCodes; // Number of UNWIND_CODE entries
UBYTE FrameRegister : 4; // Frame register (0 = no frame register)
UBYTE FrameOffset : 4; // Scaled offset of frame register from RSP
UNWIND_CODE UnwindCode[1]; // Variable-length array of unwind operations
// Followed by optional exception handler data
} UNWIND_INFO, *PUNWIND_INFO;
The UnwindCode array contains one or more UNWIND_CODE entries, each describing a single prolog operation. The codes are stored in reverse order (last prolog instruction first) so the unwinder can process them in the order needed to reverse the prolog.
Ctypedef union _UNWIND_CODE {
struct {
UBYTE CodeOffset; // Offset in prolog where this operation occurs
UBYTE UnwindOp : 4; // Operation type (UWOP_*)
UBYTE OpInfo : 4; // Operation-specific data
};
USHORT FrameOffset; // Used as a 16-bit value for some operations
} UNWIND_CODE, *PUNWIND_CODE;
UNWIND_CODE Operations
Each unwind code type describes a specific prolog instruction and its effect on the stack:
UWOP Types and Stack Effects
| UWOP Code | Value | Prolog Instruction | Stack Effect | Extra Slots |
|---|---|---|---|---|
UWOP_PUSH_NONVOL | 0 | push rbx / push rdi / etc. | +8 bytes (one QWORD) | 0 |
UWOP_ALLOC_LARGE | 1 | sub rsp, N (large) | +N bytes | 1 or 2 |
UWOP_ALLOC_SMALL | 2 | sub rsp, N (8 to 128) | +(OpInfo*8 + 8) bytes | 0 |
UWOP_SET_FPREG | 3 | lea rbp, [rsp+N] | No stack size change | 0 |
UWOP_SAVE_NONVOL | 4 | mov [rsp+N], rbx | No stack size change (saves to existing space) | 1 |
UWOP_SAVE_NONVOL_FAR | 5 | mov [rsp+N], rbx (large offset) | No stack size change | 2 |
UWOP_SAVE_XMM128 | 8 | movaps [rsp+N], xmm0 | No stack size change | 1 |
UWOP_SAVE_XMM128_FAR | 9 | movaps [rsp+N], xmm0 (large offset) | No stack size change | 2 |
UWOP_PUSH_MACHFRAME | 10 | Hardware interrupt frame | +40 or +48 bytes | 0 |
The "Extra Slots" Column
Some UWOP codes consume additional UNWIND_CODE slots. For example, UWOP_ALLOC_LARGE with OpInfo=0 uses one extra slot (16-bit value: N/8 allocation), and with OpInfo=1 uses two extra slots (full 32-bit value). UWOP_SAVE_NONVOL uses one extra slot for the scaled offset. When iterating UNWIND_CODEs, Draugr must skip these extra slots correctly or the frame size calculation will be wrong.
Concrete Example
Consider a function with this prolog:
x86-64 ASM; Function prolog:
push rbx ; Save non-volatile register [UWOP_PUSH_NONVOL, reg=rbx]
push rdi ; Save non-volatile register [UWOP_PUSH_NONVOL, reg=rdi]
sub rsp, 0x28 ; Allocate 40 bytes locals [UWOP_ALLOC_SMALL, OpInfo=4]
; (OpInfo * 8 + 8 = 4*8+8 = 40 = 0x28)
; Total stack consumed = 8 (push rbx) + 8 (push rdi) + 40 (sub rsp) = 56 bytes
; Plus the 8-byte return address pushed by CALL = 64 bytes total frame
The UNWIND_INFO for this function would contain 3 UNWIND_CODEs (in reverse order):
UNWIND_CODEs (reverse order)[0] UWOP_ALLOC_SMALL OpInfo=4 (sub rsp, 0x28) +40 bytes
[1] UWOP_PUSH_NONVOL OpInfo=7 (push rdi, reg=RDI) +8 bytes
[2] UWOP_PUSH_NONVOL OpInfo=3 (push rbx, reg=RBX) +8 bytes
Total prolog stack consumption: 40 + 8 + 8 = 56 bytes
Frame size (including return addr): 56 + 8 = 64 bytes
4. RtlVirtualUnwind
RtlVirtualUnwind is the core Windows API that performs a single step of stack unwinding. Given the current PC (program counter/RIP) and a RUNTIME_FUNCTION, it reverses the prolog's effects to compute the caller's register state:
CPEXCEPTION_ROUTINE RtlVirtualUnwind(
ULONG HandlerType, // UNW_FLAG_NHANDLER usually
DWORD64 ImageBase, // Module base address
DWORD64 ControlPc, // Current RIP
PRUNTIME_FUNCTION FunctionEntry, // From RtlLookupFunctionEntry
PCONTEXT ContextRecord, // In/out: register state
PVOID *HandlerData, // Output: exception handler data
PDWORD64 EstablisherFrame, // Output: caller's RSP
PKNONVOLATILE_CONTEXT_POINTERS ContextPointers // Output: saved register locations
);
RtlVirtualUnwind: Step-by-Step
+ RUNTIME_FUNCTION
from UnwindData RVA
UNWIND_CODE
restore RSP + regs
RIP + RSP
For each UNWIND_CODE, RtlVirtualUnwind reverses the corresponding prolog operation:
Unwind Reversal Logic
| UWOP Code | Prolog Effect | Unwind Reversal |
|---|---|---|
UWOP_PUSH_NONVOL | RSP decreased by 8 | Read saved reg from [RSP], RSP += 8 |
UWOP_ALLOC_SMALL | RSP decreased by N | RSP += (OpInfo * 8 + 8) |
UWOP_ALLOC_LARGE | RSP decreased by N | RSP += N (from extra slot data) |
UWOP_SET_FPREG | Frame register set | RSP = FrameReg - FrameOffset * 16 |
UWOP_SAVE_NONVOL | Register saved to stack | Read saved reg from [RSP + offset * 8] |
After processing all codes, RSP points at the saved return address. Reading [RSP] gives the caller's RIP. The unwinder then increments RSP by 8 (to account for the return address) to get the caller's RSP value.
5. Chained Unwind Info
Some functions have complex prologs that can't be described by a single UNWIND_INFO. The UNW_FLAG_CHAININFO flag links one RUNTIME_FUNCTION to another, creating a chain of unwind data:
C// In the UNWIND_INFO for a function with chained info:
// Flags field contains UNW_FLAG_CHAININFO (0x04)
// After the UnwindCode array (padded to even count), there's another RUNTIME_FUNCTION
// Draugr handles this in DraugrCalculateStackSize:
// 1. Parse UNWIND_CODEs from the current UNWIND_INFO
// 2. Check if UNW_FLAG_CHAININFO is set
// 3. If yes, follow the chained RUNTIME_FUNCTION and repeat
// 4. Sum all stack sizes from the entire chain
When Chaining Occurs
Chained unwind info is used when:
- A function has a hot/cold code split (common in optimized builds) where parts of the function are moved to different locations
- The compiler generates a multi-part prolog that exceeds the 255-code limit of a single UNWIND_INFO
- The function uses dynamic stack allocation (
alloca) with complex frame structures
For BaseThreadInitThunk and RtlUserThreadStart, chained info is rare but must be handled correctly. If Draugr encounters a chain, it recursively follows it to accumulate the total stack frame size.
6. Why Draugr Must Parse UNWIND_CODEs
The Critical Requirement: Exact Frame Sizes
To construct synthetic frames for BaseThreadInitThunk and RtlUserThreadStart, Draugr must know the exact stack frame size of each function. This means parsing every UNWIND_CODE for each function and summing the stack contributions. Here's why precision is non-negotiable:
When the stack walker processes a synthetic frame, it performs these steps:
- Read a return address from the stack (this is the address Draugr placed)
- Call
RtlLookupFunctionEntryto find the RUNTIME_FUNCTION for that address - Read the UNWIND_INFO and compute the frame size from UNWIND_CODEs
- Add the frame size to the current RSP to locate the next return address
Why Frame Size Must Be Exact
Correct Frame Size
Off By Even 1 Byte
Stack Layout (Draugr's Synthetic Frames); After Draugr builds the synthetic stack:
;
; [RSP + 0] = return addr into syscall stub (ntdll)
; [RSP + frameSize_BaseThread] = addr inside BaseThreadInitThunk
; [RSP + frameSize_BaseThread
; + frameSize_RtlUser] = addr inside RtlUserThreadStart
; [RSP + ... + 8] = 0x0 (stack walk terminator)
;
; If frameSize_BaseThread is wrong, the unwinder looks at the wrong
; offset for the RtlUserThreadStart return address, and the entire
; spoof collapses.
Version Sensitivity
The frame sizes for BaseThreadInitThunk and RtlUserThreadStart can change between Windows versions because Microsoft may modify their prologs. This is why Draugr parses the UNWIND_CODEs dynamically at runtime rather than hardcoding frame sizes. By reading the actual .pdata metadata from kernel32.dll and ntdll.dll, Draugr automatically adapts to whatever Windows version is running.
Module 3 Quiz: x64 Stack Unwinding
Q1: Why doesn't x64 Windows use RBP frame pointer chains for stack walking like x86 did?
Q2: What does UWOP_ALLOC_SMALL with OpInfo=4 encode?
sub rsp, 0x28. This is the standard 32-byte shadow space (0x20) plus 8 bytes of alignment, commonly seen in functions that call other functions.Q3: Why does Draugr parse UNWIND_CODEs at runtime instead of hardcoding frame sizes?