Module 7: Gadgets, Fixup & Clean Return
After the syscall, a two-byte gadget in kernelbase.dll redirects execution to a cleanup routine that restores the original world.
Module Overview
The synthetic stack exists only while the syscall is in flight. Once the kernel returns to user mode, Draugr must dismantle the synthetic stack, restore all registers, and return to the original caller as if nothing happened. This module covers the three components that make this possible: the JMP [RBX] gadget, the Fixup assembly routine, and how the return value flows back seamlessly.
The JMP [RBX] Gadget
DraugrFindGadget scans the .text section of kernelbase.dll looking for the byte sequence 0xFF 0x23. This two-byte sequence encodes the instruction jmp qword ptr [rbx].
C - DraugrFindGadgetULONG_PTR DraugrFindGadget(ULONG_PTR kernelbaseBase) {
// Parse the PE headers to find the .text section
PIMAGE_DOS_HEADER pDos = (PIMAGE_DOS_HEADER)kernelbaseBase;
PIMAGE_NT_HEADERS pNt = (PIMAGE_NT_HEADERS)(kernelbaseBase + pDos->e_lfanew);
PIMAGE_SECTION_HEADER pSec = IMAGE_FIRST_SECTION(pNt);
// Find .text section
for (WORD i = 0; i < pNt->FileHeader.NumberOfSections; i++) {
if (memcmp(pSec[i].Name, ".text", 5) == 0) {
ULONG_PTR start = kernelbaseBase + pSec[i].VirtualAddress;
DWORD size = pSec[i].Misc.VirtualSize;
// Scan for 0xFF 0x23 (jmp qword ptr [rbx])
for (DWORD j = 0; j < size - 1; j++) {
if (*(BYTE*)(start + j) == 0xFF &&
*(BYTE*)(start + j + 1) == 0x23)
{
return start + j; // Found the gadget
}
}
}
}
return 0; // Gadget not found
}
Why This Gadget Is in kernelbase.dll
kernelbase.dll is a large, Microsoft-signed system DLL that is loaded into every process. Its .text section is hundreds of kilobytes of executable code. Within that much code, the byte sequence 0xFF 0x23 is virtually guaranteed to appear — possibly as an intentional indirect jump, or as a coincidental byte alignment within a longer instruction. Because the gadget resides in a signed, legitimate DLL, any EDR that checks return addresses against known modules will see a valid code address in kernelbase.dll.
Why JMP [RBX]?
The choice of JMP [RBX] is not arbitrary. It depends on a fundamental property of the x64 calling convention and the behavior of the syscall instruction.
RBX Is Callee-Saved (Non-Volatile)
In the Windows x64 ABI, RBX is a non-volatile register. Any function — including kernel-mode syscall handlers — must preserve RBX across the call. This means that after the syscall executes and returns to user mode, RBX still points to the PRM structure, exactly where Draugr set it before the syscall.
The Dereference Chain
| Step | State | Value |
|---|---|---|
| Before syscall | RBX = &PRM | Set by Spoof routine: mov rbx, rcx |
| During syscall | RBX = &PRM | Preserved (non-volatile) |
| After RET | RIP = gadget | Popped from RSP (top of synthetic stack) |
| Gadget executes | JMP [RBX] | Reads [&PRM + 0x00] = Fixup address |
| Result | RIP = Fixup | Execution enters the cleanup routine |
Alternative Gadgets
Other callee-saved registers (RDI, RSI, R12–R15) could theoretically be used for similar gadgets (e.g., JMP [RDI], JMP [R12]). However, JMP [RBX] is the most common two-byte encoding (0xFF 0x23). Other register-based indirect jumps use three-byte encodings with REX prefixes, making them harder to find reliably across Windows versions. RBX gadgets are nearly universal in large DLLs.
The Syscall Return Flow
Here is the complete step-by-step sequence from the moment the syscall returns from kernel mode to the moment execution reaches the Fixup routine.
Return Path: Kernel to Fixup
sysret to user mode
pop RIP from RSP
jmp [rbx]
read PRM.Fixup
cleanup & restore
Detailed Sequence
- Kernel completes: The syscall handler finishes (e.g., NtAllocateVirtualMemory allocates memory). The return value is in RAX. The CPU transitions back to user mode via
sysret. - Execution resumes at the
retin ntdll: Thesyscallinstruction in the ntdll stub is followed byret. Theretinstruction pops the top of the stack (RSP) into RIP. - RIP = gadget address: The top of the synthetic stack held the
JMP [RBX]gadget address. RSP now points into the BaseThreadInitThunk synthetic frame (one slot down from where it was). - Gadget executes: The CPU is now executing the
jmp qword ptr [rbx]instruction inside kernelbase.dll. It reads the 8-byte value at the address stored in RBX. - RBX dereference: RBX points to
&PRM. Offset 0x00 of the PRM structure is theFixupaddress. The CPU jumps there. - Fixup begins: Execution is now in the Fixup routine with RAX still holding the syscall return value and RBX still pointing to the PRM.
The Fixup Routine
The Fixup routine has three jobs: deallocate the entire synthetic stack, restore all non-volatile registers from the PRM, and jump back to the original caller.
ASM - Fixup routine (from Stub.s)Fixup:
; ---- Deallocate the entire synthetic stack ----
; RSP currently points into the BaseThreadInitThunk frame
; We need to unwind: gadget(8) + BaseThreadInitThunk_Size
; + RtlUserThreadStart_Size + 0x200 + NULL(8)
; Plus any stack arguments that were copied
mov r12, [rbx + 0x40] ; BaseThreadInitThunk frame size
mov r13, [rbx + 0x48] ; RtlUserThreadStart frame size
; Total synthetic allocation:
; BaseThreadInitThunk_Size + RtlUserThreadStart_Size + 0x200 + 8 (NULL)
; + 8 (gadget was already popped by RET) + stack_args
add rsp, r12 ; Skip BaseThreadInitThunk frame
add rsp, r13 ; Skip RtlUserThreadStart frame
add rsp, 0x200 ; Skip reserved buffer
add rsp, 8 ; Skip NULL terminator
; ---- Restore non-volatile registers from PRM ----
mov rdi, [rbx + 0x10] ; Restore RDI
mov rsi, [rbx + 0x18] ; Restore RSI
mov r12, [rbx + 0x20] ; Restore R12
mov r13, [rbx + 0x28] ; Restore R13
mov r14, [rbx + 0x30] ; Restore R14
mov r15, [rbx + 0x38] ; Restore R15
; ---- Return to the original caller ----
; RAX still holds the syscall return value (NTSTATUS)
; PRM.OG_retaddr holds the return address saved at Spoof entry
jmp qword ptr [rbx + 0x08] ; Jump to original return address
Why JMP Instead of RET?
The Fixup routine uses jmp [rbx + 0x08] instead of ret to return to the caller. After deallocating the synthetic stack, RSP is restored to its original position — but the original return address was saved in the PRM structure (at OG_retaddr), not on the stack. Using JMP to the saved address avoids any dependency on the stack state and ensures the return is clean regardless of how many stack manipulations occurred.
State After Fixup Completes
| Component | State |
|---|---|
| RAX | Syscall return value (NTSTATUS) — untouched throughout |
| RSP | Restored to pre-Spoof position |
| RDI, RSI, R12–R15 | Restored from PRM saved values |
| RBX | Points to PRM (will be restored by the C compiler's epilogue) |
| RIP | Back at the original call site (the instruction after the DRAUGR_SYSCALL macro) |
| Synthetic stack | Fully deallocated — no trace remains |
Fixup Must Be Exact
The add rsp instructions in Fixup must deallocate exactly the number of bytes that Spoof allocated. If the deallocation is off by even one byte, RSP will be misaligned. On x64 Windows, the stack must be 16-byte aligned before any CALL instruction. A misaligned RSP causes MOVAPS instructions (used by the C runtime and many Windows APIs) to fault with an alignment exception. This is one of the most common bugs when implementing stack spoofing — and one of the hardest to debug because the crash occurs far from the actual mistake.
Why This Return Mechanism Is Elegant
No Exception Handlers Needed
Unlike WithSecure's approach (which uses a Vectored Exception Handler to catch a crash at the fake return address), Draugr never crashes. The gadget redirects execution cleanly to the Fixup routine. No VEH registration, no exception dispatch overhead, no risk of interfering with the application's own exception handlers.
Native Execution Speed
The return path is three instructions: RET (pop gadget), JMP [RBX] (to Fixup), then Fixup's register restores and JMP. There are no timer callbacks, no thread pool work items, no context switches. The entire return happens at full CPU speed with zero overhead beyond the instructions themselves.
Return Value Preserved in RAX
Because the gadget and Fixup routine never touch RAX, the syscall's return value (typically an NTSTATUS code) flows back to the caller untouched. The calling C code can immediately check the return value: if (NT_SUCCESS(status)) { ... }. Other approaches (like timer-based restoration) struggle to return values because the syscall happens in a different execution context.
Stack Perfectly Restored
After Fixup, the stack pointer, all callee-saved registers, and the return address are exactly as they were before the Spoof routine was called. The C compiler-generated code around the DRAUGR_SYSCALL macro continues executing without any awareness that the stack was ever manipulated.
Invisible to Stack Walkers During Execution
While the syscall is executing in kernel mode, any stack walker (including EDR kernel callbacks) that inspects the thread's stack sees only the synthetic frames. The real return address is safely hidden inside the PRM structure in user-mode memory, not on the stack at all. The stack contains only legitimate-looking frames pointing into signed Microsoft DLLs. Once Fixup completes, the synthetic stack is fully deallocated and the PRM is just a local variable — no forensic trace of the spoofing remains on the stack.
Contrast: Exception-Based Approaches Leave Traces
Approaches that rely on exception handlers (like WithSecure's VEH-based method) generate observable artifacts. The exception dispatch mechanism writes to the exception record chain, fires ETW events for the exception, and requires a registered VEH callback. Even after recovery, the exception metadata persists in system telemetry. Draugr's gadget-based approach generates no exceptions, no ETW exception events, and requires no handler registration. The entire operation completes as a series of normal memory reads, writes, and jumps.
Comparison: Return Mechanisms
Draugr: Gadget-Based Fixup
- Mechanism:
JMP [RBX]gadget → Fixup routine - Speed: Native — 3 instructions on the return path
- Return value: Preserved in RAX, immediately available
- Side effects: None — no exceptions, no callbacks
- Complexity: Requires gadget scanning and PRM structure
WithSecure: VEH Crash Recovery
- Mechanism: Fake return address causes ACCESS_VIOLATION; VEH handler catches it and restores context
- Speed: Slower — exception dispatch is expensive (kernel roundtrip)
- Return value: Difficult to extract (saved in CONTEXT structure)
- Side effects: Generates exceptions visible to debuggers and ETW
- Complexity: Requires VEH registration and context manipulation
Timer-Based Restoration
- Mechanism: Thread pool callback restores the original stack after a sleep period
- Speed: Non-deterministic — depends on timer resolution
- Return value: Cannot retrieve — the syscall and restoration happen in different contexts
- Side effects: Thread pool activity is logged; timing-dependent bugs
- Complexity: Simpler setup but unreliable return flow
ThreadStackSpoofer: Overwrite During Sleep
- Mechanism: Overwrites return addresses on the real stack during Beacon sleep
- Speed: N/A — only active during sleep, not during API calls
- Return value: N/A — does not intercept individual syscalls
- Side effects: Stack is only spoofed while sleeping; real stack visible during execution
- Complexity: Simple but limited scope
Module 7 Quiz: Gadgets, Fixup & Clean Return
Q1: Why does Draugr use a JMP [RBX] gadget found in kernelbase.dll rather than jumping directly to the Fixup routine?
Q2: After the Fixup routine completes, what is the state of the syscall's return value?