Module 4: Stack Desynchronization Theory
Separating what the CPU actually does from what the stack says happened.
The Core Innovation
SilentMoonwalk's key insight is stack desynchronization: the physical stack layout seen by a stack walker does NOT have to reflect the actual execution path. By carefully constructing the stack, you can make the unwinder see call chain A → B → C → D while the actual execution path was completely different. This module explains the theory behind this separation.
Synchronized vs Desynchronized Stacks
In normal execution, the stack is synchronized — the return addresses on the stack directly correspond to the actual call chain. If main() calls foo() which calls bar(), the stack unwinder will see exactly that chain.
A desynchronized stack breaks this correspondence. The execution might follow path X → Y → Z, but the stack is constructed so the unwinder sees A → B → C. The two views are decoupled:
Synchronized vs Desynchronized
Synchronized (Normal)
Desynchronized (SilentMoonwalk)
Why Desynchronization Is Possible
Stack desynchronization is possible because of a fundamental property of the x64 unwinding mechanism: the unwinder is stateless. It doesn't track what functions were actually called. It only looks at:
- The current RIP (instruction pointer)
- The current RSP (stack pointer)
- The RUNTIME_FUNCTION / UNWIND_INFO for the function containing RIP
- The values at specific stack offsets (as dictated by the unwind codes)
If you arrange these four things to be internally consistent, the unwinder will happily produce a clean call chain — regardless of whether those functions were ever actually called.
The Unwinder's Trust Model
RtlVirtualUnwind trusts that the stack has not been tampered with. It assumes that if RIP is inside function F at offset X, then F's prologue has already executed, and the stack contains F's saved registers and local variables at the offsets described by F's UNWIND_INFO. SilentMoonwalk exploits this trust by placing values at exactly the right offsets.
The SilentMoonwalk Approach
SilentMoonwalk achieves desynchronization through a multi-step process. At a high level:
Step 1: Select Target Functions
Choose a set of legitimate functions from ntdll.dll or kernel32.dll whose call chain would be plausible. For example, a sleeping thread might reasonably show:
Textntdll!NtWaitForSingleObject
KERNELBASE!WaitForSingleObjectEx+0x8e
kernel32!SleepEx+0x63
kernel32!Sleep+0x9
SomeApp!WorkerThread+0x42
kernel32!BaseThreadInitThunk+0x14
ntdll!RtlUserThreadStart+0x21
Step 2: Compute Frame Sizes
For each target function, parse its RUNTIME_FUNCTION and UNWIND_INFO to determine the exact frame size. The frame size is the total amount of stack space allocated by the function's prologue (SUB RSP + pushed registers). Each frame must occupy exactly this many bytes on the spoofed stack.
C++// Computing frame size from UNWIND_INFO
// This determines how much stack space a function occupies
DWORD ComputeFrameSize(PUNWIND_INFO pUnwind) {
DWORD frameSize = 0;
for (UBYTE i = 0; i < pUnwind->CountOfCodes; i++) {
UNWIND_CODE code = pUnwind->UnwindCode[i];
switch (code.UnwindOp) {
case UWOP_PUSH_NONVOL: // 0
frameSize += 8; // Each push adds 8 bytes
break;
case UWOP_ALLOC_LARGE: // 1
if (code.OpInfo == 0) {
frameSize += pUnwind->UnwindCode[++i].FrameOffset * 8;
} else {
DWORD size = *(DWORD*)&pUnwind->UnwindCode[i+1];
frameSize += size;
i += 2;
}
break;
case UWOP_ALLOC_SMALL: // 2
frameSize += (code.OpInfo * 8) + 8;
break;
// UWOP_SET_FPREG, UWOP_SAVE_NONVOL, etc. handled similarly
}
}
return frameSize; // Total bytes between RSP and return address
}
Step 3: Construct the Fake Stack
Lay out the spoofed frames contiguously on the stack. Each frame has the exact size computed from the target function's unwind codes. At the top of each frame (from the unwinder's perspective), place the return address pointing into the next target function in the chain.
Synthetic Stack Layout
Size: matches UNWIND_INFO of NtWaitForSingleObject
Return addr at computed offset → points into WaitForSingleObjectEx
Size: matches UNWIND_INFO of WaitForSingleObjectEx
Return addr → points into SleepEx
Size: matches UNWIND_INFO of SleepEx
Return addr → points into BaseThreadInitThunk
Size: matches UNWIND_INFO
Return addr → points into RtlUserThreadStart
Unwinder stops here.
Step 4: Use ROP to Actually Execute
The tricky part: this same stack layout must also function as a ROP chain for actual execution. SilentMoonwalk places gadget addresses at specific positions within each frame so that during execution, RSP advances through the frames via ADD RSP, N; RET gadgets, while from the unwinder's perspective, those same positions look like legitimate frame contents.
The Frame Size Problem
The hardest constraint in stack desynchronization is frame size matching. Consider this scenario:
TextSleepEx's UNWIND_INFO says:
UWOP_ALLOC_SMALL: 0x28 bytes (OpInfo=4, so (4*8)+8 = 40 = 0x28)
UWOP_PUSH_NONVOL: RBX (adds 8 bytes)
UWOP_PUSH_NONVOL: RSI (adds 8 bytes)
Total frame size: 0x28 + 8 + 8 = 0x38 bytes (56 bytes)
Plus 8 bytes for return address = 0x40 (64 bytes) total between caller RSP and callee RSP
This means in our synthetic stack, the "SleepEx frame" must occupy
exactly 0x40 bytes. Not 0x38, not 0x48. Exactly 0x40.
The Non-Volatile Register Constraint
It's not enough to just get the frame size right. If the unwind codes include UWOP_PUSH_NONVOL for RBX and RSI, the unwinder will read values from those stack positions and restore them into the register context. If those positions contain garbage or obviously wrong values (like 0xDEADBEEF), a sophisticated analyzer might flag the frame as synthetic. SilentMoonwalk must place plausible values for saved registers.
Execution Flow vs Unwind Flow
The key to understanding SilentMoonwalk is recognizing that execution and unwinding traverse the stack in opposite ways:
| Property | Execution (ROP chain) | Unwinding (RtlVirtualUnwind) |
|---|---|---|
| Direction | RSP increases (moves toward stack bottom) | RSP increases (reverses prologue) |
| Driven by | RET instructions popping addresses | UNWIND_CODE processing |
| What it reads | Gadget addresses from the stack | Return addresses + saved registers |
| When it happens | During actual code execution | When EDR/OS walks the stack |
| State tracking | CPU registers (RSP, RIP, etc.) | CONTEXT structure (simulated) |
SilentMoonwalk's genius is making both traversals work simultaneously on the same physical memory layout.
Comparison with Prior Approaches
ThreadStackSpoofer (Gen 1)
ThreadStackSpoofer by mgeeky simply overwrites the return address with NULL or a legitimate address before sleeping, then restores it after waking. This is a single-frame manipulation — it only modifies one return address. The unwinder hits the modified frame and either stops (NULL) or tries to unwind the spoofed function (usually failing because frame sizes don't match).
CallStackSpoofingPOC (Gen 2)
CallStackSpoofingPOC by pard0p uses a single ROP gadget (typically ADD RSP, 0x?? ; RET) to bridge between the real and fake parts of the stack. It constructs one fake frame. However, it still only spoofs one or two frames, not the entire chain.
SilentMoonwalk (Gen 3)
SilentMoonwalk constructs the entire call chain — every frame from the current position all the way back to RtlUserThreadStart. Each frame has correct size, plausible saved register values, and a return address pointing to a real instruction inside the target function. This is what makes it "fully dynamic" — it generates the spoofed stack on the fly based on the actual UNWIND_INFO of target functions.
Dynamic vs Static Spoofing
Earlier tools used hardcoded offsets and specific function addresses, making them brittle across Windows versions. SilentMoonwalk dynamically parses the .pdata section at runtime, computing frame sizes from the actual UNWIND_INFO structures. This means it works across Windows versions without needing to update offsets — as long as the target functions exist and have valid unwind data.
Pop Quiz: Desynchronization Theory
Q1: Why can the unwinder be fooled by a synthetic stack?
Q2: What is the "frame size problem" in stack desynchronization?
Q3: How does SilentMoonwalk differ from ThreadStackSpoofer?