Module 8: Detection, Limitations & Countermeasures
No technique is invincible. Understanding the defenses shapes the next generation of offense.
The Arms Race
SilentMoonwalk represented a major leap in call stack spoofing, but defenders have not stood still. This module examines the hardware and software mechanisms that can detect or prevent stack spoofing, the inherent limitations of SilentMoonwalk's approach, and how successor tools like Draugr and Unwinder evolved the technique further.
Hardware Countermeasures
Intel CET: Control-Flow Enforcement Technology
Intel CET introduces a shadow stack — a second, hardware-protected stack that stores only return addresses. The CPU automatically pushes the return address to both the regular stack and the shadow stack on every CALL, and validates that both match on every RET:
TextNormal execution with CET enabled:
CALL target_function:
1. Push return address to regular stack (RSP)
2. Push return address to shadow stack (SSP) <-- hardware-managed
3. Jump to target_function
RET:
1. Pop return address from regular stack
2. Pop return address from shadow stack
3. Compare both addresses
4. If they MATCH: continue normally
5. If they DIFFER: #CP exception (Control Protection fault)
Impact on SilentMoonwalk:
- JMP [RBX] doesn't push to either stack --> no shadow stack entry
- When the target API executes RET, the shadow stack has the REAL
return address, but the regular stack has the SPOOFED address
- MISMATCH --> #CP fault --> process crashes or is flagged
CET Adoption Status
Intel CET (specifically the shadow stack component) is supported in hardware starting from Intel 11th gen (Tiger Lake) and AMD Zen 3 processors. Windows 10 20H1+ and Windows 11 support it as Hardware-enforced Stack Protection (also known as CET Shadow Stacks). However, adoption is gradual: it must be enabled per-process via the /CETCOMPAT linker flag, and many applications and DLLs are not yet CET-compatible. As of late 2024, most enterprise software still runs without CET enforcement, but the trend is toward broader adoption.
ARM Pointer Authentication (PAC)
On ARM64 systems (including Windows on ARM), Pointer Authentication Codes provide a similar defense. Return addresses are cryptographically signed before being pushed to the stack and verified before use. Modifying a signed pointer invalidates the PAC, causing a fault. While not directly relevant to x64 SilentMoonwalk, this represents the platform direction.
Software Detection Strategies
1. Return Address Validation
EDRs can validate that each return address on the stack points to an instruction immediately following a CALL instruction. Legitimate return addresses are always the instruction after a CALL. SilentMoonwalk's return addresses point to arbitrary offsets within functions (typically just after the prologue):
C++// Detection: verify each return address follows a CALL instruction
bool IsReturnAddressAfterCall(PVOID retAddr) {
// A CALL instruction on x64 can be:
// E8 xx xx xx xx (CALL rel32) - 5 bytes
// FF 15 xx xx xx xx (CALL [rip+xx]) - 6 bytes
// FF Dx (CALL reg) - 2 bytes
// 41 FF Dx (CALL r8-r15) - 3 bytes
PBYTE addr = (PBYTE)retAddr;
// Check for CALL rel32 (most common)
if (addr[-5] == 0xE8)
return true;
// Check for CALL [rip+disp32]
if (addr[-6] == 0xFF && (addr[-5] & 0x38) == 0x10)
return true;
// Check for CALL reg (2-byte)
if (addr[-2] == 0xFF && (addr[-1] & 0xF8) == 0xD0)
return true;
// Check for CALL r8-r15 (3-byte with REX)
if (addr[-3] == 0x41 && addr[-2] == 0xFF && (addr[-1] & 0xF8) == 0xD0)
return true;
// No CALL found before this return address!
// This could indicate a spoofed return address.
return false;
}
Evasion: Choosing Post-CALL Return Addresses
SilentMoonwalk can counter this detection by specifically selecting return addresses that follow actual CALL instructions in the target function. Instead of pointing to an arbitrary post-prologue offset, the spoofer scans the target function for CALL instructions and uses the address immediately after a CALL as the return address. This makes the spoofed address indistinguishable from a real return point.
2. Call Chain Semantic Validation
A more sophisticated detection validates that the call chain is semantically plausible. For each pair of adjacent frames (caller, callee), the EDR verifies that the caller function actually contains a CALL instruction that could target the callee:
C++// Detection: verify caller actually calls callee
bool ValidateCallChainEdge(PVOID callerRetAddr, PVOID calleeFunc) {
// The CALL instruction is at (callerRetAddr - sizeof(CALL))
PBYTE callSite = (PBYTE)callerRetAddr - 5; // assuming CALL rel32
if (callSite[0] == 0xE8) {
// Decode the relative target
INT32 relTarget = *(INT32*)(callSite + 1);
PVOID callTarget = (PVOID)(callSite + 5 + relTarget);
// Does this CALL actually target the callee function?
// (or a function that eventually calls it)
if (callTarget == calleeFunc)
return true;
// Could also be an indirect call - harder to validate
}
return false; // Suspicious: caller doesn't appear to call callee
}
3. Stack Region Validation
EDRs can verify that RSP values in the unwind chain fall within the thread's actual stack bounds (stored in the TEB):
C++// Detection: verify stack pointers are within thread stack bounds
bool IsRspInThreadStack(DWORD64 rsp) {
// Get Thread Environment Block
PNT_TIB pTib = (PNT_TIB)NtCurrentTeb();
DWORD64 stackBase = (DWORD64)pTib->StackBase; // High address
DWORD64 stackLimit = (DWORD64)pTib->StackLimit; // Low address
// RSP must be between StackLimit and StackBase
return (rsp >= stackLimit && rsp < stackBase);
}
// If SilentMoonwalk allocates its synthetic stack via VirtualAlloc
// (separate from the thread's stack), the RSP values will be
// OUTSIDE the thread stack bounds --> detected!
//
// Mitigation: SilentMoonwalk should build synthetic frames ON the
// thread's actual stack (within StackLimit to StackBase range),
// rather than on a separately allocated buffer.
4. Timing-Based Detection
The act of constructing a synthetic stack takes measurable time. If an EDR monitors the time between specific events (thread creation, API call, sleep entry), an abnormal delay might indicate stack manipulation occurring:
| Detection Method | What It Detects | SilentMoonwalk Exposure |
|---|---|---|
| CET Shadow Stacks | RSP/SSP return address mismatch | High — fundamental design conflict |
| Post-CALL validation | Return addr not after a CALL | Low — can be mitigated by careful addr selection |
| Semantic chain validation | Caller doesn't actually call callee | Medium — requires realistic chain selection |
| Stack bounds checking | RSP outside thread stack | High if using separate buffer, Low if using real stack |
| Saved register analysis | Implausible non-volatile reg values | Medium — plausible values mitigate this |
| Timing analysis | Abnormal delay before syscall | Low — construction is fast (microseconds) |
Inherent Limitations of SilentMoonwalk
1. Static Chain Templates
SilentMoonwalk uses pre-selected call chain templates. While it dynamically computes frame sizes, the choice of which functions to spoof is somewhat fixed. A truly dynamic approach would analyze the actual call chain that the target API produces during normal operation and replicate that exact chain.
2. Gadget Availability
The technique depends on finding suitable gadgets (JMP [RBX], ADD RSP, POP REG) within functions that have compatible UNWIND_INFO. On some Windows builds, the ideal gadgets may not exist, or they may exist only in functions with inconvenient frame sizes. This can limit which call chains can be spoofed.
3. Non-Volatile Register State
SilentMoonwalk must provide plausible values for all saved non-volatile registers in each frame. Currently, it uses heuristic approaches (module addresses, data section pointers). A sophisticated analyzer could detect patterns in these values that differ from what real function execution produces.
4. Single-Thread Focus
SilentMoonwalk spoofs the call stack for a single thread during a specific API call. It does not handle multi-threaded scenarios where multiple threads might be inspected simultaneously, or where an EDR correlates stack traces across threads.
Comparison: Stack Spoofing Tools
| Feature | ThreadStackSpoofer | CallStackSpoofingPOC | SilentMoonwalk | Draugr |
|---|---|---|---|---|
| Spoofed Frames | 1 (return addr only) | 1-2 (with ADD RSP gadget) | Full chain (N frames) | Full chain (N frames) |
| Unwind Compatible | No — breaks RtlVirtualUnwind | Partial — first frame only | Yes — all frames have valid unwind data | Yes — fake RUNTIME_FUNCTION/UNWIND_INFO entries |
| Dynamic | No — hardcoded addresses | Partially — one gadget resolved | Yes — parses .pdata at runtime | Yes — builds synthetic unwind metadata at runtime |
| Mechanism | Overwrite return addr before sleep | ROP with ADD RSP gadget | Multi-frame ROP + synthetic unwind chain | JMP [RBX] gadget chaining + synthetic RUNTIME_FUNCTION/UNWIND_INFO for BOFs |
| CET Resistant | No | No | No | No |
| Call Validation Resistant | No | Partially | With careful return addr selection | With synthetic unwind metadata |
| Implementation | Simple C/ASM | C++ with one gadget | C++ with gadget database | C++/ASM BOF with JMP [RBX] gadgets |
Draugr and Unwinder: The Next Generation
Draugr (by NtDallas) took a different approach oriented toward Cobalt Strike BOFs (Beacon Object Files). Rather than finding arbitrary ROP gadgets, Draugr provides call stack spoofing for BOFs by constructing synthetic stack frames using JMP [RBX] gadgets to chain frames together. It constructs fake RUNTIME_FUNCTION and UNWIND_INFO entries so that the synthetic frames pass RtlVirtualUnwind validation:
C++// Draugr's approach (conceptual):
// 1. Enumerate loaded modules for JMP [RBX] gadgets in functions
// with valid RUNTIME_FUNCTION entries
// 2. Construct synthetic RUNTIME_FUNCTION and UNWIND_INFO entries
// that describe the desired frame layout
// 3. Chain synthetic frames using JMP [RBX] gadgets, where each
// frame's unwind data matches the stack layout
// 4. The resulting stack passes RtlVirtualUnwind validation because
// every frame has proper unwind metadata
// Advantage: Purpose-built for Cobalt Strike BOFs with JMP [RBX]
// gadget chaining and fake unwind metadata.
// Works within the BOF execution context without sacrificial threads.
The Unwinder Approach
Unwinder (by Kudaes, at github.com/Kudaes/Unwinder) takes a different approach by implementing a complete x64 unwinder from scratch. This custom unwinder processes UNWIND_INFO structures identically to RtlVirtualUnwind but can also reverse the process: given a desired call chain, it computes the exact stack contents needed. This eliminates the gadget dependency entirely — no ROP chain is needed because the synthetic stack is computed mathematically from the unwind metadata.
Control Flow Guard (CFG)
CFG is a Windows mitigation that validates indirect call targets. Before every indirect CALL, the compiler inserts a check against a bitmap of valid call targets. How does this affect SilentMoonwalk?
C++// CFG validation on indirect calls:
// The compiler inserts:
// call __guard_check_icall_fptr ; validate target
// call [rax] ; actual indirect call
// Impact on SilentMoonwalk:
// - JMP [RBX] is NOT a CALL, so CFG doesn't validate the target
// - However, if the spoofer uses CALL-based gadgets, CFG may block them
// - SilentMoonwalk deliberately uses JMP-based trampolines to avoid CFG
//
// CFG bitmap is per-module and marks function entry points as valid.
// Gadgets mid-function are NOT valid CFG targets for indirect CALL,
// but JMP instructions bypass this check entirely.
CFG + CET: The Combined Defense
When both CFG and CET are enabled, indirect calls are validated against the CFG bitmap (software check) and returns are validated against the shadow stack (hardware check). This combination significantly constrains ROP-based techniques. SilentMoonwalk's JMP [RBX] bypasses CFG but cannot bypass CET's shadow stack validation on the subsequent RET.
Detection Engineering: Building Detections
For blue team practitioners, here are actionable detection strategies ordered by implementation complexity:
| Priority | Detection | Implementation | False Positive Risk |
|---|---|---|---|
| 1 (Easy) | Enable CET shadow stacks | Compile with /CETCOMPAT, enable in OS | Low (hardware-enforced) |
| 2 (Medium) | Stack bounds validation | Check RSP within TEB.StackBase/Limit during unwind | Low |
| 3 (Medium) | Post-CALL instruction check | Disassemble 2-6 bytes before each return address | Medium (some JIT code) |
| 4 (Hard) | Semantic chain validation | Verify caller-callee relationships via static analysis | Medium (indirect calls) |
| 5 (Hard) | Saved register plausibility | Check non-volatile reg values against expected ranges | High (wide variance in normal) |
The Future of Stack Spoofing
The arms race between stack spoofing and detection continues to evolve:
- Hardware enforcement (CET, PAC) will eventually make ROP-based spoofing obsolete on systems where it's enabled. The transition period, however, may last years.
- Synthetic unwind metadata (Draugr's approach of constructing fake RUNTIME_FUNCTION/UNWIND_INFO entries) and unwinder emulation (Kudaes' Unwinder) produce higher-fidelity spoofed stacks but at greater complexity.
- Hybrid approaches may combine stack spoofing with sleep obfuscation (encrypting memory during sleep) and module stomping (placing code in legitimate DLL memory) for layered evasion.
- Kernel-mode telemetry improvements may provide EDRs with tamper-resistant stack information that cannot be manipulated from user mode.
The Fundamental Lesson
SilentMoonwalk demonstrated that the x64 structured exception handling mechanism, designed for reliability and performance, creates an exploitable gap between what code actually executes and what the unwinding metadata reports. As long as the unwinder is stateless and trusts stack contents, some form of desynchronization attack will be possible. The defense community's response has been to add independent verification (shadow stacks, call validation) rather than trying to make the unwinder tamper-proof.
Pop Quiz: Detection & Countermeasures
Q1: How does Intel CET's shadow stack defeat SilentMoonwalk?
Q2: What advantage does Draugr's approach have over SilentMoonwalk?
Q3: Why is stack bounds checking (TEB.StackBase/StackLimit) an effective detection against naive implementations?