Difficulty: Advanced

Module 8: Detection, Limitations & Countermeasures

No technique is invincible. Understanding the defenses shapes the next generation of offense.

The Arms Race

SilentMoonwalk represented a major leap in call stack spoofing, but defenders have not stood still. This module examines the hardware and software mechanisms that can detect or prevent stack spoofing, the inherent limitations of SilentMoonwalk's approach, and how successor tools like Draugr and Unwinder evolved the technique further.

Hardware Countermeasures

Intel CET: Control-Flow Enforcement Technology

Intel CET introduces a shadow stack — a second, hardware-protected stack that stores only return addresses. The CPU automatically pushes the return address to both the regular stack and the shadow stack on every CALL, and validates that both match on every RET:

TextNormal execution with CET enabled:
  CALL target_function:
    1. Push return address to regular stack (RSP)
    2. Push return address to shadow stack (SSP)  <-- hardware-managed
    3. Jump to target_function

  RET:
    1. Pop return address from regular stack
    2. Pop return address from shadow stack
    3. Compare both addresses
    4. If they MATCH: continue normally
    5. If they DIFFER: #CP exception (Control Protection fault)

Impact on SilentMoonwalk:
  - JMP [RBX] doesn't push to either stack --> no shadow stack entry
  - When the target API executes RET, the shadow stack has the REAL
    return address, but the regular stack has the SPOOFED address
  - MISMATCH --> #CP fault --> process crashes or is flagged

CET Adoption Status

Intel CET (specifically the shadow stack component) is supported in hardware starting from Intel 11th gen (Tiger Lake) and AMD Zen 3 processors. Windows 10 20H1+ and Windows 11 support it as Hardware-enforced Stack Protection (also known as CET Shadow Stacks). However, adoption is gradual: it must be enabled per-process via the /CETCOMPAT linker flag, and many applications and DLLs are not yet CET-compatible. As of late 2024, most enterprise software still runs without CET enforcement, but the trend is toward broader adoption.

ARM Pointer Authentication (PAC)

On ARM64 systems (including Windows on ARM), Pointer Authentication Codes provide a similar defense. Return addresses are cryptographically signed before being pushed to the stack and verified before use. Modifying a signed pointer invalidates the PAC, causing a fault. While not directly relevant to x64 SilentMoonwalk, this represents the platform direction.

Software Detection Strategies

1. Return Address Validation

EDRs can validate that each return address on the stack points to an instruction immediately following a CALL instruction. Legitimate return addresses are always the instruction after a CALL. SilentMoonwalk's return addresses point to arbitrary offsets within functions (typically just after the prologue):

C++// Detection: verify each return address follows a CALL instruction
bool IsReturnAddressAfterCall(PVOID retAddr) {
    // A CALL instruction on x64 can be:
    //   E8 xx xx xx xx     (CALL rel32)     - 5 bytes
    //   FF 15 xx xx xx xx  (CALL [rip+xx])  - 6 bytes
    //   FF Dx              (CALL reg)       - 2 bytes
    //   41 FF Dx           (CALL r8-r15)    - 3 bytes

    PBYTE addr = (PBYTE)retAddr;

    // Check for CALL rel32 (most common)
    if (addr[-5] == 0xE8)
        return true;

    // Check for CALL [rip+disp32]
    if (addr[-6] == 0xFF && (addr[-5] & 0x38) == 0x10)
        return true;

    // Check for CALL reg (2-byte)
    if (addr[-2] == 0xFF && (addr[-1] & 0xF8) == 0xD0)
        return true;

    // Check for CALL r8-r15 (3-byte with REX)
    if (addr[-3] == 0x41 && addr[-2] == 0xFF && (addr[-1] & 0xF8) == 0xD0)
        return true;

    // No CALL found before this return address!
    // This could indicate a spoofed return address.
    return false;
}

Evasion: Choosing Post-CALL Return Addresses

SilentMoonwalk can counter this detection by specifically selecting return addresses that follow actual CALL instructions in the target function. Instead of pointing to an arbitrary post-prologue offset, the spoofer scans the target function for CALL instructions and uses the address immediately after a CALL as the return address. This makes the spoofed address indistinguishable from a real return point.

2. Call Chain Semantic Validation

A more sophisticated detection validates that the call chain is semantically plausible. For each pair of adjacent frames (caller, callee), the EDR verifies that the caller function actually contains a CALL instruction that could target the callee:

C++// Detection: verify caller actually calls callee
bool ValidateCallChainEdge(PVOID callerRetAddr, PVOID calleeFunc) {
    // The CALL instruction is at (callerRetAddr - sizeof(CALL))
    PBYTE callSite = (PBYTE)callerRetAddr - 5; // assuming CALL rel32

    if (callSite[0] == 0xE8) {
        // Decode the relative target
        INT32 relTarget = *(INT32*)(callSite + 1);
        PVOID callTarget = (PVOID)(callSite + 5 + relTarget);

        // Does this CALL actually target the callee function?
        // (or a function that eventually calls it)
        if (callTarget == calleeFunc)
            return true;

        // Could also be an indirect call - harder to validate
    }
    return false;  // Suspicious: caller doesn't appear to call callee
}

3. Stack Region Validation

EDRs can verify that RSP values in the unwind chain fall within the thread's actual stack bounds (stored in the TEB):

C++// Detection: verify stack pointers are within thread stack bounds
bool IsRspInThreadStack(DWORD64 rsp) {
    // Get Thread Environment Block
    PNT_TIB pTib = (PNT_TIB)NtCurrentTeb();

    DWORD64 stackBase  = (DWORD64)pTib->StackBase;    // High address
    DWORD64 stackLimit = (DWORD64)pTib->StackLimit;   // Low address

    // RSP must be between StackLimit and StackBase
    return (rsp >= stackLimit && rsp < stackBase);
}

// If SilentMoonwalk allocates its synthetic stack via VirtualAlloc
// (separate from the thread's stack), the RSP values will be
// OUTSIDE the thread stack bounds --> detected!
//
// Mitigation: SilentMoonwalk should build synthetic frames ON the
// thread's actual stack (within StackLimit to StackBase range),
// rather than on a separately allocated buffer.

4. Timing-Based Detection

The act of constructing a synthetic stack takes measurable time. If an EDR monitors the time between specific events (thread creation, API call, sleep entry), an abnormal delay might indicate stack manipulation occurring:

Detection Method	What It Detects	SilentMoonwalk Exposure
CET Shadow Stacks	RSP/SSP return address mismatch	High — fundamental design conflict
Post-CALL validation	Return addr not after a CALL	Low — can be mitigated by careful addr selection
Semantic chain validation	Caller doesn't actually call callee	Medium — requires realistic chain selection
Stack bounds checking	RSP outside thread stack	High if using separate buffer, Low if using real stack
Saved register analysis	Implausible non-volatile reg values	Medium — plausible values mitigate this
Timing analysis	Abnormal delay before syscall	Low — construction is fast (microseconds)

Inherent Limitations of SilentMoonwalk

1. Static Chain Templates

SilentMoonwalk uses pre-selected call chain templates. While it dynamically computes frame sizes, the choice of which functions to spoof is somewhat fixed. A truly dynamic approach would analyze the actual call chain that the target API produces during normal operation and replicate that exact chain.

2. Gadget Availability

The technique depends on finding suitable gadgets (JMP [RBX], ADD RSP, POP REG) within functions that have compatible UNWIND_INFO. On some Windows builds, the ideal gadgets may not exist, or they may exist only in functions with inconvenient frame sizes. This can limit which call chains can be spoofed.

3. Non-Volatile Register State

SilentMoonwalk must provide plausible values for all saved non-volatile registers in each frame. Currently, it uses heuristic approaches (module addresses, data section pointers). A sophisticated analyzer could detect patterns in these values that differ from what real function execution produces.

4. Single-Thread Focus

SilentMoonwalk spoofs the call stack for a single thread during a specific API call. It does not handle multi-threaded scenarios where multiple threads might be inspected simultaneously, or where an EDR correlates stack traces across threads.

Comparison: Stack Spoofing Tools

Feature	ThreadStackSpoofer	CallStackSpoofingPOC	SilentMoonwalk	Draugr
Spoofed Frames	1 (return addr only)	1-2 (with ADD RSP gadget)	Full chain (N frames)	Full chain (N frames)
Unwind Compatible	No — breaks RtlVirtualUnwind	Partial — first frame only	Yes — all frames have valid unwind data	Yes — fake RUNTIME_FUNCTION/UNWIND_INFO entries
Dynamic	No — hardcoded addresses	Partially — one gadget resolved	Yes — parses .pdata at runtime	Yes — builds synthetic unwind metadata at runtime
Mechanism	Overwrite return addr before sleep	ROP with ADD RSP gadget	Multi-frame ROP + synthetic unwind chain	JMP [RBX] gadget chaining + synthetic RUNTIME_FUNCTION/UNWIND_INFO for BOFs
CET Resistant	No	No	No	No
Call Validation Resistant	No	Partially	With careful return addr selection	With synthetic unwind metadata
Implementation	Simple C/ASM	C++ with one gadget	C++ with gadget database	C++/ASM BOF with JMP [RBX] gadgets

Draugr and Unwinder: The Next Generation

Draugr (by NtDallas) took a different approach oriented toward Cobalt Strike BOFs (Beacon Object Files). Rather than finding arbitrary ROP gadgets, Draugr provides call stack spoofing for BOFs by constructing synthetic stack frames using JMP [RBX] gadgets to chain frames together. It constructs fake RUNTIME_FUNCTION and UNWIND_INFO entries so that the synthetic frames pass RtlVirtualUnwind validation:

C++// Draugr's approach (conceptual):
// 1. Enumerate loaded modules for JMP [RBX] gadgets in functions
//    with valid RUNTIME_FUNCTION entries
// 2. Construct synthetic RUNTIME_FUNCTION and UNWIND_INFO entries
//    that describe the desired frame layout
// 3. Chain synthetic frames using JMP [RBX] gadgets, where each
//    frame's unwind data matches the stack layout
// 4. The resulting stack passes RtlVirtualUnwind validation because
//    every frame has proper unwind metadata

// Advantage: Purpose-built for Cobalt Strike BOFs with JMP [RBX]
// gadget chaining and fake unwind metadata.
// Works within the BOF execution context without sacrificial threads.

The Unwinder Approach

Unwinder (by Kudaes, at github.com/Kudaes/Unwinder) takes a different approach by implementing a complete x64 unwinder from scratch. This custom unwinder processes UNWIND_INFO structures identically to RtlVirtualUnwind but can also reverse the process: given a desired call chain, it computes the exact stack contents needed. This eliminates the gadget dependency entirely — no ROP chain is needed because the synthetic stack is computed mathematically from the unwind metadata.

Control Flow Guard (CFG)

CFG is a Windows mitigation that validates indirect call targets. Before every indirect CALL, the compiler inserts a check against a bitmap of valid call targets. How does this affect SilentMoonwalk?

C++// CFG validation on indirect calls:
// The compiler inserts:
//   call __guard_check_icall_fptr  ; validate target
//   call [rax]                      ; actual indirect call

// Impact on SilentMoonwalk:
// - JMP [RBX] is NOT a CALL, so CFG doesn't validate the target
// - However, if the spoofer uses CALL-based gadgets, CFG may block them
// - SilentMoonwalk deliberately uses JMP-based trampolines to avoid CFG
//
// CFG bitmap is per-module and marks function entry points as valid.
// Gadgets mid-function are NOT valid CFG targets for indirect CALL,
// but JMP instructions bypass this check entirely.

CFG + CET: The Combined Defense

When both CFG and CET are enabled, indirect calls are validated against the CFG bitmap (software check) and returns are validated against the shadow stack (hardware check). This combination significantly constrains ROP-based techniques. SilentMoonwalk's JMP [RBX] bypasses CFG but cannot bypass CET's shadow stack validation on the subsequent RET.

Detection Engineering: Building Detections

For blue team practitioners, here are actionable detection strategies ordered by implementation complexity:

Priority	Detection	Implementation	False Positive Risk
1 (Easy)	Enable CET shadow stacks	Compile with `/CETCOMPAT`, enable in OS	Low (hardware-enforced)
2 (Medium)	Stack bounds validation	Check RSP within TEB.StackBase/Limit during unwind	Low
3 (Medium)	Post-CALL instruction check	Disassemble 2-6 bytes before each return address	Medium (some JIT code)
4 (Hard)	Semantic chain validation	Verify caller-callee relationships via static analysis	Medium (indirect calls)
5 (Hard)	Saved register plausibility	Check non-volatile reg values against expected ranges	High (wide variance in normal)

The Future of Stack Spoofing

The arms race between stack spoofing and detection continues to evolve:

Hardware enforcement (CET, PAC) will eventually make ROP-based spoofing obsolete on systems where it's enabled. The transition period, however, may last years.
Synthetic unwind metadata (Draugr's approach of constructing fake RUNTIME_FUNCTION/UNWIND_INFO entries) and unwinder emulation (Kudaes' Unwinder) produce higher-fidelity spoofed stacks but at greater complexity.
Hybrid approaches may combine stack spoofing with sleep obfuscation (encrypting memory during sleep) and module stomping (placing code in legitimate DLL memory) for layered evasion.
Kernel-mode telemetry improvements may provide EDRs with tamper-resistant stack information that cannot be manipulated from user mode.

The Fundamental Lesson

SilentMoonwalk demonstrated that the x64 structured exception handling mechanism, designed for reliability and performance, creates an exploitable gap between what code actually executes and what the unwinding metadata reports. As long as the unwinder is stateless and trusts stack contents, some form of desynchronization attack will be possible. The defense community's response has been to add independent verification (shadow stacks, call validation) rather than trying to make the unwinder tamper-proof.

← Previous: The Full Spoof Engine Back to Course Home