Difficulty: Advanced

Module 8: Detection, Limitations & Countermeasures

No technique is invincible. Understanding the defenses shapes the next generation of offense.

The Arms Race

SilentMoonwalk represented a major leap in call stack spoofing, but defenders have not stood still. This module examines the hardware and software mechanisms that can detect or prevent stack spoofing, the inherent limitations of SilentMoonwalk's approach, and how successor tools like Draugr and Unwinder evolved the technique further.

Hardware Countermeasures

Intel CET: Control-Flow Enforcement Technology

Intel CET introduces a shadow stack — a second, hardware-protected stack that stores only return addresses. The CPU automatically pushes the return address to both the regular stack and the shadow stack on every CALL, and validates that both match on every RET:

TextNormal execution with CET enabled:
  CALL target_function:
    1. Push return address to regular stack (RSP)
    2. Push return address to shadow stack (SSP)  <-- hardware-managed
    3. Jump to target_function

  RET:
    1. Pop return address from regular stack
    2. Pop return address from shadow stack
    3. Compare both addresses
    4. If they MATCH: continue normally
    5. If they DIFFER: #CP exception (Control Protection fault)

Impact on SilentMoonwalk:
  - JMP [RBX] doesn't push to either stack --> no shadow stack entry
  - When the target API executes RET, the shadow stack has the REAL
    return address, but the regular stack has the SPOOFED address
  - MISMATCH --> #CP fault --> process crashes or is flagged

CET Adoption Status

Intel CET (specifically the shadow stack component) is supported in hardware starting from Intel 11th gen (Tiger Lake) and AMD Zen 3 processors. Windows 10 20H1+ and Windows 11 support it as Hardware-enforced Stack Protection (also known as CET Shadow Stacks). However, adoption is gradual: it must be enabled per-process via the /CETCOMPAT linker flag, and many applications and DLLs are not yet CET-compatible. As of late 2024, most enterprise software still runs without CET enforcement, but the trend is toward broader adoption.

ARM Pointer Authentication (PAC)

On ARM64 systems (including Windows on ARM), Pointer Authentication Codes provide a similar defense. Return addresses are cryptographically signed before being pushed to the stack and verified before use. Modifying a signed pointer invalidates the PAC, causing a fault. While not directly relevant to x64 SilentMoonwalk, this represents the platform direction.

Software Detection Strategies

1. Return Address Validation

EDRs can validate that each return address on the stack points to an instruction immediately following a CALL instruction. Legitimate return addresses are always the instruction after a CALL. SilentMoonwalk's return addresses point to arbitrary offsets within functions (typically just after the prologue):

C++// Detection: verify each return address follows a CALL instruction
bool IsReturnAddressAfterCall(PVOID retAddr) {
    // A CALL instruction on x64 can be:
    //   E8 xx xx xx xx     (CALL rel32)     - 5 bytes
    //   FF 15 xx xx xx xx  (CALL [rip+xx])  - 6 bytes
    //   FF Dx              (CALL reg)       - 2 bytes
    //   41 FF Dx           (CALL r8-r15)    - 3 bytes

    PBYTE addr = (PBYTE)retAddr;

    // Check for CALL rel32 (most common)
    if (addr[-5] == 0xE8)
        return true;

    // Check for CALL [rip+disp32]
    if (addr[-6] == 0xFF && (addr[-5] & 0x38) == 0x10)
        return true;

    // Check for CALL reg (2-byte)
    if (addr[-2] == 0xFF && (addr[-1] & 0xF8) == 0xD0)
        return true;

    // Check for CALL r8-r15 (3-byte with REX)
    if (addr[-3] == 0x41 && addr[-2] == 0xFF && (addr[-1] & 0xF8) == 0xD0)
        return true;

    // No CALL found before this return address!
    // This could indicate a spoofed return address.
    return false;
}

Evasion: Choosing Post-CALL Return Addresses

SilentMoonwalk can counter this detection by specifically selecting return addresses that follow actual CALL instructions in the target function. Instead of pointing to an arbitrary post-prologue offset, the spoofer scans the target function for CALL instructions and uses the address immediately after a CALL as the return address. This makes the spoofed address indistinguishable from a real return point.

2. Call Chain Semantic Validation

A more sophisticated detection validates that the call chain is semantically plausible. For each pair of adjacent frames (caller, callee), the EDR verifies that the caller function actually contains a CALL instruction that could target the callee:

C++// Detection: verify caller actually calls callee
bool ValidateCallChainEdge(PVOID callerRetAddr, PVOID calleeFunc) {
    // The CALL instruction is at (callerRetAddr - sizeof(CALL))
    PBYTE callSite = (PBYTE)callerRetAddr - 5; // assuming CALL rel32

    if (callSite[0] == 0xE8) {
        // Decode the relative target
        INT32 relTarget = *(INT32*)(callSite + 1);
        PVOID callTarget = (PVOID)(callSite + 5 + relTarget);

        // Does this CALL actually target the callee function?
        // (or a function that eventually calls it)
        if (callTarget == calleeFunc)
            return true;

        // Could also be an indirect call - harder to validate
    }
    return false;  // Suspicious: caller doesn't appear to call callee
}

3. Stack Region Validation

EDRs can verify that RSP values in the unwind chain fall within the thread's actual stack bounds (stored in the TEB):

C++// Detection: verify stack pointers are within thread stack bounds
bool IsRspInThreadStack(DWORD64 rsp) {
    // Get Thread Environment Block
    PNT_TIB pTib = (PNT_TIB)NtCurrentTeb();

    DWORD64 stackBase  = (DWORD64)pTib->StackBase;    // High address
    DWORD64 stackLimit = (DWORD64)pTib->StackLimit;   // Low address

    // RSP must be between StackLimit and StackBase
    return (rsp >= stackLimit && rsp < stackBase);
}

// If SilentMoonwalk allocates its synthetic stack via VirtualAlloc
// (separate from the thread's stack), the RSP values will be
// OUTSIDE the thread stack bounds --> detected!
//
// Mitigation: SilentMoonwalk should build synthetic frames ON the
// thread's actual stack (within StackLimit to StackBase range),
// rather than on a separately allocated buffer.

4. Timing-Based Detection

The act of constructing a synthetic stack takes measurable time. If an EDR monitors the time between specific events (thread creation, API call, sleep entry), an abnormal delay might indicate stack manipulation occurring:

Detection MethodWhat It DetectsSilentMoonwalk Exposure
CET Shadow StacksRSP/SSP return address mismatchHigh — fundamental design conflict
Post-CALL validationReturn addr not after a CALLLow — can be mitigated by careful addr selection
Semantic chain validationCaller doesn't actually call calleeMedium — requires realistic chain selection
Stack bounds checkingRSP outside thread stackHigh if using separate buffer, Low if using real stack
Saved register analysisImplausible non-volatile reg valuesMedium — plausible values mitigate this
Timing analysisAbnormal delay before syscallLow — construction is fast (microseconds)

Inherent Limitations of SilentMoonwalk

1. Static Chain Templates

SilentMoonwalk uses pre-selected call chain templates. While it dynamically computes frame sizes, the choice of which functions to spoof is somewhat fixed. A truly dynamic approach would analyze the actual call chain that the target API produces during normal operation and replicate that exact chain.

2. Gadget Availability

The technique depends on finding suitable gadgets (JMP [RBX], ADD RSP, POP REG) within functions that have compatible UNWIND_INFO. On some Windows builds, the ideal gadgets may not exist, or they may exist only in functions with inconvenient frame sizes. This can limit which call chains can be spoofed.

3. Non-Volatile Register State

SilentMoonwalk must provide plausible values for all saved non-volatile registers in each frame. Currently, it uses heuristic approaches (module addresses, data section pointers). A sophisticated analyzer could detect patterns in these values that differ from what real function execution produces.

4. Single-Thread Focus

SilentMoonwalk spoofs the call stack for a single thread during a specific API call. It does not handle multi-threaded scenarios where multiple threads might be inspected simultaneously, or where an EDR correlates stack traces across threads.

Comparison: Stack Spoofing Tools

FeatureThreadStackSpooferCallStackSpoofingPOCSilentMoonwalkDraugr
Spoofed Frames1 (return addr only)1-2 (with ADD RSP gadget)Full chain (N frames)Full chain (N frames)
Unwind CompatibleNo — breaks RtlVirtualUnwindPartial — first frame onlyYes — all frames have valid unwind dataYes — fake RUNTIME_FUNCTION/UNWIND_INFO entries
DynamicNo — hardcoded addressesPartially — one gadget resolvedYes — parses .pdata at runtimeYes — builds synthetic unwind metadata at runtime
MechanismOverwrite return addr before sleepROP with ADD RSP gadgetMulti-frame ROP + synthetic unwind chainJMP [RBX] gadget chaining + synthetic RUNTIME_FUNCTION/UNWIND_INFO for BOFs
CET ResistantNoNoNoNo
Call Validation ResistantNoPartiallyWith careful return addr selectionWith synthetic unwind metadata
ImplementationSimple C/ASMC++ with one gadgetC++ with gadget databaseC++/ASM BOF with JMP [RBX] gadgets

Draugr and Unwinder: The Next Generation

Draugr (by NtDallas) took a different approach oriented toward Cobalt Strike BOFs (Beacon Object Files). Rather than finding arbitrary ROP gadgets, Draugr provides call stack spoofing for BOFs by constructing synthetic stack frames using JMP [RBX] gadgets to chain frames together. It constructs fake RUNTIME_FUNCTION and UNWIND_INFO entries so that the synthetic frames pass RtlVirtualUnwind validation:

C++// Draugr's approach (conceptual):
// 1. Enumerate loaded modules for JMP [RBX] gadgets in functions
//    with valid RUNTIME_FUNCTION entries
// 2. Construct synthetic RUNTIME_FUNCTION and UNWIND_INFO entries
//    that describe the desired frame layout
// 3. Chain synthetic frames using JMP [RBX] gadgets, where each
//    frame's unwind data matches the stack layout
// 4. The resulting stack passes RtlVirtualUnwind validation because
//    every frame has proper unwind metadata

// Advantage: Purpose-built for Cobalt Strike BOFs with JMP [RBX]
// gadget chaining and fake unwind metadata.
// Works within the BOF execution context without sacrificial threads.

The Unwinder Approach

Unwinder (by Kudaes, at github.com/Kudaes/Unwinder) takes a different approach by implementing a complete x64 unwinder from scratch. This custom unwinder processes UNWIND_INFO structures identically to RtlVirtualUnwind but can also reverse the process: given a desired call chain, it computes the exact stack contents needed. This eliminates the gadget dependency entirely — no ROP chain is needed because the synthetic stack is computed mathematically from the unwind metadata.

Control Flow Guard (CFG)

CFG is a Windows mitigation that validates indirect call targets. Before every indirect CALL, the compiler inserts a check against a bitmap of valid call targets. How does this affect SilentMoonwalk?

C++// CFG validation on indirect calls:
// The compiler inserts:
//   call __guard_check_icall_fptr  ; validate target
//   call [rax]                      ; actual indirect call

// Impact on SilentMoonwalk:
// - JMP [RBX] is NOT a CALL, so CFG doesn't validate the target
// - However, if the spoofer uses CALL-based gadgets, CFG may block them
// - SilentMoonwalk deliberately uses JMP-based trampolines to avoid CFG
//
// CFG bitmap is per-module and marks function entry points as valid.
// Gadgets mid-function are NOT valid CFG targets for indirect CALL,
// but JMP instructions bypass this check entirely.

CFG + CET: The Combined Defense

When both CFG and CET are enabled, indirect calls are validated against the CFG bitmap (software check) and returns are validated against the shadow stack (hardware check). This combination significantly constrains ROP-based techniques. SilentMoonwalk's JMP [RBX] bypasses CFG but cannot bypass CET's shadow stack validation on the subsequent RET.

Detection Engineering: Building Detections

For blue team practitioners, here are actionable detection strategies ordered by implementation complexity:

PriorityDetectionImplementationFalse Positive Risk
1 (Easy)Enable CET shadow stacksCompile with /CETCOMPAT, enable in OSLow (hardware-enforced)
2 (Medium)Stack bounds validationCheck RSP within TEB.StackBase/Limit during unwindLow
3 (Medium)Post-CALL instruction checkDisassemble 2-6 bytes before each return addressMedium (some JIT code)
4 (Hard)Semantic chain validationVerify caller-callee relationships via static analysisMedium (indirect calls)
5 (Hard)Saved register plausibilityCheck non-volatile reg values against expected rangesHigh (wide variance in normal)

The Future of Stack Spoofing

The arms race between stack spoofing and detection continues to evolve:

The Fundamental Lesson

SilentMoonwalk demonstrated that the x64 structured exception handling mechanism, designed for reliability and performance, creates an exploitable gap between what code actually executes and what the unwinding metadata reports. As long as the unwinder is stateless and trusts stack contents, some form of desynchronization attack will be possible. The defense community's response has been to add independent verification (shadow stacks, call validation) rather than trying to make the unwinder tamper-proof.

Pop Quiz: Detection & Countermeasures

Q1: How does Intel CET's shadow stack defeat SilentMoonwalk?

CET pushes return addresses to a shadow stack on every CALL and validates them on every RET. Since SilentMoonwalk uses JMP [RBX] (which doesn't push to the shadow stack), when the target API executes RET, the regular stack has the spoofed return address but the shadow stack has no corresponding entry (or a different one), causing a control protection fault.

Q2: What advantage does Draugr's approach have over SilentMoonwalk?

Draugr (by NtDallas) constructs synthetic RUNTIME_FUNCTION and UNWIND_INFO entries and uses JMP [RBX] gadgets to chain fake stack frames that pass RtlVirtualUnwind validation. It is purpose-built for Cobalt Strike BOFs, providing call stack spoofing within the BOF execution context. Its synthetic unwind metadata approach produces frames that survive structured unwinding (though, like SilentMoonwalk, it does not bypass CET).

Q3: Why is stack bounds checking (TEB.StackBase/StackLimit) an effective detection against naive implementations?

Each thread's TEB records the stack base (high address) and stack limit (low address). If SilentMoonwalk builds its synthetic stack on a separately allocated memory buffer, the RSP values during unwinding will point outside the thread's stack range. An EDR checking RSP against these bounds will immediately flag the frame as invalid. The mitigation is to build synthetic frames directly on the thread's real stack.