Module 7: Stack & Return Address Handling
Making the call stack look legitimate when execution is anything but normal.
Module Objective
Explore the advanced stack handling challenges in Ekko and similar sleep obfuscation techniques: what happens when NtContinue-invoked functions return, why the call stack matters for detection, how gadget selection affects stack legitimacy, and techniques for constructing clean return paths that fool call stack inspection tools.
1. The Return Address Problem
When NtContinue sets RIP to a target function (like VirtualProtect), the function executes normally. But at the end, it executes a RET instruction. RET pops 8 bytes from the stack (at [RSP]) and jumps to that address. The question is: what is at [RSP]?
TextNormal CALL flow:
CALL VirtualProtect --> pushes return address, jumps to function
... function executes
RET --> pops return address, jumps back to caller
NtContinue flow (Ekko):
NtContinue sets RIP = VirtualProtect, RSP -= 8
... function executes
RET --> pops [RSP] ... which is whatever was
at that stack location from the
baseline context capture
In Ekko's PoC, the value at [RSP] after the Rsp -= 8 adjustment is not controlled. It is whatever data happened to be at that location on the timer thread's stack when the context was captured. This means the RET from each API call returns to an unpredictable address.
Known Flaw in the PoC
This is one of the "known flaws" that Cracked5pider mentions in Ekko's README. The uncontrolled return address means that after each API call completes and executes RET, execution jumps to an arbitrary stack location. The technique still works because the timer thread's infrastructure eventually regains control and services the next timer, but the intermediate state is undefined. In a production implant, this could cause instability, crashes, or detection.
2. Why the Call Stack Matters for Detection
Modern EDR products and threat hunting tools inspect the call stack (backtrace) of threads to identify suspicious behavior. A legitimate thread's call stack shows a clean chain of return addresses, each pointing into known, signed modules:
TextLegitimate call stack:
ntdll!NtWaitForSingleObject+0x14
KERNELBASE!WaitForSingleObjectEx+0x8e
kernel32!WaitForSingleObject+0x12
myapp!main+0x45
kernel32!BaseThreadInitThunk+0x14
ntdll!RtlUserThreadStart+0x21
Suspicious call stack (Ekko PoC):
ntdll!NtWaitForSingleObject+0x14
KERNELBASE!WaitForSingleObjectEx+0x8e
0x00007FF612340058 <-- unbacked memory (beacon code!)
??? <-- unknown return address
Call Stack Unwinding
Stack unwinding tools (like Hunt-Sleeping-Beacons) walk the stack by following the chain of saved RBP values or using unwind metadata (.pdata/.xdata sections) in loaded modules. If a return address points into unbacked memory (memory not backed by a file on disk), it is flagged as suspicious. Ekko's PoC does not address this, leaving return addresses that may point into the implant's memory region or other non-standard locations.
3. Stack Alignment Deep Dive
The x64 ABI is strict about stack alignment. Let us trace through what happens with different RSP values to understand the Rsp -= 8 adjustment in precise detail:
| Scenario | RSP Value | Alignment | Result |
|---|---|---|---|
| Captured RSP | 0x...F000 | 16-byte aligned | Matches "before CALL" state |
| After Rsp -= 8 | 0x...EFF8 | 16-byte aligned - 8 | Matches "after CALL pushed return addr" |
| Function entry | 0x...EFF8 | 16-byte aligned - 8 | Correct alignment for function |
| After function's SUB RSP, XX | 0x...EFn0 | 16-byte aligned | Local variables properly aligned |
If Ekko did not subtract 8, the function would see RSP at 16-byte alignment (the "before CALL" state), which is wrong. Functions like VirtualProtect contain MOVAPS instructions that operate on 16-byte-aligned stack addresses computed relative to RSP. If RSP is off by 8 bytes, these instructions generate an alignment fault (exception 0x11, STATUS_DATATYPE_MISALIGNMENT), crashing the timer thread.
TextWithout Rsp -= 8:
RSP = 0x...F000 (16-aligned)
Function does: SUB RSP, 0x28
RSP = 0x...EFD8 (NOT 16-aligned!)
MOVAPS [RSP+0x20], XMM6 --> writes to 0x...EFF8 (ok)
MOVAPS [RSP+0x10], XMM7 --> writes to 0x...EFE8 (ok)
MOVAPS [RSP], XMM8 --> writes to 0x...EFD8 (misaligned!)
CRASH: #GP or #AC exception
With Rsp -= 8:
RSP = 0x...EFF8 (16-aligned - 8)
Function does: SUB RSP, 0x28
RSP = 0x...EFD0 (16-aligned!)
All MOVAPS operations target properly aligned addresses.
No crash.
4. Gadget Selection for Clean Returns
To fix the return address problem, an improved implementation would place a controlled value at [RSP] before each NtContinue context switch. This value would be a "gadget" — an address in a legitimate DLL that performs a useful action (or does nothing) and returns cleanly:
C// Improved approach: control the return address
// Find a "RET" gadget in ntdll (just returns immediately)
PVOID retGadget = FindGadget(hNtdll, "\xC3", 1);
// Write the gadget address as the return address
// by placing it at the RSP location in the context
*(DWORD64*)(RopProtRW.Rsp) = (DWORD64)retGadget;
// Now when VirtualProtect does RET, it jumps to the
// gadget (a single RET instruction in ntdll), which
// returns cleanly into the timer infrastructure
Common Gadget Types
RETgadget (0xC3) — Simply returns, passing control to the next address on the stack. Useful for "doing nothing" in a clean way.ADD RSP, XX; RET— Cleans up stack space and returns. Useful for skipping over shadow space or extra arguments.JMP [RBX]/JMP RAX— Indirect jump gadgets for trampolining through a register-controlled target. Used by FOLIAGE/AceLdr for more complex chains.POP RCX; RET— Loads a value from the stack into a register and returns. Classical ROP building block.
5. Constructing a Legitimate-Looking Stack
For maximal stealth, the call stack during the sleep window should look like a normal thread that happens to be waiting. Detection tools like Hunt-Sleeping-Beacons look for threads in a wait state whose call stack contains unbacked memory addresses. The ideal sleeping call stack would be:
TextIdeal stealth call stack during sleep:
ntdll!NtWaitForSingleObject+0x14 (the sleep wait)
ntdll!RtlRegisterWait+0x?? (timer infrastructure)
ntdll!TppTimerExpiration+0x?? (thread pool timer)
ntdll!TppWorkerThread+0x?? (thread pool worker)
kernel32!BaseThreadInitThunk+0x14 (thread start)
ntdll!RtlUserThreadStart+0x21 (NT thread entry)
Every address is in a signed, disk-backed system DLL.
No addresses in unbacked/private memory.
Achieving this requires not just controlling the return address but also constructing fake stack frames with proper frame pointers (RBP chain) and unwind metadata that matches the fabricated call stack. This is significantly more complex than Ekko's PoC and is a feature of more advanced implementations.
6. Stack Spoofing Techniques
Several techniques have been developed to create legitimate-looking call stacks:
| Technique | Mechanism | Complexity | Used By |
|---|---|---|---|
| Frame Pointer Overwrite | Set RBP to create a chain of frames pointing into legitimate DLLs | Medium | Various custom implants |
| Synthetic Stack Frames | Write complete fake frames on the stack with return addresses, saved RBP, and shadow space matching real function prologues | High | Advanced sleep obfuscation |
| Thread Stack Spoofing | Replace the entire thread stack with a fabricated one before sleeping, restore after wakeup | Very High | Research implementations |
| Unwind Metadata Alignment | Ensure return addresses align with .pdata entries so stack unwinders produce valid results | High | Advanced tooling |
Ekko's PoC Does Not Spoof Stacks
Ekko's published proof-of-concept does not implement any stack spoofing. The call stack during sleep will contain addresses from the timer thread's real execution path, which likely includes one or more addresses pointing into the implant's (now encrypted) memory region. This is a detectable artifact. A production implementation should add stack frame construction to hide these addresses.
7. The Shadow Space Requirement
Beyond the return address, Windows API functions expect 32 bytes of "shadow space" above the return address. In a normal call, the caller allocates this. In Ekko's NtContinue-driven approach, this space must exist on the stack:
TextExpected stack layout at function entry:
High addresses
...
[RSP + 0x20] shadow[3] (R9 home)
[RSP + 0x18] shadow[2] (R8 home)
[RSP + 0x10] shadow[1] (RDX home)
[RSP + 0x08] shadow[0] (RCX home)
[RSP + 0x00] return address
Low addresses (RSP points here)
Since the captured RSP points into the timer thread's existing stack, the 32 bytes above the adjusted RSP already contain whatever data was on the timer thread's stack. The shadow space is used by callees to optionally store parameters, and many functions write to it. This means Ekko's timer callbacks may corrupt 32 bytes of the timer thread's stack above each adjusted RSP. In the PoC, this does not cause problems because the timer thread's own state is managed by the kernel and reset between callback invocations.
8. Improving Ekko: Controlled Stack Layout
An improved version of Ekko could allocate a dedicated stack buffer and control its contents precisely:
C// Improved approach: dedicated stack for NtContinue operations
BYTE FakeStack[0x1000] = { 0 }; // 4KB stack buffer
DWORD64 stackTop = (DWORD64)&FakeStack[0x1000];
// Align to 16 bytes
stackTop &= ~0xF;
// For each context, use the controlled stack:
// Place return address at stackTop - 8
*(DWORD64*)(stackTop - 8) = (DWORD64)retGadget;
// Set RSP to stackTop - 8 (simulating post-CALL state)
RopProtRW.Rsp = stackTop - 8;
// Shadow space at stackTop, stackTop+8, stackTop+16, stackTop+24
// is all zeroed - safe for callee to write
// Now:
// 1. RSP is properly aligned (16-byte aligned - 8)
// 2. Return address is a controlled gadget in ntdll
// 3. Shadow space is clean and writable
// 4. No corruption of the timer thread's real stack
Benefits of a Controlled Stack
- Predictable return — Each function returns to a known gadget
- No stack corruption — Shadow space writes go to a dedicated buffer
- Stackable frames — Multiple frames can be constructed for a legitimate-looking backtrace
- Debugger-safe — Stack unwinding produces controlled results
9. Call Stack Inspection Tools
Understanding what defenders look for helps build better stack handling. Key tools that inspect sleeping thread stacks:
| Tool | Technique | What It Flags |
|---|---|---|
| Hunt-Sleeping-Beacons | Enumerates threads in wait state, walks call stack via StackWalk64 | Return addresses in unbacked (private) memory, anomalous frame chains |
| BeaconEye | Scans process memory for Cobalt Strike configuration structures | CS config patterns (not stack-specific but complementary) |
| Patriot | Inspects sleeping thread contexts via NtGetContextThread | RIP pointing into unbacked memory while thread is sleeping |
| Moneta | Scans for anomalous memory regions with unusual permissions | RWX pages, unbacked executable memory (even if encrypted) |
Against these tools, Ekko's PoC is vulnerable to all four detection methods because it does not spoof the thread context (Patriot), does not clean the call stack (Hunt-Sleeping-Beacons), leaves the PE headers encrypted but still in a distinctive pattern (BeaconEye pre-sleep), and uses a known timer-queue pattern (Moneta). Advanced implementations must address each of these vectors.
Knowledge Check
Q1: What is the primary stack-related flaw in Ekko's proof-of-concept?
Q2: Why do detection tools inspect the call stack of sleeping threads?
Q3: What would happen if Ekko did NOT perform the Rsp -= 8 adjustment?