Difficulty: Advanced

Module 7: Stack & Return Address Handling

Making the call stack look legitimate when execution is anything but normal.

Module Objective

Explore the advanced stack handling challenges in Ekko and similar sleep obfuscation techniques: what happens when NtContinue-invoked functions return, why the call stack matters for detection, how gadget selection affects stack legitimacy, and techniques for constructing clean return paths that fool call stack inspection tools.

1. The Return Address Problem

When NtContinue sets RIP to a target function (like VirtualProtect), the function executes normally. But at the end, it executes a RET instruction. RET pops 8 bytes from the stack (at [RSP]) and jumps to that address. The question is: what is at [RSP]?

TextNormal CALL flow:
  CALL VirtualProtect  -->  pushes return address, jumps to function
  ...                       function executes
  RET                  -->  pops return address, jumps back to caller

NtContinue flow (Ekko):
  NtContinue sets RIP = VirtualProtect, RSP -= 8
  ...                       function executes
  RET                  -->  pops [RSP] ... which is whatever was
                            at that stack location from the
                            baseline context capture

In Ekko's PoC, the value at [RSP] after the Rsp -= 8 adjustment is not controlled. It is whatever data happened to be at that location on the timer thread's stack when the context was captured. This means the RET from each API call returns to an unpredictable address.

Known Flaw in the PoC

This is one of the "known flaws" that Cracked5pider mentions in Ekko's README. The uncontrolled return address means that after each API call completes and executes RET, execution jumps to an arbitrary stack location. The technique still works because the timer thread's infrastructure eventually regains control and services the next timer, but the intermediate state is undefined. In a production implant, this could cause instability, crashes, or detection.

2. Why the Call Stack Matters for Detection

Modern EDR products and threat hunting tools inspect the call stack (backtrace) of threads to identify suspicious behavior. A legitimate thread's call stack shows a clean chain of return addresses, each pointing into known, signed modules:

TextLegitimate call stack:
  ntdll!NtWaitForSingleObject+0x14
  KERNELBASE!WaitForSingleObjectEx+0x8e
  kernel32!WaitForSingleObject+0x12
  myapp!main+0x45
  kernel32!BaseThreadInitThunk+0x14
  ntdll!RtlUserThreadStart+0x21

Suspicious call stack (Ekko PoC):
  ntdll!NtWaitForSingleObject+0x14
  KERNELBASE!WaitForSingleObjectEx+0x8e
  0x00007FF612340058  <-- unbacked memory (beacon code!)
  ???                  <-- unknown return address

Call Stack Unwinding

Stack unwinding tools (like Hunt-Sleeping-Beacons) walk the stack by following the chain of saved RBP values or using unwind metadata (.pdata/.xdata sections) in loaded modules. If a return address points into unbacked memory (memory not backed by a file on disk), it is flagged as suspicious. Ekko's PoC does not address this, leaving return addresses that may point into the implant's memory region or other non-standard locations.

3. Stack Alignment Deep Dive

The x64 ABI is strict about stack alignment. Let us trace through what happens with different RSP values to understand the Rsp -= 8 adjustment in precise detail:

Scenario	RSP Value	Alignment	Result
Captured RSP	`0x...F000`	16-byte aligned	Matches "before CALL" state
After Rsp -= 8	`0x...EFF8`	16-byte aligned - 8	Matches "after CALL pushed return addr"
Function entry	`0x...EFF8`	16-byte aligned - 8	Correct alignment for function
After function's SUB RSP, XX	`0x...EFn0`	16-byte aligned	Local variables properly aligned

If Ekko did not subtract 8, the function would see RSP at 16-byte alignment (the "before CALL" state), which is wrong. Functions like VirtualProtect contain MOVAPS instructions that operate on 16-byte-aligned stack addresses computed relative to RSP. If RSP is off by 8 bytes, these instructions generate an alignment fault (exception 0x11, STATUS_DATATYPE_MISALIGNMENT), crashing the timer thread.

TextWithout Rsp -= 8:
  RSP = 0x...F000  (16-aligned)
  Function does: SUB RSP, 0x28
  RSP = 0x...EFD8  (NOT 16-aligned!)
  MOVAPS [RSP+0x20], XMM6  -->  writes to 0x...EFF8 (ok)
  MOVAPS [RSP+0x10], XMM7  -->  writes to 0x...EFE8 (ok)
  MOVAPS [RSP], XMM8       -->  writes to 0x...EFD8 (misaligned!)
  CRASH: #GP or #AC exception

With Rsp -= 8:
  RSP = 0x...EFF8  (16-aligned - 8)
  Function does: SUB RSP, 0x28
  RSP = 0x...EFD0  (16-aligned!)
  All MOVAPS operations target properly aligned addresses.
  No crash.

4. Gadget Selection for Clean Returns

To fix the return address problem, an improved implementation would place a controlled value at [RSP] before each NtContinue context switch. This value would be a "gadget" — an address in a legitimate DLL that performs a useful action (or does nothing) and returns cleanly:

C// Improved approach: control the return address
// Find a "RET" gadget in ntdll (just returns immediately)
PVOID retGadget = FindGadget(hNtdll, "\xC3", 1);

// Write the gadget address as the return address
// by placing it at the RSP location in the context
*(DWORD64*)(RopProtRW.Rsp) = (DWORD64)retGadget;

// Now when VirtualProtect does RET, it jumps to the
// gadget (a single RET instruction in ntdll), which
// returns cleanly into the timer infrastructure

Common Gadget Types

RET gadget (0xC3) — Simply returns, passing control to the next address on the stack. Useful for "doing nothing" in a clean way.
ADD RSP, XX; RET — Cleans up stack space and returns. Useful for skipping over shadow space or extra arguments.
JMP [RBX] / JMP RAX — Indirect jump gadgets for trampolining through a register-controlled target. Used by FOLIAGE/AceLdr for more complex chains.
POP RCX; RET — Loads a value from the stack into a register and returns. Classical ROP building block.

5. Constructing a Legitimate-Looking Stack

For maximal stealth, the call stack during the sleep window should look like a normal thread that happens to be waiting. Detection tools like Hunt-Sleeping-Beacons look for threads in a wait state whose call stack contains unbacked memory addresses. The ideal sleeping call stack would be:

TextIdeal stealth call stack during sleep:
  ntdll!NtWaitForSingleObject+0x14     (the sleep wait)
  ntdll!RtlRegisterWait+0x??           (timer infrastructure)
  ntdll!TppTimerExpiration+0x??        (thread pool timer)
  ntdll!TppWorkerThread+0x??           (thread pool worker)
  kernel32!BaseThreadInitThunk+0x14    (thread start)
  ntdll!RtlUserThreadStart+0x21       (NT thread entry)

Every address is in a signed, disk-backed system DLL.
No addresses in unbacked/private memory.

Achieving this requires not just controlling the return address but also constructing fake stack frames with proper frame pointers (RBP chain) and unwind metadata that matches the fabricated call stack. This is significantly more complex than Ekko's PoC and is a feature of more advanced implementations.

6. Stack Spoofing Techniques

Several techniques have been developed to create legitimate-looking call stacks:

Technique	Mechanism	Complexity	Used By
Frame Pointer Overwrite	Set RBP to create a chain of frames pointing into legitimate DLLs	Medium	Various custom implants
Synthetic Stack Frames	Write complete fake frames on the stack with return addresses, saved RBP, and shadow space matching real function prologues	High	Advanced sleep obfuscation
Thread Stack Spoofing	Replace the entire thread stack with a fabricated one before sleeping, restore after wakeup	Very High	Research implementations
Unwind Metadata Alignment	Ensure return addresses align with .pdata entries so stack unwinders produce valid results	High	Advanced tooling

Ekko's PoC Does Not Spoof Stacks

Ekko's published proof-of-concept does not implement any stack spoofing. The call stack during sleep will contain addresses from the timer thread's real execution path, which likely includes one or more addresses pointing into the implant's (now encrypted) memory region. This is a detectable artifact. A production implementation should add stack frame construction to hide these addresses.

7. The Shadow Space Requirement

Beyond the return address, Windows API functions expect 32 bytes of "shadow space" above the return address. In a normal call, the caller allocates this. In Ekko's NtContinue-driven approach, this space must exist on the stack:

TextExpected stack layout at function entry:

High addresses
  ...
  [RSP + 0x20]  shadow[3] (R9 home)
  [RSP + 0x18]  shadow[2] (R8 home)
  [RSP + 0x10]  shadow[1] (RDX home)
  [RSP + 0x08]  shadow[0] (RCX home)
  [RSP + 0x00]  return address
Low addresses (RSP points here)

Since the captured RSP points into the timer thread's existing stack, the 32 bytes above the adjusted RSP already contain whatever data was on the timer thread's stack. The shadow space is used by callees to optionally store parameters, and many functions write to it. This means Ekko's timer callbacks may corrupt 32 bytes of the timer thread's stack above each adjusted RSP. In the PoC, this does not cause problems because the timer thread's own state is managed by the kernel and reset between callback invocations.

8. Improving Ekko: Controlled Stack Layout

An improved version of Ekko could allocate a dedicated stack buffer and control its contents precisely:

C// Improved approach: dedicated stack for NtContinue operations
BYTE FakeStack[0x1000] = { 0 };  // 4KB stack buffer
DWORD64 stackTop = (DWORD64)&FakeStack[0x1000];

// Align to 16 bytes
stackTop &= ~0xF;

// For each context, use the controlled stack:
// Place return address at stackTop - 8
*(DWORD64*)(stackTop - 8) = (DWORD64)retGadget;

// Set RSP to stackTop - 8 (simulating post-CALL state)
RopProtRW.Rsp = stackTop - 8;

// Shadow space at stackTop, stackTop+8, stackTop+16, stackTop+24
// is all zeroed - safe for callee to write

// Now:
// 1. RSP is properly aligned (16-byte aligned - 8)
// 2. Return address is a controlled gadget in ntdll
// 3. Shadow space is clean and writable
// 4. No corruption of the timer thread's real stack

Benefits of a Controlled Stack

Predictable return — Each function returns to a known gadget
No stack corruption — Shadow space writes go to a dedicated buffer
Stackable frames — Multiple frames can be constructed for a legitimate-looking backtrace
Debugger-safe — Stack unwinding produces controlled results

9. Call Stack Inspection Tools

Understanding what defenders look for helps build better stack handling. Key tools that inspect sleeping thread stacks:

Tool	Technique	What It Flags
Hunt-Sleeping-Beacons	Enumerates threads in wait state, walks call stack via `StackWalk64`	Return addresses in unbacked (private) memory, anomalous frame chains
BeaconEye	Scans process memory for Cobalt Strike configuration structures	CS config patterns (not stack-specific but complementary)
Patriot	Inspects sleeping thread contexts via `NtGetContextThread`	RIP pointing into unbacked memory while thread is sleeping
Moneta	Scans for anomalous memory regions with unusual permissions	RWX pages, unbacked executable memory (even if encrypted)

Against these tools, Ekko's PoC is vulnerable to all four detection methods because it does not spoof the thread context (Patriot), does not clean the call stack (Hunt-Sleeping-Beacons), leaves the PE headers encrypted but still in a distinctive pattern (BeaconEye pre-sleep), and uses a known timer-queue pattern (Moneta). Advanced implementations must address each of these vectors.

Knowledge Check

Q1: What is the primary stack-related flaw in Ekko's proof-of-concept?

A) The stack is too small for the CONTEXT structures

B) RSP is not aligned to 16 bytes

C) The return address at [RSP] is not controlled, so RET after each API call jumps to an unpredictable location

D) The shadow space is not allocated

Q2: Why do detection tools inspect the call stack of sleeping threads?

A) To check if the thread is using too much CPU

B) Return addresses pointing into unbacked/private memory indicate the thread was executing injected code before sleeping

C) To verify that all DLLs are properly loaded

D) To measure the thread's stack depth for performance analysis

Q3: What would happen if Ekko did NOT perform the Rsp -= 8 adjustment?

A) Target functions would see misaligned RSP, causing MOVAPS instructions to fault with an alignment exception

B) The encryption key would be overwritten

C) NtContinue would refuse to load the context

D) The event would be signaled prematurely

← Prev: Context Manipulation Next: Full Chain, Detection & Variants →