Difficulty: Intermediate

Module 6: Context Manipulation

Capturing, cloning, and weaponizing the CONTEXT structure for controlled execution.

Module Objective

Deep dive into how Ekko captures the timer thread's context with RtlCaptureContext, why each CONTEXT is cloned from this baseline, what the Rsp -= 8 adjustment does, and how RIP control turns a data structure into an execution primitive. This module covers the precise mechanics that make Ekko's context-oriented programming work.

1. RtlCaptureContext — Capturing Thread State

RtlCaptureContext is an ntdll function that fills a CONTEXT structure with the current thread's register state at the point of the call. It is defined as:

C// RtlCaptureContext - captures current thread state
// Exported by ntdll.dll
//
// VOID RtlCaptureContext(
//     PCONTEXT ContextRecord  // Output: filled with current registers
// );
//
// This function captures:
//   - RIP (pointing to the instruction after the call)
//   - RSP (current stack pointer)
//   - All general-purpose registers
//   - RFLAGS
//   - Segment registers
//   - XMM registers

Ekko uses RtlCaptureContext as the callback for Timer 0, passing &CtxThread as the parameter. Since the timer fires on the timer thread (due to WT_EXECUTEINTIMERTHREAD), the captured context reflects the timer thread's register state — not the main thread's.

Why the Timer Thread's Context Matters

The captured context must come from the timer thread because all subsequent NtContinue calls execute on that same thread. If Ekko captured the main thread's context instead, the RSP would point to the main thread's stack, segment registers might differ, and NtContinue would corrupt the timer thread's state. By capturing from the timer thread itself, Ekko ensures all cloned contexts have a stack pointer, segment registers, and flags that are valid for the thread that will actually execute them.

2. What RtlCaptureContext Captures

The captured CONTEXT includes these key fields that Ekko depends on:

Register/FieldCaptured ValueEkko's Use
RIPAddress after RtlCaptureContext callOverwritten with target API address
RSPTimer thread's current stack pointerAdjusted with Rsp -= 8 for alignment
SegCsCode segment selector (0x33 for x64)Kept as-is — must be valid for x64 execution
SegSsStack segment selectorKept as-is
RFLAGSCurrent processor flagsKept as-is — must be reasonable for API execution
RCX, RDX, R8, R9Current values (unimportant)Overwritten with API arguments
MxCsrSSE control registerKept as-is — prevents floating-point exceptions

The Capture Timing Issue

There is a subtle race condition in Ekko's PoC. After Timer 0 fires (calling RtlCaptureContext), the main thread waits for 50ms (WaitForSingleObject(hEvent, 0x32)) before reading the captured context. This 50ms window is a heuristic — if the timer takes longer than 50ms to fire and complete, the context may not be fully written when the main thread reads it. A more robust approach would use a separate synchronization event to confirm the capture completed.

3. Cloning the Baseline Context

After capturing the baseline, Ekko clones it into all six operational contexts:

Cmemcpy( &RopProtRW, &CtxThread, sizeof(CONTEXT) );
memcpy( &RopMemEnc, &CtxThread, sizeof(CONTEXT) );
memcpy( &RopDelay,  &CtxThread, sizeof(CONTEXT) );
memcpy( &RopMemDec, &CtxThread, sizeof(CONTEXT) );
memcpy( &RopProtRX, &CtxThread, sizeof(CONTEXT) );
memcpy( &RopSetEvt, &CtxThread, sizeof(CONTEXT) );

Each memcpy copies all 1232 bytes of the CONTEXT structure. After cloning, each operational context is an exact copy of the timer thread's state. Ekko then selectively modifies only the registers it needs:

The Minimal Modification Principle

Ekko modifies the fewest registers possible in each context:

All other registers (segment selectors, flags, XMM state, etc.) are inherited from the baseline capture. This minimizes the chance of an invalid state causing a crash or exception.

4. RSP Pivoting: The Rsp -= 8 Adjustment

Every operational context includes this adjustment:

CRopProtRW.Rsp -= 8;
RopMemEnc.Rsp -= 8;
RopDelay.Rsp  -= 8;
RopMemDec.Rsp -= 8;
RopProtRX.Rsp -= 8;
RopSetEvt.Rsp -= 8;

This single line is critical and easy to overlook. To understand why it is necessary, we need to understand the x64 stack alignment requirement and the call instruction's behavior:

The x64 ABI Stack Alignment Rule

The Microsoft x64 calling convention requires that at the point of a CALL instruction, RSP must be 16-byte aligned. The CALL instruction itself pushes an 8-byte return address onto the stack, making RSP 16-byte-aligned-minus-8 at function entry. Functions expect this alignment and may use SSE instructions (like MOVAPS) that require 16-byte-aligned operands.

Stack Alignment During a Normal CALL

Before CALL
RSP = 0x...0 (16-aligned)
CALL pushes RIP
RSP -= 8
At Function Entry
RSP = 0x...8 (16-aligned - 8)

Why Ekko Needs the Adjustment

When NtContinue restores a context, it sets RSP to whatever value is in the CONTEXT structure and sets RIP to the target function. But NtContinue does not execute a CALL instruction — it directly sets the registers. This means no return address is pushed onto the stack.

The captured RSP from RtlCaptureContext reflects the stack state at capture time. By subtracting 8, Ekko simulates the effect of a CALL instruction having pushed a return address. The target function sees RSP at the expected alignment (16-byte-aligned minus 8) and operates correctly:

TextCaptured RSP:     0x00000010AA00F000  (16-byte aligned)
After Rsp -= 8:   0x00000010AA00EFF8  (16-byte aligned - 8)

This matches what a function expects after a CALL instruction.
Without the adjustment, MOVAPS and other aligned SSE instructions
inside VirtualProtect or SystemFunction032 could fault.

The 8-Byte Slot

The Rsp -= 8 also reserves space where a return address would normally be. When the target function executes RET, it pops 8 bytes from this location. The value at that address depends on whatever was on the timer thread's stack at that offset — this is one of Ekko's known imperfections. In the PoC, the return from each API call goes to whatever address happens to be at that stack location. The timer infrastructure handles recovering from this, but it is not a clean return path. Module 7 covers this in detail.

5. RIP Control — Directing Execution

Setting RIP in the CONTEXT is the most fundamental operation in Ekko's technique. It determines where execution goes after NtContinue restores the context:

C// Each context's RIP points to a different Windows API:
RopProtRW.Rip = (DWORD64)VirtualProtect;      // kernel32.dll
RopMemEnc.Rip = (DWORD64)SysFunc032;          // advapi32.dll
RopDelay.Rip  = (DWORD64)WaitForSingleObject; // kernel32.dll
RopMemDec.Rip = (DWORD64)SysFunc032;          // advapi32.dll
RopProtRX.Rip = (DWORD64)VirtualProtect;      // kernel32.dll
RopSetEvt.Rip = (DWORD64)SetEvent;            // kernel32.dll

All target functions reside in system DLLs that are mapped at fixed addresses (with ASLR per-boot). Since these DLLs are not part of the implant's image, they remain functional even after the image is encrypted and marked non-executable. This is the key insight that makes the entire technique work.

6. Argument Setup via Registers

The x64 calling convention passes the first four integer/pointer arguments in RCX, RDX, R8, and R9. Ekko sets these registers in each CONTEXT to provide the correct arguments to each target function:

TimerFunctionRCX (arg1)RDX (arg2)R8 (arg3)R9 (arg4)
1VirtualProtectImageBaseImageSizePAGE_READWRITE&OldProtect
2SystemFunction032&Img&Keyunusedunused
3WaitForSingleObjectNtCurrentProcess()SleepTimeunusedunused
4SystemFunction032&Img&Keyunusedunused
5VirtualProtectImageBaseImageSizePAGE_EXECUTE_READWRITE&OldProtect
6SetEventhEventunusedunusedunused

Unused Registers

For functions that take fewer than four arguments (SystemFunction032 takes 2, SetEvent takes 1), the R8 and R9 registers retain their values from the baseline capture. These leftover values are ignored by the target function since it only reads the registers it needs.

7. Stack Frame Considerations

Beyond the RSP alignment, the x64 calling convention requires a 32-byte "shadow space" (also called "home space") on the stack above the return address. This space is reserved by the caller for the callee to optionally store parameters:

TextStack layout expected by a function on entry:

RSP + 0x28    [5th argument, if any]
RSP + 0x20    [shadow space for R9]
RSP + 0x18    [shadow space for R8]
RSP + 0x10    [shadow space for RDX]
RSP + 0x08    [shadow space for RCX]
RSP + 0x00    [return address]     <-- RSP points here

Ekko's Rsp -= 8 accounts for the return address slot. The shadow space above it already exists on the timer thread's stack from the captured state. As long as the timer thread's stack has at least 32 bytes of usable space above the adjusted RSP, the target functions have valid shadow space. Since the timer thread has a full-sized stack, this is not a problem in practice.

8. Context Lifetime & Stack Variables

A critical detail is that all CONTEXT structures, USTRING descriptors, the key buffer, and OldProtect are local variables in EkkoObf. They live on the main thread's stack frame:

CVOID EkkoObf( DWORD SleepTime )
{
    CONTEXT CtxThread   = { 0 };   // Stack variable
    CONTEXT RopProtRW   = { 0 };   // Stack variable
    // ... all contexts on main thread's stack

    CHAR    KeyBuf[16]  = { ... }; // Stack variable
    USTRING Key         = { 0 };   // Stack variable
    USTRING Img         = { 0 };   // Stack variable
    DWORD   OldProtect  = 0;       // Stack variable

    // ... setup and queue timers ...

    WaitForSingleObject( hEvent, INFINITE );  // BLOCKS HERE
    // While blocked, stack frame is preserved
    // All stack variables remain valid

    DeleteTimerQueue( hTimerQueue );
}  // Stack frame destroyed AFTER timers complete

Why This Works

The main thread blocks on WaitForSingleObject until Timer 6 signals the event. Because the main thread is blocked (not returned), its stack frame is preserved. The CONTEXT structures, USTRING pointers, and OldProtect variable remain at valid memory addresses for the entire duration of the timer chain. If EkkoObf returned before the timers completed, these stack variables would be destroyed, and the timer callbacks would read/write invalid memory — a use-after-free bug.

9. Visualizing the Memory Layout

Memory Relationships During Timer Chain

Main Thread Stack
CtxThread, RopProtRW...
Key, Img, OldProtect
(all valid while blocked)
← reads
Timer Thread
NtContinue loads CONTEXT
from main thread's stack
→ calls
System DLLs
VirtualProtect, SysFunc032
SetEvent, WaitForSingleObject

The timer thread reads CONTEXT structures from the main thread's stack, then NtContinue redirects execution into system DLL functions. The system functions operate on the process image (encrypting/decrypting, changing permissions) and the event handle. All of these exist in process-global memory and are accessible from any thread.

Knowledge Check

Q1: Why does Ekko subtract 8 from RSP in each operational context?

A) To allocate space for local variables in the target function
B) To simulate the effect of a CALL instruction pushing a return address, maintaining the expected stack alignment
C) To store the RC4 encryption key on the stack
D) To prevent stack overflow during the timer chain

Q2: Why must the context be captured from the timer thread specifically?

A) Because NtContinue executes on the timer thread, so the context must have valid state (RSP, segments) for that thread
B) The main thread's context is encrypted and inaccessible
C) RtlCaptureContext only works on timer threads
D) The timer thread has more registers available than the main thread

Q3: What would happen if EkkoObf returned before the timer chain completed?

A) The timers would be automatically cancelled
B) The timers would continue working normally using cached copies
C) The timer callbacks would access destroyed stack variables, causing a use-after-free crash
D) Windows would block the return until timers complete