Module 6: Context Manipulation
Capturing, cloning, and weaponizing the CONTEXT structure for controlled execution.
Module Objective
Deep dive into how Ekko captures the timer thread's context with RtlCaptureContext, why each CONTEXT is cloned from this baseline, what the Rsp -= 8 adjustment does, and how RIP control turns a data structure into an execution primitive. This module covers the precise mechanics that make Ekko's context-oriented programming work.
1. RtlCaptureContext — Capturing Thread State
RtlCaptureContext is an ntdll function that fills a CONTEXT structure with the current thread's register state at the point of the call. It is defined as:
C// RtlCaptureContext - captures current thread state
// Exported by ntdll.dll
//
// VOID RtlCaptureContext(
// PCONTEXT ContextRecord // Output: filled with current registers
// );
//
// This function captures:
// - RIP (pointing to the instruction after the call)
// - RSP (current stack pointer)
// - All general-purpose registers
// - RFLAGS
// - Segment registers
// - XMM registers
Ekko uses RtlCaptureContext as the callback for Timer 0, passing &CtxThread as the parameter. Since the timer fires on the timer thread (due to WT_EXECUTEINTIMERTHREAD), the captured context reflects the timer thread's register state — not the main thread's.
Why the Timer Thread's Context Matters
The captured context must come from the timer thread because all subsequent NtContinue calls execute on that same thread. If Ekko captured the main thread's context instead, the RSP would point to the main thread's stack, segment registers might differ, and NtContinue would corrupt the timer thread's state. By capturing from the timer thread itself, Ekko ensures all cloned contexts have a stack pointer, segment registers, and flags that are valid for the thread that will actually execute them.
2. What RtlCaptureContext Captures
The captured CONTEXT includes these key fields that Ekko depends on:
| Register/Field | Captured Value | Ekko's Use |
|---|---|---|
| RIP | Address after RtlCaptureContext call | Overwritten with target API address |
| RSP | Timer thread's current stack pointer | Adjusted with Rsp -= 8 for alignment |
| SegCs | Code segment selector (0x33 for x64) | Kept as-is — must be valid for x64 execution |
| SegSs | Stack segment selector | Kept as-is |
| RFLAGS | Current processor flags | Kept as-is — must be reasonable for API execution |
| RCX, RDX, R8, R9 | Current values (unimportant) | Overwritten with API arguments |
| MxCsr | SSE control register | Kept as-is — prevents floating-point exceptions |
The Capture Timing Issue
There is a subtle race condition in Ekko's PoC. After Timer 0 fires (calling RtlCaptureContext), the main thread waits for 50ms (WaitForSingleObject(hEvent, 0x32)) before reading the captured context. This 50ms window is a heuristic — if the timer takes longer than 50ms to fire and complete, the context may not be fully written when the main thread reads it. A more robust approach would use a separate synchronization event to confirm the capture completed.
3. Cloning the Baseline Context
After capturing the baseline, Ekko clones it into all six operational contexts:
Cmemcpy( &RopProtRW, &CtxThread, sizeof(CONTEXT) );
memcpy( &RopMemEnc, &CtxThread, sizeof(CONTEXT) );
memcpy( &RopDelay, &CtxThread, sizeof(CONTEXT) );
memcpy( &RopMemDec, &CtxThread, sizeof(CONTEXT) );
memcpy( &RopProtRX, &CtxThread, sizeof(CONTEXT) );
memcpy( &RopSetEvt, &CtxThread, sizeof(CONTEXT) );
Each memcpy copies all 1232 bytes of the CONTEXT structure. After cloning, each operational context is an exact copy of the timer thread's state. Ekko then selectively modifies only the registers it needs:
The Minimal Modification Principle
Ekko modifies the fewest registers possible in each context:
- RIP — Set to the target function address
- RCX, RDX, R8, R9 — Set to the function arguments
- RSP — Decremented by 8 for alignment
All other registers (segment selectors, flags, XMM state, etc.) are inherited from the baseline capture. This minimizes the chance of an invalid state causing a crash or exception.
4. RSP Pivoting: The Rsp -= 8 Adjustment
Every operational context includes this adjustment:
CRopProtRW.Rsp -= 8;
RopMemEnc.Rsp -= 8;
RopDelay.Rsp -= 8;
RopMemDec.Rsp -= 8;
RopProtRX.Rsp -= 8;
RopSetEvt.Rsp -= 8;
This single line is critical and easy to overlook. To understand why it is necessary, we need to understand the x64 stack alignment requirement and the call instruction's behavior:
The x64 ABI Stack Alignment Rule
The Microsoft x64 calling convention requires that at the point of a CALL instruction, RSP must be 16-byte aligned. The CALL instruction itself pushes an 8-byte return address onto the stack, making RSP 16-byte-aligned-minus-8 at function entry. Functions expect this alignment and may use SSE instructions (like MOVAPS) that require 16-byte-aligned operands.
Stack Alignment During a Normal CALL
RSP = 0x...0 (16-aligned)
RSP -= 8
RSP = 0x...8 (16-aligned - 8)
Why Ekko Needs the Adjustment
When NtContinue restores a context, it sets RSP to whatever value is in the CONTEXT structure and sets RIP to the target function. But NtContinue does not execute a CALL instruction — it directly sets the registers. This means no return address is pushed onto the stack.
The captured RSP from RtlCaptureContext reflects the stack state at capture time. By subtracting 8, Ekko simulates the effect of a CALL instruction having pushed a return address. The target function sees RSP at the expected alignment (16-byte-aligned minus 8) and operates correctly:
TextCaptured RSP: 0x00000010AA00F000 (16-byte aligned)
After Rsp -= 8: 0x00000010AA00EFF8 (16-byte aligned - 8)
This matches what a function expects after a CALL instruction.
Without the adjustment, MOVAPS and other aligned SSE instructions
inside VirtualProtect or SystemFunction032 could fault.
The 8-Byte Slot
The Rsp -= 8 also reserves space where a return address would normally be. When the target function executes RET, it pops 8 bytes from this location. The value at that address depends on whatever was on the timer thread's stack at that offset — this is one of Ekko's known imperfections. In the PoC, the return from each API call goes to whatever address happens to be at that stack location. The timer infrastructure handles recovering from this, but it is not a clean return path. Module 7 covers this in detail.
5. RIP Control — Directing Execution
Setting RIP in the CONTEXT is the most fundamental operation in Ekko's technique. It determines where execution goes after NtContinue restores the context:
C// Each context's RIP points to a different Windows API:
RopProtRW.Rip = (DWORD64)VirtualProtect; // kernel32.dll
RopMemEnc.Rip = (DWORD64)SysFunc032; // advapi32.dll
RopDelay.Rip = (DWORD64)WaitForSingleObject; // kernel32.dll
RopMemDec.Rip = (DWORD64)SysFunc032; // advapi32.dll
RopProtRX.Rip = (DWORD64)VirtualProtect; // kernel32.dll
RopSetEvt.Rip = (DWORD64)SetEvent; // kernel32.dll
All target functions reside in system DLLs that are mapped at fixed addresses (with ASLR per-boot). Since these DLLs are not part of the implant's image, they remain functional even after the image is encrypted and marked non-executable. This is the key insight that makes the entire technique work.
6. Argument Setup via Registers
The x64 calling convention passes the first four integer/pointer arguments in RCX, RDX, R8, and R9. Ekko sets these registers in each CONTEXT to provide the correct arguments to each target function:
| Timer | Function | RCX (arg1) | RDX (arg2) | R8 (arg3) | R9 (arg4) |
|---|---|---|---|---|---|
| 1 | VirtualProtect | ImageBase | ImageSize | PAGE_READWRITE | &OldProtect |
| 2 | SystemFunction032 | &Img | &Key | unused | unused |
| 3 | WaitForSingleObject | NtCurrentProcess() | SleepTime | unused | unused |
| 4 | SystemFunction032 | &Img | &Key | unused | unused |
| 5 | VirtualProtect | ImageBase | ImageSize | PAGE_EXECUTE_READWRITE | &OldProtect |
| 6 | SetEvent | hEvent | unused | unused | unused |
Unused Registers
For functions that take fewer than four arguments (SystemFunction032 takes 2, SetEvent takes 1), the R8 and R9 registers retain their values from the baseline capture. These leftover values are ignored by the target function since it only reads the registers it needs.
7. Stack Frame Considerations
Beyond the RSP alignment, the x64 calling convention requires a 32-byte "shadow space" (also called "home space") on the stack above the return address. This space is reserved by the caller for the callee to optionally store parameters:
TextStack layout expected by a function on entry:
RSP + 0x28 [5th argument, if any]
RSP + 0x20 [shadow space for R9]
RSP + 0x18 [shadow space for R8]
RSP + 0x10 [shadow space for RDX]
RSP + 0x08 [shadow space for RCX]
RSP + 0x00 [return address] <-- RSP points here
Ekko's Rsp -= 8 accounts for the return address slot. The shadow space above it already exists on the timer thread's stack from the captured state. As long as the timer thread's stack has at least 32 bytes of usable space above the adjusted RSP, the target functions have valid shadow space. Since the timer thread has a full-sized stack, this is not a problem in practice.
8. Context Lifetime & Stack Variables
A critical detail is that all CONTEXT structures, USTRING descriptors, the key buffer, and OldProtect are local variables in EkkoObf. They live on the main thread's stack frame:
CVOID EkkoObf( DWORD SleepTime )
{
CONTEXT CtxThread = { 0 }; // Stack variable
CONTEXT RopProtRW = { 0 }; // Stack variable
// ... all contexts on main thread's stack
CHAR KeyBuf[16] = { ... }; // Stack variable
USTRING Key = { 0 }; // Stack variable
USTRING Img = { 0 }; // Stack variable
DWORD OldProtect = 0; // Stack variable
// ... setup and queue timers ...
WaitForSingleObject( hEvent, INFINITE ); // BLOCKS HERE
// While blocked, stack frame is preserved
// All stack variables remain valid
DeleteTimerQueue( hTimerQueue );
} // Stack frame destroyed AFTER timers complete
Why This Works
The main thread blocks on WaitForSingleObject until Timer 6 signals the event. Because the main thread is blocked (not returned), its stack frame is preserved. The CONTEXT structures, USTRING pointers, and OldProtect variable remain at valid memory addresses for the entire duration of the timer chain. If EkkoObf returned before the timers completed, these stack variables would be destroyed, and the timer callbacks would read/write invalid memory — a use-after-free bug.
9. Visualizing the Memory Layout
Memory Relationships During Timer Chain
CtxThread, RopProtRW...
Key, Img, OldProtect
(all valid while blocked)
NtContinue loads CONTEXT
from main thread's stack
VirtualProtect, SysFunc032
SetEvent, WaitForSingleObject
The timer thread reads CONTEXT structures from the main thread's stack, then NtContinue redirects execution into system DLL functions. The system functions operate on the process image (encrypting/decrypting, changing permissions) and the event handle. All of these exist in process-global memory and are accessible from any thread.
Knowledge Check
Q1: Why does Ekko subtract 8 from RSP in each operational context?
Q2: Why must the context be captured from the timer thread specifically?
Q3: What would happen if EkkoObf returned before the timer chain completed?