Difficulty: Intermediate

Module 5: SystemFunction032 & Shellcode Mapping

How ShellGhost pre-processes shellcode into per-instruction encrypted chunks and uses Windows' built-in RC4 API for decryption.

Module Objective

Understand ShellGhost's core innovation: the shellcode mapping preprocessing step, and how it enables per-instruction encryption. You will learn how ShellGhost_mapping.py disassembles shellcode and generates CRYPT_BYTES_QUOTA structs, how SystemFunction032 from advapi32.dll provides RC4 encryption/decryption, and why each instruction is encrypted independently (no persistent RC4 state between handler calls). This preprocessing pipeline is the mechanism that makes per-instruction decryption possible.

1. The Shellcode Mapping Innovation

ShellGhost's most important innovation is not the VEH handler itself, but the preprocessing step that makes per-instruction handling possible. A Python script (ShellGhost_mapping.py) analyzes the raw shellcode offline and produces the data structures the VEH handler needs at runtime.

Shellcode Mapping Pipeline

Raw Shellcode
Binary bytes
Disassemble
Capstone / disasm
Map Instructions
RVA + byte count
Encrypt Each Instr
RC4 independently
Output C Arrays
Compile into binary

Why Preprocessing Matters

Without the preprocessing step, the VEH handler would need to determine instruction boundaries at runtime — which is extremely difficult. x86/x64 instructions range from 1 to 15 bytes, and decoding them requires a full disassembler. By disassembling the shellcode offline, ShellGhost pre-computes each instruction's offset and length, storing them in a CRYPT_BYTES_QUOTA structure. At runtime, the handler simply looks up the current instruction index to know exactly how many bytes to decrypt.

2. The CRYPT_BYTES_QUOTA Structure

Each instruction in the shellcode is represented by a CRYPT_BYTES_QUOTA struct containing two fields:

C// Per-instruction mapping structure
typedef struct _CRYPT_BYTES_QUOTA {
    DWORD rva;      // Offset of this instruction from shellcode base
    DWORD quota;    // Number of bytes in this instruction
} CRYPT_BYTES_QUOTA;

Example: Mapping a Simple Shellcode

TextShellcode disassembly:
  Offset  Bytes           Instruction
  0x0000  48 89 E5        mov rbp, rsp
  0x0003  48 83 EC 20     sub rsp, 0x20
  0x0007  31 C9           xor ecx, ecx
  0x0009  FF D0           call rax
  0x000B  C3              ret

Generated CRYPT_BYTES_QUOTA array:
  { 0x0000, 3 },   // mov rbp, rsp      (3 bytes)
  { 0x0003, 4 },   // sub rsp, 0x20     (4 bytes)
  { 0x0007, 2 },   // xor ecx, ecx      (2 bytes)
  { 0x0009, 2 },   // call rax           (2 bytes)
  { 0x000B, 1 },   // ret                (1 byte)

The preprocessing script generates this array as C source code that is compiled directly into the ShellGhost binary. At runtime, instruction N's location and size are simply map[N].rva and map[N].quota.

3. Per-Instruction Independent Encryption

A critical design decision in ShellGhost: each instruction is encrypted independently. There is no persistent RC4 state carried between instructions. Each encryption/decryption operation is a fresh call to SystemFunction032 with the same key but operating on different data.

Why Independent Encryption?

If instructions were encrypted as a continuous stream (like traditional RC4 usage), decrypting instruction N would require processing all bytes of instructions 0 through N-1 first — making random access impossible. By encrypting each instruction independently, ShellGhost can decrypt any instruction at any time with just the key and the instruction's encrypted bytes. This also means that the encrypted bytes for each instruction are self-contained: knowing the key and the ciphertext is sufficient to recover the plaintext, with no dependency on previous instructions.

ApproachState Between InstructionsRandom AccessShellGhost Uses
Continuous RC4 streamS-box + i,j must persistNo (sequential only)No
Independent per-instruction RC4No state neededYes (any instruction)Yes
AES-CTR with counterCounter valueYes (counter = offset)No

4. SystemFunction032: Windows RC4 API

ShellGhost does not implement its own RC4 cipher. Instead, it uses SystemFunction032, an undocumented but well-known function exported by advapi32.dll that performs RC4 encryption/decryption:

C// SystemFunction032 prototype (undocumented, from advapi32.dll)
// Performs RC4 encryption/decryption in-place
typedef NTSTATUS (WINAPI *_SystemFunction032)(
    PUNICODE_STRING Data,   // Buffer to encrypt/decrypt (in-place)
    PUNICODE_STRING Key     // RC4 key
);

// Note: UNICODE_STRING is repurposed here as a generic buffer descriptor:
typedef struct _UNICODE_STRING {
    USHORT Length;           // Number of bytes in the buffer
    USHORT MaximumLength;   // Total buffer capacity
    PWSTR  Buffer;          // Pointer to the data (not actually Unicode)
} UNICODE_STRING;

Why SystemFunction032?

5. Using SystemFunction032

C// Resolve SystemFunction032 at runtime
HMODULE hAdvapi = LoadLibraryA("advapi32.dll");
_SystemFunction032 SystemFunction032 =
    (_SystemFunction032)GetProcAddress(hAdvapi, "SystemFunction032");

// Decrypt one instruction using SystemFunction032
void DecryptInstruction(PBYTE exec_buf, PBYTE enc_data,
                        CRYPT_BYTES_QUOTA *entry, PBYTE key, USHORT key_len) {
    // Copy encrypted bytes to execution buffer
    memcpy(exec_buf + entry->rva, enc_data + entry->rva, entry->quota);

    // Set up the UNICODE_STRING structures
    UNICODE_STRING data_str;
    data_str.Length = (USHORT)entry->quota;
    data_str.MaximumLength = (USHORT)entry->quota;
    data_str.Buffer = (PWSTR)(exec_buf + entry->rva);

    UNICODE_STRING key_str;
    key_str.Length = key_len;
    key_str.MaximumLength = key_len;
    key_str.Buffer = (PWSTR)key;

    // Decrypt in place (RC4 is symmetric: same operation for encrypt/decrypt)
    SystemFunction032(&data_str, &key_str);
}

// Re-encrypt is the same operation applied again
void ReEncryptInstruction(PBYTE exec_buf,
                          CRYPT_BYTES_QUOTA *entry, PBYTE key, USHORT key_len) {
    UNICODE_STRING data_str;
    data_str.Length = (USHORT)entry->quota;
    data_str.MaximumLength = (USHORT)entry->quota;
    data_str.Buffer = (PWSTR)(exec_buf + entry->rva);

    UNICODE_STRING key_str;
    key_str.Length = key_len;
    key_str.MaximumLength = key_len;
    key_str.Buffer = (PWSTR)key;

    SystemFunction032(&data_str, &key_str);
    // After this, the bytes at exec_buf + entry->rva are encrypted again
}

6. The Preprocessing Script

The ShellGhost_mapping.py script performs these steps:

  1. Read the raw shellcode binary file
  2. Disassemble the shellcode using a disassembly engine (e.g., Capstone)
  3. Record each instruction's offset (RVA) and byte count into a CRYPT_BYTES_QUOTA array
  4. Encrypt each instruction's bytes independently using RC4 with the chosen key
  5. Output C source code containing:
    • The encrypted shellcode as a byte array
    • The CRYPT_BYTES_QUOTA mapping array
    • The total number of instructions
C// Example output from ShellGhost_mapping.py (compiled into binary)

// Encrypted shellcode data (each instruction encrypted independently)
unsigned char encrypted_shellcode[] = {
    0xA3, 0x7F, 0x12,              // instruction 0 (3 bytes, encrypted)
    0xBB, 0x44, 0x91, 0xD7,        // instruction 1 (4 bytes, encrypted)
    0x5E, 0xC2,                    // instruction 2 (2 bytes, encrypted)
    0x1A, 0xF8,                    // instruction 3 (2 bytes, encrypted)
    0x9D,                          // instruction 4 (1 byte, encrypted)
};

// Instruction mapping
CRYPT_BYTES_QUOTA shellcode_map[] = {
    { 0x0000, 3 },
    { 0x0003, 4 },
    { 0x0007, 2 },
    { 0x0009, 2 },
    { 0x000B, 1 },
};

#define NUM_INSTRUCTIONS 5

7. RC4 Background: How It Works Internally

While ShellGhost uses SystemFunction032 rather than a custom implementation, understanding RC4's internals helps explain why it works for per-instruction encryption:

RC4 in Brief

Per-Instruction Independence

When SystemFunction032 is called, it runs the full KSA from the key, then generates enough keystream bytes to XOR with the data buffer. Because each instruction is encrypted as a separate call, each starts from a fresh RC4 state initialized from the same key. This means the same key produces the same keystream prefix for every instruction — but since each instruction's ciphertext was produced with this same fresh state, decryption correctly recovers the plaintext. The trade-off is that this is slightly less cryptographically strong than a continuous stream, but for ShellGhost's purpose (defeating automated scanners), it is more than sufficient.

8. Security Considerations

RC4 Is Cryptographically Broken

RC4 has known statistical biases in its keystream (particularly in the first 256 bytes) and is deprecated for use in TLS and other protocols. However, for ShellGhost's use case, this does not matter. The goal is not to protect confidentiality against a dedicated cryptanalyst — the goal is to prevent automated memory scanners from recognizing shellcode patterns. Using the Windows-provided SystemFunction032 also means ShellGhost avoids embedding custom crypto code. An analyst who captures the key can decrypt the shellcode regardless of which cipher is used.

Knowledge Check

Q1: What does ShellGhost_mapping.py produce?

A) A DLL that performs encryption at runtime
B) C arrays of per-instruction encrypted data and CRYPT_BYTES_QUOTA mappings
C) An executable shellcode loader
D) A YARA rule for the shellcode

Q2: How does ShellGhost handle RC4 state between instruction decryptions?

A) It maintains a persistent S-box that advances with each instruction
B) It saves and restores RC4 state to disk between calls
C) There is no persistent state — each instruction is encrypted/decrypted independently with a fresh RC4 call
D) It uses AES instead of RC4 for state management

Q3: What Windows API does ShellGhost use for RC4 encryption/decryption?

A) SystemFunction032 from advapi32.dll
B) CryptEncrypt from crypt32.dll
C) BCryptEncrypt from bcrypt.dll
D) A custom RC4 implementation compiled into the binary