Difficulty: Beginner

Module 2: XOR Encryption for Memory Evasion

Why XOR is the ideal cipher for in-place memory toggling, how XOR32 works, and its performance characteristics.

Module Objective

Understand the properties of XOR that make it uniquely suited for in-memory encryption, the difference between single-byte and multi-byte XOR keys, how ShellcodeFluctuation generates and applies a 32-bit XOR key, and why performance matters for stealth.

1. Why XOR for In-Memory Encryption?

ShellcodeFluctuation uses XOR encryption rather than AES, RC4, or other ciphers. This is not laziness — XOR has specific properties that make it ideal for this use case:

Property	Why It Matters
Self-inverse (involution)	`A ^ K ^ K = A` — the same function encrypts and decrypts. No need for separate encrypt/decrypt code paths
In-place operation	XOR modifies data directly without requiring a separate output buffer. No additional memory allocation needed
No state	Each byte is processed independently — no initialization vectors, no cipher state, no block chaining
Single instruction	XOR compiles to a single x86 instruction per operation. Minimal CPU overhead
No imports	No cryptographic library dependencies. The encryption engine is a few lines of code
Zero expansion	Output is exactly the same size as input — no padding, no headers, no ciphertext expansion

The Key Insight: Toggling

ShellcodeFluctuation needs to rapidly toggle memory between encrypted and decrypted states hundreds of times during an implant's lifetime. XOR's self-inverse property means the exact same code path handles both directions — calling the function once encrypts, calling it again decrypts. This simplicity reduces the code surface area and eliminates an entire class of bugs.

2. Single-Byte vs Multi-Byte XOR

The key length dramatically affects the security of XOR encryption. Understanding this distinction explains why ShellcodeFluctuation uses a 32-bit (4-byte) key.

2.1 Single-Byte XOR (8-bit key)

// Single-byte XOR - trivially breakable
void xor_single(BYTE* data, SIZE_T len, BYTE key) {
    for (SIZE_T i = 0; i < len; i++) {
        data[i] ^= key;
    }
}
// Only 256 possible keys - brute force in microseconds
// BeaconEye does exactly this to find Beacon configs

With only 256 possible keys, any tool can try all of them in microseconds. This is how BeaconEye defeats Cobalt Strike's built-in configuration obfuscation — it simply tries every single-byte XOR key against each memory region and checks if the result matches a Beacon config signature.

2.2 Multi-Byte XOR (XOR32)

// ShellcodeFluctuation uses XOR32 - a 4-byte key
// This is the actual approach from the repository
void xor32(BYTE* data, SIZE_T len, DWORD key) {
    // Process 4 bytes at a time for performance
    DWORD* ptr = (DWORD*)data;
    SIZE_T dwordCount = len / sizeof(DWORD);

    for (SIZE_T i = 0; i < dwordCount; i++) {
        ptr[i] ^= key;
    }

    // Handle remaining bytes (0-3 trailing bytes)
    BYTE* keyBytes = (BYTE*)&key;
    SIZE_T remainder = len % sizeof(DWORD);
    BYTE* tail = data + (dwordCount * sizeof(DWORD));

    for (SIZE_T i = 0; i < remainder; i++) {
        tail[i] ^= keyBytes[i];
    }
}

Key Size	Keyspace	Brute Force Time	Sufficient for Sleep Evasion?
1 byte (8-bit)	256	Microseconds	No — BeaconEye breaks it instantly
4 bytes (32-bit)	~4.3 billion	Minutes to hours	Yes — impractical for real-time scanning
8 bytes (64-bit)	~1.8 x 10^19	Years	Overkill for this use case

Security Context

XOR32 is not cryptographically secure — it is vulnerable to known-plaintext attacks if an attacker knows what part of the shellcode looks like (e.g., a known MZ header at offset 0). However, for the threat model of ShellcodeFluctuation, it does not need to be. The goal is to defeat automated real-time scanners, not withstand dedicated cryptanalysis. The shellcode is only encrypted during the sleep window, and the key changes each time the implant loads.

3. Key Generation

ShellcodeFluctuation generates the XOR key at runtime, ensuring each execution uses a different key. This prevents static signatures based on known ciphertext:

// Key generation approach
DWORD generateXorKey() {
    DWORD key = 0;

    // Use a random seed - could be from:
    // - GetTickCount()
    // - __rdtsc() (CPU timestamp counter)
    // - RtlRandomEx()
    // - CryptGenRandom() / BCryptGenRandom()

    // Simple but effective for this threat model:
    srand(GetTickCount());
    key = (rand() << 16) | rand();

    // Avoid degenerate keys
    if (key == 0) key = 0xDEADBEEF;

    return key;
}

Why Not a Hardcoded Key?

A hardcoded key would mean every execution produces the same ciphertext. A defender who reverses the tool once could compute the expected encrypted bytes and write a YARA rule matching them. Runtime-generated keys ensure the ciphertext is different every time, defeating static signature approaches.

4. The XOR Toggle Pattern

Because XOR is self-inverse, ShellcodeFluctuation uses the exact same function for both encryption and decryption. Here is the conceptual toggle pattern:

// The toggle pattern - same call encrypts OR decrypts
DWORD xorKey = generateXorKey();  // Generated once at startup

// Before sleep: encrypt (plaintext -> ciphertext)
xor32(shellcodeBase, shellcodeSize, xorKey);
// shellcode region now contains encrypted gibberish

// ... sleep occurs ...

// After wake: decrypt (ciphertext -> plaintext)
xor32(shellcodeBase, shellcodeSize, xorKey);
// shellcode region now contains executable code again

XOR Toggle Cycle

Plaintext
Executable shellcode

XOR(key) →

Ciphertext
Encrypted gibberish

XOR(key) →

Plaintext
Executable shellcode

5. Performance Considerations

Speed matters for stealth. The encryption/decryption operation happens on every sleep cycle — for a beacon sleeping every 60 seconds, that is once per minute. If the XOR operation is slow, it creates a measurable delay between when Sleep is called and when actual sleeping begins.

// Performance analysis for typical shellcode sizes
//
// Cobalt Strike Beacon:    ~300 KB shellcode
// Meterpreter stage:       ~200 KB shellcode
// Sliver implant:          ~10-15 MB (larger, Go-based)
//
// XOR32 throughput on modern x86-64:
//   - Processes 4 bytes per iteration
//   - ~1 cycle per DWORD XOR (pipelined)
//   - At 3 GHz: ~3 billion DWORDs/sec = ~12 GB/sec
//
// For 300 KB Beacon:
//   - 300,000 / 4 = 75,000 iterations
//   - 75,000 / 3,000,000,000 = 0.000025 seconds = 25 microseconds
//
// Conclusion: XOR32 encryption of typical shellcode takes
// ~25 microseconds - completely negligible

Shellcode Size	XOR32 Time (approx.)	Overhead per Cycle
100 KB	~8 us	Negligible
300 KB (Beacon)	~25 us	Negligible
1 MB	~83 us	Negligible
10 MB (Sliver)	~830 us	Still under 1ms

Comparison with AES and RC4

AES-256-CBC processes data at ~1-3 GB/sec in software (without AES-NI) and ~10+ GB/sec with AES-NI hardware acceleration. RC4 (used by Ekko via SystemFunction032) runs at ~2-5 GB/sec. XOR32 at ~12 GB/sec is the fastest option and requires zero library imports. For the threat model of sleep-time encryption, the speed difference between AES and XOR is irrelevant, but the zero-import advantage of XOR is significant.

6. Memory Alignment Considerations

The XOR32 implementation processes data in 4-byte (DWORD) chunks. This works correctly when the data pointer is DWORD-aligned, which is guaranteed by VirtualAlloc (it returns page-aligned addresses, which are always DWORD-aligned). However, the trailing bytes must be handled separately:

// Alignment-safe XOR32 implementation
void xor32Safe(BYTE* data, SIZE_T len, DWORD key) {
    BYTE keyBytes[4];
    memcpy(keyBytes, &key, 4);

    // Process bulk as DWORDs (safe because VirtualAlloc
    // returns page-aligned = 4K-aligned addresses)
    SIZE_T i = 0;
    for (; i + 3 < len; i += 4) {
        *(DWORD*)(data + i) ^= key;
    }

    // Handle 0-3 remaining bytes
    for (; i < len; i++) {
        data[i] ^= keyBytes[i % 4];
    }
}

The trailing-byte handling ensures that shellcode regions whose size is not a multiple of 4 are still correctly encrypted and decrypted. Since XOR is applied byte-by-byte for the remainder using the corresponding key byte, the self-inverse property is preserved.

7. Visualizing XOR on Shellcode

Before and After XOR32 Encryption

// Original shellcode (Beacon stub) - recognizable patterns:
FC 48 83 E4 F0 E8 C0 00  00 00 41 51 41 50 52 51
56 48 31 D2 65 48 8B 52  60 48 8B 52 18 48 8B 52

// After XOR32 with key 0xA7B3C9D5 - appears random:
5B FB 4A 31 57 5B 09 D5  A7 B3 88 A4 E6 E3 9B 84
F1 FB F8 07 C2 FB 42 87  C7 FB 42 87 BF FB 42 87

// XOR again with same key - original restored:
FC 48 83 E4 F0 E8 C0 00  00 00 41 51 41 50 52 51
56 48 31 D2 65 48 8B 52  60 48 8B 52 18 48 8B 52

The original bytes contain recognizable patterns (the FC 48 83 E4 F0 E8 sequence is a well-known Cobalt Strike shellcode prologue). After XOR32, the bytes appear random and match no known signatures.

8. Why Not AES or RC4?

While stronger ciphers exist, they introduce unnecessary complexity for this threat model:

Factor	XOR32	RC4 (SystemFunction032)	AES-256
Imports needed	None	`advapi32!SystemFunction032`	`bcrypt.dll` or `advapi32.dll`
Code size	~10 lines	~5 lines + DLL load	~50+ lines + DLL load
In-place operation	Native	Native (RC4 is stream cipher)	Requires mode (CTR/CBC) for in-place
Key/IV setup	None	Key schedule per call	Key expansion + IV management
Detectability of import	No import to detect	`SystemFunction032` in IAT is an IOC	Crypto API usage may be flagged
Sufficient for sleep evasion?	Yes	Yes (overkill)	Yes (massive overkill)

Ekko's Choice vs Fluctuation's Choice

Ekko uses RC4 via SystemFunction032 because it chains operations through timer callbacks and needs a Windows API function it can pass as a callback. ShellcodeFluctuation uses XOR32 because it runs its own code in the MySleep hook handler, where it can directly call any custom function. The architectural difference drives the cipher choice.

Knowledge Check

Q1: Why is XOR's self-inverse property critical for ShellcodeFluctuation?

A) The same function and key encrypt and decrypt, eliminating separate code paths

B) It makes XOR cryptographically unbreakable

C) It allows XOR to work without a key

D) It enables parallel processing across multiple CPU cores

Q2: Why does ShellcodeFluctuation use a 32-bit XOR key instead of a single-byte key?

A) Single-byte XOR is slower than 32-bit XOR

B) 32-bit keys produce smaller ciphertext

C) A single-byte key has only 256 possibilities, which scanners like BeaconEye brute-force instantly

D) Windows requires 32-bit aligned encryption

Q3: How long does XOR32 encryption of a typical 300 KB Cobalt Strike Beacon take?

A) About 100 milliseconds

B) About 25 microseconds

C) About 5 seconds

D) About 1 millisecond

← Prev: Memory Scanning Threat Model Next: VirtualProtect & Page Permissions →