Module 2: XOR Encryption for Memory Evasion
Why XOR is the ideal cipher for in-place memory toggling, how XOR32 works, and its performance characteristics.
Module Objective
Understand the properties of XOR that make it uniquely suited for in-memory encryption, the difference between single-byte and multi-byte XOR keys, how ShellcodeFluctuation generates and applies a 32-bit XOR key, and why performance matters for stealth.
1. Why XOR for In-Memory Encryption?
ShellcodeFluctuation uses XOR encryption rather than AES, RC4, or other ciphers. This is not laziness — XOR has specific properties that make it ideal for this use case:
| Property | Why It Matters |
|---|---|
| Self-inverse (involution) | A ^ K ^ K = A — the same function encrypts and decrypts. No need for separate encrypt/decrypt code paths |
| In-place operation | XOR modifies data directly without requiring a separate output buffer. No additional memory allocation needed |
| No state | Each byte is processed independently — no initialization vectors, no cipher state, no block chaining |
| Single instruction | XOR compiles to a single x86 instruction per operation. Minimal CPU overhead |
| No imports | No cryptographic library dependencies. The encryption engine is a few lines of code |
| Zero expansion | Output is exactly the same size as input — no padding, no headers, no ciphertext expansion |
The Key Insight: Toggling
ShellcodeFluctuation needs to rapidly toggle memory between encrypted and decrypted states hundreds of times during an implant's lifetime. XOR's self-inverse property means the exact same code path handles both directions — calling the function once encrypts, calling it again decrypts. This simplicity reduces the code surface area and eliminates an entire class of bugs.
2. Single-Byte vs Multi-Byte XOR
The key length dramatically affects the security of XOR encryption. Understanding this distinction explains why ShellcodeFluctuation uses a 32-bit (4-byte) key.
2.1 Single-Byte XOR (8-bit key)
// Single-byte XOR - trivially breakable
void xor_single(BYTE* data, SIZE_T len, BYTE key) {
for (SIZE_T i = 0; i < len; i++) {
data[i] ^= key;
}
}
// Only 256 possible keys - brute force in microseconds
// BeaconEye does exactly this to find Beacon configs
With only 256 possible keys, any tool can try all of them in microseconds. This is how BeaconEye defeats Cobalt Strike's built-in configuration obfuscation — it simply tries every single-byte XOR key against each memory region and checks if the result matches a Beacon config signature.
2.2 Multi-Byte XOR (XOR32)
// ShellcodeFluctuation uses XOR32 - a 4-byte key
// This is the actual approach from the repository
void xor32(BYTE* data, SIZE_T len, DWORD key) {
// Process 4 bytes at a time for performance
DWORD* ptr = (DWORD*)data;
SIZE_T dwordCount = len / sizeof(DWORD);
for (SIZE_T i = 0; i < dwordCount; i++) {
ptr[i] ^= key;
}
// Handle remaining bytes (0-3 trailing bytes)
BYTE* keyBytes = (BYTE*)&key;
SIZE_T remainder = len % sizeof(DWORD);
BYTE* tail = data + (dwordCount * sizeof(DWORD));
for (SIZE_T i = 0; i < remainder; i++) {
tail[i] ^= keyBytes[i];
}
}
| Key Size | Keyspace | Brute Force Time | Sufficient for Sleep Evasion? |
|---|---|---|---|
| 1 byte (8-bit) | 256 | Microseconds | No — BeaconEye breaks it instantly |
| 4 bytes (32-bit) | ~4.3 billion | Minutes to hours | Yes — impractical for real-time scanning |
| 8 bytes (64-bit) | ~1.8 x 10^19 | Years | Overkill for this use case |
Security Context
XOR32 is not cryptographically secure — it is vulnerable to known-plaintext attacks if an attacker knows what part of the shellcode looks like (e.g., a known MZ header at offset 0). However, for the threat model of ShellcodeFluctuation, it does not need to be. The goal is to defeat automated real-time scanners, not withstand dedicated cryptanalysis. The shellcode is only encrypted during the sleep window, and the key changes each time the implant loads.
3. Key Generation
ShellcodeFluctuation generates the XOR key at runtime, ensuring each execution uses a different key. This prevents static signatures based on known ciphertext:
// Key generation approach
DWORD generateXorKey() {
DWORD key = 0;
// Use a random seed - could be from:
// - GetTickCount()
// - __rdtsc() (CPU timestamp counter)
// - RtlRandomEx()
// - CryptGenRandom() / BCryptGenRandom()
// Simple but effective for this threat model:
srand(GetTickCount());
key = (rand() << 16) | rand();
// Avoid degenerate keys
if (key == 0) key = 0xDEADBEEF;
return key;
}
Why Not a Hardcoded Key?
A hardcoded key would mean every execution produces the same ciphertext. A defender who reverses the tool once could compute the expected encrypted bytes and write a YARA rule matching them. Runtime-generated keys ensure the ciphertext is different every time, defeating static signature approaches.
4. The XOR Toggle Pattern
Because XOR is self-inverse, ShellcodeFluctuation uses the exact same function for both encryption and decryption. Here is the conceptual toggle pattern:
// The toggle pattern - same call encrypts OR decrypts
DWORD xorKey = generateXorKey(); // Generated once at startup
// Before sleep: encrypt (plaintext -> ciphertext)
xor32(shellcodeBase, shellcodeSize, xorKey);
// shellcode region now contains encrypted gibberish
// ... sleep occurs ...
// After wake: decrypt (ciphertext -> plaintext)
xor32(shellcodeBase, shellcodeSize, xorKey);
// shellcode region now contains executable code again
XOR Toggle Cycle
Executable shellcode
Encrypted gibberish
Executable shellcode
5. Performance Considerations
Speed matters for stealth. The encryption/decryption operation happens on every sleep cycle — for a beacon sleeping every 60 seconds, that is once per minute. If the XOR operation is slow, it creates a measurable delay between when Sleep is called and when actual sleeping begins.
// Performance analysis for typical shellcode sizes
//
// Cobalt Strike Beacon: ~300 KB shellcode
// Meterpreter stage: ~200 KB shellcode
// Sliver implant: ~10-15 MB (larger, Go-based)
//
// XOR32 throughput on modern x86-64:
// - Processes 4 bytes per iteration
// - ~1 cycle per DWORD XOR (pipelined)
// - At 3 GHz: ~3 billion DWORDs/sec = ~12 GB/sec
//
// For 300 KB Beacon:
// - 300,000 / 4 = 75,000 iterations
// - 75,000 / 3,000,000,000 = 0.000025 seconds = 25 microseconds
//
// Conclusion: XOR32 encryption of typical shellcode takes
// ~25 microseconds - completely negligible
| Shellcode Size | XOR32 Time (approx.) | Overhead per Cycle |
|---|---|---|
| 100 KB | ~8 us | Negligible |
| 300 KB (Beacon) | ~25 us | Negligible |
| 1 MB | ~83 us | Negligible |
| 10 MB (Sliver) | ~830 us | Still under 1ms |
Comparison with AES and RC4
AES-256-CBC processes data at ~1-3 GB/sec in software (without AES-NI) and ~10+ GB/sec with AES-NI hardware acceleration. RC4 (used by Ekko via SystemFunction032) runs at ~2-5 GB/sec. XOR32 at ~12 GB/sec is the fastest option and requires zero library imports. For the threat model of sleep-time encryption, the speed difference between AES and XOR is irrelevant, but the zero-import advantage of XOR is significant.
6. Memory Alignment Considerations
The XOR32 implementation processes data in 4-byte (DWORD) chunks. This works correctly when the data pointer is DWORD-aligned, which is guaranteed by VirtualAlloc (it returns page-aligned addresses, which are always DWORD-aligned). However, the trailing bytes must be handled separately:
// Alignment-safe XOR32 implementation
void xor32Safe(BYTE* data, SIZE_T len, DWORD key) {
BYTE keyBytes[4];
memcpy(keyBytes, &key, 4);
// Process bulk as DWORDs (safe because VirtualAlloc
// returns page-aligned = 4K-aligned addresses)
SIZE_T i = 0;
for (; i + 3 < len; i += 4) {
*(DWORD*)(data + i) ^= key;
}
// Handle 0-3 remaining bytes
for (; i < len; i++) {
data[i] ^= keyBytes[i % 4];
}
}
The trailing-byte handling ensures that shellcode regions whose size is not a multiple of 4 are still correctly encrypted and decrypted. Since XOR is applied byte-by-byte for the remainder using the corresponding key byte, the self-inverse property is preserved.
7. Visualizing XOR on Shellcode
Before and After XOR32 Encryption
// Original shellcode (Beacon stub) - recognizable patterns:
FC 48 83 E4 F0 E8 C0 00 00 00 41 51 41 50 52 51
56 48 31 D2 65 48 8B 52 60 48 8B 52 18 48 8B 52
// After XOR32 with key 0xA7B3C9D5 - appears random:
5B FB 4A 31 57 5B 09 D5 A7 B3 88 A4 E6 E3 9B 84
F1 FB F8 07 C2 FB 42 87 C7 FB 42 87 BF FB 42 87
// XOR again with same key - original restored:
FC 48 83 E4 F0 E8 C0 00 00 00 41 51 41 50 52 51
56 48 31 D2 65 48 8B 52 60 48 8B 52 18 48 8B 52
The original bytes contain recognizable patterns (the FC 48 83 E4 F0 E8 sequence is a well-known Cobalt Strike shellcode prologue). After XOR32, the bytes appear random and match no known signatures.
8. Why Not AES or RC4?
While stronger ciphers exist, they introduce unnecessary complexity for this threat model:
| Factor | XOR32 | RC4 (SystemFunction032) | AES-256 |
|---|---|---|---|
| Imports needed | None | advapi32!SystemFunction032 | bcrypt.dll or advapi32.dll |
| Code size | ~10 lines | ~5 lines + DLL load | ~50+ lines + DLL load |
| In-place operation | Native | Native (RC4 is stream cipher) | Requires mode (CTR/CBC) for in-place |
| Key/IV setup | None | Key schedule per call | Key expansion + IV management |
| Detectability of import | No import to detect | SystemFunction032 in IAT is an IOC | Crypto API usage may be flagged |
| Sufficient for sleep evasion? | Yes | Yes (overkill) | Yes (massive overkill) |
Ekko's Choice vs Fluctuation's Choice
Ekko uses RC4 via SystemFunction032 because it chains operations through timer callbacks and needs a Windows API function it can pass as a callback. ShellcodeFluctuation uses XOR32 because it runs its own code in the MySleep hook handler, where it can directly call any custom function. The architectural difference drives the cipher choice.
Knowledge Check
Q1: Why is XOR's self-inverse property critical for ShellcodeFluctuation?
Q2: Why does ShellcodeFluctuation use a 32-bit XOR key instead of a single-byte key?
Q3: How long does XOR32 encryption of a typical 300 KB Cobalt Strike Beacon take?