Module 5: The Encryption Layer
Deep dive into Shoggoth’s two-stage encryption: RC4 stream cipher and the random block cipher with chained arithmetic and bitwise operations.
Module Objective
Understand how Shoggoth’s encryption works at the byte level: the RC4 Key Scheduling Algorithm (KSA) and Pseudo-Random Generation Algorithm (PRGA), the random block cipher’s operation selection from the XOR/ADD/SUB/ROL/ROR/NOT/NEG/INC/DEC pool, multi-operation chaining on 8-byte blocks, and how random keys are generated for each run.
1. Stage 1: RC4 Stream Cipher
The first encryption stage uses RC4 (Rivest Cipher 4), a stream cipher that generates a pseudo-random keystream from a variable-length key. RC4 was chosen for Shoggoth for practical reasons: it produces output of exactly the same length as the input (no padding needed), its implementation is compact (important for the decoder stub size), and it provides good byte-level diffusion.
1.1 Key Scheduling Algorithm (KSA)
The KSA initializes the internal state array S[256] using the encryption key. Every element of S is swapped based on the key bytes, creating a key-dependent permutation:
C++// RC4 Key Scheduling Algorithm
void rc4_ksa(uint8_t S[256], const uint8_t* key, size_t keyLen) {
// Initialize identity permutation
for (int i = 0; i < 256; i++) {
S[i] = (uint8_t)i;
}
// Key-dependent permutation
uint8_t j = 0;
for (int i = 0; i < 256; i++) {
j = j + S[i] + key[i % keyLen]; // mod wraps key
// Swap S[i] and S[j]
uint8_t temp = S[i];
S[i] = S[j];
S[j] = temp;
}
}
1.2 Pseudo-Random Generation Algorithm (PRGA)
After KSA, the PRGA generates one keystream byte per iteration by further permuting S. Each keystream byte is XORed with a plaintext byte to produce ciphertext:
C++// RC4 PRGA - encrypt/decrypt (same operation)
void rc4_crypt(uint8_t S[256], uint8_t* data, size_t dataLen) {
uint8_t i = 0, j = 0;
for (size_t n = 0; n < dataLen; n++) {
i = i + 1;
j = j + S[i];
// Swap S[i] and S[j]
uint8_t temp = S[i];
S[i] = S[j];
S[j] = temp;
// Generate keystream byte and XOR with data
uint8_t k = S[(uint8_t)(S[i] + S[j])];
data[n] ^= k;
}
}
RC4 Is Symmetric
RC4 encryption and decryption are identical operations: XOR with the keystream. Running the PRGA with the same key on the ciphertext recovers the plaintext. This is why the decoder stub does not need separate encrypt/decrypt logic for this stage — the same code works in both directions.
1.3 Random Key Generation
By default, Shoggoth generates a random RC4 key for each encryption run using the C++ random number generator. The key length and content vary between runs, ensuring each output uses a different keystream. The key is embedded within the decoder stub so it is available at runtime for decryption.
C++// Conceptual: random RC4 key generation
std::mt19937 rng(seed); // seeded randomly or with --seed value
std::uniform_int_distribution<int> byteDist(0, 255);
std::uniform_int_distribution<int> lenDist(8, 32); // key length range
size_t keyLen = lenDist(rng);
std::vector<uint8_t> rc4Key(keyLen);
for (size_t i = 0; i < keyLen; i++) {
rc4Key[i] = (uint8_t)byteDist(rng);
}
2. Stage 2: Random Block Cipher
The second encryption stage is where Shoggoth’s polymorphic nature is most evident in the encryption itself. Rather than using a fixed algorithm, Shoggoth randomly constructs a block cipher by selecting operations from a pool and chaining them together.
2.1 The Operation Pool
Shoggoth selects from the following operations, each operating on 8-byte (QWORD) blocks:
| Operation | Encoding Action | Decoding (Inverse) Action | Requires Key? |
|---|---|---|---|
XOR | block ^= key | block ^= key | Yes (random 8-byte key) |
ADD | block += key | block -= key | Yes (random 8-byte key) |
SUB | block -= key | block += key | Yes (random 8-byte key) |
ROL | block = rotl(block, n) | block = rotr(block, n) | Yes (rotation count 1–63) |
ROR | block = rotr(block, n) | block = rotl(block, n) | Yes (rotation count 1–63) |
NOT | block = ~block | block = ~block | No (self-inverse) |
NEG | block = -block | block = -block | No (self-inverse) |
INC | block += 1 | block -= 1 | No |
DEC | block -= 1 | block += 1 | No |
Inverse Operations Are Critical
Every encryption operation must have a known inverse that the decoder applies. If encryption applies ADD with key K, the decoder must apply SUB with the same K. If encryption applies ROL by N bits, the decoder must apply ROR by N bits. The operations NOT and NEG are their own inverses. XOR is also self-inverse. Getting the inverse wrong means the payload will not decrypt correctly.
2.2 Chain Construction
For each encryption run, Shoggoth randomly selects a chain of operations. A chain might look like: XOR(k1) → ADD(k2) → ROL(n1) → NOT → SUB(k3). Each operation in the chain is applied sequentially to every 8-byte block of the payload:
C++// Conceptual: building a random operation chain
enum OpType { OP_XOR, OP_ADD, OP_SUB, OP_ROL, OP_ROR, OP_NOT, OP_NEG, OP_INC, OP_DEC };
struct Operation {
OpType type;
uint64_t key; // for XOR/ADD/SUB: 8-byte key; for ROL/ROR: rotation count
};
// Randomly select 3-5 operations for this chain
std::uniform_int_distribution<int> chainLen(3, 5);
std::uniform_int_distribution<int> opDist(0, 8); // 9 operation types
int numOps = chainLen(rng);
std::vector<Operation> chain;
for (int i = 0; i < numOps; i++) {
Operation op;
op.type = (OpType)opDist(rng);
if (op.type == OP_XOR || op.type == OP_ADD || op.type == OP_SUB) {
op.key = randomU64(rng); // random 8-byte key
} else if (op.type == OP_ROL || op.type == OP_ROR) {
op.key = (rng() % 63) + 1; // rotation: 1-63 bits
} else {
op.key = 0; // NOT, NEG, INC, DEC need no key
}
chain.push_back(op);
}
2.3 Applying the Chain
The encryption processes the payload in 8-byte chunks. For each chunk, the entire operation chain is applied in order. If the payload length is not a multiple of 8, the final partial block is handled by only encrypting the available bytes (or padding is applied).
C++// Conceptual: applying the encryption chain to 8-byte blocks
void encryptPayload(uint8_t* data, size_t len, const std::vector<Operation>& chain) {
size_t numBlocks = len / 8;
uint64_t* blocks = (uint64_t*)data;
for (size_t b = 0; b < numBlocks; b++) {
for (const auto& op : chain) {
switch (op.type) {
case OP_XOR: blocks[b] ^= op.key; break;
case OP_ADD: blocks[b] += op.key; break;
case OP_SUB: blocks[b] -= op.key; break;
case OP_ROL: blocks[b] = _rotl64(blocks[b], (int)op.key); break;
case OP_ROR: blocks[b] = _rotr64(blocks[b], (int)op.key); break;
case OP_NOT: blocks[b] = ~blocks[b]; break;
case OP_NEG: blocks[b] = (uint64_t)(-(int64_t)blocks[b]); break;
case OP_INC: blocks[b] += 1; break;
case OP_DEC: blocks[b] -= 1; break;
}
}
}
}
3. Decryption: Reversing the Chain
The decoder stub must reverse the encryption by applying the inverse operations in reverse order. If the encryption chain was XOR(k1) → ADD(k2) → ROL(n1), the decryption chain is ROR(n1) → SUB(k2) → XOR(k1):
Encryption vs Decryption Chain Order
This reversal is computed at generation time. Shoggoth builds the encryption chain, applies it to the payload, then constructs the inverse chain and passes it to the asmjit stub generator. The decoder stub encodes these inverse operations as x86-64 instructions operating on QWORD values.
4. Why Two Stages?
Using two fundamentally different encryption stages provides defense in depth against analysis:
| Property | RC4 (Stage 1) | Block Cipher (Stage 2) | Combined Effect |
|---|---|---|---|
| Granularity | Byte-level stream cipher | 8-byte block operations | Both byte-level and block-level diffusion |
| Key structure | Variable-length key, 256-byte S-box | Multiple independent keys, one per operation | Key space is effectively unbounded |
| Decoder complexity | Full KSA + PRGA loop (more instructions) | Simple per-block operations (fewer instructions) | Two different stub patterns to signature |
| Polymorphism | Different key, different registers | Different operations, different keys, different registers | Both the algorithm and the implementation vary |
Algorithm-Level Polymorphism
Stage 2 is uniquely polymorphic at the algorithm level, not just the implementation level. Two encryption runs may use completely different operation chains (e.g., XOR+ADD+ROL vs SUB+NOT+ROR+XOR). This means the decoder stubs perform genuinely different computations, not just the same computation with different registers. This is significantly harder to detect than simple register randomization.
5. Key Embedding
All encryption keys (RC4 key bytes, block cipher operation keys, rotation counts) must be available to the decoder at runtime. Shoggoth embeds them directly in the decoder stub as immediate values or as data blocks following the code:
ASM; Example: RC4 key embedded as data block after decoder code
; The decoder uses RIP-relative addressing to locate it
decoder_start:
lea r8, [rip + rc4_key_data] ; r8 = pointer to key
mov r9d, KEY_LENGTH ; r9 = key length
; ... RC4 KSA and PRGA using r8/r9 ...
jmp decrypted_payload
rc4_key_data:
db 0xA3, 0x7F, 0x12, 0xBB, ... ; random key bytes
; Block cipher keys embedded as immediates:
mov rax, 0xDEADBEEFCAFEBABE ; XOR key as immediate
xor qword [rbx], rax ; apply XOR to block
Since the keys change every run, the embedded values are different in every output. Combined with register randomization and junk code, this means the data sections of the decoder are also variable, contributing to the overall polymorphism.
Knowledge Check
Q1: If the encryption chain is ADD(k1) → XOR(k2) → NOT, what is the correct decryption chain?
Q2: Why is RC4 convenient for a polymorphic encoder’s decoder stub?
Q3: What makes Stage 2 polymorphic at the algorithm level rather than just the implementation level?