Difficulty: Intermediate

Module 5: The Encryption Layer

Deep dive into Shoggoth’s two-stage encryption: RC4 stream cipher and the random block cipher with chained arithmetic and bitwise operations.

Module Objective

Understand how Shoggoth’s encryption works at the byte level: the RC4 Key Scheduling Algorithm (KSA) and Pseudo-Random Generation Algorithm (PRGA), the random block cipher’s operation selection from the XOR/ADD/SUB/ROL/ROR/NOT/NEG/INC/DEC pool, multi-operation chaining on 8-byte blocks, and how random keys are generated for each run.

1. Stage 1: RC4 Stream Cipher

The first encryption stage uses RC4 (Rivest Cipher 4), a stream cipher that generates a pseudo-random keystream from a variable-length key. RC4 was chosen for Shoggoth for practical reasons: it produces output of exactly the same length as the input (no padding needed), its implementation is compact (important for the decoder stub size), and it provides good byte-level diffusion.

1.1 Key Scheduling Algorithm (KSA)

The KSA initializes the internal state array S[256] using the encryption key. Every element of S is swapped based on the key bytes, creating a key-dependent permutation:

C++// RC4 Key Scheduling Algorithm
void rc4_ksa(uint8_t S[256], const uint8_t* key, size_t keyLen) {
    // Initialize identity permutation
    for (int i = 0; i < 256; i++) {
        S[i] = (uint8_t)i;
    }

    // Key-dependent permutation
    uint8_t j = 0;
    for (int i = 0; i < 256; i++) {
        j = j + S[i] + key[i % keyLen];  // mod wraps key
        // Swap S[i] and S[j]
        uint8_t temp = S[i];
        S[i] = S[j];
        S[j] = temp;
    }
}

1.2 Pseudo-Random Generation Algorithm (PRGA)

After KSA, the PRGA generates one keystream byte per iteration by further permuting S. Each keystream byte is XORed with a plaintext byte to produce ciphertext:

C++// RC4 PRGA - encrypt/decrypt (same operation)
void rc4_crypt(uint8_t S[256], uint8_t* data, size_t dataLen) {
    uint8_t i = 0, j = 0;
    for (size_t n = 0; n < dataLen; n++) {
        i = i + 1;
        j = j + S[i];
        // Swap S[i] and S[j]
        uint8_t temp = S[i];
        S[i] = S[j];
        S[j] = temp;
        // Generate keystream byte and XOR with data
        uint8_t k = S[(uint8_t)(S[i] + S[j])];
        data[n] ^= k;
    }
}

RC4 Is Symmetric

RC4 encryption and decryption are identical operations: XOR with the keystream. Running the PRGA with the same key on the ciphertext recovers the plaintext. This is why the decoder stub does not need separate encrypt/decrypt logic for this stage — the same code works in both directions.

1.3 Random Key Generation

By default, Shoggoth generates a random RC4 key for each encryption run using the C++ random number generator. The key length and content vary between runs, ensuring each output uses a different keystream. The key is embedded within the decoder stub so it is available at runtime for decryption.

C++// Conceptual: random RC4 key generation
std::mt19937 rng(seed);  // seeded randomly or with --seed value
std::uniform_int_distribution<int> byteDist(0, 255);
std::uniform_int_distribution<int> lenDist(8, 32);  // key length range

size_t keyLen = lenDist(rng);
std::vector<uint8_t> rc4Key(keyLen);
for (size_t i = 0; i < keyLen; i++) {
    rc4Key[i] = (uint8_t)byteDist(rng);
}

2. Stage 2: Random Block Cipher

The second encryption stage is where Shoggoth’s polymorphic nature is most evident in the encryption itself. Rather than using a fixed algorithm, Shoggoth randomly constructs a block cipher by selecting operations from a pool and chaining them together.

2.1 The Operation Pool

Shoggoth selects from the following operations, each operating on 8-byte (QWORD) blocks:

OperationEncoding ActionDecoding (Inverse) ActionRequires Key?
XORblock ^= keyblock ^= keyYes (random 8-byte key)
ADDblock += keyblock -= keyYes (random 8-byte key)
SUBblock -= keyblock += keyYes (random 8-byte key)
ROLblock = rotl(block, n)block = rotr(block, n)Yes (rotation count 1–63)
RORblock = rotr(block, n)block = rotl(block, n)Yes (rotation count 1–63)
NOTblock = ~blockblock = ~blockNo (self-inverse)
NEGblock = -blockblock = -blockNo (self-inverse)
INCblock += 1block -= 1No
DECblock -= 1block += 1No

Inverse Operations Are Critical

Every encryption operation must have a known inverse that the decoder applies. If encryption applies ADD with key K, the decoder must apply SUB with the same K. If encryption applies ROL by N bits, the decoder must apply ROR by N bits. The operations NOT and NEG are their own inverses. XOR is also self-inverse. Getting the inverse wrong means the payload will not decrypt correctly.

2.2 Chain Construction

For each encryption run, Shoggoth randomly selects a chain of operations. A chain might look like: XOR(k1) → ADD(k2) → ROL(n1) → NOT → SUB(k3). Each operation in the chain is applied sequentially to every 8-byte block of the payload:

C++// Conceptual: building a random operation chain
enum OpType { OP_XOR, OP_ADD, OP_SUB, OP_ROL, OP_ROR, OP_NOT, OP_NEG, OP_INC, OP_DEC };

struct Operation {
    OpType type;
    uint64_t key;    // for XOR/ADD/SUB: 8-byte key; for ROL/ROR: rotation count
};

// Randomly select 3-5 operations for this chain
std::uniform_int_distribution<int> chainLen(3, 5);
std::uniform_int_distribution<int> opDist(0, 8);  // 9 operation types
int numOps = chainLen(rng);

std::vector<Operation> chain;
for (int i = 0; i < numOps; i++) {
    Operation op;
    op.type = (OpType)opDist(rng);
    if (op.type == OP_XOR || op.type == OP_ADD || op.type == OP_SUB) {
        op.key = randomU64(rng);      // random 8-byte key
    } else if (op.type == OP_ROL || op.type == OP_ROR) {
        op.key = (rng() % 63) + 1;   // rotation: 1-63 bits
    } else {
        op.key = 0;  // NOT, NEG, INC, DEC need no key
    }
    chain.push_back(op);
}

2.3 Applying the Chain

The encryption processes the payload in 8-byte chunks. For each chunk, the entire operation chain is applied in order. If the payload length is not a multiple of 8, the final partial block is handled by only encrypting the available bytes (or padding is applied).

C++// Conceptual: applying the encryption chain to 8-byte blocks
void encryptPayload(uint8_t* data, size_t len, const std::vector<Operation>& chain) {
    size_t numBlocks = len / 8;
    uint64_t* blocks = (uint64_t*)data;

    for (size_t b = 0; b < numBlocks; b++) {
        for (const auto& op : chain) {
            switch (op.type) {
                case OP_XOR: blocks[b] ^= op.key; break;
                case OP_ADD: blocks[b] += op.key; break;
                case OP_SUB: blocks[b] -= op.key; break;
                case OP_ROL: blocks[b] = _rotl64(blocks[b], (int)op.key); break;
                case OP_ROR: blocks[b] = _rotr64(blocks[b], (int)op.key); break;
                case OP_NOT: blocks[b] = ~blocks[b]; break;
                case OP_NEG: blocks[b] = (uint64_t)(-(int64_t)blocks[b]); break;
                case OP_INC: blocks[b] += 1; break;
                case OP_DEC: blocks[b] -= 1; break;
            }
        }
    }
}

3. Decryption: Reversing the Chain

The decoder stub must reverse the encryption by applying the inverse operations in reverse order. If the encryption chain was XOR(k1) → ADD(k2) → ROL(n1), the decryption chain is ROR(n1) → SUB(k2) → XOR(k1):

Encryption vs Decryption Chain Order

Encrypt: XOR(k1)
ADD(k2)
ROL(5)
Decryption reverses both order and operations:
Decrypt: ROR(5)
SUB(k2)
XOR(k1)

This reversal is computed at generation time. Shoggoth builds the encryption chain, applies it to the payload, then constructs the inverse chain and passes it to the asmjit stub generator. The decoder stub encodes these inverse operations as x86-64 instructions operating on QWORD values.

4. Why Two Stages?

Using two fundamentally different encryption stages provides defense in depth against analysis:

PropertyRC4 (Stage 1)Block Cipher (Stage 2)Combined Effect
GranularityByte-level stream cipher8-byte block operationsBoth byte-level and block-level diffusion
Key structureVariable-length key, 256-byte S-boxMultiple independent keys, one per operationKey space is effectively unbounded
Decoder complexityFull KSA + PRGA loop (more instructions)Simple per-block operations (fewer instructions)Two different stub patterns to signature
PolymorphismDifferent key, different registersDifferent operations, different keys, different registersBoth the algorithm and the implementation vary

Algorithm-Level Polymorphism

Stage 2 is uniquely polymorphic at the algorithm level, not just the implementation level. Two encryption runs may use completely different operation chains (e.g., XOR+ADD+ROL vs SUB+NOT+ROR+XOR). This means the decoder stubs perform genuinely different computations, not just the same computation with different registers. This is significantly harder to detect than simple register randomization.

5. Key Embedding

All encryption keys (RC4 key bytes, block cipher operation keys, rotation counts) must be available to the decoder at runtime. Shoggoth embeds them directly in the decoder stub as immediate values or as data blocks following the code:

ASM; Example: RC4 key embedded as data block after decoder code
; The decoder uses RIP-relative addressing to locate it
decoder_start:
    lea  r8, [rip + rc4_key_data]   ; r8 = pointer to key
    mov  r9d, KEY_LENGTH            ; r9 = key length
    ; ... RC4 KSA and PRGA using r8/r9 ...
    jmp  decrypted_payload

rc4_key_data:
    db 0xA3, 0x7F, 0x12, 0xBB, ...  ; random key bytes

; Block cipher keys embedded as immediates:
    mov  rax, 0xDEADBEEFCAFEBABE    ; XOR key as immediate
    xor  qword [rbx], rax            ; apply XOR to block

Since the keys change every run, the embedded values are different in every output. Combined with register randomization and junk code, this means the data sections of the decoder are also variable, contributing to the overall polymorphism.

Knowledge Check

Q1: If the encryption chain is ADD(k1) → XOR(k2) → NOT, what is the correct decryption chain?

Decryption reverses both the order and each operation. NOT is self-inverse, XOR is self-inverse, and ADD’s inverse is SUB. Working backwards: first undo the last encryption step (NOT → NOT), then undo XOR(k2) with XOR(k2), then undo ADD(k1) with SUB(k1). Result: NOT → XOR(k2) → SUB(k1).

Q2: Why is RC4 convenient for a polymorphic encoder’s decoder stub?

RC4 is a stream cipher where encryption and decryption are identical: generate the keystream using KSA+PRGA and XOR it with the data. This means the decoder stub uses the exact same algorithm as the encryptor — no separate “inverse” implementation is needed, keeping the decoder compact.

Q3: What makes Stage 2 polymorphic at the algorithm level rather than just the implementation level?

Implementation-level polymorphism changes how the same computation is expressed (different registers, different instruction encodings). Algorithm-level polymorphism changes what computation is performed. Stage 2 randomly selects which operations to chain, so two outputs may use entirely different mathematical transformations, not just the same transformation expressed differently.