Difficulty: Intermediate

Module 5: The Encryption Layer

Deep dive into Shoggoth’s two-stage encryption: RC4 stream cipher and the random block cipher with chained arithmetic and bitwise operations.

Module Objective

Understand how Shoggoth’s encryption works at the byte level: the RC4 Key Scheduling Algorithm (KSA) and Pseudo-Random Generation Algorithm (PRGA), the random block cipher’s operation selection from the XOR/ADD/SUB/ROL/ROR/NOT/NEG/INC/DEC pool, multi-operation chaining on 8-byte blocks, and how random keys are generated for each run.

1. Stage 1: RC4 Stream Cipher

The first encryption stage uses RC4 (Rivest Cipher 4), a stream cipher that generates a pseudo-random keystream from a variable-length key. RC4 was chosen for Shoggoth for practical reasons: it produces output of exactly the same length as the input (no padding needed), its implementation is compact (important for the decoder stub size), and it provides good byte-level diffusion.

1.1 Key Scheduling Algorithm (KSA)

The KSA initializes the internal state array S[256] using the encryption key. Every element of S is swapped based on the key bytes, creating a key-dependent permutation:

C++// RC4 Key Scheduling Algorithm
void rc4_ksa(uint8_t S[256], const uint8_t* key, size_t keyLen) {
    // Initialize identity permutation
    for (int i = 0; i < 256; i++) {
        S[i] = (uint8_t)i;
    }

    // Key-dependent permutation
    uint8_t j = 0;
    for (int i = 0; i < 256; i++) {
        j = j + S[i] + key[i % keyLen];  // mod wraps key
        // Swap S[i] and S[j]
        uint8_t temp = S[i];
        S[i] = S[j];
        S[j] = temp;
    }
}

1.2 Pseudo-Random Generation Algorithm (PRGA)

After KSA, the PRGA generates one keystream byte per iteration by further permuting S. Each keystream byte is XORed with a plaintext byte to produce ciphertext:

C++// RC4 PRGA - encrypt/decrypt (same operation)
void rc4_crypt(uint8_t S[256], uint8_t* data, size_t dataLen) {
    uint8_t i = 0, j = 0;
    for (size_t n = 0; n < dataLen; n++) {
        i = i + 1;
        j = j + S[i];
        // Swap S[i] and S[j]
        uint8_t temp = S[i];
        S[i] = S[j];
        S[j] = temp;
        // Generate keystream byte and XOR with data
        uint8_t k = S[(uint8_t)(S[i] + S[j])];
        data[n] ^= k;
    }
}

RC4 Is Symmetric

RC4 encryption and decryption are identical operations: XOR with the keystream. Running the PRGA with the same key on the ciphertext recovers the plaintext. This is why the decoder stub does not need separate encrypt/decrypt logic for this stage — the same code works in both directions.

1.3 Random Key Generation

By default, Shoggoth generates a random RC4 key for each encryption run using the C++ random number generator. The key length and content vary between runs, ensuring each output uses a different keystream. The key is embedded within the decoder stub so it is available at runtime for decryption.

C++// Conceptual: random RC4 key generation
std::mt19937 rng(seed);  // seeded randomly or with --seed value
std::uniform_int_distribution<int> byteDist(0, 255);
std::uniform_int_distribution<int> lenDist(8, 32);  // key length range

size_t keyLen = lenDist(rng);
std::vector<uint8_t> rc4Key(keyLen);
for (size_t i = 0; i < keyLen; i++) {
    rc4Key[i] = (uint8_t)byteDist(rng);
}

2. Stage 2: Random Block Cipher

The second encryption stage is where Shoggoth’s polymorphic nature is most evident in the encryption itself. Rather than using a fixed algorithm, Shoggoth randomly constructs a block cipher by selecting operations from a pool and chaining them together.

2.1 The Operation Pool

Shoggoth selects from the following operations, each operating on 8-byte (QWORD) blocks:

Operation	Encoding Action	Decoding (Inverse) Action	Requires Key?
`XOR`	`block ^= key`	`block ^= key`	Yes (random 8-byte key)
`ADD`	`block += key`	`block -= key`	Yes (random 8-byte key)
`SUB`	`block -= key`	`block += key`	Yes (random 8-byte key)
`ROL`	`block = rotl(block, n)`	`block = rotr(block, n)`	Yes (rotation count 1–63)
`ROR`	`block = rotr(block, n)`	`block = rotl(block, n)`	Yes (rotation count 1–63)
`NOT`	`block = ~block`	`block = ~block`	No (self-inverse)
`NEG`	`block = -block`	`block = -block`	No (self-inverse)
`INC`	`block += 1`	`block -= 1`	No
`DEC`	`block -= 1`	`block += 1`	No

Inverse Operations Are Critical

Every encryption operation must have a known inverse that the decoder applies. If encryption applies ADD with key K, the decoder must apply SUB with the same K. If encryption applies ROL by N bits, the decoder must apply ROR by N bits. The operations NOT and NEG are their own inverses. XOR is also self-inverse. Getting the inverse wrong means the payload will not decrypt correctly.

2.2 Chain Construction

For each encryption run, Shoggoth randomly selects a chain of operations. A chain might look like: XOR(k1) → ADD(k2) → ROL(n1) → NOT → SUB(k3). Each operation in the chain is applied sequentially to every 8-byte block of the payload:

C++// Conceptual: building a random operation chain
enum OpType { OP_XOR, OP_ADD, OP_SUB, OP_ROL, OP_ROR, OP_NOT, OP_NEG, OP_INC, OP_DEC };

struct Operation {
    OpType type;
    uint64_t key;    // for XOR/ADD/SUB: 8-byte key; for ROL/ROR: rotation count
};

// Randomly select 3-5 operations for this chain
std::uniform_int_distribution<int> chainLen(3, 5);
std::uniform_int_distribution<int> opDist(0, 8);  // 9 operation types
int numOps = chainLen(rng);

std::vector<Operation> chain;
for (int i = 0; i < numOps; i++) {
    Operation op;
    op.type = (OpType)opDist(rng);
    if (op.type == OP_XOR || op.type == OP_ADD || op.type == OP_SUB) {
        op.key = randomU64(rng);      // random 8-byte key
    } else if (op.type == OP_ROL || op.type == OP_ROR) {
        op.key = (rng() % 63) + 1;   // rotation: 1-63 bits
    } else {
        op.key = 0;  // NOT, NEG, INC, DEC need no key
    }
    chain.push_back(op);
}

2.3 Applying the Chain

The encryption processes the payload in 8-byte chunks. For each chunk, the entire operation chain is applied in order. If the payload length is not a multiple of 8, the final partial block is handled by only encrypting the available bytes (or padding is applied).

C++// Conceptual: applying the encryption chain to 8-byte blocks
void encryptPayload(uint8_t* data, size_t len, const std::vector<Operation>& chain) {
    size_t numBlocks = len / 8;
    uint64_t* blocks = (uint64_t*)data;

    for (size_t b = 0; b < numBlocks; b++) {
        for (const auto& op : chain) {
            switch (op.type) {
                case OP_XOR: blocks[b] ^= op.key; break;
                case OP_ADD: blocks[b] += op.key; break;
                case OP_SUB: blocks[b] -= op.key; break;
                case OP_ROL: blocks[b] = _rotl64(blocks[b], (int)op.key); break;
                case OP_ROR: blocks[b] = _rotr64(blocks[b], (int)op.key); break;
                case OP_NOT: blocks[b] = ~blocks[b]; break;
                case OP_NEG: blocks[b] = (uint64_t)(-(int64_t)blocks[b]); break;
                case OP_INC: blocks[b] += 1; break;
                case OP_DEC: blocks[b] -= 1; break;
            }
        }
    }
}

3. Decryption: Reversing the Chain

The decoder stub must reverse the encryption by applying the inverse operations in reverse order. If the encryption chain was XOR(k1) → ADD(k2) → ROL(n1), the decryption chain is ROR(n1) → SUB(k2) → XOR(k1):

Encryption vs Decryption Chain Order

Encrypt: XOR(k1)

→

ADD(k2)

→

ROL(5)

Decryption reverses both order and operations:

Decrypt: ROR(5)

→

SUB(k2)

→

XOR(k1)

This reversal is computed at generation time. Shoggoth builds the encryption chain, applies it to the payload, then constructs the inverse chain and passes it to the asmjit stub generator. The decoder stub encodes these inverse operations as x86-64 instructions operating on QWORD values.

4. Why Two Stages?

Using two fundamentally different encryption stages provides defense in depth against analysis:

Property	RC4 (Stage 1)	Block Cipher (Stage 2)	Combined Effect
Granularity	Byte-level stream cipher	8-byte block operations	Both byte-level and block-level diffusion
Key structure	Variable-length key, 256-byte S-box	Multiple independent keys, one per operation	Key space is effectively unbounded
Decoder complexity	Full KSA + PRGA loop (more instructions)	Simple per-block operations (fewer instructions)	Two different stub patterns to signature
Polymorphism	Different key, different registers	Different operations, different keys, different registers	Both the algorithm and the implementation vary

Algorithm-Level Polymorphism

Stage 2 is uniquely polymorphic at the algorithm level, not just the implementation level. Two encryption runs may use completely different operation chains (e.g., XOR+ADD+ROL vs SUB+NOT+ROR+XOR). This means the decoder stubs perform genuinely different computations, not just the same computation with different registers. This is significantly harder to detect than simple register randomization.

5. Key Embedding

All encryption keys (RC4 key bytes, block cipher operation keys, rotation counts) must be available to the decoder at runtime. Shoggoth embeds them directly in the decoder stub as immediate values or as data blocks following the code:

ASM; Example: RC4 key embedded as data block after decoder code
; The decoder uses RIP-relative addressing to locate it
decoder_start:
    lea  r8, [rip + rc4_key_data]   ; r8 = pointer to key
    mov  r9d, KEY_LENGTH            ; r9 = key length
    ; ... RC4 KSA and PRGA using r8/r9 ...
    jmp  decrypted_payload

rc4_key_data:
    db 0xA3, 0x7F, 0x12, 0xBB, ...  ; random key bytes

; Block cipher keys embedded as immediates:
    mov  rax, 0xDEADBEEFCAFEBABE    ; XOR key as immediate
    xor  qword [rbx], rax            ; apply XOR to block

Since the keys change every run, the embedded values are different in every output. Combined with register randomization and junk code, this means the data sections of the decoder are also variable, contributing to the overall polymorphism.

← Prev: Shoggoth Architecture Next: Decoder Stub Generation →