Difficulty: Intermediate

Module 4: Shoggoth Architecture Overview

The end-to-end encoder pipeline: from raw input to polymorphic position-independent output, covering the three operational modes and the role of each component.

Module Objective

Understand Shoggoth’s complete architecture: how input payloads are processed, how the PIC loaders are merged for PE and COFF modes, the two encryption stages, the decoder stub generation pipeline, garbage code insertion points, and the final output assembly. By the end of this module, you will have a mental model of the entire data flow.

1. The Three Operational Modes

Shoggoth supports three input modes, each handling a different payload type. The mode determines how the input is pre-processed before encryption:

ModeFlagInputPre-ProcessingOutput
Raw--mode rawShellcode (.bin)None — input is used as-isPIC blob (decoder stub + encrypted shellcode)
PE--mode pex64 PE executable (.exe)PIC PE loader prepended to the PE filePIC blob (decoder stub + encrypted [loader + PE])
COFF--mode coffx64 COFF/BOF (.o)PIC COFF loader prepended to the COFF objectPIC blob (decoder stub + encrypted [loader + COFF])

In raw mode, the input shellcode is treated as an opaque byte sequence. Shoggoth encrypts it and generates a decoder stub that decrypts and jumps to it. No loader is needed because the input is already position-independent shellcode.

In PE mode, the input is a standard x64 PE executable. Since PE files require a loader to process imports, relocations, and sections, Shoggoth prepends a PIC PE loader — a self-contained piece of shellcode that can parse the PE headers, map sections, resolve imports by walking the PEB/LDR structures, apply relocations, and transfer control to the entry point.

In COFF mode, the input is a COFF object file (commonly used as Beacon Object Files / BOFs in Cobalt Strike). A PIC COFF loader is prepended that handles symbol resolution and section mapping for COFF objects.

2. PIC Loaders

The PE and COFF loaders are critical components that enable Shoggoth to handle non-shellcode inputs. They are compiled from C source code using MinGW with specific constraints to ensure position-independence:

PIC Loader Constraints

The loaders are pre-compiled and stored in the stub/ directory of the Shoggoth source tree. At encryption time, the appropriate loader binary is read, the input payload is appended to it, and the combined blob becomes the data that gets encrypted.

C++// Conceptual: PE mode payload assembly
// 1. Read the PIC PE loader stub
std::vector<uint8_t> loader = readFile("stub/PELoader.bin");

// 2. Read the input PE file
std::vector<uint8_t> peFile = readFile(inputPath);

// 3. Concatenate: loader + PE file = combined payload
std::vector<uint8_t> payload;
payload.insert(payload.end(), loader.begin(), loader.end());
payload.insert(payload.end(), peFile.begin(), peFile.end());

// 4. Now encrypt this combined payload
// The decoder stub will decrypt it, then execution starts
// at the loader, which parses and maps the PE file

3. The Encryption Pipeline

After the payload is assembled (raw shellcode, or loader + PE/COFF), Shoggoth applies a two-stage encryption pipeline. Each stage uses a different algorithm with randomly generated parameters:

Shoggoth Encryption Pipeline

Input Payload
shellcode / loader+PE / loader+COFF
Stage 1: RC4
Random key, stream cipher
Stage 2: Block Cipher
Random ops on 8-byte blocks
Encrypted Blob

3.1 Stage 1: RC4 Stream Cipher

The first encryption stage applies the RC4 stream cipher with a randomly generated key. RC4 was chosen for several reasons: it produces output of the same length as the input (no padding), it has a simple implementation that can be expressed in few x86 instructions, and it provides good byte-level diffusion (changing one key byte affects the entire output).

3.2 Stage 2: Random Block Cipher

The second stage divides the data into 8-byte blocks and applies a randomly selected chain of arithmetic/bitwise operations. For each encryption run, Shoggoth randomly selects which operations to apply and in what order from the pool: ADD, SUB, XOR, NOT, NEG, INC, DEC, ROL, ROR. Each operation uses a randomly generated key/shift value.

Optional Stage Control

Shoggoth provides flags to skip either encryption stage: --dont-do-first-encryption skips RC4, and --dont-do-second-encryption skips the block cipher. There is also --encrypt-only-decryptor which applies the second stage only to the RC4 decryptor stub (not the entire payload). These options are useful for testing or when layering Shoggoth with other tools.

4. Decoder Stub Generation

For each encryption stage, Shoggoth uses asmjit to generate a corresponding decoder stub — position-independent x86-64 machine code that reverses the encryption at runtime. The decoder stubs are the polymorphic heart of the system:

StubDecryptsAlgorithmPolymorphic Properties
Block Cipher DecoderStage 2 encryptionInverse operations on 8-byte blocks (reverse order: if encrypted with ADD then XOR, decoder does XOR then SUB)Random registers, random junk code, random operation sequence (matches encryption)
RC4 DecoderStage 1 encryptionRC4 KSA + PRGA implementation in x86-64 assemblyRandom registers, junk code insertion between RC4 steps

Each decoder stub is generated fresh using asmjit, with random register assignments and junk code inserted at multiple points. The stubs include RIP-relative addressing to locate the encrypted data that follows them in memory.

5. Garbage Code Insertion Points

Shoggoth inserts garbage (junk) instructions at multiple points in the pipeline to further break pattern matching. Junk code is inserted:

The junk code generator recursively produces instructions that have no net effect on the decoder’s functional state. These include jump-over blocks (a short JMP that skips random bytes), side-effect-free operations (push/pop pairs, XOR reg with self then XOR again), and fake function call patterns. Module 7 covers this in detail.

6. Final Output Assembly

The final output is assembled by concatenating the generated components in execution order:

Final PIC Output Structure

Junk Preamble
Stage 2 Decoder
(block cipher)
Junk Interlude
Stage 1 Decoder
(RC4)
Encrypted Payload

When executed, the flow is:

  1. CPU executes junk preamble (no-ops effectively)
  2. Stage 2 decoder runs: decrypts the block cipher layer, revealing the RC4-encrypted payload (and the RC4 decoder stub, if --encrypt-only-decryptor was not used)
  3. Junk interlude executes (more no-ops)
  4. Stage 1 decoder runs: decrypts the RC4 layer, revealing the cleartext payload
  5. Control transfers to the decrypted payload (shellcode, or the PIC PE/COFF loader)

7. Command-Line Interface

Shoggoth’s CLI exposes control over every stage of the pipeline:

Shell# Basic usage: encrypt raw shellcode
Shoggoth.exe -i payload.bin -o encrypted.bin -m raw

# Encrypt a PE file with a specific seed for reproducibility
Shoggoth.exe -i implant.exe -o encrypted.bin -m pe -s 12345

# Encrypt a COFF/BOF with custom RC4 key
Shoggoth.exe -i beacon.o -o encrypted.bin -m coff -k AABBCCDD

# Skip RC4 stage, only use block cipher
Shoggoth.exe -i payload.bin -o encrypted.bin -m raw --dont-do-first-encryption

# Encrypt COFF with BOF arguments
Shoggoth.exe -i beacon.o -o encrypted.bin -m coff --coff-arg 0x00000001...
FlagRequiredDescription
-i / --inputYesPath to input payload file
-o / --outputYesPath for encrypted output file
-m / --modeYesEncryption mode: raw, pe, or coff
-s / --seedNoRNG seed for deterministic output (useful for testing)
-k / --keyNoCustom RC4 key in hex (default: randomly generated)
--coff-argNoBOF arguments in beacon_generate.py format
--dont-do-first-encryptionNoSkip Stage 1 (RC4)
--dont-do-second-encryptionNoSkip Stage 2 (block cipher)
--encrypt-only-decryptorNoStage 2 encrypts only the RC4 decoder, not the full payload

8. Source Code Organization

The Shoggoth repository is organized into distinct directories, each handling a specific concern:

Repository Structure

DirectoryContentsRole
src/Main encryptor C++ sourceCore engine: encryption, asmjit stub generation, junk code, CLI
PELoader/PIC PE loader C sourceCompiled to position-independent shellcode that loads PE files from memory
COFFLoader/PIC COFF loader C sourceCompiled to position-independent shellcode that loads COFF/BOF files
stub/Pre-compiled loader binariesReady-to-use .bin files for PE and COFF loaders
COFFArgGenerator/Python scriptbeacon_generate.py for formatting BOF arguments

Knowledge Check

Q1: In PE mode, what does Shoggoth prepend to the PE file before encryption?

Shoggoth prepends a PIC PE loader — a self-contained shellcode stub compiled from C with -nostdlib. This loader resolves API addresses by walking the PEB/LDR structures, maps PE sections, applies relocations, and calls the PE entry point. It contains no global variables and uses only RIP-relative addressing.

Q2: What is the correct order of decryption when the output executes?

Encryption applies RC4 first (Stage 1), then the block cipher (Stage 2). At runtime, decryption must reverse this: the block cipher decoder (Stage 2) runs first to peel off the outer layer, then the RC4 decoder (Stage 1) runs to reveal the cleartext payload. This is the standard “last encrypted, first decrypted” pattern.

Q3: What does the --seed flag do?

The --seed flag initializes the C++ random number generator with a fixed value. Since all random decisions (key generation, register selection, junk code placement, operation selection) flow from this RNG, the same seed with the same input produces byte-identical output. This is invaluable for debugging and testing while still being fully polymorphic by default (random seed from system entropy).