Module 8: Full Chain, Output Formats & Detection
End-to-end execution walkthrough, PIC output structure, COFF/PE wrapping mechanics, entropy analysis, emulation-based detection, and comparison with other encoder frameworks.
Module Objective
Trace the complete execution chain from the moment Shoggoth output begins running, understand the PIC blob’s memory layout, see how PE and COFF loaders bridge the gap between shellcode and structured executables, learn the detection techniques that defenders use against polymorphic output (entropy analysis, emulation, behavioral heuristics), and compare Shoggoth with Veil, msfvenom, and other encoding frameworks.
1. Full Execution Chain Walkthrough
Let us trace what happens from the moment a Shoggoth-encrypted payload begins executing in memory. This walkthrough assumes both encryption stages are active (the default):
Runtime Execution Flow
Executes harmlessly
Decrypts Stage 2
Executes harmlessly
Decrypts Stage 1
Begins execution
Step-by-Step Execution
- Junk Preamble (variable ~50-500 bytes): The first bytes executed are garbage instructions — NOPs, self-canceling pairs, jump-over blocks. They execute harmlessly and fall through to the block cipher decoder.
- Block Cipher Decoder: Uses
lea reg, [rip + offset]to locate the doubly-encrypted payload. Iterates over 8-byte blocks, applying the inverse operation chain (e.g.,ROR → SUB → XOR). After this loop completes, the data is now only RC4-encrypted. - Junk Interlude (variable): More garbage instructions separate the two decoder stubs. This prevents a pattern like “block loop immediately followed by RC4 setup” from becoming a signature.
- RC4 Decoder: Allocates 256 bytes on the stack for the S-box. Runs the KSA using the embedded key to initialize the permutation. Then runs the PRGA, generating keystream bytes and XORing them with the remaining encrypted data. After completion, restores the stack.
- Control Transfer: Execution falls through (or jumps) to the now-decrypted payload. In raw mode, this is the original shellcode. In PE/COFF mode, this is the PIC loader followed by the PE/COFF file.
- PE/COFF Loader (if applicable): The PIC loader resolves API addresses by walking PEB → Ldr → InMemoryOrderModuleList, maps PE sections or COFF symbols, applies relocations, and transfers control to the original payload’s entry point.
2. PIC Output Memory Layout
The flat PIC blob that Shoggoth produces has no headers, no sections, no metadata — just raw executable machine code followed by encrypted data. This is intentional: any structural metadata would become a signature.
Layout+--------------------------------------------------+
| Offset 0x0000: Junk Preamble | Variable size
| - Side-effect-free instructions | (~50-500 bytes)
| - Jump-over blocks, fake calls |
+--------------------------------------------------+
| Block Cipher Decoder Stub | Variable size
| - Register setup (random regs) | (~100-400 bytes)
| - LEA to locate encrypted data |
| - Decryption loop (inverse ops + junk) |
| - Keys embedded as immediates |
+--------------------------------------------------+
| Junk Interlude | Variable size
+--------------------------------------------------+
| RC4 Decoder Stub | Variable size
| - SUB RSP, 256 (S-box allocation) | (~200-600 bytes)
| - KSA loop (with junk) |
| - PRGA loop (with junk) |
| - ADD RSP, 256 (stack restore) |
+--------------------------------------------------+
| RC4 Key Data | 8-32 bytes
+--------------------------------------------------+
| Encrypted Payload | Same size as
| (doubly encrypted: RC4 then block cipher) | original input
+--------------------------------------------------+
No File Format Artifacts
The output has no MZ header, no PE sections, no ELF magic bytes, no relocations, no import tables. It is pure position-independent machine code that can be loaded at any address and jumped to. This makes it compatible with any shellcode injection technique: VirtualAlloc + memcpy + cast-to-function-pointer, CreateThread, APC injection, process hollowing, etc.
3. PE Mode: Reflective Loading
When Shoggoth wraps a PE file, the decrypted output is not a standalone executable — it is the PIC PE loader followed by the PE file bytes. The loader acts as a minimal reflective loader:
| Loader Step | What Happens | Why It’s Needed |
|---|---|---|
| 1. Find image base | Calculate pointer to the appended PE file using RIP-relative offset | The loader must know where the PE data starts in memory |
| 2. Parse PE headers | Read DOS header, NT headers, section table | Determine section layout, entry point, import/relocation directories |
| 3. Allocate memory | VirtualAlloc with SizeOfImage from optional header | Create a contiguous block to map sections into |
| 4. Map sections | Copy each section from raw data to virtual address offset | Sections must be at their correct RVAs for references to work |
| 5. Process relocations | Apply base relocation delta if loaded at non-preferred address | Absolute addresses in the PE must be fixed up |
| 6. Resolve imports | Walk PEB to find loaded DLLs, parse export tables, fill IAT | The PE needs function pointers for API calls |
| 7. Set permissions | VirtualProtect per section (RX for .text, RW for .data, etc.) | Proper memory permissions for security and correctness |
| 8. Call entry point | Jump to AddressOfEntryPoint + new base address | Begin executing the original PE |
All API addresses (VirtualAlloc, VirtualProtect, LoadLibraryA, GetProcAddress) are resolved at runtime by the loader through PEB walking — no import table in the PIC blob itself.
4. COFF Mode: BOF Loading
COFF/BOF (Beacon Object File) mode handles .o object files, commonly used with Cobalt Strike’s inline-execute or similar BOF runners. The COFF loader performs a simplified version of what a linker does:
- Parse COFF headers — read section table, symbol table, string table
- Map sections — allocate memory and copy section data
- Process relocations — apply COFF relocations (IMAGE_REL_AMD64_ADDR64, IMAGE_REL_AMD64_REL32, etc.)
- Resolve external symbols — BOF external functions (like
BeaconOutput, Windows APIs) are resolved through a function table or PEB walking - Call entry point — execute the
gofunction (standard BOF entry point) with optional arguments
The --coff-arg flag allows passing arguments to the BOF in the format expected by Cobalt Strike’s BeaconDataParse API, generated using the included beacon_generate.py script.
5. Detection: Entropy Analysis
One of the most effective detection techniques against encrypted payloads is entropy analysis. Shannon entropy measures the randomness of data on a scale from 0 (completely uniform) to 8 (maximum randomness for byte data). Encrypted data has characteristically high entropy:
| Data Type | Typical Entropy (bits/byte) | Pattern |
|---|---|---|
| English text | 3.5 – 5.0 | Low entropy, repetitive character distribution |
| Compiled x86 code | 5.5 – 6.5 | Moderate entropy, structured opcode patterns |
| Compressed data (zlib, etc.) | 7.5 – 8.0 | Near-maximum entropy |
| Encrypted data (AES, RC4, etc.) | 7.9 – 8.0 | Near-maximum entropy, indistinguishable from random |
| Shoggoth output | 6.0 – 7.8 | Mixed: low-entropy decoder stub + high-entropy encrypted payload |
The Entropy Profile Is a Signature
A Shoggoth output has a distinctive entropy profile: a block of moderate-entropy code (the decoder stubs with their junk instructions) followed by a block of high-entropy data (the encrypted payload). This “step function” in entropy is itself detectable. Tools like binwalk --entropy or pestudio can visualize this pattern, revealing the boundary between code and encrypted data.
Defenders can use sliding-window entropy analysis to flag regions of a binary that transition sharply from normal code entropy (~6.0) to encrypted data entropy (~7.9). This does not reveal what the payload is, but it identifies that encryption is present — which is suspicious in shellcode.
6. Detection: Emulation & Sandbox Analysis
Despite Shoggoth’s junk code slowing emulation, advanced detection still relies on emulation-based unpacking:
Emulation-Based Detection Approaches
| Technique | How It Works | Effectiveness Against Shoggoth |
|---|---|---|
| Instruction-count emulation | Run code in emulator for N instructions, scan memory for known patterns | Moderate — junk code consumes budget, but high-budget emulators may succeed |
| Write-then-execute detection | Monitor for memory regions that are written to then executed (self-modifying code pattern) | High — the decoder must write decrypted bytes then execute them; this is inherent to the design |
| API call monitoring | In PE/COFF mode, watch for VirtualAlloc, VirtualProtect, LoadLibrary calls from shellcode | High — the PIC loaders must call these APIs to function |
| Memory content scanning | Periodically scan writable memory for known malware signatures after each N instructions | High if budget is sufficient — eventually the payload is in cleartext |
The fundamental limitation of any encryption-based evasion (including Shoggoth) is that the payload must be decrypted in memory before it can execute. At that moment, it is vulnerable to memory scanning. Polymorphism protects the file on disk and the decoder stub from signature matching, but the decrypted payload is the original unmodified shellcode or PE file.
7. Detection: Behavioral Heuristics
Beyond entropy and emulation, behavioral heuristics target the execution patterns common to polymorphic decoders:
- Loop-then-jump pattern: A tight loop that modifies a memory region followed by a jump into that region is strongly indicative of a decoder stub
- Stack-allocated S-box: Allocating exactly 256 bytes on the stack, filling with an identity permutation, then permuting — this is the RC4 KSA fingerprint
- RIP-relative self-reference: Code that computes its own address and then reads/writes data relative to itself suggests position-independent self-modifying code
- High junk-to-real instruction ratio: Code with an unusually high proportion of side-effect-free instructions suggests automated junk insertion
- Entropy transition in code flow: Execution transitioning from normal code into a high-entropy region that it just wrote to
8. Comparison with Other Frameworks
How does Shoggoth compare with other commonly used payload encoding tools?
| Feature | Shoggoth | msfvenom (shikata_ga_nai) | Veil | Custom XOR encoder |
|---|---|---|---|---|
| Polymorphism level | Full — registers, operations, junk, keys all random | Moderate — random key, some register variation, FPU-based addressing | Language-level — generates unique source code per run | None — static decoder |
| Encryption stages | 2 (RC4 + random block cipher) | 1 (additive XOR feedback) | Varies by template | 1 (XOR) |
| Junk code | Recursive, multi-category (jumps, fake calls, opaque preds) | Limited (some NOP variations) | Source-level dead code | None |
| Output format | Flat PIC blob | Flat shellcode or wrapped in format | EXE, DLL, various languages | Flat shellcode |
| Input types | Shellcode, PE, COFF/BOF | Shellcode only (for encoding) | Shellcode (wrapped in payloads) | Shellcode only |
| Code generation | asmjit (runtime assembly) | Rex::Poly (Ruby polymorphic library) | Template rendering | Hardcoded bytes |
| Detection rate | Low (novel output each time) | High (widely signatured after years of use) | Moderate (known templates) | Very high (trivially signatured) |
shikata_ga_nai Is Heavily Signatured
The msfvenom shikata_ga_nai encoder, while polymorphic, has been in wide use since the mid-2000s. AV vendors have developed numerous detection strategies specific to its output: the FPU FNSTENV/FSTENV instruction for RIP recovery, the additive feedback XOR pattern, and the characteristic loop structure. Shoggoth avoids these well-known patterns entirely by using different PIC addressing methods and a completely different encryption approach.
9. Limitations & Operational Considerations
Understanding Shoggoth’s limitations is as important as understanding its capabilities:
What Shoggoth Does Not Do
- No runtime evasion: Once the payload is decrypted and executing, Shoggoth provides no protection. Memory scans during execution will find the cleartext payload.
- No anti-debug: The decoder stubs do not include anti-debugging checks (no
IsDebuggerPresent, no timing checks, no hardware breakpoint detection). - No sleep obfuscation: Shoggoth is a file-level encoder, not a runtime protector. It does not encrypt the payload during sleep periods.
- Entropy is detectable: The encrypted payload region has near-random entropy that statistical analysis can flag.
- x64 only: The current implementation targets x86-64. The PIC loaders and asmjit code generation are 64-bit only.
- Emulation can defeat it: Given sufficient instruction budget, an emulator will eventually step through all junk code and reach the decrypted payload.
Shoggoth is best understood as one layer in a defense-in-depth offensive strategy. It protects the payload at rest (on disk, in transit) and provides initial evasion when the payload first begins executing. For ongoing runtime protection, it should be combined with sleep obfuscation, in-memory encryption, and behavioral evasion techniques.
10. Course Summary
Course Complete
You have completed the Shoggoth Polymorphic Engine Masterclass. You now understand signature-based detection and its limitations, the principles of polymorphic and metamorphic code, how asmjit enables runtime code generation, Shoggoth’s two-stage encryption pipeline with RC4 and random block ciphers, how decoder stubs are dynamically generated with register randomization and instruction substitution, the role of junk code in anti-analysis and anti-emulation, and how defenders detect polymorphic output through entropy analysis, emulation, and behavioral heuristics.
Knowledge Check
Q1: What is the most fundamental limitation of any encryption-based evasion technique, including Shoggoth?
Q2: How does entropy analysis detect Shoggoth output?
Q3: What key advantage does Shoggoth have over msfvenom’s shikata_ga_nai encoder?