Difficulty: Advanced

Module 8: Full Chain, Output Formats & Detection

End-to-end execution walkthrough, PIC output structure, COFF/PE wrapping mechanics, entropy analysis, emulation-based detection, and comparison with other encoder frameworks.

Module Objective

Trace the complete execution chain from the moment Shoggoth output begins running, understand the PIC blob’s memory layout, see how PE and COFF loaders bridge the gap between shellcode and structured executables, learn the detection techniques that defenders use against polymorphic output (entropy analysis, emulation, behavioral heuristics), and compare Shoggoth with Veil, msfvenom, and other encoding frameworks.

1. Full Execution Chain Walkthrough

Let us trace what happens from the moment a Shoggoth-encrypted payload begins executing in memory. This walkthrough assumes both encryption stages are active (the default):

Runtime Execution Flow

Junk Preamble
Executes harmlessly

→

Block Cipher Decoder
Decrypts Stage 2

→

Junk Interlude
Executes harmlessly

→

RC4 Decoder
Decrypts Stage 1

→

Cleartext Payload
Begins execution

Step-by-Step Execution

Junk Preamble (variable ~50-500 bytes): The first bytes executed are garbage instructions — NOPs, self-canceling pairs, jump-over blocks. They execute harmlessly and fall through to the block cipher decoder.
Block Cipher Decoder: Uses lea reg, [rip + offset] to locate the doubly-encrypted payload. Iterates over 8-byte blocks, applying the inverse operation chain (e.g., ROR → SUB → XOR). After this loop completes, the data is now only RC4-encrypted.
Junk Interlude (variable): More garbage instructions separate the two decoder stubs. This prevents a pattern like “block loop immediately followed by RC4 setup” from becoming a signature.
RC4 Decoder: Allocates 256 bytes on the stack for the S-box. Runs the KSA using the embedded key to initialize the permutation. Then runs the PRGA, generating keystream bytes and XORing them with the remaining encrypted data. After completion, restores the stack.
Control Transfer: Execution falls through (or jumps) to the now-decrypted payload. In raw mode, this is the original shellcode. In PE/COFF mode, this is the PIC loader followed by the PE/COFF file.
PE/COFF Loader (if applicable): The PIC loader resolves API addresses by walking PEB → Ldr → InMemoryOrderModuleList, maps PE sections or COFF symbols, applies relocations, and transfers control to the original payload’s entry point.

2. PIC Output Memory Layout

The flat PIC blob that Shoggoth produces has no headers, no sections, no metadata — just raw executable machine code followed by encrypted data. This is intentional: any structural metadata would become a signature.

Layout+--------------------------------------------------+
| Offset 0x0000: Junk Preamble                     |  Variable size
|   - Side-effect-free instructions                 |  (~50-500 bytes)
|   - Jump-over blocks, fake calls                  |
+--------------------------------------------------+
| Block Cipher Decoder Stub                         |  Variable size
|   - Register setup (random regs)                  |  (~100-400 bytes)
|   - LEA to locate encrypted data                  |
|   - Decryption loop (inverse ops + junk)          |
|   - Keys embedded as immediates                   |
+--------------------------------------------------+
| Junk Interlude                                    |  Variable size
+--------------------------------------------------+
| RC4 Decoder Stub                                  |  Variable size
|   - SUB RSP, 256 (S-box allocation)              |  (~200-600 bytes)
|   - KSA loop (with junk)                          |
|   - PRGA loop (with junk)                         |
|   - ADD RSP, 256 (stack restore)                  |
+--------------------------------------------------+
| RC4 Key Data                                      |  8-32 bytes
+--------------------------------------------------+
| Encrypted Payload                                 |  Same size as
|   (doubly encrypted: RC4 then block cipher)       |  original input
+--------------------------------------------------+

No File Format Artifacts

The output has no MZ header, no PE sections, no ELF magic bytes, no relocations, no import tables. It is pure position-independent machine code that can be loaded at any address and jumped to. This makes it compatible with any shellcode injection technique: VirtualAlloc + memcpy + cast-to-function-pointer, CreateThread, APC injection, process hollowing, etc.

3. PE Mode: Reflective Loading

When Shoggoth wraps a PE file, the decrypted output is not a standalone executable — it is the PIC PE loader followed by the PE file bytes. The loader acts as a minimal reflective loader:

Loader Step	What Happens	Why It’s Needed
1. Find image base	Calculate pointer to the appended PE file using RIP-relative offset	The loader must know where the PE data starts in memory
2. Parse PE headers	Read DOS header, NT headers, section table	Determine section layout, entry point, import/relocation directories
3. Allocate memory	`VirtualAlloc` with `SizeOfImage` from optional header	Create a contiguous block to map sections into
4. Map sections	Copy each section from raw data to virtual address offset	Sections must be at their correct RVAs for references to work
5. Process relocations	Apply base relocation delta if loaded at non-preferred address	Absolute addresses in the PE must be fixed up
6. Resolve imports	Walk PEB to find loaded DLLs, parse export tables, fill IAT	The PE needs function pointers for API calls
7. Set permissions	`VirtualProtect` per section (RX for .text, RW for .data, etc.)	Proper memory permissions for security and correctness
8. Call entry point	Jump to `AddressOfEntryPoint` + new base address	Begin executing the original PE

All API addresses (VirtualAlloc, VirtualProtect, LoadLibraryA, GetProcAddress) are resolved at runtime by the loader through PEB walking — no import table in the PIC blob itself.

4. COFF Mode: BOF Loading

COFF/BOF (Beacon Object File) mode handles .o object files, commonly used with Cobalt Strike’s inline-execute or similar BOF runners. The COFF loader performs a simplified version of what a linker does:

Parse COFF headers — read section table, symbol table, string table
Map sections — allocate memory and copy section data
Process relocations — apply COFF relocations (IMAGE_REL_AMD64_ADDR64, IMAGE_REL_AMD64_REL32, etc.)
Resolve external symbols — BOF external functions (like BeaconOutput, Windows APIs) are resolved through a function table or PEB walking
Call entry point — execute the go function (standard BOF entry point) with optional arguments

The --coff-arg flag allows passing arguments to the BOF in the format expected by Cobalt Strike’s BeaconDataParse API, generated using the included beacon_generate.py script.

5. Detection: Entropy Analysis

One of the most effective detection techniques against encrypted payloads is entropy analysis. Shannon entropy measures the randomness of data on a scale from 0 (completely uniform) to 8 (maximum randomness for byte data). Encrypted data has characteristically high entropy:

Data Type	Typical Entropy (bits/byte)	Pattern
English text	3.5 – 5.0	Low entropy, repetitive character distribution
Compiled x86 code	5.5 – 6.5	Moderate entropy, structured opcode patterns
Compressed data (zlib, etc.)	7.5 – 8.0	Near-maximum entropy
Encrypted data (AES, RC4, etc.)	7.9 – 8.0	Near-maximum entropy, indistinguishable from random
Shoggoth output	6.0 – 7.8	Mixed: low-entropy decoder stub + high-entropy encrypted payload

The Entropy Profile Is a Signature

A Shoggoth output has a distinctive entropy profile: a block of moderate-entropy code (the decoder stubs with their junk instructions) followed by a block of high-entropy data (the encrypted payload). This “step function” in entropy is itself detectable. Tools like binwalk --entropy or pestudio can visualize this pattern, revealing the boundary between code and encrypted data.

Defenders can use sliding-window entropy analysis to flag regions of a binary that transition sharply from normal code entropy (~6.0) to encrypted data entropy (~7.9). This does not reveal what the payload is, but it identifies that encryption is present — which is suspicious in shellcode.

6. Detection: Emulation & Sandbox Analysis

Despite Shoggoth’s junk code slowing emulation, advanced detection still relies on emulation-based unpacking:

Emulation-Based Detection Approaches

Technique	How It Works	Effectiveness Against Shoggoth
Instruction-count emulation	Run code in emulator for N instructions, scan memory for known patterns	Moderate — junk code consumes budget, but high-budget emulators may succeed
Write-then-execute detection	Monitor for memory regions that are written to then executed (self-modifying code pattern)	High — the decoder must write decrypted bytes then execute them; this is inherent to the design
API call monitoring	In PE/COFF mode, watch for VirtualAlloc, VirtualProtect, LoadLibrary calls from shellcode	High — the PIC loaders must call these APIs to function
Memory content scanning	Periodically scan writable memory for known malware signatures after each N instructions	High if budget is sufficient — eventually the payload is in cleartext

The fundamental limitation of any encryption-based evasion (including Shoggoth) is that the payload must be decrypted in memory before it can execute. At that moment, it is vulnerable to memory scanning. Polymorphism protects the file on disk and the decoder stub from signature matching, but the decrypted payload is the original unmodified shellcode or PE file.

7. Detection: Behavioral Heuristics

Beyond entropy and emulation, behavioral heuristics target the execution patterns common to polymorphic decoders:

Loop-then-jump pattern: A tight loop that modifies a memory region followed by a jump into that region is strongly indicative of a decoder stub
Stack-allocated S-box: Allocating exactly 256 bytes on the stack, filling with an identity permutation, then permuting — this is the RC4 KSA fingerprint
RIP-relative self-reference: Code that computes its own address and then reads/writes data relative to itself suggests position-independent self-modifying code
High junk-to-real instruction ratio: Code with an unusually high proportion of side-effect-free instructions suggests automated junk insertion
Entropy transition in code flow: Execution transitioning from normal code into a high-entropy region that it just wrote to

8. Comparison with Other Frameworks

How does Shoggoth compare with other commonly used payload encoding tools?

Feature	Shoggoth	msfvenom (shikata_ga_nai)	Veil	Custom XOR encoder
Polymorphism level	Full — registers, operations, junk, keys all random	Moderate — random key, some register variation, FPU-based addressing	Language-level — generates unique source code per run	None — static decoder
Encryption stages	2 (RC4 + random block cipher)	1 (additive XOR feedback)	Varies by template	1 (XOR)
Junk code	Recursive, multi-category (jumps, fake calls, opaque preds)	Limited (some NOP variations)	Source-level dead code	None
Output format	Flat PIC blob	Flat shellcode or wrapped in format	EXE, DLL, various languages	Flat shellcode
Input types	Shellcode, PE, COFF/BOF	Shellcode only (for encoding)	Shellcode (wrapped in payloads)	Shellcode only
Code generation	asmjit (runtime assembly)	Rex::Poly (Ruby polymorphic library)	Template rendering	Hardcoded bytes
Detection rate	Low (novel output each time)	High (widely signatured after years of use)	Moderate (known templates)	Very high (trivially signatured)

shikata_ga_nai Is Heavily Signatured

The msfvenom shikata_ga_nai encoder, while polymorphic, has been in wide use since the mid-2000s. AV vendors have developed numerous detection strategies specific to its output: the FPU FNSTENV/FSTENV instruction for RIP recovery, the additive feedback XOR pattern, and the characteristic loop structure. Shoggoth avoids these well-known patterns entirely by using different PIC addressing methods and a completely different encryption approach.

9. Limitations & Operational Considerations

Understanding Shoggoth’s limitations is as important as understanding its capabilities:

What Shoggoth Does Not Do

No runtime evasion: Once the payload is decrypted and executing, Shoggoth provides no protection. Memory scans during execution will find the cleartext payload.
No anti-debug: The decoder stubs do not include anti-debugging checks (no IsDebuggerPresent, no timing checks, no hardware breakpoint detection).
No sleep obfuscation: Shoggoth is a file-level encoder, not a runtime protector. It does not encrypt the payload during sleep periods.
Entropy is detectable: The encrypted payload region has near-random entropy that statistical analysis can flag.
x64 only: The current implementation targets x86-64. The PIC loaders and asmjit code generation are 64-bit only.
Emulation can defeat it: Given sufficient instruction budget, an emulator will eventually step through all junk code and reach the decrypted payload.

Shoggoth is best understood as one layer in a defense-in-depth offensive strategy. It protects the payload at rest (on disk, in transit) and provides initial evasion when the payload first begins executing. For ongoing runtime protection, it should be combined with sleep obfuscation, in-memory encryption, and behavioral evasion techniques.

10. Course Summary

Course Complete

You have completed the Shoggoth Polymorphic Engine Masterclass. You now understand signature-based detection and its limitations, the principles of polymorphic and metamorphic code, how asmjit enables runtime code generation, Shoggoth’s two-stage encryption pipeline with RC4 and random block ciphers, how decoder stubs are dynamically generated with register randomization and instruction substitution, the role of junk code in anti-analysis and anti-emulation, and how defenders detect polymorphic output through entropy analysis, emulation, and behavioral heuristics.

← Prev: Junk Code & Anti-Analysis Back to Course Home →