Difficulty: Advanced

Module 8: Full Chain, Output Formats & Detection

End-to-end execution walkthrough, PIC output structure, COFF/PE wrapping mechanics, entropy analysis, emulation-based detection, and comparison with other encoder frameworks.

Module Objective

Trace the complete execution chain from the moment Shoggoth output begins running, understand the PIC blob’s memory layout, see how PE and COFF loaders bridge the gap between shellcode and structured executables, learn the detection techniques that defenders use against polymorphic output (entropy analysis, emulation, behavioral heuristics), and compare Shoggoth with Veil, msfvenom, and other encoding frameworks.

1. Full Execution Chain Walkthrough

Let us trace what happens from the moment a Shoggoth-encrypted payload begins executing in memory. This walkthrough assumes both encryption stages are active (the default):

Runtime Execution Flow

Junk Preamble
Executes harmlessly
Block Cipher Decoder
Decrypts Stage 2
Junk Interlude
Executes harmlessly
RC4 Decoder
Decrypts Stage 1
Cleartext Payload
Begins execution

Step-by-Step Execution

  1. Junk Preamble (variable ~50-500 bytes): The first bytes executed are garbage instructions — NOPs, self-canceling pairs, jump-over blocks. They execute harmlessly and fall through to the block cipher decoder.
  2. Block Cipher Decoder: Uses lea reg, [rip + offset] to locate the doubly-encrypted payload. Iterates over 8-byte blocks, applying the inverse operation chain (e.g., ROR → SUB → XOR). After this loop completes, the data is now only RC4-encrypted.
  3. Junk Interlude (variable): More garbage instructions separate the two decoder stubs. This prevents a pattern like “block loop immediately followed by RC4 setup” from becoming a signature.
  4. RC4 Decoder: Allocates 256 bytes on the stack for the S-box. Runs the KSA using the embedded key to initialize the permutation. Then runs the PRGA, generating keystream bytes and XORing them with the remaining encrypted data. After completion, restores the stack.
  5. Control Transfer: Execution falls through (or jumps) to the now-decrypted payload. In raw mode, this is the original shellcode. In PE/COFF mode, this is the PIC loader followed by the PE/COFF file.
  6. PE/COFF Loader (if applicable): The PIC loader resolves API addresses by walking PEB → Ldr → InMemoryOrderModuleList, maps PE sections or COFF symbols, applies relocations, and transfers control to the original payload’s entry point.

2. PIC Output Memory Layout

The flat PIC blob that Shoggoth produces has no headers, no sections, no metadata — just raw executable machine code followed by encrypted data. This is intentional: any structural metadata would become a signature.

Layout+--------------------------------------------------+
| Offset 0x0000: Junk Preamble                     |  Variable size
|   - Side-effect-free instructions                 |  (~50-500 bytes)
|   - Jump-over blocks, fake calls                  |
+--------------------------------------------------+
| Block Cipher Decoder Stub                         |  Variable size
|   - Register setup (random regs)                  |  (~100-400 bytes)
|   - LEA to locate encrypted data                  |
|   - Decryption loop (inverse ops + junk)          |
|   - Keys embedded as immediates                   |
+--------------------------------------------------+
| Junk Interlude                                    |  Variable size
+--------------------------------------------------+
| RC4 Decoder Stub                                  |  Variable size
|   - SUB RSP, 256 (S-box allocation)              |  (~200-600 bytes)
|   - KSA loop (with junk)                          |
|   - PRGA loop (with junk)                         |
|   - ADD RSP, 256 (stack restore)                  |
+--------------------------------------------------+
| RC4 Key Data                                      |  8-32 bytes
+--------------------------------------------------+
| Encrypted Payload                                 |  Same size as
|   (doubly encrypted: RC4 then block cipher)       |  original input
+--------------------------------------------------+

No File Format Artifacts

The output has no MZ header, no PE sections, no ELF magic bytes, no relocations, no import tables. It is pure position-independent machine code that can be loaded at any address and jumped to. This makes it compatible with any shellcode injection technique: VirtualAlloc + memcpy + cast-to-function-pointer, CreateThread, APC injection, process hollowing, etc.

3. PE Mode: Reflective Loading

When Shoggoth wraps a PE file, the decrypted output is not a standalone executable — it is the PIC PE loader followed by the PE file bytes. The loader acts as a minimal reflective loader:

Loader StepWhat HappensWhy It’s Needed
1. Find image baseCalculate pointer to the appended PE file using RIP-relative offsetThe loader must know where the PE data starts in memory
2. Parse PE headersRead DOS header, NT headers, section tableDetermine section layout, entry point, import/relocation directories
3. Allocate memoryVirtualAlloc with SizeOfImage from optional headerCreate a contiguous block to map sections into
4. Map sectionsCopy each section from raw data to virtual address offsetSections must be at their correct RVAs for references to work
5. Process relocationsApply base relocation delta if loaded at non-preferred addressAbsolute addresses in the PE must be fixed up
6. Resolve importsWalk PEB to find loaded DLLs, parse export tables, fill IATThe PE needs function pointers for API calls
7. Set permissionsVirtualProtect per section (RX for .text, RW for .data, etc.)Proper memory permissions for security and correctness
8. Call entry pointJump to AddressOfEntryPoint + new base addressBegin executing the original PE

All API addresses (VirtualAlloc, VirtualProtect, LoadLibraryA, GetProcAddress) are resolved at runtime by the loader through PEB walking — no import table in the PIC blob itself.

4. COFF Mode: BOF Loading

COFF/BOF (Beacon Object File) mode handles .o object files, commonly used with Cobalt Strike’s inline-execute or similar BOF runners. The COFF loader performs a simplified version of what a linker does:

The --coff-arg flag allows passing arguments to the BOF in the format expected by Cobalt Strike’s BeaconDataParse API, generated using the included beacon_generate.py script.

5. Detection: Entropy Analysis

One of the most effective detection techniques against encrypted payloads is entropy analysis. Shannon entropy measures the randomness of data on a scale from 0 (completely uniform) to 8 (maximum randomness for byte data). Encrypted data has characteristically high entropy:

Data TypeTypical Entropy (bits/byte)Pattern
English text3.5 – 5.0Low entropy, repetitive character distribution
Compiled x86 code5.5 – 6.5Moderate entropy, structured opcode patterns
Compressed data (zlib, etc.)7.5 – 8.0Near-maximum entropy
Encrypted data (AES, RC4, etc.)7.9 – 8.0Near-maximum entropy, indistinguishable from random
Shoggoth output6.0 – 7.8Mixed: low-entropy decoder stub + high-entropy encrypted payload

The Entropy Profile Is a Signature

A Shoggoth output has a distinctive entropy profile: a block of moderate-entropy code (the decoder stubs with their junk instructions) followed by a block of high-entropy data (the encrypted payload). This “step function” in entropy is itself detectable. Tools like binwalk --entropy or pestudio can visualize this pattern, revealing the boundary between code and encrypted data.

Defenders can use sliding-window entropy analysis to flag regions of a binary that transition sharply from normal code entropy (~6.0) to encrypted data entropy (~7.9). This does not reveal what the payload is, but it identifies that encryption is present — which is suspicious in shellcode.

6. Detection: Emulation & Sandbox Analysis

Despite Shoggoth’s junk code slowing emulation, advanced detection still relies on emulation-based unpacking:

Emulation-Based Detection Approaches

TechniqueHow It WorksEffectiveness Against Shoggoth
Instruction-count emulationRun code in emulator for N instructions, scan memory for known patternsModerate — junk code consumes budget, but high-budget emulators may succeed
Write-then-execute detectionMonitor for memory regions that are written to then executed (self-modifying code pattern)High — the decoder must write decrypted bytes then execute them; this is inherent to the design
API call monitoringIn PE/COFF mode, watch for VirtualAlloc, VirtualProtect, LoadLibrary calls from shellcodeHigh — the PIC loaders must call these APIs to function
Memory content scanningPeriodically scan writable memory for known malware signatures after each N instructionsHigh if budget is sufficient — eventually the payload is in cleartext

The fundamental limitation of any encryption-based evasion (including Shoggoth) is that the payload must be decrypted in memory before it can execute. At that moment, it is vulnerable to memory scanning. Polymorphism protects the file on disk and the decoder stub from signature matching, but the decrypted payload is the original unmodified shellcode or PE file.

7. Detection: Behavioral Heuristics

Beyond entropy and emulation, behavioral heuristics target the execution patterns common to polymorphic decoders:

8. Comparison with Other Frameworks

How does Shoggoth compare with other commonly used payload encoding tools?

FeatureShoggothmsfvenom (shikata_ga_nai)VeilCustom XOR encoder
Polymorphism levelFull — registers, operations, junk, keys all randomModerate — random key, some register variation, FPU-based addressingLanguage-level — generates unique source code per runNone — static decoder
Encryption stages2 (RC4 + random block cipher)1 (additive XOR feedback)Varies by template1 (XOR)
Junk codeRecursive, multi-category (jumps, fake calls, opaque preds)Limited (some NOP variations)Source-level dead codeNone
Output formatFlat PIC blobFlat shellcode or wrapped in formatEXE, DLL, various languagesFlat shellcode
Input typesShellcode, PE, COFF/BOFShellcode only (for encoding)Shellcode (wrapped in payloads)Shellcode only
Code generationasmjit (runtime assembly)Rex::Poly (Ruby polymorphic library)Template renderingHardcoded bytes
Detection rateLow (novel output each time)High (widely signatured after years of use)Moderate (known templates)Very high (trivially signatured)

shikata_ga_nai Is Heavily Signatured

The msfvenom shikata_ga_nai encoder, while polymorphic, has been in wide use since the mid-2000s. AV vendors have developed numerous detection strategies specific to its output: the FPU FNSTENV/FSTENV instruction for RIP recovery, the additive feedback XOR pattern, and the characteristic loop structure. Shoggoth avoids these well-known patterns entirely by using different PIC addressing methods and a completely different encryption approach.

9. Limitations & Operational Considerations

Understanding Shoggoth’s limitations is as important as understanding its capabilities:

What Shoggoth Does Not Do

Shoggoth is best understood as one layer in a defense-in-depth offensive strategy. It protects the payload at rest (on disk, in transit) and provides initial evasion when the payload first begins executing. For ongoing runtime protection, it should be combined with sleep obfuscation, in-memory encryption, and behavioral evasion techniques.

10. Course Summary

Course Complete

You have completed the Shoggoth Polymorphic Engine Masterclass. You now understand signature-based detection and its limitations, the principles of polymorphic and metamorphic code, how asmjit enables runtime code generation, Shoggoth’s two-stage encryption pipeline with RC4 and random block ciphers, how decoder stubs are dynamically generated with register randomization and instruction substitution, the role of junk code in anti-analysis and anti-emulation, and how defenders detect polymorphic output through entropy analysis, emulation, and behavioral heuristics.

Knowledge Check

Q1: What is the most fundamental limitation of any encryption-based evasion technique, including Shoggoth?

Every encryption-based technique shares this fundamental constraint: to execute, the payload must exist in cleartext in memory. At that point, a memory scanner can detect it using the same signatures that would catch the unencrypted payload on disk. Polymorphism protects the decoder and the file at rest, but the ultimate payload must eventually be exposed.

Q2: How does entropy analysis detect Shoggoth output?

Shoggoth output has a characteristic entropy profile: the decoder stubs (with junk code) have moderate entropy (~6.0-6.5 bits/byte, typical of x86 code), while the encrypted payload has near-maximum entropy (~7.9 bits/byte). This sharp transition is detectable by sliding-window entropy analysis, even though the specific bytes vary between outputs.

Q3: What key advantage does Shoggoth have over msfvenom’s shikata_ga_nai encoder?

shikata_ga_nai has been in use since the mid-2000s and its patterns (FPU-based RIP recovery via FNSTENV/FSTENV, additive feedback XOR loop, characteristic register usage) are well-documented and heavily signatured by AV vendors. Shoggoth uses modern RIP-relative addressing, a completely different two-stage encryption scheme (RC4 + random block cipher), and sophisticated junk code generation, making it a fresh target that existing shikata_ga_nai signatures do not cover.