Difficulty: Beginner

Module 1: Signature-Based Detection & Why Polymorphism

Understanding how static analysis catches malware and why simple encoding schemes fail to provide lasting evasion.

Module Objective

Learn the mechanics behind signature-based detection, understand how YARA rules and pattern-matching engines work, see why XOR encoding and single-key encryption are trivially defeated, and build the conceptual foundation for why polymorphic engines like Shoggoth exist.

1. How Signature-Based Detection Works

Antivirus and EDR products rely heavily on signature-based detection as a first line of defense. A signature is a sequence of bytes, a pattern, or a set of conditions that uniquely identifies a known piece of malicious code. When a file is scanned, the detection engine searches for these known patterns within the binary data.

At its core, signature matching is a string search problem. The scanner maintains a database of thousands (sometimes millions) of signatures and compares every scanned file against this database. If a match is found, the file is flagged as malicious. The key advantage of this approach is its speed and low false-positive rate — a byte-exact match on a known malware sample is almost always a true positive.

Detection MethodHow It WorksStrengthsWeaknesses
Byte SignaturesExact byte sequences extracted from known malware samplesExtremely fast, near-zero false positivesTrivially defeated by changing a single byte
Wildcard SignaturesPatterns with wildcards (e.g., EB ?? 90 90 ?? FF) allowing variable bytesTolerates minor variationsWider patterns increase scan time and false positives
Heuristic SignaturesBehavioral patterns like “allocates RWX memory then writes MZ header”Catches unknown variantsHigher false-positive rate, computationally expensive
YARA RulesFlexible pattern language combining byte patterns, strings, conditions, and metadataHighly expressive, community-maintained rulesetsRules must be written and maintained; evasion is possible

2. YARA Rules in Practice

YARA is the de facto standard for writing malware signatures in the security industry. A YARA rule consists of strings to search for (hex patterns, text strings, or regular expressions) and a condition that determines when the rule matches. Understanding YARA is essential to understanding what polymorphic engines must defeat.

Consider a simple YARA rule that targets a hypothetical shellcode loader:

YARArule ShellcodeLoader_Generic {
    meta:
        author = "Analyst"
        description = "Detects generic shellcode loader pattern"
    strings:
        $api1 = "VirtualAlloc" ascii
        $api2 = "VirtualProtect" ascii
        $stub = { 48 89 5C 24 08 48 89 6C 24 10 48 89 74 24 18 }
        $xor_loop = { 80 34 ?? ?? 48 FF C? 48 3B ?? 75 }
    condition:
        uint16(0) == 0x5A4D and
        all of ($api*) and
        ($stub or $xor_loop)
}

This rule looks for a PE file containing API import strings, a known function prologue byte pattern, and a characteristic XOR decryption loop. If all conditions match, the file is flagged. The rule is effective against static payloads but falls apart against polymorphic output because the byte patterns, register choices, and instruction sequences change every time.

Key Insight

YARA rules target invariants — byte sequences that remain the same across samples. A polymorphic engine eliminates invariants by ensuring that no two outputs share the same byte sequences in their decoder stubs, encryption keys, or instruction layouts.

3. Why Simple XOR Encoding Fails

The most basic attempt at evading signatures is XOR encoding: take the payload, XOR every byte with a single key, and prepend a small decoder loop. This was sufficient in the 1990s but is trivially defeated today for several reasons:

3.1 The Decoder Stub Is Static

Even though the encrypted payload changes when the key changes, the decoder loop itself remains identical. A YARA rule can simply target the decoder stub rather than the payload:

ASM; Classic single-byte XOR decoder - always the same bytes
    jmp short get_address     ; EB XX
get_address:
    pop rsi                   ; 5E
    xor rcx, rcx              ; 48 31 C9
    mov cl, PAYLOAD_LEN       ; B1 XX
decode_loop:
    xor byte [rsi + rcx], KEY ; 80 74 0E XX
    loop decode_loop           ; E2 FA
    jmp rsi                   ; FF E6

The opcodes EB, 5E, 48 31 C9, 80 74 0E, E2 FA, FF E6 form a reliable signature. Changing the XOR key changes the operand byte but not the instruction opcodes.

3.2 Statistical Analysis Breaks XOR

Single-byte XOR is vulnerable to frequency analysis. In most payloads, the null byte (0x00) appears frequently. When XORed with key K, null bytes become K. An analyst can find the most frequent byte in the encrypted blob, assume it corresponds to 0x00, and recover the key instantly.

Brute Force Is Trivial

A single-byte XOR key has only 256 possible values. Even without frequency analysis, an automated tool can try all 256 keys in microseconds, check if the decrypted output contains known strings or valid instructions, and recover the payload. Multi-byte XOR with a short repeating key is only marginally better — Kasiski examination and index-of-coincidence analysis can determine the key length and break it.

4. Why Multi-Layer Static Encryption Still Fails

A natural progression from single-byte XOR is to layer multiple encryption operations: first XOR with key A, then ADD with key B, then ROL by 3 bits. This makes frequency analysis harder, but it does not solve the fundamental problem:

Why Static Encryption Fails Against Emulation

Encrypted Payload
AV Emulator Runs Decoder
Decrypted Payload in Sandbox
Signature Match!

5. The Case for Polymorphism

Polymorphism solves the fundamental weakness of static encoding by ensuring that the decoder stub itself changes with every generation. A polymorphic engine does not merely change the encryption key — it changes the instructions used to perform decryption, the registers allocated, the order of operations, and even inserts meaningless junk code between real instructions.

The goal is to eliminate all static invariants from the output. Two runs of a polymorphic engine with the same input payload produce outputs that:

PropertyStatic EncoderPolymorphic Engine
Decoder stubIdentical every timeDifferent instructions, registers, layout each time
EncryptionSame algorithm, different keyDifferent algorithm chain, different keys, different order
Junk codeNoneRandomly generated NOP-equivalent instructions inserted throughout
Register usageHardcoded registersRandomly selected from available pool
Signature resilienceTrivially signatured on decoderNo stable byte pattern to signature
Emulation resilienceNoneJunk code + opaque predicates slow emulation; can exceed time budgets

6. What Shoggoth Brings to the Table

Shoggoth by frkngksl is a modern polymorphic encryptor implemented in C++ using the asmjit library for runtime x86-64 code generation. It takes three input types (raw shellcode, PE files, COFF/BOF files) and produces position-independent encrypted output that self-decrypts at runtime.

What makes Shoggoth a true polymorphic engine rather than a simple encoder:

Shoggoth’s Polymorphic Properties

Over the next seven modules, we will dissect every component of this engine: the asmjit code generation library, the encryption pipeline, the decoder stub construction, the junk code strategies, and the final output packaging for each supported format.

Knowledge Check

Q1: Why does a single-byte XOR encoder fail against modern detection?

Single-byte XOR has a fixed 256-value key space that can be exhausted in microseconds. More importantly, the decoder loop uses identical opcodes every time, making it trivially signaturable by YARA rules that target the decoder rather than the encrypted payload.

Q2: What is the primary advantage of a polymorphic engine over a static encoder?

The defining characteristic of a polymorphic engine is that the decoder stub (not just the encrypted payload) is different each time. This eliminates the static byte patterns that signature-based detection relies on, forcing defenders to use more expensive heuristic or emulation-based approaches.

Q3: How does an AV emulator defeat static encryption?

AV emulators run the binary (or just the decoder stub) in a lightweight CPU emulator/sandbox. Once the decoder runs and produces the cleartext payload, the emulator scans that decrypted output against its signature database. This makes the encryption layer transparent — the emulator effectively undoes it.