Difficulty: Beginner

Module 3: asmjit — Runtime Code Generation

Understanding the JIT assembly library that powers Shoggoth’s dynamic instruction emission and why it is ideal for polymorphic stub generation.

Module Objective

Learn what asmjit is, how its core classes (CodeHolder, x86::Assembler, JitRuntime) work, understand the difference between the Assembler and Builder/Compiler interfaces, and see how Shoggoth leverages asmjit to emit randomized x86-64 machine code at runtime.

1. What Is asmjit?

asmjit is an open-source C++ library for runtime machine code generation. It allows programs to construct x86 and x86-64 assembly instructions programmatically and emit them as executable machine code in memory. Unlike a traditional assembler (NASM, MASM) that runs at build time, asmjit operates at runtime — your C++ program decides what instructions to generate while it is executing.

This runtime generation capability is exactly what a polymorphic engine needs. Instead of selecting from pre-built decoder stubs (which would be signaturable), the engine uses asmjit to construct a fresh decoder every time, choosing different registers, instructions, and layouts based on random decisions.

FeatureTraditional Assembler (NASM)asmjit
When code is generatedBuild time (compile/link)Runtime (during program execution)
OutputObject files (.o/.obj)Machine code in memory or byte buffer
Dynamic decisionsNo (macro-level only)Yes (any C++ logic can drive instruction selection)
Register selectionHardcoded by the programmerCan be parameterized — choose registers at runtime
Use caseStatic code, OS kernels, driversJIT compilers, dynamic code gen, polymorphic engines

2. Core Architecture

asmjit is organized around a few key classes that form a pipeline: you create a code container, attach an emitter to it, emit instructions, and then either extract the raw bytes or make the code executable.

asmjit Code Generation Pipeline

CodeHolder
Code container & sections
x86::Assembler
Instruction emitter
Machine Code
Raw bytes in CodeHolder
JitRuntime / Export
Execute or extract bytes

2.1 CodeHolder

The CodeHolder is the central container that stores generated machine code, manages code sections (like .text), handles relocations, and tracks labels. You initialize it with an Environment that specifies the target architecture:

C++#include <asmjit/asmjit.h>
using namespace asmjit;

// Create a CodeHolder targeting x86-64
CodeHolder code;
Environment env = Environment::host(); // or explicitly: Arch::kX64
code.init(env);

2.2 x86::Assembler

The x86::Assembler is the low-level instruction emitter. You attach it to a CodeHolder and call methods corresponding to x86 instructions. Each method call appends the encoded instruction bytes to the CodeHolder’s internal buffer:

C++x86::Assembler a(&code);

// Emit instructions - these become raw machine code bytes
a.push(x86::rbp);
a.mov(x86::rbp, x86::rsp);
a.xor_(x86::rax, x86::rax);     // Note: xor_ because 'xor' is a C++ keyword
a.mov(x86::rax, 42);
a.pop(x86::rbp);
a.ret();

Note on Naming Conventions

asmjit appends an underscore to instruction names that conflict with C++ keywords: xor_, and_, or_, not_. The rest use their standard mnemonics: mov, add, sub, push, pop, rol, ror, inc, dec, jmp, je, jne, etc.

2.3 JitRuntime

JitRuntime allocates executable memory (using VirtualAlloc on Windows with PAGE_EXECUTE_READWRITE permissions), copies the generated machine code into it, and returns a function pointer you can call directly:

C++JitRuntime rt;

// Allocate executable memory and copy code
typedef int (*Func)();
Func fn;
Error err = rt.add(&fn, &code);
if (err) { /* handle error */ }

// Call the generated code!
int result = fn();  // result == 42

// Release when done
rt.release(fn);

3. Labels and Branches

Loops and conditional branches are essential for decoder stubs. asmjit provides a Label type for managing branch targets. Labels can be forward-referenced — you can jump to a label before binding it, and asmjit will patch the offset when the label is bound later:

C++x86::Assembler a(&code);
Label loopStart = a.newLabel();
Label loopEnd = a.newLabel();

// Setup: rcx = count, rsi = data pointer
a.mov(x86::rcx, payloadSize);
a.lea(x86::rsi, x86::ptr(x86::rip));  // RIP-relative for PIC

a.bind(loopStart);              // loopStart:
a.xor_(x86::byte_ptr(x86::rsi), 0xAB);  // XOR decrypt byte
a.inc(x86::rsi);                // advance pointer
a.dec(x86::rcx);                // decrement counter
a.jnz(loopStart);              // loop if not zero

a.bind(loopEnd);                // loopEnd:
// ... transfer control to decrypted payload

This label mechanism is critical for Shoggoth. The decoder stub contains a decryption loop whose body varies between generations, but the loop structure (branch back to start, exit when done) is managed through labels. asmjit computes the correct relative offsets automatically, even as junk instructions change the loop body size.

4. Register Parameterization

One of the most powerful features for polymorphic generation is that asmjit register operands are values, not syntax. You can store registers in variables and use them interchangeably:

C++// Register pool for randomization
x86::Gp availableRegs[] = {
    x86::rax, x86::rbx, x86::rcx, x86::rdx,
    x86::rsi, x86::rdi, x86::r8,  x86::r9,
    x86::r10, x86::r11, x86::r12, x86::r13,
    x86::r14, x86::r15
    // RSP excluded - must not be clobbered
};

// Randomly select registers for decoder roles
std::shuffle(availableRegs, availableRegs + 14, rng);

x86::Gp regPointer = availableRegs[0];  // data pointer
x86::Gp regCounter = availableRegs[1];  // loop counter
x86::Gp regKey     = availableRegs[2];  // encryption key
x86::Gp regTemp    = availableRegs[3];  // scratch register

// Use them in code generation - different registers each time!
a.mov(regCounter, payloadSize);
a.lea(regPointer, x86::ptr(x86::rip, offset));
a.mov(regKey, encryptionKey);

Label loop = a.newLabel();
a.bind(loop);
a.xor_(x86::byte_ptr(regPointer), regKey);
a.inc(regPointer);
a.dec(regCounter);
a.jnz(loop);

The same logical decoder — load counter, load pointer, decrypt byte, advance, loop — produces completely different machine code depending on which registers are selected. The opcodes differ because x86-64 encodes the register number into the ModR/M byte and REX prefix.

5. Memory Operands

asmjit provides a rich memory operand system through x86::Mem and the x86::ptr / x86::byte_ptr / x86::qword_ptr helpers. These are essential for building decoder stubs that read and write the encrypted payload:

C++// Direct memory access with various addressing modes
a.mov(x86::rax, x86::qword_ptr(x86::rsi));           // [rsi]
a.xor_(x86::byte_ptr(x86::rsi, x86::rcx), 0x41);     // [rsi + rcx]
a.mov(x86::rax, x86::qword_ptr(x86::rbx, x86::rcx, 3, 16)); // [rbx + rcx*8 + 16]

// RIP-relative addressing (essential for PIC)
a.lea(x86::rax, x86::ptr(x86::rip, someLabel));       // lea rax, [rip + offset]

Position-Independence Requirement

Since Shoggoth’s output must be position-independent (executable from any memory address), all data references in the decoder stub use RIP-relative addressing. The x86::ptr(x86::rip, label) pattern generates lea reg, [rip + offset] instructions that work regardless of where the code is loaded. Absolute addresses are never used.

6. Extracting Raw Machine Code

While JitRuntime is useful for testing (making generated code directly executable), Shoggoth needs to extract the raw bytes to write to an output file. The CodeHolder provides access to the generated code through its section buffer:

C++// After all instructions are emitted...
CodeHolder code;
code.init(Environment::host());
x86::Assembler a(&code);

// ... emit instructions ...

// Access the generated code section
Section* textSection = code.textSection();
size_t codeSize = textSection->bufferSize();
const uint8_t* codeBytes = textSection->buffer();

// Copy to output buffer
std::vector<uint8_t> output(codeBytes, codeBytes + codeSize);

// Now 'output' contains the raw machine code bytes
// that can be written to a file or prepended to an encrypted payload

This is how Shoggoth captures the decoder stub: it generates the instructions using x86::Assembler, extracts the raw bytes from the CodeHolder, and concatenates them with the encrypted payload to form the final PIC output.

7. Why asmjit Is Perfect for Polymorphic Engines

Combining all these features, asmjit provides the ideal infrastructure for a polymorphic engine:

RequirementHow asmjit Satisfies It
Dynamic instruction selectionC++ control flow chooses which instructions to emit at runtime
Register randomizationRegisters are values, not syntax — store in variables, shuffle, and use
Correct encodingAutomatic REX prefix, ModR/M, SIB, displacement handling
Forward referencesLabels resolve automatically, even with variable-size junk code between branches
PIC generationRIP-relative addressing support for position-independent output
Raw byte extractionCodeHolder section buffer provides direct access to machine code bytes
No runtime dependencyGenerated code is self-contained — no asmjit library needed at execution time

Build-Time vs Run-Time

asmjit is a build-time dependency of Shoggoth (the encryptor tool), not a runtime dependency of the output. The final encrypted PIC blob contains only raw machine code — no C++ library code, no asmjit headers, no runtime. This is a critical distinction: asmjit helps generate the decoder stub, but the generated stub is pure standalone x86-64 machine code.

Knowledge Check

Q1: What is the role of CodeHolder in asmjit?

CodeHolder is the central container in asmjit. It holds the generated machine code bytes in sections (like .text), manages label-to-offset mappings for branch resolution, and handles relocations. JitRuntime (not CodeHolder) handles executable memory allocation.

Q2: Why is register parameterization valuable for a polymorphic engine?

In asmjit, registers are runtime values that can be stored in variables. By randomly selecting which registers serve as the loop counter, data pointer, key register, etc., the same logical decoder produces different opcodes (since register numbers are encoded into the instruction bytes). This eliminates the register-specific byte patterns that signatures rely on.

Q3: Why does Shoggoth’s decoder stub use RIP-relative addressing?

Shoggoth generates position-independent code (PIC) that can be loaded and executed at any memory address. RIP-relative addressing (lea reg, [rip + offset]) calculates data addresses relative to the current instruction pointer, so the code works regardless of its absolute position in memory. Absolute addresses would break if the code is loaded at a different base address.