Module 3: asmjit — Runtime Code Generation
Understanding the JIT assembly library that powers Shoggoth’s dynamic instruction emission and why it is ideal for polymorphic stub generation.
Module Objective
Learn what asmjit is, how its core classes (CodeHolder, x86::Assembler, JitRuntime) work, understand the difference between the Assembler and Builder/Compiler interfaces, and see how Shoggoth leverages asmjit to emit randomized x86-64 machine code at runtime.
1. What Is asmjit?
asmjit is an open-source C++ library for runtime machine code generation. It allows programs to construct x86 and x86-64 assembly instructions programmatically and emit them as executable machine code in memory. Unlike a traditional assembler (NASM, MASM) that runs at build time, asmjit operates at runtime — your C++ program decides what instructions to generate while it is executing.
This runtime generation capability is exactly what a polymorphic engine needs. Instead of selecting from pre-built decoder stubs (which would be signaturable), the engine uses asmjit to construct a fresh decoder every time, choosing different registers, instructions, and layouts based on random decisions.
| Feature | Traditional Assembler (NASM) | asmjit |
|---|---|---|
| When code is generated | Build time (compile/link) | Runtime (during program execution) |
| Output | Object files (.o/.obj) | Machine code in memory or byte buffer |
| Dynamic decisions | No (macro-level only) | Yes (any C++ logic can drive instruction selection) |
| Register selection | Hardcoded by the programmer | Can be parameterized — choose registers at runtime |
| Use case | Static code, OS kernels, drivers | JIT compilers, dynamic code gen, polymorphic engines |
2. Core Architecture
asmjit is organized around a few key classes that form a pipeline: you create a code container, attach an emitter to it, emit instructions, and then either extract the raw bytes or make the code executable.
asmjit Code Generation Pipeline
Code container & sections
Instruction emitter
Raw bytes in CodeHolder
Execute or extract bytes
2.1 CodeHolder
The CodeHolder is the central container that stores generated machine code, manages code sections (like .text), handles relocations, and tracks labels. You initialize it with an Environment that specifies the target architecture:
C++#include <asmjit/asmjit.h>
using namespace asmjit;
// Create a CodeHolder targeting x86-64
CodeHolder code;
Environment env = Environment::host(); // or explicitly: Arch::kX64
code.init(env);
2.2 x86::Assembler
The x86::Assembler is the low-level instruction emitter. You attach it to a CodeHolder and call methods corresponding to x86 instructions. Each method call appends the encoded instruction bytes to the CodeHolder’s internal buffer:
C++x86::Assembler a(&code);
// Emit instructions - these become raw machine code bytes
a.push(x86::rbp);
a.mov(x86::rbp, x86::rsp);
a.xor_(x86::rax, x86::rax); // Note: xor_ because 'xor' is a C++ keyword
a.mov(x86::rax, 42);
a.pop(x86::rbp);
a.ret();
Note on Naming Conventions
asmjit appends an underscore to instruction names that conflict with C++ keywords: xor_, and_, or_, not_. The rest use their standard mnemonics: mov, add, sub, push, pop, rol, ror, inc, dec, jmp, je, jne, etc.
2.3 JitRuntime
JitRuntime allocates executable memory (using VirtualAlloc on Windows with PAGE_EXECUTE_READWRITE permissions), copies the generated machine code into it, and returns a function pointer you can call directly:
C++JitRuntime rt;
// Allocate executable memory and copy code
typedef int (*Func)();
Func fn;
Error err = rt.add(&fn, &code);
if (err) { /* handle error */ }
// Call the generated code!
int result = fn(); // result == 42
// Release when done
rt.release(fn);
3. Labels and Branches
Loops and conditional branches are essential for decoder stubs. asmjit provides a Label type for managing branch targets. Labels can be forward-referenced — you can jump to a label before binding it, and asmjit will patch the offset when the label is bound later:
C++x86::Assembler a(&code);
Label loopStart = a.newLabel();
Label loopEnd = a.newLabel();
// Setup: rcx = count, rsi = data pointer
a.mov(x86::rcx, payloadSize);
a.lea(x86::rsi, x86::ptr(x86::rip)); // RIP-relative for PIC
a.bind(loopStart); // loopStart:
a.xor_(x86::byte_ptr(x86::rsi), 0xAB); // XOR decrypt byte
a.inc(x86::rsi); // advance pointer
a.dec(x86::rcx); // decrement counter
a.jnz(loopStart); // loop if not zero
a.bind(loopEnd); // loopEnd:
// ... transfer control to decrypted payload
This label mechanism is critical for Shoggoth. The decoder stub contains a decryption loop whose body varies between generations, but the loop structure (branch back to start, exit when done) is managed through labels. asmjit computes the correct relative offsets automatically, even as junk instructions change the loop body size.
4. Register Parameterization
One of the most powerful features for polymorphic generation is that asmjit register operands are values, not syntax. You can store registers in variables and use them interchangeably:
C++// Register pool for randomization
x86::Gp availableRegs[] = {
x86::rax, x86::rbx, x86::rcx, x86::rdx,
x86::rsi, x86::rdi, x86::r8, x86::r9,
x86::r10, x86::r11, x86::r12, x86::r13,
x86::r14, x86::r15
// RSP excluded - must not be clobbered
};
// Randomly select registers for decoder roles
std::shuffle(availableRegs, availableRegs + 14, rng);
x86::Gp regPointer = availableRegs[0]; // data pointer
x86::Gp regCounter = availableRegs[1]; // loop counter
x86::Gp regKey = availableRegs[2]; // encryption key
x86::Gp regTemp = availableRegs[3]; // scratch register
// Use them in code generation - different registers each time!
a.mov(regCounter, payloadSize);
a.lea(regPointer, x86::ptr(x86::rip, offset));
a.mov(regKey, encryptionKey);
Label loop = a.newLabel();
a.bind(loop);
a.xor_(x86::byte_ptr(regPointer), regKey);
a.inc(regPointer);
a.dec(regCounter);
a.jnz(loop);
The same logical decoder — load counter, load pointer, decrypt byte, advance, loop — produces completely different machine code depending on which registers are selected. The opcodes differ because x86-64 encodes the register number into the ModR/M byte and REX prefix.
5. Memory Operands
asmjit provides a rich memory operand system through x86::Mem and the x86::ptr / x86::byte_ptr / x86::qword_ptr helpers. These are essential for building decoder stubs that read and write the encrypted payload:
C++// Direct memory access with various addressing modes
a.mov(x86::rax, x86::qword_ptr(x86::rsi)); // [rsi]
a.xor_(x86::byte_ptr(x86::rsi, x86::rcx), 0x41); // [rsi + rcx]
a.mov(x86::rax, x86::qword_ptr(x86::rbx, x86::rcx, 3, 16)); // [rbx + rcx*8 + 16]
// RIP-relative addressing (essential for PIC)
a.lea(x86::rax, x86::ptr(x86::rip, someLabel)); // lea rax, [rip + offset]
Position-Independence Requirement
Since Shoggoth’s output must be position-independent (executable from any memory address), all data references in the decoder stub use RIP-relative addressing. The x86::ptr(x86::rip, label) pattern generates lea reg, [rip + offset] instructions that work regardless of where the code is loaded. Absolute addresses are never used.
6. Extracting Raw Machine Code
While JitRuntime is useful for testing (making generated code directly executable), Shoggoth needs to extract the raw bytes to write to an output file. The CodeHolder provides access to the generated code through its section buffer:
C++// After all instructions are emitted...
CodeHolder code;
code.init(Environment::host());
x86::Assembler a(&code);
// ... emit instructions ...
// Access the generated code section
Section* textSection = code.textSection();
size_t codeSize = textSection->bufferSize();
const uint8_t* codeBytes = textSection->buffer();
// Copy to output buffer
std::vector<uint8_t> output(codeBytes, codeBytes + codeSize);
// Now 'output' contains the raw machine code bytes
// that can be written to a file or prepended to an encrypted payload
This is how Shoggoth captures the decoder stub: it generates the instructions using x86::Assembler, extracts the raw bytes from the CodeHolder, and concatenates them with the encrypted payload to form the final PIC output.
7. Why asmjit Is Perfect for Polymorphic Engines
Combining all these features, asmjit provides the ideal infrastructure for a polymorphic engine:
| Requirement | How asmjit Satisfies It |
|---|---|
| Dynamic instruction selection | C++ control flow chooses which instructions to emit at runtime |
| Register randomization | Registers are values, not syntax — store in variables, shuffle, and use |
| Correct encoding | Automatic REX prefix, ModR/M, SIB, displacement handling |
| Forward references | Labels resolve automatically, even with variable-size junk code between branches |
| PIC generation | RIP-relative addressing support for position-independent output |
| Raw byte extraction | CodeHolder section buffer provides direct access to machine code bytes |
| No runtime dependency | Generated code is self-contained — no asmjit library needed at execution time |
Build-Time vs Run-Time
asmjit is a build-time dependency of Shoggoth (the encryptor tool), not a runtime dependency of the output. The final encrypted PIC blob contains only raw machine code — no C++ library code, no asmjit headers, no runtime. This is a critical distinction: asmjit helps generate the decoder stub, but the generated stub is pure standalone x86-64 machine code.
Knowledge Check
Q1: What is the role of CodeHolder in asmjit?
CodeHolder is the central container in asmjit. It holds the generated machine code bytes in sections (like .text), manages label-to-offset mappings for branch resolution, and handles relocations. JitRuntime (not CodeHolder) handles executable memory allocation.Q2: Why is register parameterization valuable for a polymorphic engine?
Q3: Why does Shoggoth’s decoder stub use RIP-relative addressing?
lea reg, [rip + offset]) calculates data addresses relative to the current instruction pointer, so the code works regardless of its absolute position in memory. Absolute addresses would break if the code is loaded at a different base address.