Difficulty: Beginner

Module 1: The Reflective Loader Problem

Why loading DLLs from memory is both essential and dangerously detectable.

Module Objective

This module builds the foundational understanding you need before diving into Crystal Palace and its ecosystem. You will learn why reflective loading exists, how the classic technique works, what detection vectors it creates, and where Crystal-Loaders fits in the evolution of Cobalt Strike's loader architecture. By the end, you will understand the exact problems Crystal-Loaders was designed to solve.

1. Why Reflective Loading Exists

Programs need to load DLLs into memory to use their exported functions. The standard Windows mechanism is LoadLibrary, provided by kernel32.dll. It works perfectly for legitimate software, but it creates a trail of telemetry that is fatal for offensive operations:

What LoadLibrary Does (Behind the Scenes)

Step	Action	Telemetry Created
1	Opens the DLL file on disk	Filesystem minifilter callbacks, ETW file I/O events
2	Creates a section object (SEC_IMAGE)	Kernel section object creation event
3	Maps the section into the process	`PsSetLoadImageNotifyRoutine` callback fires
4	Registers the module in the PEB	Module appears in `InMemoryOrderModuleList`
5	Resolves imports and processes relocations	Additional DLL loads may cascade
6	Calls `DllMain(DLL_PROCESS_ATTACH)`	Entry point execution is observable

Every one of these steps is visible to EDR products. The PsSetLoadImageNotifyRoutine kernel callback alone gives security products the image name, base address, and size of every DLL loaded into every process on the system. Loading a malicious DLL (like Cobalt Strike Beacon) through LoadLibrary is equivalent to announcing your presence to every defensive tool on the endpoint.

The Core Problem

Attackers need to load complex payloads (fully featured implants with many dependencies) into a target process without touching disk and without triggering image load callbacks. The payload exists only in memory, received over a network connection or embedded in a stager. This is the problem reflective loading solves.

In 2008, Stephen Fewer published Reflective DLL Injection, a technique that loads a DLL entirely from a memory buffer by reimplementing the Windows loader in user mode. This technique became the foundation for virtually every offensive DLL loader that followed, including Cobalt Strike's default loader, Metasploit's meterpreter, and dozens of open-source projects.

2. How Traditional Reflective Loading Works

Stephen Fewer's original technique works by embedding a small ReflectiveLoader function inside the DLL itself. When the raw DLL bytes are injected into a target process, execution begins at a bootstrap shellcode stub in the DOS header, which transfers control to ReflectiveLoader. The function then performs the same steps the Windows loader would, but entirely in user mode from a memory buffer.

Classic ReflectiveLoader Flow

Bootstrap stub
in DOS header

→

Find own
image base

→

PEB walk
resolve APIs

→

VirtualAlloc
RWX region

→

Copy PE
sections

→

Fix relocs
& imports

→

Call
DllMain

Each step of the ReflectiveLoader deserves a closer look:

ReflectiveLoader Steps in Detail

#	Step	What Happens
1	Bootstrap shellcode stub	A small piece of position-independent assembly placed in the PE's DOS header stub. It calculates the address of the ReflectiveLoader function and jumps to it.
2	Find own image base	ReflectiveLoader walks backward in memory from its own address to find the MZ (0x4D5A) magic bytes marking the start of the PE.
3	Resolve APIs via PEB	Walks the PEB's `InMemoryOrderModuleList` to find `kernel32.dll`, then parses its export table to resolve `LoadLibraryA`, `GetProcAddress`, and `VirtualAlloc`.
4	Allocate RWX memory	Calls `VirtualAlloc` with `PAGE_EXECUTE_READWRITE` for a region of `SizeOfImage` bytes.
5	Copy PE sections	Iterates the section table and copies each section (`.text`, `.rdata`, `.data`, etc.) to its correct virtual address offset within the allocated region.
6	Process base relocations	Since the DLL is loaded at an arbitrary base address (not its preferred base), all absolute addresses in the code must be adjusted using the .reloc section's fixup entries.
7	Resolve Import Address Table	Walks the import directory, loads each dependency with `LoadLibraryA`, and resolves each imported function with `GetProcAddress`, writing the addresses into the IAT.
8	Call DllMain	Invokes the entry point with `DLL_PROCESS_ATTACH`, initializing the payload (e.g., starting Beacon's main loop).

Here is conceptual pseudocode showing what ReflectiveLoader does internally:

C (Pseudocode)// Conceptual ReflectiveLoader pseudocode (simplified from Stephen Fewer's work)
DWORD ReflectiveLoader(VOID)
{
    // Step 1: Find our own image base by scanning backward for MZ header
    ULONG_PTR imageBase = FindImageBase(&ReflectiveLoader);

    // Step 2: Parse PEB to find kernel32.dll
    ULONG_PTR peb      = __readgsqword(0x60);    // x64: GS:[0x60]
    ULONG_PTR ldr      = *(ULONG_PTR*)(peb + 0x18);
    ULONG_PTR modList   = *(ULONG_PTR*)(ldr + 0x20); // InMemoryOrderModuleList

    // Walk module list, hash each name, find kernel32.dll
    ULONG_PTR k32Base  = FindModuleByHash(modList, KERNEL32_HASH);

    // Step 3: Resolve needed APIs from kernel32's export table
    fnLoadLibraryA   pLoadLibraryA   = GetExport(k32Base, LOADLIBRARYA_HASH);
    fnGetProcAddress pGetProcAddress = GetExport(k32Base, GETPROCADDRESS_HASH);
    fnVirtualAlloc   pVirtualAlloc   = GetExport(k32Base, VIRTUALALLOC_HASH);

    // Step 4: Allocate memory for the full image
    PIMAGE_NT_HEADERS ntHdrs = (imageBase + dosHdr->e_lfanew);
    LPVOID newBase = pVirtualAlloc(
        NULL,
        ntHdrs->OptionalHeader.SizeOfImage,
        MEM_RESERVE | MEM_COMMIT,
        PAGE_EXECUTE_READWRITE    // <-- RWX: Detection vector #3
    );

    // Step 5: Copy headers + each section to correct VA
    memcpy(newBase, imageBase, ntHdrs->OptionalHeader.SizeOfHeaders);
    for (each section in sectionTable)
        memcpy(newBase + section.VirtualAddress,
               imageBase + section.PointerToRawData,
               section.SizeOfRawData);

    // Step 6: Process base relocations (delta = newBase - preferredBase)
    ApplyRelocations(newBase, ntHdrs);

    // Step 7: Resolve IAT - load dependencies + resolve functions
    ResolveImports(newBase, ntHdrs, pLoadLibraryA, pGetProcAddress);

    // Step 8: Call DllMain
    fnDllMain entryPoint = newBase + ntHdrs->OptionalHeader.AddressOfEntryPoint;
    entryPoint((HINSTANCE)newBase, DLL_PROCESS_ATTACH, NULL);

    return 0;
}

3. Five Detection Vectors

While reflective loading avoids the telemetry generated by LoadLibrary, it introduces five new detection surfaces that modern security tools actively scan for. Understanding these is critical because they are the exact problems Crystal-Loaders was built to eliminate.

#	Detection Vector	What Scanners Look For	Tools That Detect It
1	Private-commit executable memory	Executable regions backed by MEM_PRIVATE instead of SEC_IMAGE	Moneta, pe-sieve, MalMemDetect
2	MZ/PE headers in private memory	0x4D5A magic bytes at the start of private allocations	YARA rules, pe-sieve, BeaconHunter
3	RWX memory	Pages with PAGE_EXECUTE_READWRITE protection	Moneta, MalMemDetect, ETW-based monitors
4	Return address in unbacked memory	Call stack frames pointing outside any loaded module	pe-sieve (thread scan), EDR kernel callbacks
5	IAT / import artifacts	Resolved function pointers and import metadata patterns	pe-sieve (IAT scan), manual analysis

Vector 1: Private-Commit Executable Memory

When the Windows loader maps a DLL via LoadLibrary, it creates a section object of type SEC_IMAGE. The resulting memory pages are tagged as MEM_IMAGE in the VAD tree. Memory scanners know that all legitimate executable code should reside in image-backed regions.

A reflectively loaded DLL is allocated with VirtualAlloc, which creates MEM_PRIVATE pages. When those pages are later marked executable (or allocated as RWX from the start), the combination of MEM_PRIVATE + PAGE_EXECUTE* is an immediate detection signal. Legitimate processes almost never have executable private memory outside of JIT compilers (like .NET CLR or JavaScript V8).

Memory Commit Type Comparison

LoadLibrary (Legitimate)

CreateSection(SEC_IMAGE)

MapViewOfSection → MEM_IMAGE

.text: PAGE_EXECUTE_READ (RX)

.rdata: PAGE_READONLY (R)

.data: PAGE_READWRITE (RW)

Reflective Load (Detectable)

VirtualAlloc(MEM_COMMIT)

Single allocation → MEM_PRIVATE

All sections: PAGE_EXECUTE_READWRITE (RWX)

No file backing. No SEC_IMAGE. All in one RWX blob.

Vector 2: MZ/PE Headers in Private Memory

The first two bytes of every PE file are 0x4D 0x5A ("MZ"). After reflective loading, the PE headers are copied to the start of the allocated region. A trivial YARA rule can find them:

YARArule reflective_dll_in_memory {
    meta:
        description = "Detects PE header in private executable memory"
    condition:
        uint16(0) == 0x5A4D and         // MZ magic at start of region
        uint32(uint32(0x3C)) == 0x4550   // PE signature at e_lfanew offset
}

Some loaders attempt to erase the headers after loading by zeroing out the first page. However, partial artifacts often remain: the Rich header, section names like .text and .rdata, or the optional header's magic value. More advanced scanners look for these secondary indicators even when the MZ bytes are erased.

Vector 3: RWX Memory

The classic ReflectiveLoader allocates the entire image as PAGE_EXECUTE_READWRITE. This single permission flag is the most straightforward detection signal. Legitimate compiled code is mapped with granular permissions: .text is RX, .rdata is R, .data is RW. The only common legitimate source of RWX memory is JIT compilation (CLR, V8), and those regions have specific, identifiable patterns.

Why Not Just Use RW Then VirtualProtect to RX?

Better loaders do exactly this: allocate as RW, write sections, then change to RX. But this still leaves you with MEM_PRIVATE + RX, which is Vector 1. And the brief RWX window during writing can be caught by real-time monitoring. Crystal-Loaders solves this entirely by producing PIC output that never needs to be mapped as a PE image at all.

Vector 4: Return Address / Call Stack Analysis

When a reflectively loaded DLL calls a Windows API, the thread's call stack contains return addresses that point into the privately-allocated memory region. EDR kernel callbacks (registered via ObRegisterCallbacks) can walk the thread's stack using RtlWalkFrameChain and check whether each return address falls within a known, file-backed module.

If a return address points to MEM_PRIVATE memory with no file object in the VAD tree, the EDR knows code is executing from a dynamic allocation. Tools like pe-sieve perform this check in user mode by scanning threads and resolving their stack frames against loaded module ranges.

Vector 5: IAT Artifacts

After the ReflectiveLoader resolves the Import Address Table, the loaded DLL contains a fully populated IAT: an array of function pointers to APIs in ntdll.dll, kernel32.dll, and other system DLLs. This structure is recognizable. Even when the PE headers are erased, the pattern of pointers (consecutive addresses into the same DLL's export range) can be identified by heuristic scans.

Additionally, the import directory entries themselves (the IMAGE_IMPORT_DESCRIPTOR array) contain RVAs to DLL name strings and function name strings. These strings persist in memory after loading and provide clear evidence of a mapped PE image.

4. Cobalt Strike's Loader Evolution

Cobalt Strike has progressively replaced and extended its loader architecture over many years. Understanding this timeline shows you why Crystal Palace exists and what gap it fills.

Cobalt Strike Loader Timeline

Default Reflective Loader — Stephen Fewer's technique, shipped with Cobalt Strike from the beginning. Functional but highly signatured.

Artifact Kit — Customizable stager templates. Changed how the payload is packaged and executed, but the reflective loader itself remained the same.

Sleep Mask Kit (CS 4.4) — Encrypt Beacon's memory during sleep. Addresses the problem of static signatures in idle Beacon memory, but does not change the loading process.

User-Defined Reflective Loader (CS 4.4) — The pivotal change. Operators can now replace the entire reflective loader with their own implementation. This opened the door to loaders like AceLdr, BokuLoader, and TitanLdr.

BeaconGate (CS 4.10) — Hook specific Win32 API calls made by Beacon. Allows operators to intercept and modify API calls (e.g., adding indirect syscalls) without modifying Beacon itself.

Crystal Palace (Tradecraft Garden, June 2025) — A standalone spec-driven PIC linker released as part of the Tradecraft Garden project. Produces position-independent code output with no PE headers, no MZ signature, and no traditional reflective loading. This is what Crystal-Loaders is built with.

Notable Community UDRLs

The UDRL interface (CS 4.4) enabled the security research community to build sophisticated loaders that address specific detection vectors:

Key UDRLs in the Ecosystem

UDRL	Author	Key Innovation
AceLdr	Kyle Avery	RC4 sleep encryption, private heap isolation, FOLIAGE-based APC sleep masking, return address spoofing. Evades Moneta, pe-sieve, BeaconEye, and Hunt-Sleeping-Beacons.
BokuLoader	Bobby Cooke	Uses DLL module stomping via LoadLibraryExA with DONT_RESOLVE_DLL_REFERENCES to load a sacrificial DLL from disk (mapped as MEM_IMAGE by the OS), then writes Beacon over that region. Also uses indirect NT syscalls (HellsGate/HalosGate) and reflective call stack spoofing.
TitanLdr	Austin Hudson (SecIdiot)	The original community UDRL that pioneered DNS-over-HTTPS (DoH) integration for Cobalt Strike DNS Beacons. Its primary innovation is IAT hooking — it hooks DNSQuery_A in Beacon's IAT to route DNS queries through HTTPS.

Each of these UDRLs addresses some of the five detection vectors, but they all still perform the fundamental reflective loading operation: take a PE DLL, parse its headers, map its sections, fix relocations, and resolve imports. The PE structure remains in memory in some form. Crystal Palace takes a fundamentally different approach.

5. What Crystal-Loaders Brings to the Table

Crystal-Loaders (a collection of PIC loaders built with Crystal Palace, designed for Cobalt Strike) does not patch or improve the reflective loading process. It eliminates it entirely. Instead of injecting a DLL and loading it in-memory, Crystal-Loaders uses a spec-driven build system to produce raw position-independent code at compile time.

The Six Key Innovations

#	Innovation	What It Solves
1	Spec-driven build system	Declarative, reproducible, composable builds. The loader behavior is defined by a specification file, not ad-hoc C code. Different specs produce different loaders without rewriting code.
2	PIC output	The final output is raw shellcode — no PE headers, no MZ signature, no section table. There is nothing for PE scanners to find. This eliminates detection vectors 1 and 2 entirely.
3	Modular libraries (LibTCG, LibGate)	LibTCG handles PE loading logic. LibGate handles indirect syscalls. They are independent and swappable. You can replace the syscall mechanism without touching the loader logic, or vice versa.
4	Dynamic Function Resolution (DFR)	Crystal Palace rewrites all API calls at link time. No static imports, no IAT, no import directory. Every API call is resolved dynamically at runtime through PEB walking and export table parsing. This eliminates detection vector 5.
5	XOR-encrypted embedded payload	The Beacon DLL is XOR-encrypted and embedded inside the PIC blob. It is decrypted at runtime just before loading. Static scanning of the PIC blob reveals no PE signatures.
6	Beacon User Data (BUD)	The loader passes pre-resolved syscall stubs and memory tracking information to Beacon via a structured data block. Beacon can then use these for its own sleep masking and evasion without re-resolving anything.

Crystal-Loaders vs Traditional Reflective Loading

Traditional UDRL

Loader shellcode + Beacon DLL (PE)

Runtime: Parse PE, alloc, map sections

Result: Full PE in MEM_PRIVATE (RWX)

IAT populated, headers present

Crystal-Loaders (PIC)

Spec-compiled PIC blob (no PE)

XOR-encrypted Beacon payload inside

DFR: All APIs resolved at runtime

BUD: Syscalls + tracking passed to Beacon

How the Detection Vectors Are Addressed

Vector	Traditional UDRL Status	Crystal-Loaders Status
1. Private-commit executable memory	Still present (MEM_PRIVATE + RX)	Reduced — PIC is smaller, less conspicuous, but memory is still private
2. MZ/PE headers in private memory	PE headers exist (even if partially erased)	Eliminated — no PE structure in the PIC blob
3. RWX memory	Often present during or after loading	Eliminated — LibGate syscalls avoid RWX allocations
4. Return address in unbacked memory	Present unless stack spoofing is added	Still present — requires additional stack spoofing
5. IAT artifacts	Full IAT populated after loading	Eliminated — DFR resolves all APIs dynamically

What Crystal-Loaders Does NOT Solve

Crystal-Loaders addresses the loader-side detection surface. It does not inherently solve call stack analysis (Vector 4) or behavioral detections (like command-and-control traffic patterns). A complete evasion strategy would combine Crystal-Loaders with stack spoofing (Draugr-style synthetic frames), sleep masking (FOLIAGE-style APC chains), and network obfuscation. The Beacon User Data (BUD) mechanism is specifically designed to enable this composability.

6. Module Summary

Key Takeaways
LoadLibrary creates filesystem, PEB, and kernel callback telemetry that makes it unusable for offensive DLL loading.
Reflective DLL Injection (Stephen Fewer, 2008) loads DLLs from memory, but creates five new detection vectors: private-commit RX memory, MZ/PE headers, RWX allocations, unbacked return addresses, and IAT artifacts.
Cobalt Strike's UDRL (CS 4.4) enabled community loaders like AceLdr, BokuLoader, and TitanLdr to address these vectors individually.
Crystal Palace (released as part of the Tradecraft Garden project in June 2025) and Crystal-Loaders (built with Crystal Palace for Cobalt Strike) take a fundamentally different approach: a spec-driven PIC linker that produces headerless shellcode with no PE structure, no static imports, and encrypted embedded payloads.
Crystal-Loaders eliminates vectors 2, 3, and 5. It reduces vector 1. Vector 4 (call stack analysis) requires additional techniques like synthetic frame construction.

Next: Crystal Palace →