Difficulty: Beginner

Module 1: The Reflective Loader Problem

Why loading DLLs from memory is both essential and dangerously detectable.

Module Objective

This module builds the foundational understanding you need before diving into Crystal Palace and its ecosystem. You will learn why reflective loading exists, how the classic technique works, what detection vectors it creates, and where Crystal-Loaders fits in the evolution of Cobalt Strike's loader architecture. By the end, you will understand the exact problems Crystal-Loaders was designed to solve.

1. Why Reflective Loading Exists

Programs need to load DLLs into memory to use their exported functions. The standard Windows mechanism is LoadLibrary, provided by kernel32.dll. It works perfectly for legitimate software, but it creates a trail of telemetry that is fatal for offensive operations:

What LoadLibrary Does (Behind the Scenes)

StepActionTelemetry Created
1Opens the DLL file on diskFilesystem minifilter callbacks, ETW file I/O events
2Creates a section object (SEC_IMAGE)Kernel section object creation event
3Maps the section into the processPsSetLoadImageNotifyRoutine callback fires
4Registers the module in the PEBModule appears in InMemoryOrderModuleList
5Resolves imports and processes relocationsAdditional DLL loads may cascade
6Calls DllMain(DLL_PROCESS_ATTACH)Entry point execution is observable

Every one of these steps is visible to EDR products. The PsSetLoadImageNotifyRoutine kernel callback alone gives security products the image name, base address, and size of every DLL loaded into every process on the system. Loading a malicious DLL (like Cobalt Strike Beacon) through LoadLibrary is equivalent to announcing your presence to every defensive tool on the endpoint.

The Core Problem

Attackers need to load complex payloads (fully featured implants with many dependencies) into a target process without touching disk and without triggering image load callbacks. The payload exists only in memory, received over a network connection or embedded in a stager. This is the problem reflective loading solves.

In 2008, Stephen Fewer published Reflective DLL Injection, a technique that loads a DLL entirely from a memory buffer by reimplementing the Windows loader in user mode. This technique became the foundation for virtually every offensive DLL loader that followed, including Cobalt Strike's default loader, Metasploit's meterpreter, and dozens of open-source projects.

2. How Traditional Reflective Loading Works

Stephen Fewer's original technique works by embedding a small ReflectiveLoader function inside the DLL itself. When the raw DLL bytes are injected into a target process, execution begins at a bootstrap shellcode stub in the DOS header, which transfers control to ReflectiveLoader. The function then performs the same steps the Windows loader would, but entirely in user mode from a memory buffer.

Classic ReflectiveLoader Flow

Bootstrap stub
in DOS header
Find own
image base
PEB walk
resolve APIs
VirtualAlloc
RWX region
Copy PE
sections
Fix relocs
& imports
Call
DllMain

Each step of the ReflectiveLoader deserves a closer look:

ReflectiveLoader Steps in Detail

#StepWhat Happens
1Bootstrap shellcode stubA small piece of position-independent assembly placed in the PE's DOS header stub. It calculates the address of the ReflectiveLoader function and jumps to it.
2Find own image baseReflectiveLoader walks backward in memory from its own address to find the MZ (0x4D5A) magic bytes marking the start of the PE.
3Resolve APIs via PEBWalks the PEB's InMemoryOrderModuleList to find kernel32.dll, then parses its export table to resolve LoadLibraryA, GetProcAddress, and VirtualAlloc.
4Allocate RWX memoryCalls VirtualAlloc with PAGE_EXECUTE_READWRITE for a region of SizeOfImage bytes.
5Copy PE sectionsIterates the section table and copies each section (`.text`, `.rdata`, `.data`, etc.) to its correct virtual address offset within the allocated region.
6Process base relocationsSince the DLL is loaded at an arbitrary base address (not its preferred base), all absolute addresses in the code must be adjusted using the .reloc section's fixup entries.
7Resolve Import Address TableWalks the import directory, loads each dependency with LoadLibraryA, and resolves each imported function with GetProcAddress, writing the addresses into the IAT.
8Call DllMainInvokes the entry point with DLL_PROCESS_ATTACH, initializing the payload (e.g., starting Beacon's main loop).

Here is conceptual pseudocode showing what ReflectiveLoader does internally:

C (Pseudocode)// Conceptual ReflectiveLoader pseudocode (simplified from Stephen Fewer's work)
DWORD ReflectiveLoader(VOID)
{
    // Step 1: Find our own image base by scanning backward for MZ header
    ULONG_PTR imageBase = FindImageBase(&ReflectiveLoader);

    // Step 2: Parse PEB to find kernel32.dll
    ULONG_PTR peb      = __readgsqword(0x60);    // x64: GS:[0x60]
    ULONG_PTR ldr      = *(ULONG_PTR*)(peb + 0x18);
    ULONG_PTR modList   = *(ULONG_PTR*)(ldr + 0x20); // InMemoryOrderModuleList

    // Walk module list, hash each name, find kernel32.dll
    ULONG_PTR k32Base  = FindModuleByHash(modList, KERNEL32_HASH);

    // Step 3: Resolve needed APIs from kernel32's export table
    fnLoadLibraryA   pLoadLibraryA   = GetExport(k32Base, LOADLIBRARYA_HASH);
    fnGetProcAddress pGetProcAddress = GetExport(k32Base, GETPROCADDRESS_HASH);
    fnVirtualAlloc   pVirtualAlloc   = GetExport(k32Base, VIRTUALALLOC_HASH);

    // Step 4: Allocate memory for the full image
    PIMAGE_NT_HEADERS ntHdrs = (imageBase + dosHdr->e_lfanew);
    LPVOID newBase = pVirtualAlloc(
        NULL,
        ntHdrs->OptionalHeader.SizeOfImage,
        MEM_RESERVE | MEM_COMMIT,
        PAGE_EXECUTE_READWRITE    // <-- RWX: Detection vector #3
    );

    // Step 5: Copy headers + each section to correct VA
    memcpy(newBase, imageBase, ntHdrs->OptionalHeader.SizeOfHeaders);
    for (each section in sectionTable)
        memcpy(newBase + section.VirtualAddress,
               imageBase + section.PointerToRawData,
               section.SizeOfRawData);

    // Step 6: Process base relocations (delta = newBase - preferredBase)
    ApplyRelocations(newBase, ntHdrs);

    // Step 7: Resolve IAT - load dependencies + resolve functions
    ResolveImports(newBase, ntHdrs, pLoadLibraryA, pGetProcAddress);

    // Step 8: Call DllMain
    fnDllMain entryPoint = newBase + ntHdrs->OptionalHeader.AddressOfEntryPoint;
    entryPoint((HINSTANCE)newBase, DLL_PROCESS_ATTACH, NULL);

    return 0;
}

3. Five Detection Vectors

While reflective loading avoids the telemetry generated by LoadLibrary, it introduces five new detection surfaces that modern security tools actively scan for. Understanding these is critical because they are the exact problems Crystal-Loaders was built to eliminate.

#Detection VectorWhat Scanners Look ForTools That Detect It
1Private-commit executable memoryExecutable regions backed by MEM_PRIVATE instead of SEC_IMAGEMoneta, pe-sieve, MalMemDetect
2MZ/PE headers in private memory0x4D5A magic bytes at the start of private allocationsYARA rules, pe-sieve, BeaconHunter
3RWX memoryPages with PAGE_EXECUTE_READWRITE protectionMoneta, MalMemDetect, ETW-based monitors
4Return address in unbacked memoryCall stack frames pointing outside any loaded modulepe-sieve (thread scan), EDR kernel callbacks
5IAT / import artifactsResolved function pointers and import metadata patternspe-sieve (IAT scan), manual analysis

Vector 1: Private-Commit Executable Memory

When the Windows loader maps a DLL via LoadLibrary, it creates a section object of type SEC_IMAGE. The resulting memory pages are tagged as MEM_IMAGE in the VAD tree. Memory scanners know that all legitimate executable code should reside in image-backed regions.

A reflectively loaded DLL is allocated with VirtualAlloc, which creates MEM_PRIVATE pages. When those pages are later marked executable (or allocated as RWX from the start), the combination of MEM_PRIVATE + PAGE_EXECUTE* is an immediate detection signal. Legitimate processes almost never have executable private memory outside of JIT compilers (like .NET CLR or JavaScript V8).

Memory Commit Type Comparison

LoadLibrary (Legitimate)

CreateSection(SEC_IMAGE)
MapViewOfSection → MEM_IMAGE
.text: PAGE_EXECUTE_READ (RX)
.rdata: PAGE_READONLY (R)
.data: PAGE_READWRITE (RW)

Reflective Load (Detectable)

VirtualAlloc(MEM_COMMIT)
Single allocation → MEM_PRIVATE
All sections: PAGE_EXECUTE_READWRITE (RWX)
No file backing. No SEC_IMAGE. All in one RWX blob.

Vector 2: MZ/PE Headers in Private Memory

The first two bytes of every PE file are 0x4D 0x5A ("MZ"). After reflective loading, the PE headers are copied to the start of the allocated region. A trivial YARA rule can find them:

YARArule reflective_dll_in_memory {
    meta:
        description = "Detects PE header in private executable memory"
    condition:
        uint16(0) == 0x5A4D and         // MZ magic at start of region
        uint32(uint32(0x3C)) == 0x4550   // PE signature at e_lfanew offset
}

Some loaders attempt to erase the headers after loading by zeroing out the first page. However, partial artifacts often remain: the Rich header, section names like .text and .rdata, or the optional header's magic value. More advanced scanners look for these secondary indicators even when the MZ bytes are erased.

Vector 3: RWX Memory

The classic ReflectiveLoader allocates the entire image as PAGE_EXECUTE_READWRITE. This single permission flag is the most straightforward detection signal. Legitimate compiled code is mapped with granular permissions: .text is RX, .rdata is R, .data is RW. The only common legitimate source of RWX memory is JIT compilation (CLR, V8), and those regions have specific, identifiable patterns.

Why Not Just Use RW Then VirtualProtect to RX?

Better loaders do exactly this: allocate as RW, write sections, then change to RX. But this still leaves you with MEM_PRIVATE + RX, which is Vector 1. And the brief RWX window during writing can be caught by real-time monitoring. Crystal-Loaders solves this entirely by producing PIC output that never needs to be mapped as a PE image at all.

Vector 4: Return Address / Call Stack Analysis

When a reflectively loaded DLL calls a Windows API, the thread's call stack contains return addresses that point into the privately-allocated memory region. EDR kernel callbacks (registered via ObRegisterCallbacks) can walk the thread's stack using RtlWalkFrameChain and check whether each return address falls within a known, file-backed module.

If a return address points to MEM_PRIVATE memory with no file object in the VAD tree, the EDR knows code is executing from a dynamic allocation. Tools like pe-sieve perform this check in user mode by scanning threads and resolving their stack frames against loaded module ranges.

Vector 5: IAT Artifacts

After the ReflectiveLoader resolves the Import Address Table, the loaded DLL contains a fully populated IAT: an array of function pointers to APIs in ntdll.dll, kernel32.dll, and other system DLLs. This structure is recognizable. Even when the PE headers are erased, the pattern of pointers (consecutive addresses into the same DLL's export range) can be identified by heuristic scans.

Additionally, the import directory entries themselves (the IMAGE_IMPORT_DESCRIPTOR array) contain RVAs to DLL name strings and function name strings. These strings persist in memory after loading and provide clear evidence of a mapped PE image.

4. Cobalt Strike's Loader Evolution

Cobalt Strike has progressively replaced and extended its loader architecture over many years. Understanding this timeline shows you why Crystal Palace exists and what gap it fills.

Cobalt Strike Loader Timeline

Default Reflective Loader — Stephen Fewer's technique, shipped with Cobalt Strike from the beginning. Functional but highly signatured.
Artifact Kit — Customizable stager templates. Changed how the payload is packaged and executed, but the reflective loader itself remained the same.
Sleep Mask Kit (CS 4.4) — Encrypt Beacon's memory during sleep. Addresses the problem of static signatures in idle Beacon memory, but does not change the loading process.
User-Defined Reflective Loader (CS 4.4) — The pivotal change. Operators can now replace the entire reflective loader with their own implementation. This opened the door to loaders like AceLdr, BokuLoader, and TitanLdr.
BeaconGate (CS 4.10) — Hook specific Win32 API calls made by Beacon. Allows operators to intercept and modify API calls (e.g., adding indirect syscalls) without modifying Beacon itself.
Crystal Palace (Tradecraft Garden, June 2025) — A standalone spec-driven PIC linker released as part of the Tradecraft Garden project. Produces position-independent code output with no PE headers, no MZ signature, and no traditional reflective loading. This is what Crystal-Loaders is built with.

Notable Community UDRLs

The UDRL interface (CS 4.4) enabled the security research community to build sophisticated loaders that address specific detection vectors:

Key UDRLs in the Ecosystem

UDRLAuthorKey Innovation
AceLdrKyle AveryRC4 sleep encryption, private heap isolation, FOLIAGE-based APC sleep masking, return address spoofing. Evades Moneta, pe-sieve, BeaconEye, and Hunt-Sleeping-Beacons.
BokuLoaderBobby CookeUses DLL module stomping via LoadLibraryExA with DONT_RESOLVE_DLL_REFERENCES to load a sacrificial DLL from disk (mapped as MEM_IMAGE by the OS), then writes Beacon over that region. Also uses indirect NT syscalls (HellsGate/HalosGate) and reflective call stack spoofing.
TitanLdrAustin Hudson (SecIdiot)The original community UDRL that pioneered DNS-over-HTTPS (DoH) integration for Cobalt Strike DNS Beacons. Its primary innovation is IAT hooking — it hooks DNSQuery_A in Beacon's IAT to route DNS queries through HTTPS.

Each of these UDRLs addresses some of the five detection vectors, but they all still perform the fundamental reflective loading operation: take a PE DLL, parse its headers, map its sections, fix relocations, and resolve imports. The PE structure remains in memory in some form. Crystal Palace takes a fundamentally different approach.

5. What Crystal-Loaders Brings to the Table

Crystal-Loaders (a collection of PIC loaders built with Crystal Palace, designed for Cobalt Strike) does not patch or improve the reflective loading process. It eliminates it entirely. Instead of injecting a DLL and loading it in-memory, Crystal-Loaders uses a spec-driven build system to produce raw position-independent code at compile time.

The Six Key Innovations

#InnovationWhat It Solves
1Spec-driven build systemDeclarative, reproducible, composable builds. The loader behavior is defined by a specification file, not ad-hoc C code. Different specs produce different loaders without rewriting code.
2PIC outputThe final output is raw shellcode — no PE headers, no MZ signature, no section table. There is nothing for PE scanners to find. This eliminates detection vectors 1 and 2 entirely.
3Modular libraries (LibTCG, LibGate)LibTCG handles PE loading logic. LibGate handles indirect syscalls. They are independent and swappable. You can replace the syscall mechanism without touching the loader logic, or vice versa.
4Dynamic Function Resolution (DFR)Crystal Palace rewrites all API calls at link time. No static imports, no IAT, no import directory. Every API call is resolved dynamically at runtime through PEB walking and export table parsing. This eliminates detection vector 5.
5XOR-encrypted embedded payloadThe Beacon DLL is XOR-encrypted and embedded inside the PIC blob. It is decrypted at runtime just before loading. Static scanning of the PIC blob reveals no PE signatures.
6Beacon User Data (BUD)The loader passes pre-resolved syscall stubs and memory tracking information to Beacon via a structured data block. Beacon can then use these for its own sleep masking and evasion without re-resolving anything.

Crystal-Loaders vs Traditional Reflective Loading

Traditional UDRL

Loader shellcode + Beacon DLL (PE)
Runtime: Parse PE, alloc, map sections
Result: Full PE in MEM_PRIVATE (RWX)
IAT populated, headers present

Crystal-Loaders (PIC)

Spec-compiled PIC blob (no PE)
XOR-encrypted Beacon payload inside
DFR: All APIs resolved at runtime
BUD: Syscalls + tracking passed to Beacon

How the Detection Vectors Are Addressed

VectorTraditional UDRL StatusCrystal-Loaders Status
1. Private-commit executable memoryStill present (MEM_PRIVATE + RX)Reduced — PIC is smaller, less conspicuous, but memory is still private
2. MZ/PE headers in private memoryPE headers exist (even if partially erased)Eliminated — no PE structure in the PIC blob
3. RWX memoryOften present during or after loadingEliminated — LibGate syscalls avoid RWX allocations
4. Return address in unbacked memoryPresent unless stack spoofing is addedStill present — requires additional stack spoofing
5. IAT artifactsFull IAT populated after loadingEliminated — DFR resolves all APIs dynamically

What Crystal-Loaders Does NOT Solve

Crystal-Loaders addresses the loader-side detection surface. It does not inherently solve call stack analysis (Vector 4) or behavioral detections (like command-and-control traffic patterns). A complete evasion strategy would combine Crystal-Loaders with stack spoofing (Draugr-style synthetic frames), sleep masking (FOLIAGE-style APC chains), and network obfuscation. The Beacon User Data (BUD) mechanism is specifically designed to enable this composability.

6. Module Summary

Key Takeaways

Module 1 Quiz: The Reflective Loader Problem

Q1: What type of memory commit does a reflectively loaded DLL appear as?

Correct! Reflectively loaded DLLs are allocated with VirtualAlloc, which creates MEM_PRIVATE pages. Legitimate DLLs loaded via LoadLibrary use section objects (SEC_IMAGE) which produce MEM_IMAGE pages. The presence of executable code in MEM_PRIVATE memory is a primary detection vector used by tools like Moneta and pe-sieve.

Q2: What was the first major public UDRL feature release?

Cobalt Strike 4.4 introduced the User-Defined Reflective Loader (UDRL) interface, which allowed operators to replace the entire reflective loader with a custom implementation. This was the pivotal change that enabled community projects like AceLdr, BokuLoader, and TitanLdr. CS 4.10 later added BeaconGate, and Crystal Palace was released separately as part of the Tradecraft Garden project in 2025.

Q3: What problem does Crystal Palace's PIC output solve compared to traditional reflective loading?

Crystal Palace's PIC (position-independent code) output eliminates PE headers entirely. Traditional reflective loaders copy the full PE structure (including MZ/PE headers, section table, and import directory) into memory, making them detectable by PE scanners and YARA rules. Crystal-Loaders produces raw shellcode with no PE artifacts, eliminating detection vectors 2 and 5.