Difficulty: Advanced

Module 8: Extending Crystal-Loaders

The adaptation guide — module stomping, sleep mask integration, call stack spoofing, IAT hooking PICOs, guardrailed loaders, custom DFR resolvers, and porting the architecture beyond Cobalt Strike.

Module Objective

Crystal-Loaders is a proof of concept. As rasta-mouse stated, it demonstrates the BUD contract and Crystal Palace integration — not a production-ready implant. This module maps the extension points that Crystal-Loaders deliberately leaves open. By the end, you will understand seven concrete extension paths, the data structures each one touches, and the detection trade-offs each extension addresses.

1. What Crystal-Loaders Leaves on the Table

Crystal-Loaders is deliberately minimal. It proves the spec-driven PIC architecture works, but it ships without several capabilities that a production-grade loader would need. Each omission is an extension opportunity:

CapabilityCurrent StatusExtension Opportunity
Module stomping / overloadingUses VirtualAlloc → private commitFile-backed MEM_IMAGE memory
Sleep maskBUD tracking is set up but no sleep mask PIC is includedBUD-aware encrypt/decrypt cycle
Call stack spoofingNot implementedDraugr / ThreadStackSpoofer integration
IAT hooking PICONo post-exploitation API interceptionHook loaded DLL IATs via Crystal Palace PICO
GuardrailingEnvironment-locked decryption not includedPayload only decrypts on the correct target
Custom DFR resolversDefault ROR13 + string resolutionAlternative hash algorithms, LdrLoadDll fallback
ETW / AMSI bypassNot addressedPatch or unhook telemetry providers
PE header cleanupBeacon PE headers persist after loadZero headers post-load to evade memory scanners

The rest of this module walks through seven of these extensions in detail, showing what structures to modify, what code to change, and what detection vectors each extension addresses.

2. Extension 1 — Module Stomping

The single biggest detection surface in Crystal-Loaders today is that Beacon lives in private-commit executable memory allocated via VirtualAlloc. Tools like Moneta and pe-sieve specifically scan for MEM_PRIVATE + PAGE_EXECUTE* regions. Module stomping (also called module overloading) eliminates this by placing Beacon in file-backed MEM_IMAGE memory.

The Concept

Module Stomping Flow

Find sacrificial
DLL on disk
NtCreateSection
SEC_IMAGE flag
NtMapViewOfSection
→ MEM_IMAGE
Overwrite sections
with Beacon
Beacon in
file-backed memory
  1. Instead of VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE), load a legitimate but unused DLL from disk.
  2. Use NtCreateSection with the SEC_IMAGE flag to create a file-backed section object.
  3. Map it into the process with NtMapViewOfSection.
  4. Overwrite the legitimate DLL's sections with Beacon's sections.
  5. The result: Beacon lives in MEM_IMAGE (file-backed) memory, defeating Moneta and pe-sieve's private-commit detection.

Code Changes to go()

C (Conceptual Diff)// Before (current Crystal-Loaders)
PBYTE dst = KERNEL32$VirtualAlloc(NULL, size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);

// After (module stomping)
// 1. Find a suitable sacrificial DLL (not loaded, large enough)
// 2. NtCreateSection with SEC_IMAGE
// 3. NtMapViewOfSection
// 4. Copy sections over the mapped DLL

Updating ALLOCATED_MEMORY

The BUD ALLOCATED_MEMORY structure tracks how each region was allocated. When switching to module stomping, you must update the allocation method so the sleep mask and other BUD consumers know they are dealing with file-backed memory:

C// Use METHOD_MODULESTOMP instead of METHOD_VIRTUALALLOC
region->CleanupInformation.AllocationMethod = METHOD_MODULESTOMP;
// or METHOD_NTMAPVIEW if using NtMapViewOfSection directly

The MODULESTOMP_INFO Structure

Beacon's beacon.h defines a structure for tracking module-stomped allocations:

C (beacon.h)typedef struct _MODULESTOMP_INFO {
    HMODULE ModuleHandle;
} MODULESTOMP_INFO, *PMODULESTOMP_INFO;

This structure simply stores the module handle of the stomped DLL. BUD consumers (especially the sleep mask and cleanup routines) use the ModuleHandle to identify which loaded module was overwritten, enabling proper cleanup (e.g., unmapping the view) when the implant exits or the region is freed.

Sacrificial DLL Selection

The sacrificial DLL must meet two criteria: it must not already be loaded in the process (to avoid conflicts), and it must be large enough to contain the Beacon payload. Common choices include DLLs in C:\Windows\System32 that are large but rarely loaded, such as xpsservices.dll or dbghelp.dll. Choosing a DLL whose SizeOfImage closely matches Beacon's size reduces the amount of unused mapped memory, which itself can be a detection heuristic.

3. Extension 2 — Sleep Mask Integration

Crystal-Loaders populates the ALLOCATED_MEMORY structure with section-level tracking, but it does not include a sleep mask. A sleep mask PIC would use this BUD data to encrypt Beacon's memory sections during sleep and decrypt them before the next check-in.

The Sleep Mask Cycle

BUD-Aware Sleep Mask Flow

1. Read ALLOCATED_MEMORY from BUD — enumerate all tracked regions and sections
2. For each section where MaskSection == TRUE: change permissions to RW via NtProtectVirtualMemory
3. Encrypt section contents (XOR, RC4, or SystemFunction032)
4. Set up a sleep timer (CreateTimerQueueTimer or waitable timer)
5. After sleep: decrypt section contents
6. Restore original permissions (RX for code, R for read-only data)

Conceptual Implementation Using BUD

C (Sleep Mask - BUD Integration)// Conceptual sleep mask flow using BUD's ALLOCATED_MEMORY
for (int r = 0; r < 6; r++) {
    ALLOCATED_MEMORY_REGION * region = &bud->allocatedMemory->AllocatedMemoryRegions[r];
    if (region->Purpose == PURPOSE_EMPTY) continue;

    for (int s = 0; s < 8; s++) {
        ALLOCATED_MEMORY_SECTION * sec = ®ion->Sections[s];
        if (!sec->MaskSection || sec->VirtualSize == 0) continue;

        // Change to RW so we can encrypt in place
        DWORD oldProtect;
        NtProtectVirtualMemory(
            NtCurrentProcess(),
            &sec->BaseAddress,
            &sec->VirtualSize,
            PAGE_READWRITE,
            &oldProtect
        );

        // Encrypt the section contents
        xor_encrypt(sec->BaseAddress, sec->VirtualSize, key);
    }
}

The Ekko-Style Timer Queue Pattern

The Ekko sleep obfuscation technique uses CreateTimerQueueTimer to queue ROP-style callbacks that change memory permissions, encrypt, sleep, decrypt, and restore — all without the Beacon thread being active during the sleep window. Adapting Ekko for BUD means replacing Ekko's hardcoded memory ranges with the ALLOCATED_MEMORY regions tracked by BUD.

Why BUD Makes This Easier

Without BUD, a sleep mask must discover Beacon's memory layout at runtime — scanning the VAD, searching for PE headers, or using hardcoded offsets. BUD provides the exact base address, size, and permissions of every section in a structured array. The sleep mask just iterates the array. This is the design intent behind ALLOCATED_MEMORY — it is the contract between the loader and the sleep mask.

4. Extension 3 — Call Stack Spoofing

Even with LibGate's indirect syscalls routing execution through ntdll, the return stack still reveals that the calling code lives in private memory. A full call stack walk shows:

Call Stack (Anomalous)ntdll!NtAllocateVirtualMemory       (syscall;ret gadget)
  → returns to 0x00007FF6`1A3C0842   ← loader PIC blob (PRIVATE memory!)
  → returns to 0x00007FF6`1A3C0210   ← still in PIC blob
  → ...

The return addresses in private memory are the anomaly. EDR stack walkers flag any frame that falls outside a file-backed module.

The Draugr Approach

The Draugr technique (covered in a separate course) creates synthetic stack frames that make the call stack look like it originates from legitimate system code. It uses JMP [RBX] gadget chaining with fake RUNTIME_FUNCTION and UNWIND_INFO structures so that the Windows unwinder produces a clean, believable stack trace.

The ThreadStackSpoofer Approach

A simpler alternative: before entering sleep, overwrite the return addresses on the current thread's stack with pointers into legitimate modules. On wake, restore the real return addresses. This defeats point-in-time stack scans during the sleep window.

Integration via Crystal Palace PICO

Crystal Kit demonstrated call stack spoofing as a PICO merged via Crystal Palace. The integration approach:

Call Stack Spoofing PICO Integration

1. Build the call stack spoofing logic as a standalone PIC object (PICO)
2. Add it to the Crystal Palace spec via the mergelib directive
3. The PICO intercepts API calls and wraps them with synthetic stack frames
4. All outbound syscalls now show a legitimate call stack

Why This Is a PICO and Not Part of the Loader

Call stack spoofing is orthogonal to the loading process. It applies to every API call Beacon makes after loading, not just the loading phase. By packaging it as a PICO merged via mergelib, the spoofing logic is composable — you can include it or omit it per-spec without modifying the loader code itself.

5. Extension 4 — IAT Hooking PICO

Crystal Kit's most sophisticated extension was an IAT hooking PICO that intercepts specific API calls made by Beacon's loaded dependencies. This operates during and after the ProcessImports phase of LibTCG.

How It Works

  1. During LoadDLL / ProcessImports, the PICO hooks the loaded DLL's IAT entries by replacing function pointers with addresses inside the PIC blob.
  2. The PIC blob intercepts the call, adds evasion (call stack spoofing, argument sanitization), and then forwards to the real API.
  3. The hooking is invisible to Beacon — it calls APIs normally through its IAT, unaware that they are being intercepted.

Key Hook Targets

Target APIBeacon Commands AffectedEvasion Purpose
CreateProcessA/Wshell, run, powerpickWrap with call stack spoofing for child process creation
LoadLibraryA/WCLR/PowerShell commandsIntercept loads of clr.dll, System.Management.Automation.dll

The LoadLibraryA/W hook is particularly important. When Beacon executes powerpick or execute-assembly, Windows loads the CLR and PowerShell automation DLLs. EDR products monitor for these loads (via PsSetLoadImageNotifyRoutine) as they strongly indicate offensive tooling. An IAT hook can intercept these loads and apply additional obfuscation or timing manipulation.

Complexity Warning

IAT hooking PICOs are the most complex extension on this list. The PIC blob must maintain its own trampolines for every hooked function, handle calling convention preservation, and ensure thread safety. Incorrect hooking can crash the Beacon process or produce unpredictable behavior. This is an advanced technique that requires deep understanding of x64 calling conventions and IAT structure.

6. Extension 5 — Guardrailed Loaders

Guardrailing ensures that a captured payload is useless outside the target environment. The Tradecraft Garden includes a simple_rdll_guardrail example demonstrating this concept, though Crystal-Loaders itself does not implement guardrailing. The full guardrail pattern works as follows:

The Guardrail Flow

Environment-Locked Payload Decryption

Link Time: Derive a key from target’s C: drive volume serial number (GetVolumeInformationA("c:\\"))
Link Time: Encrypt the Beacon DLL with RC4 via SystemFunction033 using the derived key
Runtime: Re-derive the key from the current host’s C: drive volume serial
Match: Serial matches → RC4 decryption succeeds, Beacon runs
Mismatch: Wrong host → wrong key → garbage output, no valid PE, no IOCs

Implementation (simple_rdll_guardrail pattern)

The simple_rdll_guardrail example in the Tradecraft Garden derives its key from the target machine's C: drive volume serial number and uses RC4 encryption via the undocumented SystemFunction033 export from advapi32.dll:

C (Link Time - Operator Workstation)// Obtain the target's C: volume serial (from recon)
DWORD targetSerial = ...; // e.g., 0xABCD1234

// Encrypt the Beacon DLL with RC4 using the serial as key material
RC4_Encrypt(beacon, beaconSize, &targetSerial, sizeof(DWORD));
C (Runtime - Target Host)// Re-derive the key from the current host's C: volume serial
DWORD volumeSerial = 0;
GetVolumeInformationA("c:\\", NULL, 0, &volumeSerial, NULL, NULL, NULL, 0);

// Decrypt via SystemFunction033 (RC4)
USTRING data = { beaconSize, beaconSize, encrypted };
USTRING key  = { sizeof(DWORD), sizeof(DWORD), (PBYTE)&volumeSerial };
SystemFunction033(&data, &key);

// If wrong host: volumeSerial differs → RC4 produces garbage
// No valid PE signature → ParseDLL fails → loader exits cleanly
// No crash, no IOCs, no Beacon artifacts for blue team to analyze

Blue Team Impact

The critical benefit of guardrailing is not that it prevents execution — it prevents analysis. When a blue team captures a guardrailed payload and attempts to detonate it in a sandbox or reverse-engineer it on an analyst workstation, they get garbage. The Beacon DLL never materializes. There are no strings to extract, no configuration to decode, no C2 addresses to block. The payload is forensically inert outside the target environment.

7. Extension 6 — Custom DFR Resolvers

Crystal Palace's Dynamic Function Resolution system supports multiple resolver strategies. The spec file controls which strategy applies to which modules. The default Crystal-Loaders configuration uses ROR13 hashing, but the DFR system is designed to be customizable.

The Dual-Resolver Pattern

From rasta-mouse's "Arranging the PIC Parterre" blog post, the recommended pattern uses two resolvers:

Crystal Palace Specdfr "resolve_explicit" "ror13" "KERNEL32, NTDLL"
dfr "resolve_default" "strings"
ResolverBehaviorUse Case
resolve_explicitOnly walks already-loaded modules (EAT parsing). Never calls LoadLibrary.KERNEL32 and NTDLL — guaranteed to be loaded in every process
resolve_defaultFalls back to LoadLibraryA for modules not yet loaded.All other DLLs that Beacon might need at runtime

Advanced Resolver Modifications

Resolver Security Trade-Off

The explicit resolver is stealthier because it never triggers a PsSetLoadImageNotifyRoutine callback — it only reads modules already in the PEB. The default (LoadLibrary-based) resolver is more flexible but generates image load events that EDRs monitor. A well-designed loader uses explicit resolution for everything it can and only falls back to default for genuinely unloaded dependencies.

8. Extension 7 — Porting Beyond Cobalt Strike

Crystal Palace and the PIC architecture are not Cobalt Strike-specific. The core value — spec-driven PIC generation, LibTCG PE loading, LibGate indirect syscalls, DFR — is C2-framework agnostic. The Cobalt Strike coupling comes from only a few interfaces.

What You Would Need to Change

CS-Specific ComponentWhat to Remove / Replace
beacon.h BUD structuresCreate equivalent data-passing structures for your C2, or remove BUD entirely if not needed
Three DllMain call protocolRemove the DLL_BEACON_USER_DATA, DLL_PROCESS_ATTACH, DLL_BEACON_START sequence
Aggressor .cna hooksReplace with your C2's operator scripting interface (if any)
Post-Ex UDRL referencesAdapt or remove post-exploitation loader if your C2 does not use child payload loading

Potential Targets

A Minimal Generic Loader

Stripped of all Cobalt Strike specifics, a Crystal Palace loader reduces to this core:

C (Generic Loader)void go(void)
{
    // Get the embedded DLL and XOR key from the PIC's resource section
    PRESOURCE dll = GETRESOURCE(_DLL_);
    PRESOURCE key = GETRESOURCE(_KEY_);

    // XOR unmask the embedded payload
    PBYTE buf = VirtualAlloc(NULL, dll->length, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    for (DWORD i = 0; i < dll->length; i++)
        buf[i] = dll->data[i] ^ key->data[i % key->length];

    // Parse and load the DLL using LibTCG
    DLLDATA dlldata;
    ParseDLL(buf, &dlldata);
    PBYTE dst = VirtualAlloc(NULL, SizeOfDLL(&dlldata), MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    LoadDLL(&dlldata, dst);
    ProcessImports(&dlldata, dst);
    FixSectionPermissions(&dlldata, dst);

    // Simple: just call DllMain directly
    DLLMAIN_FUNC entry = (DLLMAIN_FUNC)EntryPoint(&dlldata, dst);
    VirtualFree(buf, 0, MEM_RELEASE);
    entry(dst, DLL_PROCESS_ATTACH, NULL);
}

This is the entire loading pipeline without BUD, without the three-call protocol, and without Aggressor integration. LibTCG, LibGate, and DFR all still function — only the Cobalt Strike handshake is removed. You can then add back whatever data-passing contract your C2 agent expects.

The Core Portable Components

ComponentFunctionCS Dependency
Crystal Palace linkerCompiles COFF → PIC shellcodeNone
LibTCGPE parsing, loading, imports, relocationsNone
LibGateIndirect syscall executionNone
DFRDynamic function resolution at link timeNone
mergelib / PICOsComposable PIC module mergingNone
BUDLoader → agent data contractCS-specific structures
Three-call DllMainAgent initialization protocolCS-specific protocol

9. Detection & OPSEC Considerations

This table summarizes the detection posture of Crystal-Loaders in its current PoC state versus what becomes possible with the extensions described in this module:

Detection VectorCurrent Crystal-LoadersWith Extensions
Private-commit executable memoryVulnerable — uses VirtualAllocModule stomping → file-backed MEM_IMAGE
PE headers in memoryMinimal — PIC blob has no PE headers, but loaded Beacon doesHeader erasure post-load
RWX memoryAvoids — uses RW then RX per sectionSame
Return address in private memoryLibGate indirect syscalls help (ntdll frame present)Call stack spoofing eliminates anomalous frames
YARA on ROR13 constantsNTDLL_HASH (0x3CFA685D, from loader.c) visible in PICCustom hash algorithm obfuscates constants
ETW syscall telemetryKernel-level — cannot bypass from userlandN/A — kernel telemetry remains
Behavioral analysisVirtualAlloc + VirtualProtect patternsModule stomping changes the pattern entirely
Sleep mask detectionNo sleep mask includedBUD-aware sleep mask encrypts during sleep
CLR / PowerShell image loadsNot addressedIAT hooking PICO can intercept and obfuscate

No Silver Bullet

Even with every extension implemented, certain detection vectors remain. Kernel-level ETW telemetry sees every syscall regardless of how it is invoked. Behavioral analysis can correlate process creation patterns, network traffic, and named pipe usage. Memory scanners are evolving to detect module stomping by comparing mapped sections against the on-disk DLL. The goal of these extensions is not to achieve permanent invisibility — it is to raise the cost of detection high enough that automated tools miss the implant, forcing defenders into expensive manual analysis.

10. Module Summary

Key Takeaways

Module 8 Quiz: Extending Crystal-Loaders

Q1: What allocation method should you use in ALLOCATED_MEMORY when implementing module stomping?

METHOD_MODULESTOMP is the correct allocation method to record in ALLOCATED_MEMORY when the loader uses module stomping. This tells BUD consumers (especially the sleep mask) that the memory region is file-backed via a stomped DLL rather than a private VirtualAlloc allocation. METHOD_NTMAPVIEW is also related but less precise — it indicates a generic NtMapViewOfSection mapping, whereas METHOD_MODULESTOMP specifically signals that a legitimate DLL's sections have been overwritten with payload data.

Q2: What is the primary benefit of guardrailed loaders?

Guardrailed loaders derive an encryption key from a target-specific property — for example, the simple_rdll_guardrail uses the C: drive volume serial number obtained via GetVolumeInformationA, then encrypts with RC4 via SystemFunction033. If the payload is captured and detonated in a sandbox or analyst workstation, the derived key will not match, and RC4 decryption produces garbage. The Beacon DLL never materializes, leaving no strings, configuration, or C2 addresses for blue team analysis. The payload is forensically inert outside the intended target.

Q3: What Crystal Palace feature makes the PIC architecture portable beyond Cobalt Strike?

The core components of Crystal Palace — the PIC linker, Dynamic Function Resolution (DFR), LibTCG PE loading, LibGate indirect syscalls, and the mergelib/PICO system — have zero dependencies on Cobalt Strike. The only CS-specific pieces are the BUD data structures (beacon.h) and the three-call DllMain initialization protocol. Removing those interfaces lets you use the entire PIC architecture with any C2 framework or standalone shellcode runner.