Difficulty: Intermediate

Module 4: Reflective DLL Loading

Loading a DLL without Windows ever knowing about it.

What Makes It "Reflective"?

Normally, you call LoadLibrary("beacon.dll") and Windows handles everything: maps the file, resolves imports, applies relocations. But that leaves traces everywhere - the module appears in the PEB module list, file-backed memory, event logs. Reflective loading means the DLL loads itself from raw bytes in memory, bypassing the Windows loader entirely. No file on disk, no entry in the module list.

Why "Reflective"?

The term was coined by Stephen Fewer. The idea is that the loader is contained within the DLL itself - it "reflects" back to parse and load its own PE structure. In AceLdr's case, the loader (position-independent shellcode) is prepended to the Beacon DLL. When executed, it parses the attached PE, maps it into memory, and runs it. The entire process happens without touching the filesystem or the Windows loader.

AceLdr's Loading Sequence

The complete chain from shellcode injection to Beacon running, in 11 steps:

Full Reflective Loading Pipeline

1. Start() - ASM entry, aligns stack, calls Ace()

↓

2. Ace() - Creates suspended thread, hijacks its RIP to Loader()

↓

3. Loader() - resolveLoaderFunctions() from NTDLL via PEB

↓

4. calculateRegions() - parse PE headers, compute memory layout

↓

5. NtAllocateVirtualMemory() - allocate RW space for stub + beacon

↓

6. copyStub() - copy AceLdr's hook code to allocated region

↓

7. copyBeaconSections() - map PE sections to virtual addresses

↓

8. RtlCreateHeap() - create private heap for Beacon

↓

9. installHooks() - resolve IAT + overwrite 6 function pointers

↓

10. NtProtectVirtualMemory() - change protection to RX

↓

11. executeBeacon() - call Beacon's DllMain (reason=1, then reason=4)

Understanding Each Step

Steps 1-2 handle the initial execution and thread setup. Steps 3-5 prepare the runtime environment: resolving API functions, parsing the PE, and allocating memory. Steps 6-9 perform the actual loading: copying code, setting up the heap, and installing hooks. Step 10 hardens the memory (from writable to executable). Step 11 finally hands control to Beacon.

Thread Hijacking via Ace()

AceLdr doesn't just call Loader() directly. It creates a suspended thread and overwrites its instruction pointer. This is a common technique to avoid having the loader's call stack visible during Beacon execution:

C - from ace.cVOID Ace( VOID )
{
    API     Api;
    CONTEXT Ctx;
    HANDLE  Thread;

    // ... resolve NtGetContextThread, NtSetContextThread, etc.

    // Create a suspended thread at an innocent-looking start address
    // (RtlUserThreadStart + 0x21 is a ret instruction)
    PVOID StartAddress = pApi->ntdll.RtlUserThreadStart + 0x21;
    pApi->ntdll.RtlCreateUserThread(
        (HANDLE)-1,    // current process
        NULL, TRUE,    // suspended = TRUE
        0, 0, 0,
        StartAddress,  // innocent start address
        NULL, &Thread, NULL );

    // Hijack: overwrite RIP to point at our Loader function
    Ctx.ContextFlags = CONTEXT_CONTROL;
    Api.ntdll.NtGetContextThread( Thread, &Ctx );
    Ctx.Rip = (DWORD64) Loader;   // <-- The hijack
    Api.ntdll.NtSetContextThread( Thread, &Ctx );

    // Resume the thread - it now executes Loader() instead
    Api.ntdll.NtResumeThread( Thread, NULL );

    // Clean up evidence
    RtlSecureZeroMemory( &Api, sizeof(Api) );
    RtlSecureZeroMemory( &Ctx, sizeof(Ctx) );
}

Why Not Just Call Loader() Directly?

If AceLdr called Loader() directly, the call stack during Beacon execution would trace all the way back through the injection point. When memory scanners inspect sleeping threads, they examine the call stack. A clean thread created via RtlCreateUserThread with its RIP hijacked to Loader() has a much more innocuous-looking stack. The original thread (which ran the shellcode) can exit cleanly, leaving no trace.

The STUB Structure

AceLdr's STUB structure sits at the very beginning of the allocated region and acts as the control block for all hook functions:

C - from include.htypedef struct __attribute__(( packed ))
{
    ULONG_PTR Region;  // Base address of entire allocation
    ULONG_PTR Size;    // Total size of allocation
    HANDLE    Heap;    // Handle to private heap
} STUB, *PSTUB;

This compact 24-byte structure contains everything AceLdr's hooks need at runtime:

Region: The base address, needed by the sleep hook to know what memory region to encrypt/decrypt
Size: Total allocation size, so the sleep hook knows how many bytes to encrypt
Heap: The private heap handle, returned by GetProcessHeap_Hook()

Memory Layout After Loading

Memory Layout After Reflective Load

STUB (Region, Size, Heap) - 24 bytes

AceLdr Hook Code (Sleep_Hook, Spoof, etc.)

Page alignment padding

Beacon .text (code)

Beacon .rdata (imports, strings)

Beacon .data (globals)

Beacon .reloc (relocations)

Key Insight: Why the Stub Sits First

The hook functions need to know where the allocation starts (to encrypt it during sleep) and where the heap is (to redirect GetProcessHeap). By placing the STUB at the start and using position-independent addressing (OFFSET(Stub)), any hook function can find this data without global variables. The OFFSET macro (covered in Module 5) calculates the runtime address of the Stub relative to the current instruction pointer.

Executing Beacon: DllMain

After all loading steps are complete, AceLdr calls Beacon's DllMain entry point with two different reason values:

C - from ace.c executeBeacon()// First call: Standard DLL initialization
DllMain( beaconBase, DLL_PROCESS_ATTACH, NULL );  // reason = 1

// Second call: Cobalt Strike-specific "start beacon" signal
DllMain( beaconBase, 4, NULL );  // reason = 4 (Beacon init)

DllMain Reason Values

Reason 1 (DLL_PROCESS_ATTACH) is the standard Windows notification that a DLL has been loaded into a process. Beacon uses this to perform its initial setup. Reason 4 is a Cobalt Strike-specific convention for User Defined Reflective Loaders (UDRL). It signals Beacon to begin its main C2 communication loop. This two-step initialization allows Beacon to differentiate between "I've been loaded" and "start operating."

Pop Quiz: Reflective Loading

Q1: Why does Ace() create a suspended thread instead of calling Loader() directly?

For multithreading performance Because Loader() requires a separate thread to work To avoid having the shellcode's call stack visible during Beacon execution Windows requires DllMain to run on a new thread

If Loader() ran on the original shellcode injection thread, the call stack would trace back to the injected shellcode. By creating a new thread and hijacking its RIP, the Beacon runs on a clean thread with no suspicious stack frames.

Q2: executeBeacon() calls DllMain with two different "reason" values. What are they?

1 (DLL_PROCESS_ATTACH) then 4 (DLL_BEACON_INIT - Cobalt Strike specific) 0 (DLL_PROCESS_DETACH) then 1 (DLL_PROCESS_ATTACH) 1 (DLL_PROCESS_ATTACH) then 2 (DLL_THREAD_ATTACH) Just 1 (DLL_PROCESS_ATTACH) twice

Reason 1 is standard DLL_PROCESS_ATTACH. Reason 4 is a special Cobalt Strike value that signals Beacon to begin its main C2 loop. This two-step initialization is a Cobalt Strike convention for UDRL loaders.

← Previous: PEB & API Hashing Next: Position-Independent Code →