Difficulty: Beginner

Module 2: Remote Function Hooking Concept

Redirect execution flow by patching the first bytes of a function — even in another process.

The Core Idea

Inline function hooking is a technique where you overwrite the beginning of a target function with a jump instruction that redirects execution to your own code. This has been used for decades in debugging, instrumentation, and game modding. ThreadlessInject applies this concept across process boundaries: you hook a function inside a remote process so that when an existing thread in that process calls the hooked function, it runs your shellcode instead.

What Is Inline Hooking?

At the machine code level, every function begins with a prologue — a sequence of instructions that sets up the stack frame. On x64 Windows, a typical function prologue looks like this:

x86-64 ASM; Typical x64 function prologue
sub rsp, 0x28         ; 4 bytes - allocate stack space (shadow space + alignment)
mov [rsp+0x30], rcx   ; 5 bytes - save first parameter
mov [rsp+0x38], rdx   ; 5 bytes - save second parameter

; Or for functions that save nonvolatile registers:
mov [rsp+0x08], rbx   ; 5 bytes - save nonvolatile register
push rdi              ; 1 byte  - save nonvolatile register
sub rsp, 0x20         ; 4 bytes - allocate shadow space

Note: The mov edi, edi hot-patch padding sometimes referenced in hooking literature is a 32-bit Windows convention. On x64, this padding does not exist; hot-patching uses different mechanisms.

An inline hook overwrites these first bytes with a jump instruction that redirects execution to a different address. The original bytes that were overwritten are saved so they can be executed later (this is the "trampoline" pattern), allowing the original function to still work after the hook runs.

The Hook Jump: Overwriting the Prologue

On x64, the most common way to redirect execution is with an absolute indirect jump. You need to get to an arbitrary 64-bit address, which requires more than a simple relative JMP (which is limited to a 32-bit signed displacement, or roughly +/- 2GB). The standard approach is:

x86-64 ASM; Absolute jump via RIP-relative addressing (14 bytes total)
; This is what gets written over the function prologue
jmp [rip+0]           ; FF 25 00 00 00 00  (6 bytes) - jump to address stored at [RIP+0]
dq targetAddress      ; 8 bytes - the 64-bit absolute address to jump to

; After execution of "jmp [rip+0]":
;   RIP is loaded from the 8 bytes immediately following the JMP instruction
;   This gives us a full 64-bit address range

This 14-byte absolute JMP is a common approach for generic x64 inline hooking. However, the actual ThreadlessInject tool uses a different strategy: a 5-byte relative CALL instruction (opcode E8 + 4-byte signed offset), which only overwrites 5 bytes of the target function's prologue. To make this work, ThreadlessInject allocates its shellcode loader within +/- 2GB of the target function so the relative offset fits in 32 bits. The CALL also pushes the return address onto the stack, which the loader pops to know where the hook point is for self-restoration. This course uses the 14-byte absolute JMP approach for pedagogical clarity, as it demonstrates the full range of x64 hooking mechanics.

Why 14 Bytes?

A relative JMP rel32 (opcode E9) is only 5 bytes but can only reach addresses within +/- 2GB of the current instruction. Since shellcode could be allocated anywhere in the 64-bit address space, you need the full absolute jump. The FF 25 encoding with a following 8-byte address gives you complete coverage of the address space at the cost of overwriting 14 bytes of the target function's prologue.

The Trampoline Pattern

When you overwrite a function's first 14 bytes, you destroy those original instructions. If you simply redirect to your shellcode and never execute the original function, the target process will break — whatever functionality that function provided is now gone. The trampoline solves this by preserving the overwritten bytes and providing a way to execute them after your hook code runs.

Inline Hook with Trampoline Flow

Caller invokes
TargetFunc()

→

JMP to
Hook Stub

→

Execute
Shellcode

→

Execute saved
original bytes

→

JMP back to
TargetFunc+14

The trampoline works in four stages:

Redirect: The caller enters TargetFunc() and immediately hits the JMP instruction that was patched in. Execution transfers to the hook stub.
Hook execution: The hook stub saves register state, calls the shellcode payload, then restores register state.
Original bytes: The saved original prologue bytes are executed. These are the instructions that were overwritten by the JMP.
Resume: A second JMP transfers execution back to TargetFunc+14 (the instruction immediately after the overwritten region), and the original function continues as if nothing happened.

Local vs. Remote Hooking

In a local hook (hooking a function in your own process), you have direct memory access. You can simply use memcpy to save original bytes, VirtualProtect to make the page writable, and write your jump. Remote hooking — hooking a function in another process — requires cross-process memory APIs:

Operation	Local (Same Process)	Remote (Cross-Process)
Read original bytes	`memcpy()`	`NtReadVirtualMemory()`
Allocate hook stub	`VirtualAlloc()`	`NtAllocateVirtualMemory()`
Write hook stub	`memcpy()`	`NtWriteVirtualMemory()`
Change page protection	`VirtualProtect()`	`NtProtectVirtualMemory()`
Write JMP (install hook)	`memcpy()`	`NtWriteVirtualMemory()`

ThreadlessInject uses the Nt* native API variants throughout. These are the lowest-level usermode functions and go directly to the kernel via syscalls, bypassing any userland hooks that EDR products may have placed on the kernel32.dll wrappers.

Why This Works for Injection

The insight that makes this an injection technique (rather than just a hooking technique) is straightforward: if you hook a function inside a remote process, you do not need to create a thread to execute your code. The target process's own existing threads will do it for you. Every time any thread in the target process calls the hooked function, your code runs. No CreateRemoteThread, no NtCreateThreadEx, no APC queuing. The thread creation kernel callback (PsSetCreateThreadNotifyRoutine) never fires because no thread is created.

C++// Conceptual overview of remote function hooking for injection:
//
// 1. Open target process
// 2. Find address of a frequently-called function (e.g., ntdll!RtlUserThreadStart)
// 3. Read the first 14 bytes of that function (save original prologue)
// 4. Allocate RWX memory in target process for shellcode + hook stub
// 5. Write shellcode + hook stub + saved original bytes into allocated memory
// 6. Overwrite target function's first 14 bytes with JMP to our hook stub
// 7. Wait... an existing thread calls the function and executes our shellcode
//
// Result: shellcode runs in target process with ZERO new threads created

Key Advantage: Execution on Existing Threads

Because your shellcode executes on an existing thread that was already running in the target process, the thread's call stack looks largely normal. The thread was already there, already running, already making function calls. It just happens to take a detour through your shellcode before continuing its normal work. This is fundamentally different from a new thread whose entire existence is attributable to the injection.

Limitations to Keep in Mind

Remote function hooking is not without challenges. The hook is not guaranteed to execute immediately — it runs only when a thread calls the hooked function. If you pick a function that is rarely called, your shellcode might not execute for a long time (or ever). There are also thread safety concerns: if a thread is partway through executing the function when you overwrite its prologue, the thread could crash. These challenges are addressed in subsequent modules.

← Previous: Thread Creation Problem Next: Target Function Selection →