Difficulty: Beginner

Module 1: The Thread Creation Problem

Every classic injection technique has the same Achilles' heel: creating a thread in a remote process.

Why This Module?

Before understanding ThreadlessInject by CCob (EthicalChaos), you must understand the problem it solves. Traditional process injection techniques rely on creating a new thread in the target process to execute injected code. This single operation generates a cascade of telemetry that modern EDR products exploit ruthlessly. ThreadlessInject exists because thread creation is, from an attacker's perspective, the loudest thing you can do.

The Classic Injection Pattern

Nearly every traditional process injection technique follows the same three-step pattern. First, you allocate memory in the target process. Second, you write your payload (shellcode or a DLL) into that memory. Third, you trigger execution of that payload. It is this third step — triggering execution — that has historically been the most detectable, because the most common approach is to create a new thread in the remote process.

The canonical implementation uses CreateRemoteThread, a Win32 API exported by kernel32.dll. This function asks the kernel to create a new thread in a specified process, starting execution at an address you control:

C++// Classic injection: allocate, write, execute via new thread
HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, targetPid);

// Step 1: Allocate memory in target
LPVOID remoteBuf = VirtualAllocEx(hProcess, NULL, shellcodeLen,
    MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);

// Step 2: Write shellcode
WriteProcessMemory(hProcess, remoteBuf, shellcode, shellcodeLen, NULL);

// Step 3: Create a remote thread to execute it
HANDLE hThread = CreateRemoteThread(hProcess, NULL, 0,
    (LPTHREAD_START_ROUTINE)remoteBuf, NULL, 0, NULL);

This pattern works, but it is trivially detectable. The call to CreateRemoteThread triggers a well-documented chain of kernel events that security products monitor.

Kernel-Level Thread Creation Callbacks

Windows provides a documented mechanism for kernel-mode drivers to receive notifications whenever a new thread is created: PsSetCreateThreadNotifyRoutine. Every major EDR product registers a callback with this function. When your CreateRemoteThread call creates a new thread, the kernel walks its list of registered callbacks and invokes each one, passing the owning process ID, the new thread ID, and a boolean indicating creation (TRUE) or deletion (FALSE).

Kernel CallbackWhat It ReportsUsed By
PsSetCreateThreadNotifyRoutineNew thread creation: owning process PID, thread ID, create/delete flagAll major EDRs
PsSetCreateProcessNotifyRoutineExNew process creation with full image pathAll major EDRs
PsSetLoadImageNotifyRoutineDLL/image loads into any processAll major EDRs
ObRegisterCallbacksHandle operations (OpenProcess, OpenThread)Most EDRs

The critical observation is this: when process A creates a thread in process B, the EDR driver sees that the creating process differs from the target process. A thread created by a process in itself is normal. A thread created by an external process is immediately suspicious, and the EDR will flag it for further analysis or outright block it.

Cross-Process Thread Creation = Immediate Alert

EDR products like CrowdStrike Falcon, Microsoft Defender for Endpoint, and SentinelOne all monitor PsSetCreateThreadNotifyRoutine. When they detect cross-process thread creation (creator PID != target PID), this alone is often sufficient to flag the operation as suspicious and trigger deeper behavioral analysis or termination.

ETW Telemetry: Even More Visibility

Event Tracing for Windows (ETW) provides additional userland telemetry on thread creation. The Microsoft-Windows-Kernel-Process ETW provider emits ThreadStart and ThreadStop events for every thread in the system. EDR agents subscribe to these events and correlate them with their kernel callback data.

The ETW telemetry includes the thread's start address, which is the address passed to CreateRemoteThread. If this start address points into a region that was recently allocated with VirtualAllocEx and has executable permissions, the detection is practically certain. The sequence "allocate remote RWX memory, write to it, create thread pointing to it" is one of the most well-known attack patterns in the Windows ecosystem.

C++// What the ETW event looks like to the EDR:
// Event: ThreadStart/Start
// Fields:
//   ProcessId:      target.exe (PID 1234)
//   ThreadId:       newly created thread
//   StartAddress:   0x00000213A0010000  <-- points to VirtualAllocEx'd memory
//   StackBase:      ...
//   StackLimit:     ...
//
// EDR correlation:
//   - StartAddress is in private, recently-allocated, RWX memory
//   - Creating process (attacker.exe) != target process (target.exe)
//   - VERDICT: Malicious remote thread injection

Alternatives to CreateRemoteThread (Still Loud)

Attackers have tried many variations to avoid CreateRemoteThread detection, but all share the fundamental problem of creating a new thread or scheduling code execution through monitored mechanisms:

Traditional Injection Execution Methods

CreateRemoteThread — direct remote thread, trivially detected
NtCreateThreadEx — native API, same kernel callback fires
RtlCreateUserThread — undocumented ntdll API, still creates a thread
QueueUserAPC — queues to existing thread but requires alertable state
SetThreadContext — hijacks existing thread, but suspicious context change
NtQueueApcThread — native APC queue, monitored by modern EDRs

Even when attackers use lower-level native API functions like NtCreateThreadEx, the kernel callback still fires because the actual thread creation happens in the kernel. The PsSetCreateThreadNotifyRoutine callback is triggered by the kernel's internal thread creation path, not by the specific userland API that was called. Switching from CreateRemoteThread to NtCreateThreadEx is merely cosmetic from a detection standpoint.

APC-Based Injection: Better, But Still Detectable

Asynchronous Procedure Calls (APCs) represent a step forward because they execute code on an existing thread rather than creating a new one. However, APCs have their own problems. The target thread must be in an alertable wait state (meaning it called SleepEx, WaitForSingleObjectEx, or similar with the bAlertable flag set to TRUE). Not all threads are in an alertable state, making APC injection unreliable. Additionally, modern EDR products now monitor NtQueueApcThread calls where the calling process differs from the target process.

C++// APC injection: still requires finding an alertable thread
// and the cross-process QueueUserAPC call is monitored
HANDLE hThread = OpenThread(THREAD_SET_CONTEXT, FALSE, targetThreadId);
QueueUserAPC((PAPCFUNC)remoteShellcodeAddr, hThread, 0);
// Problem 1: thread must be in alertable wait state
// Problem 2: cross-process APC queuing is now monitored by EDRs
// Problem 3: if thread never enters alertable state, payload never runs

The ThreadlessInject Insight

What if you could make the target process execute your code without creating a thread and without queuing an APC? What if, instead of telling the target process to run something new, you modified something the target is already doing so that it runs your code as part of its normal operation? This is the core insight behind ThreadlessInject: hook a function that the target process already calls regularly, so that the next time an existing thread calls that function, your code runs.

The Detection Surface Summary

To appreciate why ThreadlessInject is significant, consider the full detection surface of traditional injection:

Detection LayerWhat It CatchesThreadlessInject Avoids?
Kernel thread callbacksCross-process thread creationYes — no new thread
ETW thread eventsThread start address in suspicious memoryYes — no new thread
Handle monitoringOpenProcess with suspicious access rightsPartial — still needs handle
Memory scanningRWX memory regions, shellcode patternsPartial — still allocates memory
API hooking (userland)Calls to CreateRemoteThread, WriteProcessMemoryYes — uses Nt* APIs
Behavioral analysisPattern: alloc + write + thread creationYes — no thread in pattern

ThreadlessInject eliminates the single most detectable component of the injection chain — the thread creation event — while accepting that some detection surface (like cross-process memory allocation and writing) remains. The trade-off is heavily in the attacker's favor because thread creation was by far the strongest signal available to defenders.

Pop Quiz: The Thread Creation Problem

Q1: Why does switching from CreateRemoteThread to NtCreateThreadEx not evade kernel-level detection?

Correct! The kernel callback PsSetCreateThreadNotifyRoutine is triggered by the kernel's internal thread creation logic (PspInsertThread), not by any specific userland API. Whether you call CreateRemoteThread, NtCreateThreadEx, or RtlCreateUserThread, the same kernel code path runs and the same callback fires.

Q2: What is the primary limitation of APC-based injection?

APCs only execute when the target thread enters an alertable wait state (by calling SleepEx, WaitForSingleObjectEx, MsgWaitForMultipleObjectsEx, etc. with bAlertable=TRUE). If the thread never enters this state, the queued APC will never run, making APC injection unreliable.

Q3: What is the core innovation of ThreadlessInject?

ThreadlessInject's key innovation is modifying (hooking) a function that the target process already calls regularly. When an existing thread in the target process next calls that hooked function, the hook redirects execution to the attacker's shellcode. No new thread is created, so kernel thread creation callbacks never fire.