Module 3: Thread Pool Internals
TppWorkerThread, work item lifecycle, how the kernel dispatches callbacks, and TP_POOL deep dive.
Module Objective
Understand the internal execution flow of a thread pool worker thread from the moment it is created by the worker factory, through its main loop waiting on the IOCP, to the point where it dispatches a callback. This understanding is critical for knowing exactly where PoolParty inserts its hooks into the dispatch chain.
1. TppWorkerThread — The Worker Main Loop
Every worker thread in a Windows thread pool begins execution at ntdll!TppWorkerThread. This function contains the main dispatch loop that all work items pass through:
C++ (Pseudocode)// ntdll!TppWorkerThread - simplified pseudocode
VOID TppWorkerThread(PTP_POOL Pool)
{
// Register this thread with the worker factory
TppWorkerThreadInit(Pool);
while (TRUE)
{
IO_STATUS_BLOCK iosb;
PVOID completionKey;
PVOID overlapped;
// Wait on the pool's IOCP for the next work item
NTSTATUS status = NtRemoveIoCompletion(
Pool->CompletionPort,
&completionKey,
&overlapped,
&iosb,
NULL // infinite wait
);
if (!NT_SUCCESS(status))
break;
// Determine work item type from the completion packet
TP_TASK_TYPE type = DecodeTaskType(completionKey, overlapped);
switch (type)
{
case TP_TASK_WORK:
TppWorkpExecuteCallback((PTP_WORK)overlapped);
break;
case TP_TASK_TIMER:
TppTimerpExecuteCallback((PTP_TIMER)overlapped);
break;
case TP_TASK_WAIT:
TppWaitpExecuteCallback((PTP_WAIT)overlapped);
break;
case TP_TASK_IO:
TppIopExecuteCallback((PTP_IO)overlapped);
break;
case TP_TASK_ALPC:
TppAlpcpExecuteCallback((PTP_ALPC)overlapped);
break;
case TP_TASK_DIRECT:
TppDirectpExecuteCallback((PTP_DIRECT)overlapped);
break;
}
// Notify the worker factory this thread is available again
TppWorkerThreadReady(Pool);
}
}
Key Insight for PoolParty
The worker thread does not validate the origin of completion packets. If a completion packet arrives on the IOCP with a properly formatted work item, the worker thread will dispatch it. This is the fundamental trust assumption PoolParty exploits — completion packets are trusted implicitly.
2. Work Item Lifecycle
A normal work item goes through these stages from creation to callback execution:
Work Item Lifecycle
Allocate TP_WORK
Queue to pool
Post to IOCP
Worker executes
2.1 Allocation (TpAllocWork)
TpAllocWork allocates a TP_WORK structure on the heap, initializes the callback pointer and context, and associates it with a TP_POOL:
C++// Creating a work item (normal usage)
PTP_WORK workItem = NULL;
TpAllocWork(&workItem, MyCallback, myContext, NULL);
// workItem->Task.WorkCallback = MyCallback
// workItem->Task.Context = myContext
// workItem->Pool = default pool (or specified pool)
2.2 Submission (TpPostWork)
TpPostWork inserts the work item into the pool’s task queue and posts a completion packet to the IOCP to wake a worker thread:
C++ (Pseudocode)VOID TpPostWork(PTP_WORK Work)
{
// Insert into the pool's pending task queue
TppWorkInsertQueue(Work->Pool, Work, TP_PRIORITY_NORMAL);
// Post a completion packet to wake a worker thread
NtSetIoCompletion(
Work->Pool->CompletionPort,
(ULONG_PTR)Work, // CompletionKey = work item pointer
NULL, // ApcContext
STATUS_SUCCESS,
0
);
}
2.3 Dispatch
A sleeping worker thread is woken by the IOCP, dequeues the completion packet, extracts the TP_WORK pointer, and calls the registered callback:
C++ (Pseudocode)VOID TppWorkpExecuteCallback(PTP_WORK Work)
{
PTP_CALLBACK_INSTANCE instance;
TppInitCallbackInstance(&instance, Work);
// Call the user's callback function
Work->Task.WorkCallback(
instance,
Work->Task.Context,
Work
);
TppCleanupCallbackInstance(&instance);
}
3. The Worker Factory Kernel Object
The worker factory is a kernel-mode object type (TpWorkerFactory) that manages the lifecycle of worker threads. Key properties:
| Property | Description | Relevant To |
|---|---|---|
| StartRoutine | Function pointer for new threads (TppWorkerThread) | Variant 1 |
| StartParameter | Parameter passed to StartRoutine (TP_POOL pointer) | Variant 1 |
| MinThreadCount | Minimum number of worker threads | Thread creation trigger |
| MaxThreadCount | Maximum number of worker threads | Scaling limits |
| CompletionPort | The IOCP that workers wait on | All IOCP variants |
| WorkerCount | Current number of active workers | Scaling decisions |
3.1 Querying the Worker Factory
The NtQueryInformationWorkerFactory syscall retrieves information about a worker factory, including its start routine and thread counts:
C++// Query worker factory information
typedef struct _WORKER_FACTORY_BASIC_INFORMATION {
LARGE_INTEGER Timeout;
LARGE_INTEGER RetryDelay;
ULONG IdleWorkerCount;
ULONG TotalWorkerCount;
ULONG ActiveWorkerCount;
ULONG WaitingWorkerCount;
ULONG PendingWorkerCount;
PVOID StartRoutine; // TppWorkerThread normally
PVOID StartParameter; // TP_POOL pointer
HANDLE CompletionPort; // Pool IOCP
NTSTATUS LastThreadCreationStatus;
PROCESS_ID ProcessId;
} WORKER_FACTORY_BASIC_INFORMATION;
WORKER_FACTORY_BASIC_INFORMATION wfInfo;
NtQueryInformationWorkerFactory(
hWorkerFactory,
WorkerFactoryBasicInformation,
&wfInfo,
sizeof(wfInfo),
NULL
);
3.2 Modifying the Worker Factory
NtSetInformationWorkerFactory can modify worker factory properties. This is the API that Variant 1 uses to change the StartRoutine:
C++// Change the StartRoutine (used by Variant 1)
NtSetInformationWorkerFactory(
hWorkerFactory,
WorkerFactoryThreadMinimum, // Information class
&newMinimum, // Trigger new thread creation
sizeof(ULONG)
);
4. How the Kernel Dispatches Callbacks
The flow from kernel to user-mode callback involves several layers:
Kernel-to-User Callback Dispatch
Timer, I/O, signal
IoSetIoCompletion
NtRemoveIoCompletion
WorkCallback()
- Event occurs — a timer fires, I/O completes, a wait object is signaled
- Kernel posts to IOCP — the kernel calls
IoSetIoCompletion(internal) to post a completion packet to the pool’s IOCP - Worker thread wakes —
NtRemoveIoCompletionreturns with the completion packet data - Dispatch —
TppWorkerThreaddecodes the packet type and calls the appropriate Tpp*ExecuteCallback function - Callback runs — the user’s callback function executes in the context of the worker thread
5. Locating the Target’s TP_POOL
PoolParty needs to find the target process’s TP_POOL to extract handle values (IOCP, Worker Factory). The approach used by PoolParty involves:
5.1 Handle Enumeration
C++// Enumerate handles in the target process using NtQueryInformationProcess
// or NtQuerySystemInformation(SystemHandleInformation)
typedef struct _SYSTEM_HANDLE_TABLE_ENTRY_INFO {
USHORT UniqueProcessId;
USHORT CreatorBackTraceIndex;
UCHAR ObjectTypeIndex;
UCHAR HandleAttributes;
USHORT HandleValue;
PVOID Object;
ULONG GrantedAccess;
} SYSTEM_HANDLE_TABLE_ENTRY_INFO;
// Filter for IoCompletion (IOCP) and TpWorkerFactory object types
// Duplicate the handles into our process for inspection
HANDLE dupHandle;
DuplicateHandle(hTargetProcess, (HANDLE)entry.HandleValue,
GetCurrentProcess(), &dupHandle,
0, FALSE, DUPLICATE_SAME_ACCESS);
5.2 Identifying the Thread Pool IOCP
Not every IOCP in a process belongs to the thread pool. PoolParty identifies the correct one by querying worker factory objects and checking which IOCP they reference:
C++// For each worker factory handle found:
WORKER_FACTORY_BASIC_INFORMATION wfbi;
NtQueryInformationWorkerFactory(dupWfHandle,
WorkerFactoryBasicInformation,
&wfbi, sizeof(wfbi), NULL);
// wfbi.StartRoutine should be ntdll!TppWorkerThread
// wfbi.CompletionPort is the thread pool IOCP
// wfbi.StartParameter is the TP_POOL pointer in the target
6. TP_POOL Deep Dive
The TP_POOL structure (reconstructed from reverse engineering) contains fields that each PoolParty variant targets:
| Field | Offset (x64) | Used By Variant | Purpose |
|---|---|---|---|
TaskQueue[High] | +0x... | 2 | High-priority work item linked list |
TaskQueue[Normal] | +0x... | 2 | Normal-priority work item linked list |
TaskQueue[Low] | +0x... | 2 | Low-priority work item linked list |
CompletionPort | +0x... | 4, 5, 7 | IOCP handle value |
TimerQueue | +0x... | 8 | Timer item ordered list |
WaitObjectList | +0x... | 3 | Wait item list |
Offsets Are Version-Dependent
The exact field offsets within TP_POOL and other internal structures vary between Windows versions. PoolParty uses pattern matching and heuristics rather than hardcoded offsets to maintain compatibility across Windows 10 and 11 builds.
7. The TP_DIRECT Fast Path
In addition to the standard callback types, there is a special TP_DIRECT structure that provides a fast path for execution. Unlike TP_WORK or TP_TIMER, a TP_DIRECT item is dispatched directly from the IOCP completion packet without going through the standard task queue:
C++ (Reconstructed)struct TP_DIRECT {
TP_TASK Task; // Callback and context
// Minimal structure - no pool linkage needed
};
// TP_DIRECT dispatch in TppWorkerThread:
// If the completion packet has a specific signature,
// treat the overlapped pointer as a TP_DIRECT and call
// its callback immediately.
Variant 7 exploits this fast path. We will cover it in detail in Module 6.
Knowledge Check
Q1: What function do all worker threads begin executing at?
Q2: Why does the worker thread trust completion packets from the IOCP?
Q3: How does PoolParty identify which IOCP handle belongs to the thread pool?