Module 2: Windows Thread Pool Architecture
The user-mode thread pool API, its core data structures, and the I/O completion port that drives everything.
Module Objective
Understand the Windows Thread Pool API surface, the relationship between TP_POOL, TP_WORK, TP_TIMER, TP_WAIT, TP_IO, and TP_ALPC objects, how worker threads are managed, and how I/O completion ports (IOCP) serve as the central dispatch mechanism. This is essential groundwork for understanding all 8 PoolParty variants.
1. Why Thread Pools Exist
Creating and destroying threads is expensive. Each thread requires kernel stack allocation, context initialization, and scheduler registration. Thread pools solve this by maintaining a set of pre-created worker threads that wait for work items to process:
- Amortized cost — threads are created once and reused for many operations
- Automatic scaling — the pool grows and shrinks based on workload
- Unified dispatch — all async operations (work, timers, waits, I/O) use the same worker threads
- Default pool — every Windows process has a default thread pool created automatically
The fact that every Windows process has a thread pool is critical for PoolParty — it means every process has injectable infrastructure already running.
2. The Thread Pool API
Windows exposes the thread pool through both the public Win32 API and the internal ntdll.dll Tp* functions. The public API wraps the internal functions:
| Public API | Internal (ntdll) | Purpose |
|---|---|---|
CreateThreadpoolWork | TpAllocWork | Create a work callback item |
SubmitThreadpoolWork | TpPostWork | Submit work to the pool queue |
CreateThreadpoolTimer | TpAllocTimer | Create a timer callback item |
SetThreadpoolTimer | TpSetTimer | Arm the timer with a due time |
CreateThreadpoolWait | TpAllocWait | Create a wait callback item |
SetThreadpoolWait | TpSetWait | Associate a handle to wait on |
CreateThreadpoolIo | TpAllocIoCompletion | Create an I/O callback item |
StartThreadpoolIo | TpStartAsyncIoOperation | Begin async I/O tracking |
PoolParty works at the internal Tp* level, directly manipulating the structures these APIs create and manage.
3. Core Data Structures
3.1 TP_POOL
The central structure representing a thread pool instance. Every process has at least one default TP_POOL. Key fields include:
C++ (Reconstructed)// Simplified TP_POOL structure (undocumented, reverse-engineered)
struct TP_POOL {
TP_TASK_CALLBACKS TaskCallbacks;
LIST_ENTRY WorkerListHead; // Linked list of worker threads
ULONG NumWorkers; // Current worker count
ULONG MinWorkers; // Minimum worker threads
ULONG MaxWorkers; // Maximum worker threads
HANDLE CompletionPort; // IOCP handle - THE dispatch mechanism
TP_QUEUE TaskQueue[3]; // High, Normal, Low priority queues
HANDLE WorkerFactory; // Worker factory kernel object
// ... additional fields
};
The IOCP Is Everything
The CompletionPort field is the I/O completion port that all worker threads wait on. Every work item, timer expiration, wait satisfaction, and I/O completion is dispatched through this single IOCP. If you can post a completion packet to this IOCP, you can trigger code execution on a worker thread. This is the insight behind several PoolParty variants.
3.2 TP_WORK
Represents a work callback — a function pointer and context to be executed by a worker thread:
C++ (Reconstructed)struct TP_WORK {
TP_TASK Task; // Contains the callback + context
TP_POOL *Pool; // Owning pool
LIST_ENTRY ListEntry; // Queue linkage
ULONG Flags;
// ...
};
struct TP_TASK {
TP_CALLBACK_ENVIRON_V3 *CallbackEnviron;
PTP_WORK_CALLBACK WorkCallback; // Function to call
PVOID Context; // Argument to callback
// ...
};
3.3 TP_TIMER
A timer-based callback item. When the timer expires, the callback is dispatched through the IOCP:
C++ (Reconstructed)struct TP_TIMER {
TP_TASK Task; // Callback + context
TP_POOL *Pool;
LARGE_INTEGER DueTime; // When to fire
ULONG Period; // Recurring interval (0 = one-shot)
ULONG Window; // Coalescing window
LIST_ENTRY WindowListEntry; // Timer window list
// ...
};
3.4 TP_WAIT
Waits on a kernel object (event, mutex, semaphore, process). When the object is signaled, the callback fires:
C++ (Reconstructed)struct TP_WAIT {
TP_TASK Task; // Callback + context
TP_POOL *Pool;
HANDLE WaitObject; // Object to wait on
HANDLE WaitHandle; // Registered wait handle
// ...
};
3.5 TP_IO
I/O completion callback. When an async I/O operation completes on a file handle bound to the thread pool, the callback fires:
C++ (Reconstructed)struct TP_IO {
TP_TASK Task; // Callback + context
TP_POOL *Pool;
HANDLE FileHandle; // The I/O target
ULONG PendingCount; // Outstanding operations
// ...
};
3.6 TP_ALPC
ALPC (Advanced Local Procedure Call) port callback. When a message arrives on a bound ALPC port, the callback fires. This is used internally by Windows for RPC dispatch:
C++ (Reconstructed)struct TP_ALPC {
TP_TASK Task; // Callback + context
TP_POOL *Pool;
HANDLE AlpcPort; // ALPC port handle
// ...
};
4. I/O Completion Ports (IOCP)
The IOCP is the backbone of the Windows thread pool. It is a kernel object that acts as a queue of completion packets and a thread wakeup mechanism:
IOCP as Central Dispatcher
Work callback
Completion packets
TppWorkerThread
Timer expiry
Completion packets
TppWorkerThread
I/O completion
Completion packets
TppWorkerThread
Worker threads call NtRemoveIoCompletion (the native equivalent of GetQueuedCompletionStatus) to block until a completion packet arrives. When one does, the worker thread extracts the work item from the packet and calls the registered callback.
C++// How worker threads consume work (simplified)
while (true) {
ULONG_PTR completionKey;
LPOVERLAPPED overlapped;
DWORD bytes;
// Block until work is available
NtRemoveIoCompletion(pool->CompletionPort,
&completionKey,
&overlapped,
&ioStatusBlock,
NULL); // infinite timeout
// The completion key and overlapped encode the work item type
TP_TASK* task = DecodeWorkItem(completionKey, overlapped);
task->WorkCallback(task->Instance, task->Context);
}
5. Worker Threads and the Worker Factory
Worker threads are not created with CreateThread. Instead, Windows uses a Worker Factory kernel object that manages thread creation and destruction:
| Component | Description |
|---|---|
| Worker Factory | Kernel object (TpWorkerFactory) that creates and manages worker threads for a pool |
| Start Routine | The function new worker threads begin executing — typically TppWorkerThread |
| Min/Max Threads | The factory automatically scales between min and max based on load |
| NtCreateWorkerFactory | Creates the factory, specifying the IOCP and start routine |
C++// Worker factory creation (internal, during pool init)
NTSTATUS NtCreateWorkerFactory(
PHANDLE WorkerFactoryHandle,
ACCESS_MASK DesiredAccess,
POBJECT_ATTRIBUTES ObjectAttributes,
HANDLE CompletionPortHandle, // The pool's IOCP
HANDLE WorkerProcessHandle, // Process the threads run in
PVOID StartRoutine, // TppWorkerThread
PVOID StartParameter, // Pool context
ULONG MaxThreadCount,
SIZE_T StackReserve,
SIZE_T StackCommit
);
Variant 1 Preview
The Worker Factory’s StartRoutine determines what function new worker threads execute. PoolParty Variant 1 uses NtSetInformationWorkerFactory to change the StartRoutine to point at injected shellcode, then triggers creation of a new worker thread. The new thread starts executing the shellcode directly.
6. The Default Thread Pool
Every Windows process automatically has a default thread pool. It is created lazily when any thread pool API is first called. Key characteristics:
- Location — stored in the process’s TEB/PEB structures, accessible via
NtQueryInformationProcess - Shared — all code in the process that uses
CreateThreadpoolWork(NULL, ...)(without a custom pool) shares this pool - Always present — any non-trivial Windows process (services, GUI apps, browsers) uses it
- Has an IOCP — the pool’s I/O completion port is a handle that PoolParty needs to locate
PoolParty’s first challenge in each variant is to locate the target process’s TP_POOL and extract the relevant handles (IOCP, Worker Factory) from it.
7. Object Relationships
How the Pieces Fit Together
| Object | Lives In | Connected To |
|---|---|---|
TP_POOL | User-mode heap | IOCP handle, Worker Factory handle, task queues |
TP_WORK | User-mode heap | Back-pointer to TP_POOL, callback + context |
TP_TIMER | User-mode heap | Back-pointer to TP_POOL, timer parameters |
TP_WAIT | User-mode heap | Back-pointer to TP_POOL, wait object handle |
TP_IO | User-mode heap | Back-pointer to TP_POOL, file handle |
TP_ALPC | User-mode heap | Back-pointer to TP_POOL, ALPC port handle |
Worker Factory | Kernel | IOCP handle, StartRoutine, thread counts |
IOCP | Kernel | Completion packet queue, waiting thread list |
The key takeaway: all roads lead to the IOCP. Whether you inject a work item, fire a timer, signal a wait, complete an I/O, or send an ALPC message, the callback dispatch goes through the I/O completion port. Understanding this central role is essential for understanding every PoolParty variant.
Knowledge Check
Q1: What kernel object serves as the central dispatch mechanism for all thread pool callbacks?
Q2: What internal ntdll function do worker threads call to wait for work items?
Q3: Why is the default thread pool significant for PoolParty?