Difficulty: Intermediate

Module 3: LibTCG — The Tradecraft Garden

The shared PE loading library that gives every Crystal Palace loader its foundation — PE parsing, section loading, import resolution, relocations, and PEB-based hash resolution.

Module Objective

LibTCG (Library for the Tradecraft Garden) is the beating heart of the Crystal-Loaders ecosystem. It is a pre-compiled relocatable object file that Crystal Palace merges into your loader via mergelib. Every primitive a reflective loader needs — parsing PE headers, mapping sections, processing relocations, resolving imports, and walking the PEB for hash-based API resolution — lives here. By the end of this module, you will understand the key functions in LibTCG and how they compose into a complete PE loading pipeline. LibTCG also includes helper functions like LoadSections (which handles per-section copying as a sub-step of the loading process) and GetDataDirectory (a convenience wrapper for accessing PE data directory entries), which follow the same patterns covered here.

1. What Is LibTCG?

LibTCG stands for Library for the Tradecraft Garden. It is a shared C library that provides the common PE loading and API resolution primitives needed by any Crystal Palace-based loader. Rather than writing manual mapping code from scratch each time, LibTCG encapsulates the entire process in a clean, reusable interface.

Key Characteristics

When Crystal Palace processes a mergelib directive, it extracts the COFF objects from the zip archive and links them directly into your PIC output. The functions become available as if they were defined in your own source files. This modular approach means you can write a loader that is only a few dozen lines of C, delegating all the heavy lifting to LibTCG.

2. Key Types and Macros

The tcg.h header defines several structures, typedefs, and macros that the entire LibTCG API depends on. Understanding these is essential before looking at any function implementation.

IMPORTFUNCS

This structure holds the two Win32 API function pointers that LibTCG needs for import resolution. These are resolved by the loader before calling any LibTCG PE loading functions:

C — tcg.htypedef struct {
    __typeof__(LoadLibraryA)   * LoadLibraryA;     // Pointer to kernel32!LoadLibraryA
    __typeof__(GetProcAddress) * GetProcAddress;   // Pointer to kernel32!GetProcAddress
} IMPORTFUNCS;

Why Only Two Functions?

With LoadLibraryA and GetProcAddress, you can resolve any other Win32 API. LoadLibraryA loads a DLL by name and returns its base address. GetProcAddress takes a module handle and function name and returns the function pointer. Together, they bootstrap the entire IAT. The loader resolves these two from the PEB using findModuleByHash() and findFunctionByHash() before calling any LibTCG import processing.

DLLDATA

This structure is populated by ParseDLL() and carries parsed PE header pointers through the entire loading pipeline:

C — tcg.htypedef struct {
    IMAGE_DOS_HEADER      * DosHeader;       // Pointer to the MZ header
    IMAGE_NT_HEADERS      * NtHeaders;       // Pointer to PE\0\0 + COFF + Optional
    IMAGE_OPTIONAL_HEADER * OptionalHeader;  // Pointer to the Optional Header
} DLLDATA;

Every subsequent LibTCG function takes a pointer to a DLLDATA struct. It acts as a parsed view into the raw PE bytes, avoiding redundant re-parsing at each stage.

Function Pointer Typedefs

C — tcg.h// Standard DllMain signature for reflective DLL loading
typedef BOOL WINAPI (*DLLMAIN_FUNC)(HINSTANCE, DWORD, LPVOID);

// Entry point for PIC Object (PICO) payloads
typedef void (* PICOMAIN_FUNC)(char * arg);

Two Entry Point Models

DLLMAIN_FUNC is the classic DLL entry point called with DLL_PROCESS_ATTACH after a reflective load completes. PICOMAIN_FUNC is the simpler entry point used by PICO (Position Independent Code Object) payloads, which receive a single string argument. The loader decides which to call based on the payload type.

Utility Macros

C — tcg.h// Pointer arithmetic: advance pointer x by y bytes
#define PTR_OFFSET(x, y)  ( (void *)(x) + (ULONG)(y) )

// Dereference a pointer-sized value at the given address
#define DEREF(name)        *(UINT_PTR *)(name)

// Get the caller's return address (MinGW built-in)
#define WIN_GET_CALLER()   __builtin_extract_return_addr(__builtin_return_address(0))

Macro Breakdown

MacroPurposeUsage
PTR_OFFSET(x, y)Byte-level pointer arithmeticNavigate from a base address to an RVA: PTR_OFFSET(moduleBase, rva)
DEREF(name)Read a pointer-sized value from memoryRead IAT entries, function pointers from tables
WIN_GET_CALLER()Retrieve the return address of the current functionUsed for call stack introspection and validation

3. PE Loading Pipeline

LibTCG implements the full manual mapping pipeline as a series of composable functions. Each function handles one stage of the PE loading process. The loader calls them in sequence to transform raw PE bytes into an executable image in memory.

LibTCG PE Loading Pipeline

ParseDLL()
Validate & parse headers
SizeOfDLL()
Get virtual image size
LoadDLL()
Map headers + sections
ProcessRelocations()
Fix base delta
ProcessImports()
Resolve IAT
EntryPoint()
Calculate DllMain addr

ParseDLL(src, &data)

The first step in the pipeline. ParseDLL takes a pointer to the raw PE bytes and populates the DLLDATA struct with pointers into the PE structure:

C// ParseDLL validates the DOS header and locates the NT headers
void ParseDLL(char * src, DLLDATA * data)
{
    // 1. Cast the start of the buffer to a DOS header
    data->DosHeader = (IMAGE_DOS_HEADER *)src;

    // 2. Validate the MZ magic number (0x5A4D)
    //    If this check fails, the payload is not a valid PE

    // 3. Use e_lfanew to locate the NT headers
    //    e_lfanew is a LONG at offset 0x3C in the DOS header
    //    It contains the file offset to the PE signature
    data->NtHeaders = (IMAGE_NT_HEADERS *)PTR_OFFSET(src,
                          data->DosHeader->e_lfanew);

    // 4. Extract the Optional Header pointer
    //    The Optional Header immediately follows the COFF File Header
    data->OptionalHeader = &data->NtHeaders->OptionalHeader;
}

DOS Header Validation

The 0x4D5A magic ("MZ") at the very start of the file identifies it as a DOS executable. The e_lfanew field at offset 0x3C is a 4-byte offset pointing to the PE signature (0x00004550 = "PE\0\0"). Every PE file, whether a DLL or EXE, begins with this same DOS header structure. If e_lfanew points outside the buffer or the PE signature doesn't match, the file is corrupt or not a PE.

SizeOfDLL(&data)

Returns the total virtual size needed for the fully loaded image:

C// SizeOfDLL returns the total memory required for the mapped image
DWORD SizeOfDLL(DLLDATA * data)
{
    // SizeOfImage from the Optional Header tells us the total virtual
    // size of the PE when loaded into memory, including all sections
    // aligned to SectionAlignment boundaries
    return data->OptionalHeader->SizeOfImage;
}

The caller uses this value to allocate a contiguous memory region (via VirtualAlloc or NtAllocateVirtualMemory) before calling LoadDLL. The SizeOfImage field accounts for section alignment padding, so it is always larger than the raw file size.

LoadDLL(&dll, src, dst)

Copies the PE headers and all sections from the source buffer to the destination buffer at their correct virtual offsets:

C// LoadDLL maps the PE into the destination buffer
void LoadDLL(DLLDATA * dll, char * src, char * dst)
{
    // 1. Copy the PE headers (DOS + NT + Section Table)
    //    Size = OptionalHeader->SizeOfHeaders
    memcpy(dst, src,
           dll->OptionalHeader->SizeOfHeaders);

    // 2. Get the first section header
    PIMAGE_SECTION_HEADER section = IMAGE_FIRST_SECTION(dll->NtHeaders);
    WORD numSections = dll->NtHeaders->FileHeader.NumberOfSections;

    // 3. Iterate each section and copy raw data to virtual address
    for (WORD i = 0; i < numSections; i++) {
        if (section[i].SizeOfRawData > 0) {
            char * rawSrc  = src + section[i].PointerToRawData;
            char * rawDest = dst + section[i].VirtualAddress;
            memcpy(rawDest, rawSrc, section[i].SizeOfRawData);
        }
    }
}

Virtual Address vs Raw Data

On disk, section data is packed at PointerToRawData offsets aligned to FileAlignment (typically 0x200). In memory, sections sit at VirtualAddress offsets aligned to SectionAlignment (typically 0x1000). The gap between the raw and virtual layouts means sections may have zero-filled padding between them once mapped. This is why SizeOfImage is larger than the file size.

ProcessRelocations(&dll, src, dst)

When a DLL is loaded at a base address different from its preferred ImageBase, all absolute addresses embedded in the code must be patched. This is what the .reloc section is for:

C// ProcessRelocations patches absolute addresses for the new base
void ProcessRelocations(DLLDATA * dll, char * src, char * dst)
{
    // 1. Calculate the delta between actual and preferred base
    UINT_PTR delta = (UINT_PTR)dst -
                     dll->OptionalHeader->ImageBase;

    if (delta == 0) return;  // Loaded at preferred base, no fixups needed

    // 2. Locate the Base Relocation Directory
    PIMAGE_DATA_DIRECTORY relocDir = &dll->OptionalHeader->
        DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC];

    PIMAGE_BASE_RELOCATION reloc = (PIMAGE_BASE_RELOCATION)
        PTR_OFFSET(dst, relocDir->VirtualAddress);

    // 3. Walk each relocation block
    while (reloc->VirtualAddress != 0) {
        DWORD numEntries = (reloc->SizeOfBlock - sizeof(IMAGE_BASE_RELOCATION))
                           / sizeof(WORD);
        PWORD entries = (PWORD)(reloc + 1);

        for (DWORD i = 0; i < numEntries; i++) {
            WORD type   = entries[i] >> 12;        // Top 4 bits = type
            WORD offset = entries[i] & 0x0FFF;     // Bottom 12 bits = offset

            if (type == IMAGE_REL_BASED_DIR64) {   // Type 10: 64-bit fixup
                PUINT_PTR patchAddr = (PUINT_PTR)PTR_OFFSET(
                    dst, reloc->VirtualAddress + offset);
                *patchAddr += delta;               // Apply the delta
            }
        }
        // Advance to next relocation block
        reloc = (PIMAGE_BASE_RELOCATION)PTR_OFFSET(reloc, reloc->SizeOfBlock);
    }
}

Relocation Block Structure

FieldDescription
VirtualAddressBase RVA for this block of relocations (page-aligned)
SizeOfBlockTotal size of this block including the header and all entries
Entries (WORD[])Each WORD: top 4 bits = relocation type, bottom 12 bits = offset within the page

On x64, the only relocation type you typically encounter is IMAGE_REL_BASED_DIR64 (type 10), which patches a 64-bit absolute address by adding the base delta.

ProcessImports(&funcs, &dll, dst)

Walks the Import Directory Table and resolves every imported function into the IAT:

C// ProcessImports resolves all imported DLLs and functions
void ProcessImports(IMPORTFUNCS * funcs, DLLDATA * dll, char * dst)
{
    // 1. Locate the Import Directory
    PIMAGE_DATA_DIRECTORY importDir = &dll->OptionalHeader->
        DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT];

    PIMAGE_IMPORT_DESCRIPTOR importDesc = (PIMAGE_IMPORT_DESCRIPTOR)
        PTR_OFFSET(dst, importDir->VirtualAddress);

    // 2. Walk each import descriptor (one per imported DLL)
    while (importDesc->Name != 0) {
        // Resolve the imported DLL by name
        char * dllName = (char *)PTR_OFFSET(dst, importDesc->Name);
        HMODULE hModule = funcs->LoadLibraryA(dllName);

        // 3. Walk the ILT (Import Lookup Table) and IAT simultaneously
        PUINT_PTR thunk    = (PUINT_PTR)PTR_OFFSET(dst,
                              importDesc->OriginalFirstThunk);
        PUINT_PTR iatEntry = (PUINT_PTR)PTR_OFFSET(dst,
                              importDesc->FirstThunk);

        while (*thunk != 0) {
            if (IMAGE_SNAP_BY_ORDINAL(*thunk)) {
                // Import by ordinal
                *iatEntry = (UINT_PTR)funcs->GetProcAddress(hModule,
                             (LPCSTR)IMAGE_ORDINAL(*thunk));
            } else {
                // Import by name
                PIMAGE_IMPORT_BY_NAME nameEntry = (PIMAGE_IMPORT_BY_NAME)
                    PTR_OFFSET(dst, *thunk);
                *iatEntry = (UINT_PTR)funcs->GetProcAddress(hModule, nameEntry->Name);
            }
            thunk++;
            iatEntry++;
        }
        importDesc++;
    }
}

The IAT After Resolution

After ProcessImports completes, every slot in the Import Address Table contains a live function pointer. When the loaded DLL's code calls an imported function like VirtualProtect, it reads the pointer from the IAT and jumps to it. This is identical to how the Windows loader resolves imports — the key difference is that LibTCG does it entirely in usermode without any loader lock or LDR notification.

EntryPoint(&dll, base)

Computes the address of the DLL's entry point in the mapped image and returns it as a DLLMAIN_FUNC pointer ready to call:

C// EntryPoint calculates the entry point address
DLLMAIN_FUNC EntryPoint(DLLDATA * dll, void * base)
{
    // AddressOfEntryPoint is an RVA in the Optional Header
    // Add it to the base to get the absolute address
    return (DLLMAIN_FUNC)PTR_OFFSET(base,
                      dll->OptionalHeader->AddressOfEntryPoint);
}

The returned DLLMAIN_FUNC pointer is called directly with DLL_PROCESS_ATTACH. For a Cobalt Strike Beacon DLL, this is where execution begins after the reflective load completes.

4. PEB Walking — findModuleByHash()

Before the PE loading pipeline can run, the loader needs LoadLibraryA and GetProcAddress. But you cannot call GetProcAddress to find GetProcAddress — that is circular. The solution is to walk the Process Environment Block (PEB) to find loaded modules by hash, then walk their Export Address Table to find functions by hash.

Cchar * findModuleByHash(DWORD moduleHash)
{
    // 1. Read the PEB from the GS segment register (x64)
    //    On x64, gs:[0x60] points to the PEB
    PPEB pPeb = (PPEB)__readgsqword(0x60);

    // 2. Access the Ldr (PEB_LDR_DATA) which tracks all loaded modules
    //    InMemoryOrderModuleList is a doubly-linked list of
    //    LDR_DATA_TABLE_ENTRY structures
    PLIST_ENTRY head  = &pPeb->Ldr->InMemoryOrderModuleList;
    PLIST_ENTRY entry = head->Flink;

    // 3. Walk the linked list
    while (entry != head) {
        // CONTAINING_RECORD macro: given a list entry pointer,
        // recover the enclosing LDR_DATA_TABLE_ENTRY
        PLDR_DATA_TABLE_ENTRY mod = CONTAINING_RECORD(
            entry, LDR_DATA_TABLE_ENTRY, InMemoryOrderLinks);

        // 4. Hash the module's BaseDllName (Unicode, case-insensitive)
        //    using the ROR13 algorithm
        DWORD nameHash = ror13_unicode(mod->BaseDllName.Buffer,
                                       mod->BaseDllName.Length);

        // 5. Compare against the target hash
        if (nameHash == moduleHash)
            return mod->DllBase;  // Return the module's base address

        entry = entry->Flink;    // Advance to next module
    }
    return NULL;  // Module not found
}

PEB Structure Chain (x64)

TEB
gs:[0x60]
PEB
ProcessEnvironmentBlock
PEB_LDR_DATA
Ldr
InMemoryOrder
ModuleList
LIST_ENTRY head
LDR_DATA_TABLE_ENTRY
DllBase + BaseDllName

The Module List Walk

StepStructureAccess
1Thread Environment Block (TEB)__readgsqword(0x60) on x64 — GS segment register points to the TEB, offset 0x60 holds the PEB pointer
2Process Environment Block (PEB)pPeb->Ldr — pointer to PEB_LDR_DATA, the loader data structure
3PEB_LDR_DATALdr->InMemoryOrderModuleList — head of the doubly-linked list of loaded modules
4LDR_DATA_TABLE_ENTRYEach entry has DllBase (base address), BaseDllName (Unicode name), FullDllName, and other metadata

Why InMemoryOrderModuleList?

The PEB_LDR_DATA has three module lists: InLoadOrderModuleList, InMemoryOrderModuleList, and InInitializationOrderModuleList. LibTCG uses InMemoryOrderModuleList because it is the most commonly used in shellcode and is well-documented. The order typically starts with the executable itself, then ntdll.dll, then kernel32.dll. This predictable order means the loader will find kernel32 within the first few iterations.

5. EAT Walking — findFunctionByHash()

Once findModuleByHash() returns a module's base address, findFunctionByHash() searches that module's Export Address Table (EAT) for a function matching a given ROR13 hash:

Cvoid * findFunctionByHash(char * src, DWORD wantedFunction)
{
    // 1. Parse PE headers to find the Export Directory
    PIMAGE_DOS_HEADER dos = (PIMAGE_DOS_HEADER)src;
    PIMAGE_NT_HEADERS nt  = (PIMAGE_NT_HEADERS)
        PTR_OFFSET(src, dos->e_lfanew);

    PIMAGE_EXPORT_DIRECTORY exports = (PIMAGE_EXPORT_DIRECTORY)
        PTR_OFFSET(src,
            nt->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT]
            .VirtualAddress);

    // 2. Retrieve the three parallel arrays
    PDWORD names     = (PDWORD)PTR_OFFSET(src, exports->AddressOfNames);
    PDWORD functions = (PDWORD)PTR_OFFSET(src, exports->AddressOfFunctions);
    PWORD  ordinals  = (PWORD) PTR_OFFSET(src, exports->AddressOfNameOrdinals);

    // 3. Iterate through all named exports
    for (DWORD i = 0; i < exports->NumberOfNames; i++) {
        // Resolve the function name string
        char * name = (char *)PTR_OFFSET(src, names[i]);

        // Hash the name and compare
        if (ror13_ascii(name) == wantedFunction)
            return PTR_OFFSET(src, functions[ordinals[i]]);
    }
    return NULL;  // Function not found
}

The Three Parallel Arrays

The Export Directory uses three arrays that work together to map function names to their code addresses. Understanding how they link together is critical:

Export Address Table — Three Parallel Arrays

AddressOfNames

[0] RVA → "CreateFileW"
[1] RVA → "GetProcAddress"
[2] RVA → "LoadLibraryA"

AddressOfNameOrdinals

[0] ordinal: 42
[1] ordinal: 187
[2] ordinal: 215

AddressOfFunctions

[42] RVA → code
[187] RVA → code
[215] RVA → code

How the Lookup Works

  1. AddressOfNames[i] is an RVA pointing to a null-terminated ASCII string — the function's export name
  2. AddressOfNameOrdinals[i] is a WORD index that maps the name at position i to an entry in the functions array
  3. AddressOfFunctions[ordinals[i]] is an RVA pointing to the actual function code

The names array is sorted alphabetically (enabling binary search by the Windows loader), but the functions array is indexed by ordinal, not alphabetically. The ordinals array bridges the two. When LibTCG finds a matching hash at index i, it reads ordinals[i] and uses that as the index into the functions array.

Forwarded Exports

If an AddressOfFunctions entry points within the export directory's own memory range (between the directory's VirtualAddress and VirtualAddress + Size), it is a forwarded export — a string like "NTDLL.RtlAllocateHeap" instead of a code address. LibTCG's basic implementation does not handle forwarded exports. For Crystal-Loaders this is acceptable because the functions it resolves (LoadLibraryA, GetProcAddress) are not forwarded in kernel32.dll.

6. The ROR13 Hash Algorithm

LibTCG uses the ROR13 (Rotate Right by 13) hash algorithm to identify modules and functions without embedding their plaintext names in the shellcode. This is the same algorithm used in Metasploit's block_api stager and has been a shellcode convention for over a decade.

C// ROR13 hash for ASCII strings (function names)
DWORD ror13_ascii(const char * str)
{
    DWORD hash = 0;
    while (*str) {
        hash = (hash >> 13) | (hash << 19);  // Rotate right 13 bits
        hash += (DWORD)*str++;                // Add current character
    }
    return hash;
}

// ROR13 hash for Unicode strings (module names, case-insensitive)
DWORD ror13_unicode(const WCHAR * str, DWORD len)
{
    DWORD hash = 0;
    DWORD chars = len / sizeof(WCHAR);
    while (chars--) {
        WCHAR c = *str++;
        if (c >= L'A' && c <= L'Z')
            c += 0x20;  // Convert to lowercase for case-insensitive matching
        hash = (hash >> 13) | (hash << 19);
        hash += (DWORD)c;
    }
    return hash;
}

Why ROR13?

Pre-Computed Hashes in Crystal-Loaders

Crystal Palace's dfr (define function reference) directive computes ROR13 hashes at build time and embeds them as DWORD immediates in the PIC output. For example, in loader.c:

C — loader.c// The NTDLL hash used in Crystal-Loaders
#define NTDLL_HASH      0x3CFA685D

// At runtime, the loader calls:
char * ntdll = findModuleByHash(NTDLL_HASH);

// The dfr directive in the .spec file tells Crystal Palace
// to resolve function references the same way:
//   dfr kernel32.dll, LoadLibraryA
//   dfr kernel32.dll, GetProcAddress
// Crystal Palace pre-computes the hashes and generates
// the findModuleByHash + findFunctionByHash calls automatically

Hash Computation at Build Time vs Runtime

When you use dfr in a Crystal Palace spec file, the linker computes the ROR13 hash of both the module name and function name at compile time. These hashes are baked into the shellcode as DWORD constants. At runtime, the PIC walks the PEB and EAT, computing hashes of live module/function names and comparing them to the embedded constants. This eliminates any plaintext API strings from the final shellcode — only 4-byte hash values remain.

7. PICO Functions

Beyond traditional PE/DLL loading, LibTCG also supports PICOs (Position Independent Code Objects). PICOs are a Crystal Kit concept for modular evasion components that differ from standard PIC in one critical way: they keep code and data in separate allocations.

PICO API Overview

FunctionPurpose
PicoEntryPoint()Returns the address of the PICO's entry function — the PICOMAIN_FUNC that receives a single char * argument
PicoCodeSize()Returns the size of the PICO's code section (executable, read-only at rest)
PicoDataSize()Returns the size of the PICO's data section (read-write, non-executable)
PicoLoad()Loads a PICO into two separate memory regions: one for code, one for data

PICO vs PIC: The Split-Memory Model

Standard position-independent code (PIC) places code and data in a single contiguous allocation. This is simple but creates a security problem: the allocation must be both writable (for data) and executable (for code), resulting in RWX memory — a strong detection signal.

PICOs solve this by splitting into two allocations:

This split avoids RWX and looks far more legitimate to EDR memory scanners. The PICO's code references data via relative offsets that PicoLoad() patches at load time, similar to how relocations work for DLLs.

Crystal Kit uses PICOs for modular evasion primitives — sleep masks, call stack spoofers, and other components that can be composed and swapped independently. LibTCG provides the loading machinery; the PICO format and its tooling come from the broader Crystal Kit ecosystem.

Module 3 Quiz: LibTCG

Q1: What structure does ParseDLL populate with parsed PE header information?

ParseDLL takes a pointer to raw PE bytes and populates a DLLDATA structure containing pointers to the DOS header, NT headers, and Optional Header. This struct is then passed to every subsequent function in the LibTCG pipeline (SizeOfDLL, LoadDLL, ProcessRelocations, ProcessImports, EntryPoint).

Q2: How does findModuleByHash() locate ntdll.dll in a running process?

findModuleByHash() reads the PEB via the GS segment register (gs:[0x60] on x64), then walks the PEB_LDR_DATA's InMemoryOrderModuleList. For each loaded module, it computes the ROR13 hash of the BaseDllName (Unicode, case-insensitive) and compares it against the target hash. When a match is found, it returns the module's DllBase address.

Q3: What are the three parallel arrays in the Export Address Table?

The Export Directory uses three parallel arrays: AddressOfNames (RVAs to null-terminated function name strings), AddressOfFunctions (RVAs to the actual function code), and AddressOfNameOrdinals (WORD indices that map each name to its corresponding entry in the functions array). To look up a function by name, you find the matching name in AddressOfNames at index i, read the ordinal from AddressOfNameOrdinals[i], and use that ordinal as the index into AddressOfFunctions.