Difficulty: Intermediate

Module 5: The Dual-Handler Architecture

Two VEH handlers orchestrate a four-phase syscall redirect — one sets the trap, the other springs it.

The Big Picture

LayeredSyscall does not use a single exception handler. Instead, it registers two Vectored Exception Handlers that fire on completely different exception types and manage distinct responsibilities. Handler 1 (AddHwBp) reacts to ACCESS_VIOLATION and installs hardware breakpoints. Handler 2 (HandlerHwBp) reacts to SINGLE_STEP and orchestrates the entire multi-phase execution flow. This separation of concerns is what makes the technique both elegant and robust.

Architecture Overview

The two handlers cooperate across four distinct phases of execution. Each phase is triggered by a different CPU exception, creating a chain of events that transforms a simple function call into a fully stack-spoofed syscall.

Handler Roles

HandlerException TypePurpose
AddHwBp (Handler 1)EXCEPTION_ACCESS_VIOLATIONInstalls hardware breakpoints on the syscall and ret instructions inside the target Nt* function
HandlerHwBp (Handler 2)EXCEPTION_SINGLE_STEPManages the three-phase execution flow: context save, trap-flag tracing, context swap, and clean return

Execution Flow Diagram

End-to-End Handler Flow

Wrapper
wrpNtXxx()
NULL deref
ACCESS_VIOLATION
AddHwBp
Install Dr0/Dr1
Resume
Call Nt*
Syscall BP
SINGLE_STEP
HandlerHwBp
Save & Redirect
Trace
Trap Flag
Execute
Real Syscall

Each colored box represents a phase transition. The red boxes are CPU exceptions that transfer control to a VEH handler. The green box is the final syscall with a genuine call stack.

The Wrapper Function Pattern

Every syscall that LayeredSyscall wraps follows the exact same four-step pattern. The wrapper is the entry point into the entire system. Here is a representative example for NtAllocateVirtualMemory:

C++NTSTATUS wrpNtAllocateVirtualMemory(
    HANDLE   ProcessHandle,
    PVOID*   BaseAddress,
    ULONG_PTR ZeroBits,
    PSIZE_T  RegionSize,
    ULONG    AllocationType,
    ULONG    Protect)
{
    // Step 1: Resolve the real ntdll function address
    auto addr = GetProcAddress(
        GetModuleHandleA("ntdll.dll"),
        "NtAllocateVirtualMemory"
    );

    // Step 2: Resolve the System Service Number
    int ssn = GetSsnByName("NtAllocateVirtualMemory");

    // Step 3: Install hardware breakpoints via VEH
    // TRUE = ExtendedArgs (function has >4 arguments)
    SetHwBp((ULONG_PTR)addr, TRUE, ssn);

    // Step 4: Call the REAL function (hits the breakpoint)
    return ((NtAllocateVirtualMemory_t)addr)(
        ProcessHandle, BaseAddress, ZeroBits,
        RegionSize, AllocationType, Protect
    );
}

The Four Steps Explained

StepActionWhy
1. Resolve AddressGetProcAddress for the Nt* functionWe need the real address in ntdll to set breakpoints on the correct instructions
2. Get SSNGetSsnByName via Exception DirectoryWe must know the System Service Number to place it in RAX before the syscall instruction
3. Install BPsSetHwBp triggers the VEH chainHardware breakpoints are set on the syscall and ret instructions inside the Nt* stub
4. Call Real FunctionCall through the resolved pointerThe real call enters ntdll (EDR hook may inspect it), but execution is intercepted at the syscall instruction

Why Call the Real Function?

The wrapper calls the actual Nt* function in ntdll — not a copy, not a trampoline. If an EDR has hooked that function, the hook executes. However, the hardware breakpoint fires at the syscall instruction past the hook. The EDR sees a legitimate function call; LayeredSyscall intercepts it at the last possible moment before the kernel transition.

SetHwBp and the ACCESS_VIOLATION Trigger

The SetHwBp function stores global state and then deliberately crashes the program with a null pointer dereference. This crash is the entry point into the VEH handler chain.

C++// Store global state for the VEH handlers
void SetHwBp(ULONG_PTR addr, BOOL extArgs, int ssn) {
    SyscallEntryAddr = addr;    // Target Nt* function address
    ExtendedArgs     = extArgs; // Does it have >4 arguments?
    SyscallNo        = ssn;     // System Service Number

    _SetHwBp(addr);  // Passes addr as RCX (first argument)
}

// This function deliberately crashes
void _SetHwBp(ULONG_PTR addr) {
    // TRIGGER_ACCESS_VIOLATION_EXCEPTION macro:
    int *a = 0;
    int b = *a;   // Null pointer dereference!
}

Why a Separate _SetHwBp?

The split is deliberate. The _SetHwBp function receives addr as its first parameter, which on x64 means it is stored in the RCX register. When the ACCESS_VIOLATION fires and the VEH handler receives the CONTEXT structure, it can read RCX to recover the target address. This is a clever technique to pass data from normal code into the exception handler context without using additional global variables.

Data Flow Through the Exception

SetHwBp()
stores globals
_SetHwBp(addr)
RCX = addr
*NULL = crash
ACCESS_VIOLATION
AddHwBp
reads RCX from CONTEXT

Handler 1: AddHwBp — Detailed Walkthrough

This is the first VEH handler. It fires on the deliberate null dereference, scans the target function to locate the syscall and ret instructions, and installs hardware breakpoints on both.

C++LONG WINAPI AddHwBp(PEXCEPTION_POINTERS ExceptionInfo) {
    // Only handle ACCESS_VIOLATION exceptions
    if (ExceptionInfo->ExceptionRecord->ExceptionCode
            == EXCEPTION_ACCESS_VIOLATION)
    {
        // Read target address from RCX register
        ULONG_PTR funcAddr = ExceptionInfo->ContextRecord->Rcx;

        // Scan forward up to 25 bytes for the syscall opcode
        for (int i = 0; i < 25; i++) {
            // 0x050F = syscall in little-endian byte order
            if (*(USHORT*)((ULONG_PTR)funcAddr + i) == 0x050F) {
                OPCODE_SYSCALL_OFF     = i;
                OPCODE_SYSCALL_RET_OFF = i + 2; // ret is 2 bytes after
                break;
            }
        }

        // Dr0 = breakpoint on syscall instruction
        ExceptionInfo->ContextRecord->Dr0 =
            funcAddr + OPCODE_SYSCALL_OFF;

        // Dr1 = breakpoint on ret instruction (after syscall)
        ExceptionInfo->ContextRecord->Dr1 =
            funcAddr + OPCODE_SYSCALL_RET_OFF;

        // Enable both breakpoints in Dr7
        ExceptionInfo->ContextRecord->Dr7 |= (1 << 0) | (1 << 2);

        // Skip past the null dereference (2-byte instruction)
        ExceptionInfo->ContextRecord->Rip += OPCODE_SZ_ACC_VIO;

        return EXCEPTION_CONTINUE_EXECUTION;
    }

    return EXCEPTION_CONTINUE_SEARCH;
}

Step-by-Step Breakdown

1. Exception Filter

The handler checks that ExceptionCode == EXCEPTION_ACCESS_VIOLATION. Other exception types are passed to the next handler via EXCEPTION_CONTINUE_SEARCH.

2. Recover the Target Address

The Nt* function address was passed as RCX to _SetHwBp. The CONTEXT structure captures all register values at the point of the exception, so ContextRecord->Rcx holds the target address.

3. Scan for the Syscall Opcode

The handler scans forward from the function address looking for 0x050F — the syscall instruction encoded in little-endian. In a standard ntdll stub, this is typically at offset 0x12 (18 bytes), but the scan allows for variation due to EDR hooks or OS version differences. The scan window is 25 bytes.

4. Record Offsets

OPCODE_SYSCALL_OFF stores the offset to the syscall instruction. OPCODE_SYSCALL_RET_OFF is always syscall_off + 2 because the syscall instruction is 2 bytes (0F 05) and the ret instruction immediately follows.

5. Install Hardware Breakpoints

Debug registers Dr0 and Dr1 are set to the absolute addresses of the syscall and ret instructions. Dr7 is the control register — bit 0 enables Dr0, bit 2 enables Dr1. The expression (1 << 0) | (1 << 2) enables both breakpoints.

6. Advance RIP Past the Crash

The null dereference instruction (mov eax, [0]) is 2 bytes. By adding OPCODE_SZ_ACC_VIO (2) to RIP, execution resumes after the crash as if it never happened. The wrapper function then proceeds to call the real Nt* function.

Global State Variables

LayeredSyscall uses several global variables to pass state between the wrapper functions and the two VEH handlers. These are set by SetHwBp and consumed by HandlerHwBp:

VariableTypeSet ByUsed ByPurpose
SyscallEntryAddrULONG_PTRSetHwBpHandlerHwBpAddress of the target Nt* function in ntdll
ExtendedArgsBOOLSetHwBpHandlerHwBpTRUE if function has >4 arguments (need stack argument copying)
SyscallNointSetHwBpHandlerHwBpSystem Service Number to place in RAX
IsSubRspintHandlerHwBpHandlerHwBpState machine: 0 = searching, 1 = found sub rsp, 2 = found call
SavedContextCONTEXTHandlerHwBpHandlerHwBpComplete CPU state saved at the syscall breakpoint (preserves all arguments)
NtdllInfostructInitHandlerHwBpContains DllBaseAddress and DllEndAddress of ntdll.dll for range checks
OPCODE_SYSCALL_OFFintAddHwBpHandlerHwBpByte offset from function start to the syscall instruction
OPCODE_SYSCALL_RET_OFFintAddHwBpHandlerHwBpByte offset from function start to the ret instruction

Thread Safety Concern

These globals make the current implementation single-threaded. If two threads call wrapped syscalls simultaneously, the globals would be overwritten. A production implementation would need thread-local storage or a mutex around the entire wrapper-to-return sequence.

Why This Design?

The dual-handler architecture is not arbitrary. Every design decision serves a specific evasion or engineering purpose.

Deliberate ACCESS_VIOLATION

The null dereference is the entry point into the entire VEH-based system. It triggers the exception dispatcher, which calls AddHwBp, which installs hardware breakpoints. Without this deliberate crash, there would be no way to set up the debug registers before the Nt* function executes.

Hardware Breakpoints Are Invisible

Unlike software breakpoints (which patch memory with 0xCC / INT 3), hardware breakpoints use CPU debug registers. They do not modify any memory, so integrity-checking scanners that compare ntdll in memory against ntdll on disk will find no discrepancies. The breakpoints exist only in the CPU state of the current thread.

Separation of Concerns

Handler 1 (AddHwBp) only cares about setup: finding the syscall opcode and installing breakpoints. Handler 2 (HandlerHwBp) only cares about execution management: saving context, redirecting execution, tracing, swapping, and cleaning up. This makes the code modular and each handler easy to reason about independently.

The EDR Hook Executes Normally

Because the wrapper calls the real Nt* function, any EDR inline hook at the function entry point runs normally. The breakpoint fires at the syscall instruction, which is past the hook. From the EDR's perspective, the function was called legitimately. The interception happens at the very last moment before the kernel transition.

Where Interception Happens vs. Where EDR Hooks

EDR JMP Hook
Function entry
mov r10, rcx
Stub setup
mov eax, SSN
Load SSN
syscall [Dr0]
BP fires here
ret [Dr1]
Clean return BP

Module 5 Quiz: Dual-Handler Architecture

Q1: In the wrapper function pattern, why does the wrapper call the real Nt* function instead of directly issuing a syscall?

Correct! The real Nt* function is called so the EDR inline hook runs normally. The hardware breakpoint intercepts execution at the syscall instruction, which is past the hook. The EDR sees a legitimate call; LayeredSyscall hijacks it at the kernel boundary.

Q2: Why does the code use a null pointer dereference to enter the VEH system?

The ACCESS_VIOLATION is the entry point into the VEH chain. When the handler receives the CONTEXT structure, it can modify debug registers (Dr0, Dr1, Dr7) and advance RIP past the crash. This is the only way to atomically set hardware breakpoints and resume execution in the same thread context.