Difficulty: Intermediate

Module 4: Hardware Breakpoints & Debug Registers

Using the CPU's own debug facilities to intercept execution — invisibly.

Module Objective

Hardware breakpoints are the mechanism LayeredSyscall uses to intercept execution at the syscall instruction inside ntdll. This module covers the x64 debug register architecture, how breakpoints are set from VEH handlers, the different breakpoint types, and why hardware breakpoints are far stealthier than software alternatives.

1. Software vs Hardware Breakpoints

There are two fundamentally different ways to set breakpoints on x64. Understanding the distinction is critical because one modifies memory (detectable) and one does not (stealthy).

Software Breakpoints (INT3)
  • Replaces the first byte of the target instruction with 0xCC (INT3)
  • Modifies code in memory — changes the actual bytes of ntdll
  • Detectable by comparing in-memory bytes against on-disk image
  • Detectable by code integrity checks (checksum of .text section)
  • Unlimited count — can set as many as needed
  • Triggers EXCEPTION_BREAKPOINT (0x80000003)
Hardware Breakpoints (Debug Registers)
  • Uses CPU debug registers (Dr0–Dr3) to monitor addresses
  • No memory modification — ntdll bytes remain clean
  • Invisible to memory scanners and integrity checks
  • Limited to 4 per thread (Dr0, Dr1, Dr2, Dr3)
  • Per-thread: only affects the thread that set them
  • Triggers EXCEPTION_SINGLE_STEP (0x80000004)
ComparisonSoftware Breakpoint (INT3):
  BEFORE:  4C 8B D1 B8 18 00 00 00 0F 05 C3    ; clean stub
  AFTER:   CC 8B D1 B8 18 00 00 00 0F 05 C3    ; 0xCC replaces first byte!
           ^^ DETECTABLE - byte changed in memory

Hardware Breakpoint (Dr0):
  MEMORY:  4C 8B D1 B8 18 00 00 00 0F 05 C3    ; unchanged! clean!
  Dr0:     points to address of 0F 05 (syscall)
  Dr7:     bit 0 enabled
  Result:  exception fires when RIP reaches Dr0's address, but ntdll bytes are untouched

2. x64 Debug Registers

The x64 architecture provides 8 debug registers (Dr0–Dr7). LayeredSyscall uses Dr0, Dr1, and Dr7:

RegisterPurposeLayeredSyscall Usage
Dr0Breakpoint address 0Address of the syscall instruction (0F 05) in the target Nt* stub
Dr1Breakpoint address 1Address of the ret instruction (C3) after syscall
Dr2Breakpoint address 2Not used (available for other purposes)
Dr3Breakpoint address 3Not used
Dr4–Dr5Reserved (aliases for Dr6/Dr7 when CR4.DE=0)Not used
Dr6Debug status registerRead by CPU to indicate which breakpoint fired (bits 0–3)
Dr7Debug control registerEnable/disable breakpoints, set conditions and length

Dr7: Debug Control Register Layout

Dr7 is the most complex debug register. It controls which breakpoints are active and what conditions trigger them:

Dr7 Bit Layout (Low 16 Bits)

BitsFieldDescription
0L0Local enable for Dr0 — set to 1 to activate Dr0 breakpoint
1G0Global enable for Dr0 (use L0 instead in user mode)
2L1Local enable for Dr1 — set to 1 to activate Dr1 breakpoint
3G1Global enable for Dr1
4L2Local enable for Dr2
5G2Global enable for Dr2
6L3Local enable for Dr3
7G3Global enable for Dr3
16–17R/W0Condition for Dr0 (00=exec, 01=write, 10=I/O, 11=read/write)
18–19LEN0Length for Dr0 (00=1 byte for execution)
20–21R/W1Condition for Dr1
22–23LEN1Length for Dr1
24–25R/W2Condition for Dr2
26–27LEN2Length for Dr2
28–29R/W3Condition for Dr3
30–31LEN3Length for Dr3

For LayeredSyscall, the relevant operations on Dr7 are:

C++// Enable Dr0 (local enable, bit 0)
Dr7 |= (1 << 0);    // Sets bit 0 = L0 = enable Dr0

// Enable Dr1 (local enable, bit 2)
Dr7 |= (1 << 2);    // Sets bit 2 = L1 = enable Dr1

// Condition bits for Dr0 at bits 16-17: 00 = execution breakpoint
// Condition bits for Dr1 at bits 20-21: 00 = execution breakpoint
// (00 is the default, so no explicit set needed for execution type)

// Disable Dr0
Dr7 &= ~(1 << 0);   // Clears bit 0 = L0 = disable Dr0

3. Setting Hardware Breakpoints via VEH

Normally, modifying debug registers requires the SetThreadContext API, which EDRs monitor. But when you're inside a VEH handler, you have direct access to the CONTEXT structure — including the debug registers. This is how LayeredSyscall sets breakpoints without calling any suspicious APIs.

C++// From LayeredSyscall's AddHwBp handler
LONG WINAPI AddHwBp(PEXCEPTION_POINTERS ExceptionInfo) {
    if (ExceptionInfo->ExceptionRecord->ExceptionCode != EXCEPTION_ACCESS_VIOLATION)
        return EXCEPTION_CONTINUE_SEARCH;

    PCONTEXT ctx = ExceptionInfo->ContextRecord;

    // Set Dr0 = address of 'syscall' (0F 05) in the target Nt* function
    ctx->Dr0 = (DWORD64)pSyscallAddress;

    // Set Dr1 = address of 'ret' (C3) after the syscall instruction
    ctx->Dr1 = (DWORD64)pRetAddress;

    // Enable both breakpoints in Dr7
    ctx->Dr7 |= (1 << 0);  // Enable Dr0 (L0)
    ctx->Dr7 |= (1 << 2);  // Enable Dr1 (L1)

    // Advance RIP past the faulting instruction (null deref)
    ctx->Rip += INSTRUCTION_SIZE;

    // Set up the call to the legitimate API...
    return EXCEPTION_CONTINUE_EXECUTION;
}

No API Calls Needed

Traditional methods of setting hardware breakpoints require:

C++ (Traditional - Detectable)// This is what EDRs watch for!
CONTEXT ctx;
ctx.ContextFlags = CONTEXT_DEBUG_REGISTERS;
GetThreadContext(hThread, &ctx);
ctx.Dr0 = targetAddr;
ctx.Dr7 |= 1;
SetThreadContext(hThread, &ctx);  // EDR hooks this!

LayeredSyscall avoids this entirely. By modifying Dr0–Dr7 within a VEH handler, it uses the exception dispatch mechanism itself as the context-modification channel. No calls to GetThreadContext or SetThreadContext are needed.

4. Breakpoint Types

The R/W (condition) bits in Dr7 determine what triggers the breakpoint. There are four types:

R/W BitsTypeTrigger ConditionLayeredSyscall Use
00ExecutionCPU attempts to execute instruction at the addressUsed for both Dr0 and Dr1
01Data WriteCPU writes data to the addressNot used
10I/O AccessCPU executes I/O instruction for the port (requires CR4.DE=1)Not used
11Data Read/WriteCPU reads or writes data at the addressNot used

LayeredSyscall exclusively uses execution breakpoints (R/W = 00). This is the default value, so the condition bits don't need explicit configuration beyond enabling the local enable flags.

5. How LayeredSyscall Uses Hardware Breakpoints

LayeredSyscall sets two hardware breakpoints per syscall invocation, each serving a distinct purpose:

Dr0: Syscall Interception Point

Target: Address of the syscall instruction (0F 05) within the Nt* function stub.

Purpose: When a legitimate API (e.g., WriteFile) calls its corresponding Nt* function and execution reaches the syscall instruction, Dr0 fires. The VEH handler then swaps EAX (SSN) to the desired function's SSN and swaps the arguments.

Dr1: Return Interception Point

Target: Address of the ret instruction (C3) immediately after the syscall.

Purpose: After the hijacked syscall returns from the kernel, Dr1 fires. The handler uses this opportunity to clean up: restore the original return value, fix the stack, clear the breakpoints, and return control to the caller.

Finding the Syscall and Ret Addresses

The code scans forward from the beginning of the Nt* function, looking for the 0F 05 (syscall) byte sequence within the first 25 bytes:

C++// From LayeredSyscall - finding syscall instruction offset
BOOL GetSyscallAddresses(PVOID funcBase, PVOID* pSyscall, PVOID* pRet) {
    BYTE* p = (BYTE*)funcBase;

    for (DWORD i = 0; i < 25; i++) {
        // Look for syscall opcode: 0F 05
        if (p[i] == 0x0F && p[i + 1] == 0x05) {
            *pSyscall = (PVOID)&p[i];       // Address of 'syscall'
            *pRet     = (PVOID)&p[i + 2];   // Address of 'ret' (C3)
            return TRUE;
        }
    }
    return FALSE;  // syscall not found within range
}

// These addresses become:
// OPCODE_SYSCALL_OFF   = offset of 0F 05 from function base (typically +8)
// OPCODE_SYSCALL_RET_OFF = offset of C3 from function base (typically +10)

Breakpoint Placement in the Syscall Stub

Memory LayoutNtAllocateVirtualMemory:
  +0:  4C 8B D1          mov r10, rcx
  +3:  B8 18 00 00 00    mov eax, 0x18
  +8:  0F 05             syscall          ← Dr0 breakpoint HERE
  +10: C3                ret              ← Dr1 breakpoint HERE

6. The EXCEPTION_SINGLE_STEP Exception

When execution hits an address stored in Dr0–Dr3 (with the corresponding enable bit set in Dr7), the CPU raises an exception with code 0x80000004 (EXCEPTION_SINGLE_STEP). This is the same exception code produced by:

SourceException CodeHow to Differentiate
Hardware breakpoint (Dr0–Dr3)0x80000004Check if RIP matches a Dr0–Dr3 address
Trap Flag (TF) single-step0x80000004RIP does NOT match any Dr address; TF was set
Branch trace (BTF)0x80000004Rarely used in user mode

LayeredSyscall's HandlerHwBp differentiates by checking whether the exception address matches Dr0 or Dr1:

C++LONG WINAPI HandlerHwBp(PEXCEPTION_POINTERS ExceptionInfo) {
    if (ExceptionInfo->ExceptionRecord->ExceptionCode != EXCEPTION_SINGLE_STEP)
        return EXCEPTION_CONTINUE_SEARCH;

    PCONTEXT ctx = ExceptionInfo->ContextRecord;
    DWORD64 rip = ctx->Rip;

    if (rip == ctx->Dr0) {
        // Hit the syscall instruction breakpoint
        // Swap SSN and arguments...
    }
    else if (rip == ctx->Dr1) {
        // Hit the ret instruction breakpoint
        // Clean up and restore state...
    }
    else {
        // This is a trap flag single-step (used for call stack building)
        // Handle single-step logic...
    }

    return EXCEPTION_CONTINUE_EXECUTION;
}

Dr6: Which Breakpoint Fired?

The CPU sets bits in Dr6 to indicate which breakpoint triggered:

Dr6 BitMeaning
Bit 0Dr0 breakpoint was hit
Bit 1Dr1 breakpoint was hit
Bit 2Dr2 breakpoint was hit
Bit 3Dr3 breakpoint was hit
Bit 14Single-step (trap flag or BTF)

LayeredSyscall primarily checks RIP against Dr0/Dr1 addresses rather than reading Dr6 bits, which is a simpler and equally reliable approach.

7. Advantages for Evasion

Why Hardware Breakpoints Are Ideal for Syscall Interception

PropertyBenefit
No memory modificationntdll bytes remain pristine. Memory scanners (PE-sieve, Moneta) see a clean DLL. No 0xCC patches, no JMP hooks.
Per-thread scopeDebug registers are part of the thread context. Setting Dr0 on thread A does not affect thread B. EDR monitoring of other threads sees nothing.
Dynamic set/clearBreakpoints can be installed just before a syscall and cleared immediately after. They exist for microseconds, minimizing the detection window.
No API callsSet from within a VEH handler by modifying the CONTEXT structure. No calls to SetThreadContext or NtSetContextThread for the EDR to intercept.
Executes from ntdllThe syscall instruction still executes from its original address in ntdll. InstrumentationCallback sees a legitimate ntdll return address.

Limitations to Be Aware Of

Summary: The Complete Interception Chain

Combining what we learned in Modules 2–4, here is how the pieces fit together:

Hardware Breakpoint Interception Flow

Resolve SSN
(Exception Directory)
Find syscall addr
(scan for 0F 05)
Trigger VEH entry
(null deref)
Set Dr0/Dr1
(in VEH handler)
Call legitimate API
(e.g., WriteFile)
Dr0 fires at syscall
SINGLE_STEP
Swap SSN + args
(in VEH handler)
Kernel executes
desired function
Kernel returns
(NTSTATUS)
Dr1 fires at ret
SINGLE_STEP
Clean up & restore
(clear Dr0/Dr1)
Return to caller
with real result

Module 4 Quiz: Hardware Breakpoints

Q1: Which Dr7 bit must be set to enable a hardware breakpoint on Dr0?

Bit 0 of Dr7 is L0, the local enable flag for Dr0. Setting Dr7 |= (1 << 0) activates the breakpoint configured in Dr0. The "local" designation means it is automatically cleared on task switches (though in practice, Windows preserves debug registers across user-mode context switches within the same thread).

Q2: Why are hardware breakpoints stealthier than software breakpoints (INT3) for syscall interception?

Software breakpoints replace instruction bytes with 0xCC (INT3), which memory scanners can detect by comparing in-memory ntdll against its on-disk image. Hardware breakpoints use CPU debug registers — no bytes in memory are modified, so scanners like PE-sieve and Moneta see a completely clean ntdll. Note that EDRs can read debug registers via GetThreadContext, but this is uncommon and LayeredSyscall clears breakpoints immediately after use.