Difficulty: Intermediate

Module 4: Hardware Breakpoints & Debug Registers

Using the CPU's own debug facilities to intercept execution — invisibly.

Module Objective

Hardware breakpoints are the mechanism LayeredSyscall uses to intercept execution at the syscall instruction inside ntdll. This module covers the x64 debug register architecture, how breakpoints are set from VEH handlers, the different breakpoint types, and why hardware breakpoints are far stealthier than software alternatives.

1. Software vs Hardware Breakpoints

There are two fundamentally different ways to set breakpoints on x64. Understanding the distinction is critical because one modifies memory (detectable) and one does not (stealthy).

Software Breakpoints (INT3)

Replaces the first byte of the target instruction with 0xCC (INT3)
Modifies code in memory — changes the actual bytes of ntdll
Detectable by comparing in-memory bytes against on-disk image
Detectable by code integrity checks (checksum of .text section)
Unlimited count — can set as many as needed
Triggers EXCEPTION_BREAKPOINT (0x80000003)

Hardware Breakpoints (Debug Registers)

Uses CPU debug registers (Dr0–Dr3) to monitor addresses
No memory modification — ntdll bytes remain clean
Invisible to memory scanners and integrity checks
Limited to 4 per thread (Dr0, Dr1, Dr2, Dr3)
Per-thread: only affects the thread that set them
Triggers EXCEPTION_SINGLE_STEP (0x80000004)

ComparisonSoftware Breakpoint (INT3):
  BEFORE:  4C 8B D1 B8 18 00 00 00 0F 05 C3    ; clean stub
  AFTER:   CC 8B D1 B8 18 00 00 00 0F 05 C3    ; 0xCC replaces first byte!
           ^^ DETECTABLE - byte changed in memory

Hardware Breakpoint (Dr0):
  MEMORY:  4C 8B D1 B8 18 00 00 00 0F 05 C3    ; unchanged! clean!
  Dr0:     points to address of 0F 05 (syscall)
  Dr7:     bit 0 enabled
  Result:  exception fires when RIP reaches Dr0's address, but ntdll bytes are untouched

2. x64 Debug Registers

The x64 architecture provides 8 debug registers (Dr0–Dr7). LayeredSyscall uses Dr0, Dr1, and Dr7:

Register	Purpose	LayeredSyscall Usage
Dr0	Breakpoint address 0	Address of the `syscall` instruction (0F 05) in the target Nt* stub
Dr1	Breakpoint address 1	Address of the `ret` instruction (C3) after syscall
Dr2	Breakpoint address 2	Not used (available for other purposes)
Dr3	Breakpoint address 3	Not used
Dr4–Dr5	Reserved (aliases for Dr6/Dr7 when CR4.DE=0)	Not used
Dr6	Debug status register	Read by CPU to indicate which breakpoint fired (bits 0–3)
Dr7	Debug control register	Enable/disable breakpoints, set conditions and length

Dr7: Debug Control Register Layout

Dr7 is the most complex debug register. It controls which breakpoints are active and what conditions trigger them:

Dr7 Bit Layout (Low 16 Bits)

Bits	Field	Description
0	L0	Local enable for Dr0 — set to 1 to activate Dr0 breakpoint
1	G0	Global enable for Dr0 (use L0 instead in user mode)
2	L1	Local enable for Dr1 — set to 1 to activate Dr1 breakpoint
3	G1	Global enable for Dr1
4	L2	Local enable for Dr2
5	G2	Global enable for Dr2
6	L3	Local enable for Dr3
7	G3	Global enable for Dr3
16–17	R/W0	Condition for Dr0 (00=exec, 01=write, 10=I/O, 11=read/write)
18–19	LEN0	Length for Dr0 (00=1 byte for execution)
20–21	R/W1	Condition for Dr1
22–23	LEN1	Length for Dr1
24–25	R/W2	Condition for Dr2
26–27	LEN2	Length for Dr2
28–29	R/W3	Condition for Dr3
30–31	LEN3	Length for Dr3

For LayeredSyscall, the relevant operations on Dr7 are:

C++// Enable Dr0 (local enable, bit 0)
Dr7 |= (1 << 0);    // Sets bit 0 = L0 = enable Dr0

// Enable Dr1 (local enable, bit 2)
Dr7 |= (1 << 2);    // Sets bit 2 = L1 = enable Dr1

// Condition bits for Dr0 at bits 16-17: 00 = execution breakpoint
// Condition bits for Dr1 at bits 20-21: 00 = execution breakpoint
// (00 is the default, so no explicit set needed for execution type)

// Disable Dr0
Dr7 &= ~(1 << 0);   // Clears bit 0 = L0 = disable Dr0

3. Setting Hardware Breakpoints via VEH

Normally, modifying debug registers requires the SetThreadContext API, which EDRs monitor. But when you're inside a VEH handler, you have direct access to the CONTEXT structure — including the debug registers. This is how LayeredSyscall sets breakpoints without calling any suspicious APIs.

C++// From LayeredSyscall's AddHwBp handler
LONG WINAPI AddHwBp(PEXCEPTION_POINTERS ExceptionInfo) {
    if (ExceptionInfo->ExceptionRecord->ExceptionCode != EXCEPTION_ACCESS_VIOLATION)
        return EXCEPTION_CONTINUE_SEARCH;

    PCONTEXT ctx = ExceptionInfo->ContextRecord;

    // Set Dr0 = address of 'syscall' (0F 05) in the target Nt* function
    ctx->Dr0 = (DWORD64)pSyscallAddress;

    // Set Dr1 = address of 'ret' (C3) after the syscall instruction
    ctx->Dr1 = (DWORD64)pRetAddress;

    // Enable both breakpoints in Dr7
    ctx->Dr7 |= (1 << 0);  // Enable Dr0 (L0)
    ctx->Dr7 |= (1 << 2);  // Enable Dr1 (L1)

    // Advance RIP past the faulting instruction (null deref)
    ctx->Rip += INSTRUCTION_SIZE;

    // Set up the call to the legitimate API...
    return EXCEPTION_CONTINUE_EXECUTION;
}

No API Calls Needed

Traditional methods of setting hardware breakpoints require:

C++ (Traditional - Detectable)// This is what EDRs watch for!
CONTEXT ctx;
ctx.ContextFlags = CONTEXT_DEBUG_REGISTERS;
GetThreadContext(hThread, &ctx);
ctx.Dr0 = targetAddr;
ctx.Dr7 |= 1;
SetThreadContext(hThread, &ctx);  // EDR hooks this!

LayeredSyscall avoids this entirely. By modifying Dr0–Dr7 within a VEH handler, it uses the exception dispatch mechanism itself as the context-modification channel. No calls to GetThreadContext or SetThreadContext are needed.

4. Breakpoint Types

The R/W (condition) bits in Dr7 determine what triggers the breakpoint. There are four types:

R/W Bits	Type	Trigger Condition	LayeredSyscall Use
`00`	Execution	CPU attempts to execute instruction at the address	Used for both Dr0 and Dr1
`01`	Data Write	CPU writes data to the address	Not used
`10`	I/O Access	CPU executes I/O instruction for the port (requires CR4.DE=1)	Not used
`11`	Data Read/Write	CPU reads or writes data at the address	Not used

LayeredSyscall exclusively uses execution breakpoints (R/W = 00). This is the default value, so the condition bits don't need explicit configuration beyond enabling the local enable flags.

5. How LayeredSyscall Uses Hardware Breakpoints

LayeredSyscall sets two hardware breakpoints per syscall invocation, each serving a distinct purpose:

Dr0: Syscall Interception Point

Target: Address of the syscall instruction (0F 05) within the Nt* function stub.

Purpose: When a legitimate API (e.g., WriteFile) calls its corresponding Nt* function and execution reaches the syscall instruction, Dr0 fires. The VEH handler then swaps EAX (SSN) to the desired function's SSN and swaps the arguments.

Dr1: Return Interception Point

Target: Address of the ret instruction (C3) immediately after the syscall.

Purpose: After the hijacked syscall returns from the kernel, Dr1 fires. The handler uses this opportunity to clean up: restore the original return value, fix the stack, clear the breakpoints, and return control to the caller.

Finding the Syscall and Ret Addresses

The code scans forward from the beginning of the Nt* function, looking for the 0F 05 (syscall) byte sequence within the first 25 bytes:

C++// From LayeredSyscall - finding syscall instruction offset
BOOL GetSyscallAddresses(PVOID funcBase, PVOID* pSyscall, PVOID* pRet) {
    BYTE* p = (BYTE*)funcBase;

    for (DWORD i = 0; i < 25; i++) {
        // Look for syscall opcode: 0F 05
        if (p[i] == 0x0F && p[i + 1] == 0x05) {
            *pSyscall = (PVOID)&p[i];       // Address of 'syscall'
            *pRet     = (PVOID)&p[i + 2];   // Address of 'ret' (C3)
            return TRUE;
        }
    }
    return FALSE;  // syscall not found within range
}

// These addresses become:
// OPCODE_SYSCALL_OFF   = offset of 0F 05 from function base (typically +8)
// OPCODE_SYSCALL_RET_OFF = offset of C3 from function base (typically +10)

Breakpoint Placement in the Syscall Stub

Memory LayoutNtAllocateVirtualMemory:
  +0:  4C 8B D1          mov r10, rcx
  +3:  B8 18 00 00 00    mov eax, 0x18
  +8:  0F 05             syscall          ← Dr0 breakpoint HERE
  +10: C3                ret              ← Dr1 breakpoint HERE

6. The EXCEPTION_SINGLE_STEP Exception

When execution hits an address stored in Dr0–Dr3 (with the corresponding enable bit set in Dr7), the CPU raises an exception with code 0x80000004 (EXCEPTION_SINGLE_STEP). This is the same exception code produced by:

Source	Exception Code	How to Differentiate
Hardware breakpoint (Dr0–Dr3)	`0x80000004`	Check if RIP matches a Dr0–Dr3 address
Trap Flag (TF) single-step	`0x80000004`	RIP does NOT match any Dr address; TF was set
Branch trace (BTF)	`0x80000004`	Rarely used in user mode

LayeredSyscall's HandlerHwBp differentiates by checking whether the exception address matches Dr0 or Dr1:

C++LONG WINAPI HandlerHwBp(PEXCEPTION_POINTERS ExceptionInfo) {
    if (ExceptionInfo->ExceptionRecord->ExceptionCode != EXCEPTION_SINGLE_STEP)
        return EXCEPTION_CONTINUE_SEARCH;

    PCONTEXT ctx = ExceptionInfo->ContextRecord;
    DWORD64 rip = ctx->Rip;

    if (rip == ctx->Dr0) {
        // Hit the syscall instruction breakpoint
        // Swap SSN and arguments...
    }
    else if (rip == ctx->Dr1) {
        // Hit the ret instruction breakpoint
        // Clean up and restore state...
    }
    else {
        // This is a trap flag single-step (used for call stack building)
        // Handle single-step logic...
    }

    return EXCEPTION_CONTINUE_EXECUTION;
}

Dr6: Which Breakpoint Fired?

The CPU sets bits in Dr6 to indicate which breakpoint triggered:

Dr6 Bit	Meaning
Bit 0	Dr0 breakpoint was hit
Bit 1	Dr1 breakpoint was hit
Bit 2	Dr2 breakpoint was hit
Bit 3	Dr3 breakpoint was hit
Bit 14	Single-step (trap flag or BTF)

LayeredSyscall primarily checks RIP against Dr0/Dr1 addresses rather than reading Dr6 bits, which is a simpler and equally reliable approach.

7. Advantages for Evasion

Why Hardware Breakpoints Are Ideal for Syscall Interception

Property	Benefit
No memory modification	ntdll bytes remain pristine. Memory scanners (PE-sieve, Moneta) see a clean DLL. No 0xCC patches, no JMP hooks.
Per-thread scope	Debug registers are part of the thread context. Setting Dr0 on thread A does not affect thread B. EDR monitoring of other threads sees nothing.
Dynamic set/clear	Breakpoints can be installed just before a syscall and cleared immediately after. They exist for microseconds, minimizing the detection window.
No API calls	Set from within a VEH handler by modifying the CONTEXT structure. No calls to `SetThreadContext` or `NtSetContextThread` for the EDR to intercept.
Executes from ntdll	The `syscall` instruction still executes from its original address in ntdll. InstrumentationCallback sees a legitimate ntdll return address.

Limitations to Be Aware Of

Maximum 4 breakpoints per thread: Dr0–Dr3 only. LayeredSyscall uses 2 (Dr0 for syscall, Dr1 for ret), leaving Dr2–Dr3 available.
EDRs can check debug registers: Some EDRs periodically call GetThreadContext to inspect Dr0–Dr7. If non-zero debug registers are found outside a debugger context, it may be flagged. LayeredSyscall mitigates this by clearing breakpoints immediately after use.
Debugger interference: If a debugger is attached, it may use Dr0–Dr3 for its own breakpoints, conflicting with LayeredSyscall. The tool includes an IsDebuggerPresent check to avoid this.
Thread affinity: The breakpoints only exist on the thread that triggered the VEH handler. Multi-threaded applications need to ensure the syscall chain runs on the same thread that installed the breakpoints.

Summary: The Complete Interception Chain

Combining what we learned in Modules 2–4, here is how the pieces fit together:

Hardware Breakpoint Interception Flow

Resolve SSN
(Exception Directory)

→

Find syscall addr
(scan for 0F 05)

→

Trigger VEH entry
(null deref)

→

Set Dr0/Dr1
(in VEH handler)

Call legitimate API
(e.g., WriteFile)

→

Dr0 fires at syscall
SINGLE_STEP

→

Swap SSN + args
(in VEH handler)

→

Kernel executes
desired function

Kernel returns
(NTSTATUS)

→

Dr1 fires at ret
SINGLE_STEP

→

Clean up & restore
(clear Dr0/Dr1)

→

Return to caller
with real result

Module 4 Quiz: Hardware Breakpoints

Q1: Which Dr7 bit must be set to enable a hardware breakpoint on Dr0?

Bit 7 (G3) Bit 1 (G0) Bit 0 (L0 — local enable for Dr0) Bit 16 (R/W0 condition)

Bit 0 of Dr7 is L0, the local enable flag for Dr0. Setting Dr7 |= (1 << 0) activates the breakpoint configured in Dr0. The "local" designation means it is automatically cleared on task switches (though in practice, Windows preserves debug registers across user-mode context switches within the same thread).

Q2: Why are hardware breakpoints stealthier than software breakpoints (INT3) for syscall interception?

Hardware breakpoints run in kernel mode and are invisible to user-mode scanners They don't modify any bytes in memory, so ntdll remains clean and integrity checks pass They are encrypted by the CPU and cannot be read by any software EDRs cannot access the debug registers under any circumstances

Software breakpoints replace instruction bytes with 0xCC (INT3), which memory scanners can detect by comparing in-memory ntdll against its on-disk image. Hardware breakpoints use CPU debug registers — no bytes in memory are modified, so scanners like PE-sieve and Moneta see a completely clean ntdll. Note that EDRs can read debug registers via GetThreadContext, but this is uncommon and LayeredSyscall clears breakpoints immediately after use.

← Prev: Exception Handling & VEH Next: Dual-Handler Architecture →