Module 4: Hardware Breakpoints & Debug Registers
Using the CPU's own debug facilities to intercept execution — invisibly.
Module Objective
Hardware breakpoints are the mechanism LayeredSyscall uses to intercept execution at the syscall instruction inside ntdll. This module covers the x64 debug register architecture, how breakpoints are set from VEH handlers, the different breakpoint types, and why hardware breakpoints are far stealthier than software alternatives.
1. Software vs Hardware Breakpoints
There are two fundamentally different ways to set breakpoints on x64. Understanding the distinction is critical because one modifies memory (detectable) and one does not (stealthy).
Software Breakpoints (INT3)
- Replaces the first byte of the target instruction with
0xCC(INT3) - Modifies code in memory — changes the actual bytes of ntdll
- Detectable by comparing in-memory bytes against on-disk image
- Detectable by code integrity checks (checksum of .text section)
- Unlimited count — can set as many as needed
- Triggers
EXCEPTION_BREAKPOINT(0x80000003)
Hardware Breakpoints (Debug Registers)
- Uses CPU debug registers (Dr0–Dr3) to monitor addresses
- No memory modification — ntdll bytes remain clean
- Invisible to memory scanners and integrity checks
- Limited to 4 per thread (Dr0, Dr1, Dr2, Dr3)
- Per-thread: only affects the thread that set them
- Triggers
EXCEPTION_SINGLE_STEP(0x80000004)
ComparisonSoftware Breakpoint (INT3):
BEFORE: 4C 8B D1 B8 18 00 00 00 0F 05 C3 ; clean stub
AFTER: CC 8B D1 B8 18 00 00 00 0F 05 C3 ; 0xCC replaces first byte!
^^ DETECTABLE - byte changed in memory
Hardware Breakpoint (Dr0):
MEMORY: 4C 8B D1 B8 18 00 00 00 0F 05 C3 ; unchanged! clean!
Dr0: points to address of 0F 05 (syscall)
Dr7: bit 0 enabled
Result: exception fires when RIP reaches Dr0's address, but ntdll bytes are untouched
2. x64 Debug Registers
The x64 architecture provides 8 debug registers (Dr0–Dr7). LayeredSyscall uses Dr0, Dr1, and Dr7:
| Register | Purpose | LayeredSyscall Usage |
|---|---|---|
| Dr0 | Breakpoint address 0 | Address of the syscall instruction (0F 05) in the target Nt* stub |
| Dr1 | Breakpoint address 1 | Address of the ret instruction (C3) after syscall |
| Dr2 | Breakpoint address 2 | Not used (available for other purposes) |
| Dr3 | Breakpoint address 3 | Not used |
| Dr4–Dr5 | Reserved (aliases for Dr6/Dr7 when CR4.DE=0) | Not used |
| Dr6 | Debug status register | Read by CPU to indicate which breakpoint fired (bits 0–3) |
| Dr7 | Debug control register | Enable/disable breakpoints, set conditions and length |
Dr7: Debug Control Register Layout
Dr7 is the most complex debug register. It controls which breakpoints are active and what conditions trigger them:
Dr7 Bit Layout (Low 16 Bits)
| Bits | Field | Description |
|---|---|---|
| 0 | L0 | Local enable for Dr0 — set to 1 to activate Dr0 breakpoint |
| 1 | G0 | Global enable for Dr0 (use L0 instead in user mode) |
| 2 | L1 | Local enable for Dr1 — set to 1 to activate Dr1 breakpoint |
| 3 | G1 | Global enable for Dr1 |
| 4 | L2 | Local enable for Dr2 |
| 5 | G2 | Global enable for Dr2 |
| 6 | L3 | Local enable for Dr3 |
| 7 | G3 | Global enable for Dr3 |
| 16–17 | R/W0 | Condition for Dr0 (00=exec, 01=write, 10=I/O, 11=read/write) |
| 18–19 | LEN0 | Length for Dr0 (00=1 byte for execution) |
| 20–21 | R/W1 | Condition for Dr1 |
| 22–23 | LEN1 | Length for Dr1 |
| 24–25 | R/W2 | Condition for Dr2 |
| 26–27 | LEN2 | Length for Dr2 |
| 28–29 | R/W3 | Condition for Dr3 |
| 30–31 | LEN3 | Length for Dr3 |
For LayeredSyscall, the relevant operations on Dr7 are:
C++// Enable Dr0 (local enable, bit 0)
Dr7 |= (1 << 0); // Sets bit 0 = L0 = enable Dr0
// Enable Dr1 (local enable, bit 2)
Dr7 |= (1 << 2); // Sets bit 2 = L1 = enable Dr1
// Condition bits for Dr0 at bits 16-17: 00 = execution breakpoint
// Condition bits for Dr1 at bits 20-21: 00 = execution breakpoint
// (00 is the default, so no explicit set needed for execution type)
// Disable Dr0
Dr7 &= ~(1 << 0); // Clears bit 0 = L0 = disable Dr0
3. Setting Hardware Breakpoints via VEH
Normally, modifying debug registers requires the SetThreadContext API, which EDRs monitor. But when you're inside a VEH handler, you have direct access to the CONTEXT structure — including the debug registers. This is how LayeredSyscall sets breakpoints without calling any suspicious APIs.
C++// From LayeredSyscall's AddHwBp handler
LONG WINAPI AddHwBp(PEXCEPTION_POINTERS ExceptionInfo) {
if (ExceptionInfo->ExceptionRecord->ExceptionCode != EXCEPTION_ACCESS_VIOLATION)
return EXCEPTION_CONTINUE_SEARCH;
PCONTEXT ctx = ExceptionInfo->ContextRecord;
// Set Dr0 = address of 'syscall' (0F 05) in the target Nt* function
ctx->Dr0 = (DWORD64)pSyscallAddress;
// Set Dr1 = address of 'ret' (C3) after the syscall instruction
ctx->Dr1 = (DWORD64)pRetAddress;
// Enable both breakpoints in Dr7
ctx->Dr7 |= (1 << 0); // Enable Dr0 (L0)
ctx->Dr7 |= (1 << 2); // Enable Dr1 (L1)
// Advance RIP past the faulting instruction (null deref)
ctx->Rip += INSTRUCTION_SIZE;
// Set up the call to the legitimate API...
return EXCEPTION_CONTINUE_EXECUTION;
}
No API Calls Needed
Traditional methods of setting hardware breakpoints require:
C++ (Traditional - Detectable)// This is what EDRs watch for!
CONTEXT ctx;
ctx.ContextFlags = CONTEXT_DEBUG_REGISTERS;
GetThreadContext(hThread, &ctx);
ctx.Dr0 = targetAddr;
ctx.Dr7 |= 1;
SetThreadContext(hThread, &ctx); // EDR hooks this!
LayeredSyscall avoids this entirely. By modifying Dr0–Dr7 within a VEH handler, it uses the exception dispatch mechanism itself as the context-modification channel. No calls to GetThreadContext or SetThreadContext are needed.
4. Breakpoint Types
The R/W (condition) bits in Dr7 determine what triggers the breakpoint. There are four types:
| R/W Bits | Type | Trigger Condition | LayeredSyscall Use |
|---|---|---|---|
00 | Execution | CPU attempts to execute instruction at the address | Used for both Dr0 and Dr1 |
01 | Data Write | CPU writes data to the address | Not used |
10 | I/O Access | CPU executes I/O instruction for the port (requires CR4.DE=1) | Not used |
11 | Data Read/Write | CPU reads or writes data at the address | Not used |
LayeredSyscall exclusively uses execution breakpoints (R/W = 00). This is the default value, so the condition bits don't need explicit configuration beyond enabling the local enable flags.
5. How LayeredSyscall Uses Hardware Breakpoints
LayeredSyscall sets two hardware breakpoints per syscall invocation, each serving a distinct purpose:
Dr0: Syscall Interception Point
Target: Address of the syscall instruction (0F 05) within the Nt* function stub.
Purpose: When a legitimate API (e.g., WriteFile) calls its corresponding Nt* function and execution reaches the syscall instruction, Dr0 fires. The VEH handler then swaps EAX (SSN) to the desired function's SSN and swaps the arguments.
Dr1: Return Interception Point
Target: Address of the ret instruction (C3) immediately after the syscall.
Purpose: After the hijacked syscall returns from the kernel, Dr1 fires. The handler uses this opportunity to clean up: restore the original return value, fix the stack, clear the breakpoints, and return control to the caller.
Finding the Syscall and Ret Addresses
The code scans forward from the beginning of the Nt* function, looking for the 0F 05 (syscall) byte sequence within the first 25 bytes:
C++// From LayeredSyscall - finding syscall instruction offset
BOOL GetSyscallAddresses(PVOID funcBase, PVOID* pSyscall, PVOID* pRet) {
BYTE* p = (BYTE*)funcBase;
for (DWORD i = 0; i < 25; i++) {
// Look for syscall opcode: 0F 05
if (p[i] == 0x0F && p[i + 1] == 0x05) {
*pSyscall = (PVOID)&p[i]; // Address of 'syscall'
*pRet = (PVOID)&p[i + 2]; // Address of 'ret' (C3)
return TRUE;
}
}
return FALSE; // syscall not found within range
}
// These addresses become:
// OPCODE_SYSCALL_OFF = offset of 0F 05 from function base (typically +8)
// OPCODE_SYSCALL_RET_OFF = offset of C3 from function base (typically +10)
Breakpoint Placement in the Syscall Stub
Memory LayoutNtAllocateVirtualMemory:
+0: 4C 8B D1 mov r10, rcx
+3: B8 18 00 00 00 mov eax, 0x18
+8: 0F 05 syscall ← Dr0 breakpoint HERE
+10: C3 ret ← Dr1 breakpoint HERE
6. The EXCEPTION_SINGLE_STEP Exception
When execution hits an address stored in Dr0–Dr3 (with the corresponding enable bit set in Dr7), the CPU raises an exception with code 0x80000004 (EXCEPTION_SINGLE_STEP). This is the same exception code produced by:
| Source | Exception Code | How to Differentiate |
|---|---|---|
| Hardware breakpoint (Dr0–Dr3) | 0x80000004 | Check if RIP matches a Dr0–Dr3 address |
| Trap Flag (TF) single-step | 0x80000004 | RIP does NOT match any Dr address; TF was set |
| Branch trace (BTF) | 0x80000004 | Rarely used in user mode |
LayeredSyscall's HandlerHwBp differentiates by checking whether the exception address matches Dr0 or Dr1:
C++LONG WINAPI HandlerHwBp(PEXCEPTION_POINTERS ExceptionInfo) {
if (ExceptionInfo->ExceptionRecord->ExceptionCode != EXCEPTION_SINGLE_STEP)
return EXCEPTION_CONTINUE_SEARCH;
PCONTEXT ctx = ExceptionInfo->ContextRecord;
DWORD64 rip = ctx->Rip;
if (rip == ctx->Dr0) {
// Hit the syscall instruction breakpoint
// Swap SSN and arguments...
}
else if (rip == ctx->Dr1) {
// Hit the ret instruction breakpoint
// Clean up and restore state...
}
else {
// This is a trap flag single-step (used for call stack building)
// Handle single-step logic...
}
return EXCEPTION_CONTINUE_EXECUTION;
}
Dr6: Which Breakpoint Fired?
The CPU sets bits in Dr6 to indicate which breakpoint triggered:
| Dr6 Bit | Meaning |
|---|---|
| Bit 0 | Dr0 breakpoint was hit |
| Bit 1 | Dr1 breakpoint was hit |
| Bit 2 | Dr2 breakpoint was hit |
| Bit 3 | Dr3 breakpoint was hit |
| Bit 14 | Single-step (trap flag or BTF) |
LayeredSyscall primarily checks RIP against Dr0/Dr1 addresses rather than reading Dr6 bits, which is a simpler and equally reliable approach.
7. Advantages for Evasion
Why Hardware Breakpoints Are Ideal for Syscall Interception
| Property | Benefit |
|---|---|
| No memory modification | ntdll bytes remain pristine. Memory scanners (PE-sieve, Moneta) see a clean DLL. No 0xCC patches, no JMP hooks. |
| Per-thread scope | Debug registers are part of the thread context. Setting Dr0 on thread A does not affect thread B. EDR monitoring of other threads sees nothing. |
| Dynamic set/clear | Breakpoints can be installed just before a syscall and cleared immediately after. They exist for microseconds, minimizing the detection window. |
| No API calls | Set from within a VEH handler by modifying the CONTEXT structure. No calls to SetThreadContext or NtSetContextThread for the EDR to intercept. |
| Executes from ntdll | The syscall instruction still executes from its original address in ntdll. InstrumentationCallback sees a legitimate ntdll return address. |
Limitations to Be Aware Of
- Maximum 4 breakpoints per thread: Dr0–Dr3 only. LayeredSyscall uses 2 (Dr0 for syscall, Dr1 for ret), leaving Dr2–Dr3 available.
- EDRs can check debug registers: Some EDRs periodically call
GetThreadContextto inspect Dr0–Dr7. If non-zero debug registers are found outside a debugger context, it may be flagged. LayeredSyscall mitigates this by clearing breakpoints immediately after use. - Debugger interference: If a debugger is attached, it may use Dr0–Dr3 for its own breakpoints, conflicting with LayeredSyscall. The tool includes an
IsDebuggerPresentcheck to avoid this. - Thread affinity: The breakpoints only exist on the thread that triggered the VEH handler. Multi-threaded applications need to ensure the syscall chain runs on the same thread that installed the breakpoints.
Summary: The Complete Interception Chain
Combining what we learned in Modules 2–4, here is how the pieces fit together:
Hardware Breakpoint Interception Flow
(Exception Directory)
(scan for 0F 05)
(null deref)
(in VEH handler)
(e.g., WriteFile)
SINGLE_STEP
(in VEH handler)
desired function
(NTSTATUS)
SINGLE_STEP
(clear Dr0/Dr1)
with real result
Module 4 Quiz: Hardware Breakpoints
Q1: Which Dr7 bit must be set to enable a hardware breakpoint on Dr0?
Dr7 |= (1 << 0) activates the breakpoint configured in Dr0. The "local" designation means it is automatically cleared on task switches (though in practice, Windows preserves debug registers across user-mode context switches within the same thread).Q2: Why are hardware breakpoints stealthier than software breakpoints (INT3) for syscall interception?