Module 8: Full Chain, Detection & Prior Art
Every step from wrapper to kernel and back — then how defenders can catch it all.
Putting It All Together
Over the past seven modules, we dissected each component of LayeredSyscall in isolation. Now we trace the complete execution path from a single wrapper call through every exception, breakpoint, redirect, and context swap — all the way to the kernel and back. Then we examine how defenders can detect each layer.
End-to-End Execution Flow
Here is the complete 12-step sequence for every wrapped syscall. Each step maps to concepts from previous modules:
Step 1: Resolve Address
wrpNtXxx() calls GetProcAddress(GetModuleHandleA("ntdll.dll"), "NtXxx") to resolve the target function address in ntdll.dll. (Module 5)
Step 2: Resolve SSN
GetSsnByName("NtXxx") walks the ntdll Exception Directory to find the function's ordinal, then maps that ordinal to the System Service Number. (Module 2)
Step 3: Trigger ACCESS_VIOLATION
SetHwBp(addr, extArgs, ssn) stores global state, then _SetHwBp(addr) performs a null pointer dereference. The address is passed as RCX. (Module 5)
Step 4: AddHwBp Installs Breakpoints
VEH Handler 1 catches EXCEPTION_ACCESS_VIOLATION, reads RCX, scans for the syscall opcode (0x050F), and installs Dr0 (on syscall) and Dr1 (on ret). RIP advances past the crash. (Module 5)
Step 5: Call Real Nt* Function
The wrapper calls the real ntdll function through the resolved pointer. Any EDR inline hook at the function entry point executes normally. (Module 5)
Step 6: Dr0 Fires at Syscall Instruction
Execution reaches the syscall instruction inside the Nt* stub. The Dr0 hardware breakpoint fires, generating EXCEPTION_SINGLE_STEP. (Module 4)
Step 7: Save Context & Redirect
HandlerHwBp (Phase 2) disables Dr0, saves the entire CONTEXT via memcpy, redirects RIP to demofunction() (MessageBoxW), and enables the Trap Flag (EFlags |= 0x100). (Module 6)
Step 8: Single-Step Trace
Execution flows through MessageBoxW → user32.dll internals → various Win32 layers. Each instruction triggers EXCEPTION_SINGLE_STEP. The handler re-enables TF each time, building genuine call stack frames. (Module 6)
Step 9: Three Conditions Met
Once execution enters ntdll (range check), the handler finds a sub rsp >= 0x58 (IsSubRsp = 1), then a call instruction (IsSubRsp = 2). All three conditions satisfied. (Module 6)
Step 10: Context Swap
The handler saves TempRsp (legitimate stack), restores SavedContext (real arguments), replaces RSP with TempRsp, emulates mov r10, rcx and mov eax, SSN, copies extended arguments if needed, clears the Trap Flag, and sets RIP to the syscall instruction. (Module 7)
Step 11: Syscall Executes
The syscall instruction executes from ntdll.dll memory with the correct SSN in RAX, real arguments in registers and on stack, and a genuine call stack showing MessageBoxW → user32 → ntdll. The kernel processes the request normally. (Module 7)
Step 12: Clean Return via Dr1
After the kernel returns, execution hits the ret instruction where Dr1 fires. The handler disables Dr1 and restores RSP to the original wrapper stack. The ret instruction returns to the wrapper with NTSTATUS in RAX. (Module 7)
Complete Flow Diagram
Full Execution Chain
GetProcAddress
Exception Dir
ACCESS_VIOLATION
Dr0 + Dr1
EDR hook runs
SINGLE_STEP
MessageBoxW
~1000s instrs
sub rsp + call
Args + Stack
Genuine stack
Restore RSP
What the EDR Sees
The entire point of this technique is to change what kernel telemetry and call stack analysis reveals. Here is a side-by-side comparison:
Call Stack Comparison
Without LayeredSyscall (Anomalous)
Red flag: EXE jumps directly into ntdll
With LayeredSyscall (Legitimate)
Legitimate: Standard API chain into ntdll
Genuine, Not Fabricated
The frames above are not synthetic return addresses pushed onto the stack. Execution actually passed through those functions. If a defender walks the stack using unwind metadata or checks that each return address corresponds to a valid call site, every frame will pass validation. This is the key advantage over frame-fabrication approaches.
The Demo: Creating calc.exe
The repository includes demo.cpp, which demonstrates launching calc.exe as a child process using the fully wrapped NtCreateUserProcess syscall (11 arguments, ExtendedArgs = TRUE).
C++// Initialize the VEH handlers
InitializeHooks();
// Build the process parameters
UNICODE_STRING NtImagePath;
RtlInitUnicodeString(&NtImagePath, L"\\??\\C:\\Windows\\System32\\calc.exe");
PRTL_USER_PROCESS_PARAMETERS ProcessParameters = NULL;
RtlCreateProcessParametersEx(
&ProcessParameters,
&NtImagePath,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
RTL_USER_PROCESS_PARAMETERS_NORMALIZED
);
C++// Create the process using the wrapped syscall
PS_CREATE_INFO CreateInfo = { 0 };
CreateInfo.Size = sizeof(CreateInfo);
CreateInfo.State = PsCreateInitialState;
PPS_ATTRIBUTE_LIST AttributeList = /* ... setup ... */;
HANDLE hProcess = NULL, hThread = NULL;
NTSTATUS status = wrpNtCreateUserProcess(
&hProcess, // arg 1 (RCX)
&hThread, // arg 2 (RDX)
PROCESS_ALL_ACCESS, // arg 3 (R8)
THREAD_ALL_ACCESS, // arg 4 (R9)
NULL, // arg 5 (stack: 0x28)
NULL, // arg 6 (stack: 0x30)
0, // arg 7 (stack: 0x38)
0, // arg 8 (stack: 0x40)
ProcessParameters, // arg 9 (stack: 0x48)
&CreateInfo, // arg 10 (stack: 0x50)
AttributeList // arg 11 (stack: 0x58)
);
What Happens Under the Hood
This single call triggers the entire 12-step chain from above. The 11 arguments span both registers and stack slots through ELEVENTH_ARGUMENT (offset 0x58). The EDR sees a legitimate-looking call stack originating from MessageBoxW, not from the offensive tool's main function. Calculator launches successfully.
Detection Surface Analysis
Despite the sophistication of the technique, several detection opportunities exist. Defenders should layer multiple signals for confidence.
VEH Registration Monitoring
Signal: Calls to AddVectoredExceptionHandler from non-standard modules. Legitimate use of VEH is rare outside debugging frameworks and language runtimes. An executable registering two VEH handlers early in its lifecycle is suspicious.
Detection: Hook or monitor RtlAddVectoredExceptionHandler in ntdll. Alert when the callback address is outside known legitimate modules.
Hardware Breakpoint Detection
Signal: Debug registers (Dr0-Dr3) contain non-zero values outside of a debugger session. Normal applications never set hardware breakpoints.
Detection: Periodically call NtGetContextThread on process threads and inspect debug registers. Non-zero Dr0-Dr3 values outside a debugger are highly anomalous. Note: LayeredSyscall clears breakpoints after each syscall, so timing is important.
Trap Flag Abuse
Signal: The EFlags TF bit (0x100) is set during normal execution. This bit is virtually never set outside of debugger single-stepping.
Detection: ETW providers or kernel callbacks that observe RFLAGS can flag processes with frequent TF-related exceptions. The sheer volume of EXCEPTION_SINGLE_STEP events is a strong signal.
Exception Frequency Analysis
Signal: Hundreds or thousands of EXCEPTION_SINGLE_STEP exceptions per syscall. Normal applications generate near-zero single-step exceptions. A process generating thousands per second is clearly doing something unusual.
Detection: Monitor exception dispatch rates via ETW or kernel instrumentation. Statistical anomaly detection on exception frequency per process.
Heuristic: Phantom API Calls
Signal: MessageBoxW (or another demo function) is called but never displays. The API enters its execution path but is abandoned mid-execution when the trap-flag trace redirects to the real syscall.
Detection: Correlate API call entry events with completion events. A MessageBoxW that starts but never creates a window is suspicious. Note: the demo function is configurable, so this specific heuristic can be evaded by changing it.
| Detection Vector | Difficulty | Reliability | Evasion Possible? |
|---|---|---|---|
| VEH Registration | Low | Medium | Could register from a DLL that looks legitimate |
| Hardware BP Detection | Medium | Low (timing-dependent) | BPs are cleared after each syscall |
| Trap Flag / TF Abuse | Medium | High | Difficult to avoid — fundamental to the technique |
| Exception Frequency | Low | High | Cannot be reduced without changing the technique |
| Phantom API Calls | High | Medium | Change demo function to something less conspicuous |
Prior Art Comparison
LayeredSyscall builds on years of syscall evasion research. Here is how it compares to the major techniques:
| Project | SSN Resolution | Syscall Location | Call Stack | Mechanism |
|---|---|---|---|---|
| Hell's Gate | Stub opcodes | Direct (in EXE memory) | Anomalous | Read SSN from ntdll stub bytes |
| SysWhispers3 | Zw* sort order | Indirect (jumps into ntdll) | Anomalous | Static jmp to syscall in ntdll |
| HWSyscalls | HalosGate | Indirect (in ntdll) | Synthetic trampoline | HW breakpoints + VEH |
| TamperingSyscalls | Various | Through hook | Spoofed arguments | HW breakpoints + VEH |
| WithSecure Spoofer | N/A (separate) | Various | Fabricated frames | UNWIND_CODE parsing |
| LayeredSyscall | Exception Directory | Indirect (in ntdll) | GENUINE frames | VEH + HW BPs + Trap Flag |
Key Differentiator: Genuine vs. Fabricated Stacks
Hell's Gate and SysWhispers3 make no attempt at stack spoofing — their stacks are anomalous. HWSyscalls uses a synthetic trampoline (one frame). WithSecure fabricates multiple frames using UNWIND_CODE metadata, but the execution never actually passed through those frames. LayeredSyscall is unique in producing genuinely traversed call stack frames by actually executing a legitimate API (MessageBoxW) and hijacking its execution context at the right moment.
Evolution of SSN Resolution
Each project uses a different approach to resolve System Service Numbers:
- Hell's Gate: Reads the
mov eax, SSNinstruction from the ntdll stub (fails if hooked) - HalosGate: If the stub is hooked, scans neighboring stubs to infer the SSN
- SysWhispers3: Sorts Zw* function addresses; the sort order equals the SSN
- LayeredSyscall: Uses the ntdll Exception Directory to map function names to ordinals, then to SSNs. This is independent of stub byte patterns.
Limitations
LayeredSyscall is a proof of concept, not production-ready malware. Understanding its limitations is as important as understanding its capabilities.
Cannot Wrap NtSetContextThread
This function modifies thread context, including the debug registers (Dr0-Dr7) that LayeredSyscall relies on. Wrapping it would create a circular dependency: the hardware breakpoints need to persist to intercept the syscall, but the syscall itself would overwrite them. Any tool that needs to call NtSetContextThread must do so outside the LayeredSyscall framework.
Performance Overhead
Each wrapped syscall generates hundreds to thousands of EXCEPTION_SINGLE_STEP exceptions during the trap-flag trace. Each exception involves a kernel-to-user transition, VEH dispatch, and handler execution. For a single syscall like NtCreateUserProcess this is acceptable, but wrapping performance-sensitive syscalls in a tight loop would be impractical.
x64 Only
The technique relies on x64-specific features: the syscall instruction (not sysenter), the x64 register-based calling convention (RCX, RDX, R8, R9), and 64-bit CONTEXT structure layout. Porting to x86 (WoW64) would require fundamental redesign.
Default Demo Function Fingerprinting
The default demofunction() is MessageBoxW. If defenders look for processes that call MessageBoxW but never create a message box window, this is detectable. However, the demo function is configurable — any API that eventually reaches ntdll would work. Using something more common (like a file operation API) would be harder to fingerprint.
No Build System in Repository
The repository does not include Visual Studio project files, CMakeLists, or a Makefile. Users must manually integrate the source files into an existing MSVC project with the correct settings (x64, C++17 or later, Windows SDK headers).
References & Further Reading
Core Projects
- LayeredSyscall — The project this course is based on (GitHub: WKL-Sec/LayeredSyscall)
- Hell's Gate — Original dynamic SSN resolution (GitHub: am0nsec/HellsGate)
- Halo's Gate — Hooked-stub neighbor scanning (GitHub: trickster0/TartarusGate)
- SysWhispers3 — Zw sort-order SSN resolution with indirect syscalls (GitHub: klezVirus/SysWhispers3)
- HWSyscalls — Hardware breakpoint-based syscalls (GitHub: ShorSec/HWSyscalls)
- TamperingSyscalls — Argument-spoofing syscalls via VEH (GitHub: rad9800/TamperingSyscalls)
Call Stack Spoofing Research
- WithSecure Call Stack Spoofer — UNWIND_CODE-based frame fabrication
- VulcanRaven — Return address spoofing via synthetic frames
- ThreadStackSpoofer — Sleep-time stack manipulation
Detection Resources
- Elastic Blog — Research on detecting direct/indirect syscalls via call stack analysis
- ETW Threat Intelligence — Microsoft's ETW provider for kernel callback telemetry
- Windows Internals, 7th Edition — Pavel Yosifovich et al. (comprehensive OS internals reference)
Course Complete
Congratulations!
You have completed the LayeredSyscall course. Over 8 modules, you learned:
- Module 1: How EDRs hook userland APIs and why syscall evasion matters
- Module 2: Syscall internals, the SSDT, and how LayeredSyscall resolves SSNs via the Exception Directory
- Module 3: Windows exception handling from SEH through VEH and how to weaponize it
- Module 4: Hardware breakpoints via debug registers (Dr0-Dr7) and their role in the technique
- Module 5: The dual-handler architecture: AddHwBp (setup) and HandlerHwBp (execution management)
- Module 6: Call stack construction via CPU Trap Flag tracing through a legitimate API
- Module 7: Argument marshalling, the context swap, and clean return via Dr1
- Module 8: The complete execution chain, detection surfaces, and comparison with prior art
Suggested Next Steps
- Read the source: Clone the LayeredSyscall repository and trace through HookModule.cpp line by line with this course as a guide
- Build and test: Set up a Windows VM with a debugger (x64dbg or WinDbg) and step through the exception chain
- Extend: Try wrapping additional Nt* functions not in the default set
- Detect: Write a detector for trap flag abuse or exception frequency anomalies
- Compare: Build HWSyscalls or TamperingSyscalls and compare the call stacks side by side
- Study prior art: Read the Hell's Gate and SysWhispers3 source code to understand the evolution
Final Exam: Full Chain & Detection
Q1: In the complete 12-step chain, what happens between "Call real Nt* function" (Step 5) and "Save context & redirect" (Step 7)?
Q2: Which detection vector is hardest for LayeredSyscall to evade?
Q3: How does LayeredSyscall's call stack differ from WithSecure's call stack spoofer?
Q4: Why can't LayeredSyscall wrap NtSetContextThread?