Difficulty: Advanced

Module 8: Full Chain, Detection & Prior Art

Every step from wrapper to kernel and back — then how defenders can catch it all.

Putting It All Together

Over the past seven modules, we dissected each component of LayeredSyscall in isolation. Now we trace the complete execution path from a single wrapper call through every exception, breakpoint, redirect, and context swap — all the way to the kernel and back. Then we examine how defenders can detect each layer.

End-to-End Execution Flow

Here is the complete 12-step sequence for every wrapped syscall. Each step maps to concepts from previous modules:

Step 1: Resolve Address

wrpNtXxx() calls GetProcAddress(GetModuleHandleA("ntdll.dll"), "NtXxx") to resolve the target function address in ntdll.dll. (Module 5)

Step 2: Resolve SSN

GetSsnByName("NtXxx") walks the ntdll Exception Directory to find the function's ordinal, then maps that ordinal to the System Service Number. (Module 2)

Step 3: Trigger ACCESS_VIOLATION

SetHwBp(addr, extArgs, ssn) stores global state, then _SetHwBp(addr) performs a null pointer dereference. The address is passed as RCX. (Module 5)

Step 4: AddHwBp Installs Breakpoints

VEH Handler 1 catches EXCEPTION_ACCESS_VIOLATION, reads RCX, scans for the syscall opcode (0x050F), and installs Dr0 (on syscall) and Dr1 (on ret). RIP advances past the crash. (Module 5)

Step 5: Call Real Nt* Function

The wrapper calls the real ntdll function through the resolved pointer. Any EDR inline hook at the function entry point executes normally. (Module 5)

Step 6: Dr0 Fires at Syscall Instruction

Execution reaches the syscall instruction inside the Nt* stub. The Dr0 hardware breakpoint fires, generating EXCEPTION_SINGLE_STEP. (Module 4)

Step 7: Save Context & Redirect

HandlerHwBp (Phase 2) disables Dr0, saves the entire CONTEXT via memcpy, redirects RIP to demofunction() (MessageBoxW), and enables the Trap Flag (EFlags |= 0x100). (Module 6)

Step 8: Single-Step Trace

Execution flows through MessageBoxW → user32.dll internals → various Win32 layers. Each instruction triggers EXCEPTION_SINGLE_STEP. The handler re-enables TF each time, building genuine call stack frames. (Module 6)

Step 9: Three Conditions Met

Once execution enters ntdll (range check), the handler finds a sub rsp >= 0x58 (IsSubRsp = 1), then a call instruction (IsSubRsp = 2). All three conditions satisfied. (Module 6)

Step 10: Context Swap

The handler saves TempRsp (legitimate stack), restores SavedContext (real arguments), replaces RSP with TempRsp, emulates mov r10, rcx and mov eax, SSN, copies extended arguments if needed, clears the Trap Flag, and sets RIP to the syscall instruction. (Module 7)

Step 11: Syscall Executes

The syscall instruction executes from ntdll.dll memory with the correct SSN in RAX, real arguments in registers and on stack, and a genuine call stack showing MessageBoxW → user32 → ntdll. The kernel processes the request normally. (Module 7)

Step 12: Clean Return via Dr1

After the kernel returns, execution hits the ret instruction where Dr1 fires. The handler disables Dr1 and restores RSP to the original wrapper stack. The ret instruction returns to the wrapper with NTSTATUS in RAX. (Module 7)

Complete Flow Diagram

Full Execution Chain

1. Resolve addr
GetProcAddress
2. Get SSN
Exception Dir
3. NULL deref
ACCESS_VIOLATION
4. AddHwBp
Dr0 + Dr1
5. Call Nt*
EDR hook runs
6. Dr0 fires
SINGLE_STEP
7. Save + Redirect
MessageBoxW
8. TF Trace
~1000s instrs
9. 3 Conditions
sub rsp + call
10. Ctx Swap
Args + Stack
11. Syscall!
Genuine stack
12. Dr1 Return
Restore RSP

What the EDR Sees

The entire point of this technique is to change what kernel telemetry and call stack analysis reveals. Here is a side-by-side comparison:

Call Stack Comparison

Without LayeredSyscall (Anomalous)

myapp.exe!main + 0x55
myapp.exe!wrpNtCreate + 0x30
ntdll!NtCreateUserProcess + 0x14
syscall

Red flag: EXE jumps directly into ntdll

With LayeredSyscall (Legitimate)

user32!MessageBoxW + 0xAA
user32!InternalFunction + 0x42
ntdll!SomeNtdllFunc + 0x1B
ntdll!NtCreateUserProcess + 0x14
syscall

Legitimate: Standard API chain into ntdll

Genuine, Not Fabricated

The frames above are not synthetic return addresses pushed onto the stack. Execution actually passed through those functions. If a defender walks the stack using unwind metadata or checks that each return address corresponds to a valid call site, every frame will pass validation. This is the key advantage over frame-fabrication approaches.

The Demo: Creating calc.exe

The repository includes demo.cpp, which demonstrates launching calc.exe as a child process using the fully wrapped NtCreateUserProcess syscall (11 arguments, ExtendedArgs = TRUE).

C++// Initialize the VEH handlers
InitializeHooks();

// Build the process parameters
UNICODE_STRING NtImagePath;
RtlInitUnicodeString(&NtImagePath, L"\\??\\C:\\Windows\\System32\\calc.exe");

PRTL_USER_PROCESS_PARAMETERS ProcessParameters = NULL;
RtlCreateProcessParametersEx(
    &ProcessParameters,
    &NtImagePath,
    NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
    RTL_USER_PROCESS_PARAMETERS_NORMALIZED
);
C++// Create the process using the wrapped syscall
PS_CREATE_INFO CreateInfo = { 0 };
CreateInfo.Size = sizeof(CreateInfo);
CreateInfo.State = PsCreateInitialState;

PPS_ATTRIBUTE_LIST AttributeList = /* ... setup ... */;

HANDLE hProcess = NULL, hThread = NULL;

NTSTATUS status = wrpNtCreateUserProcess(
    &hProcess,              // arg 1  (RCX)
    &hThread,               // arg 2  (RDX)
    PROCESS_ALL_ACCESS,     // arg 3  (R8)
    THREAD_ALL_ACCESS,      // arg 4  (R9)
    NULL,                   // arg 5  (stack: 0x28)
    NULL,                   // arg 6  (stack: 0x30)
    0,                      // arg 7  (stack: 0x38)
    0,                      // arg 8  (stack: 0x40)
    ProcessParameters,      // arg 9  (stack: 0x48)
    &CreateInfo,            // arg 10 (stack: 0x50)
    AttributeList           // arg 11 (stack: 0x58)
);

What Happens Under the Hood

This single call triggers the entire 12-step chain from above. The 11 arguments span both registers and stack slots through ELEVENTH_ARGUMENT (offset 0x58). The EDR sees a legitimate-looking call stack originating from MessageBoxW, not from the offensive tool's main function. Calculator launches successfully.

Detection Surface Analysis

Despite the sophistication of the technique, several detection opportunities exist. Defenders should layer multiple signals for confidence.

VEH Registration Monitoring

Signal: Calls to AddVectoredExceptionHandler from non-standard modules. Legitimate use of VEH is rare outside debugging frameworks and language runtimes. An executable registering two VEH handlers early in its lifecycle is suspicious.

Detection: Hook or monitor RtlAddVectoredExceptionHandler in ntdll. Alert when the callback address is outside known legitimate modules.

Hardware Breakpoint Detection

Signal: Debug registers (Dr0-Dr3) contain non-zero values outside of a debugger session. Normal applications never set hardware breakpoints.

Detection: Periodically call NtGetContextThread on process threads and inspect debug registers. Non-zero Dr0-Dr3 values outside a debugger are highly anomalous. Note: LayeredSyscall clears breakpoints after each syscall, so timing is important.

Trap Flag Abuse

Signal: The EFlags TF bit (0x100) is set during normal execution. This bit is virtually never set outside of debugger single-stepping.

Detection: ETW providers or kernel callbacks that observe RFLAGS can flag processes with frequent TF-related exceptions. The sheer volume of EXCEPTION_SINGLE_STEP events is a strong signal.

Exception Frequency Analysis

Signal: Hundreds or thousands of EXCEPTION_SINGLE_STEP exceptions per syscall. Normal applications generate near-zero single-step exceptions. A process generating thousands per second is clearly doing something unusual.

Detection: Monitor exception dispatch rates via ETW or kernel instrumentation. Statistical anomaly detection on exception frequency per process.

Heuristic: Phantom API Calls

Signal: MessageBoxW (or another demo function) is called but never displays. The API enters its execution path but is abandoned mid-execution when the trap-flag trace redirects to the real syscall.

Detection: Correlate API call entry events with completion events. A MessageBoxW that starts but never creates a window is suspicious. Note: the demo function is configurable, so this specific heuristic can be evaded by changing it.

Detection VectorDifficultyReliabilityEvasion Possible?
VEH RegistrationLowMediumCould register from a DLL that looks legitimate
Hardware BP DetectionMediumLow (timing-dependent)BPs are cleared after each syscall
Trap Flag / TF AbuseMediumHighDifficult to avoid — fundamental to the technique
Exception FrequencyLowHighCannot be reduced without changing the technique
Phantom API CallsHighMediumChange demo function to something less conspicuous

Prior Art Comparison

LayeredSyscall builds on years of syscall evasion research. Here is how it compares to the major techniques:

Project SSN Resolution Syscall Location Call Stack Mechanism
Hell's Gate Stub opcodes Direct (in EXE memory) Anomalous Read SSN from ntdll stub bytes
SysWhispers3 Zw* sort order Indirect (jumps into ntdll) Anomalous Static jmp to syscall in ntdll
HWSyscalls HalosGate Indirect (in ntdll) Synthetic trampoline HW breakpoints + VEH
TamperingSyscalls Various Through hook Spoofed arguments HW breakpoints + VEH
WithSecure Spoofer N/A (separate) Various Fabricated frames UNWIND_CODE parsing
LayeredSyscall Exception Directory Indirect (in ntdll) GENUINE frames VEH + HW BPs + Trap Flag

Key Differentiator: Genuine vs. Fabricated Stacks

Hell's Gate and SysWhispers3 make no attempt at stack spoofing — their stacks are anomalous. HWSyscalls uses a synthetic trampoline (one frame). WithSecure fabricates multiple frames using UNWIND_CODE metadata, but the execution never actually passed through those frames. LayeredSyscall is unique in producing genuinely traversed call stack frames by actually executing a legitimate API (MessageBoxW) and hijacking its execution context at the right moment.

Evolution of SSN Resolution

Each project uses a different approach to resolve System Service Numbers:

Limitations

LayeredSyscall is a proof of concept, not production-ready malware. Understanding its limitations is as important as understanding its capabilities.

Cannot Wrap NtSetContextThread

This function modifies thread context, including the debug registers (Dr0-Dr7) that LayeredSyscall relies on. Wrapping it would create a circular dependency: the hardware breakpoints need to persist to intercept the syscall, but the syscall itself would overwrite them. Any tool that needs to call NtSetContextThread must do so outside the LayeredSyscall framework.

Performance Overhead

Each wrapped syscall generates hundreds to thousands of EXCEPTION_SINGLE_STEP exceptions during the trap-flag trace. Each exception involves a kernel-to-user transition, VEH dispatch, and handler execution. For a single syscall like NtCreateUserProcess this is acceptable, but wrapping performance-sensitive syscalls in a tight loop would be impractical.

x64 Only

The technique relies on x64-specific features: the syscall instruction (not sysenter), the x64 register-based calling convention (RCX, RDX, R8, R9), and 64-bit CONTEXT structure layout. Porting to x86 (WoW64) would require fundamental redesign.

Default Demo Function Fingerprinting

The default demofunction() is MessageBoxW. If defenders look for processes that call MessageBoxW but never create a message box window, this is detectable. However, the demo function is configurable — any API that eventually reaches ntdll would work. Using something more common (like a file operation API) would be harder to fingerprint.

No Build System in Repository

The repository does not include Visual Studio project files, CMakeLists, or a Makefile. Users must manually integrate the source files into an existing MSVC project with the correct settings (x64, C++17 or later, Windows SDK headers).

References & Further Reading

Core Projects

Call Stack Spoofing Research

Detection Resources

Course Complete

Congratulations!

You have completed the LayeredSyscall course. Over 8 modules, you learned:

Suggested Next Steps

Final Exam: Full Chain & Detection

Q1: In the complete 12-step chain, what happens between "Call real Nt* function" (Step 5) and "Save context & redirect" (Step 7)?

Step 6 is the Dr0 breakpoint firing when execution reaches the syscall instruction inside the Nt* stub. This generates EXCEPTION_SINGLE_STEP, which is caught by HandlerHwBp. The ntdll stub may have loaded RAX with the SSN, but the breakpoint fires at the syscall instruction itself, before it enters the kernel.

Q2: Which detection vector is hardest for LayeredSyscall to evade?

The massive number of EXCEPTION_SINGLE_STEP exceptions is fundamental to the technique and cannot be reduced. VEH registration can be hidden, hardware breakpoints are cleared between calls, and the demo function can be changed. But the trap-flag trace necessarily generates hundreds or thousands of exceptions per syscall.

Q3: How does LayeredSyscall's call stack differ from WithSecure's call stack spoofer?

WithSecure's approach parses UNWIND_CODE structures to construct plausible-looking stack frames without actually executing the corresponding code. LayeredSyscall calls a real API (MessageBoxW), lets the OS build genuine stack frames, then hijacks the execution context. The frames are real because code actually executed through them.

Q4: Why can't LayeredSyscall wrap NtSetContextThread?

NtSetContextThread writes to the thread's CONTEXT, which includes debug registers Dr0-Dr7. LayeredSyscall needs those registers active (Dr0 on syscall, Dr1 on ret) throughout the execution chain. If the syscall itself modifies those registers, the hardware breakpoints would be corrupted mid-execution.