Difficulty: Intermediate

Module 7: The symbol<T> Template

Position-independent string access — how shellcode references data without absolute addresses.

Important Distinction

The symbol<T> template is for position-independent string access. It is not part of the API resolution system (which uses resolve::module, resolve::_api, and RESOLVE_IMPORT as covered in Module 6). symbol<T> solves a different problem: how does shellcode reference its own embedded string data when it doesn't know what address it was loaded at?

The String Problem

When you write a string literal in C/C++, the compiler places it in the .rdata (read-only data) section and generates code that references it using an absolute address. For a normal executable, this works fine — the loader maps the binary at its preferred base address, and everything lines up.

But shellcode has no loader. It gets injected at an arbitrary address. Those absolute references now point to garbage (or, more likely, an access violation).

C++// Normal code - compiler generates absolute address reference
const char* msg = "Hello";
// Compiled to something like: lea rax, [0x140003000]
// If shellcode is loaded at 0x200000 instead of 0x140000000... CRASH

// Shellcode needs: calculate the ACTUAL address of "Hello" at runtime
// regardless of where in memory the shellcode was loaded

The symbol<T> Implementation

Stardust's solution lives in common.h. The symbol<T> struct uses a function called RipData() whose own address serves as a known reference point. Because RipData() is a function, the CPU can find it via RIP-relative addressing (which works at any load address). The compile-time distance between RipData() and the string data is baked into the binary and never changes, regardless of where the shellcode is loaded.

C++// common.h - symbol<T> (simplified)
template<typename T>
struct symbol {
    // 's' holds the compile-time distance from RipData to the string
    uintptr_t s;

    // RipData() returns its own runtime address
    // The address of RipData itself is the anchor point
    static auto RipData() -> uintptr_t {
        return (uintptr_t)&RipData;
    }

    // To get the runtime string address:
    // runtime_string_addr = RipData_runtime_addr - compile_time_distance
    // Which is: &RipData - s
    auto get() -> T {
        return (T)( RipData() - s );
    }
};

Step-by-Step: The Math

Here's what happens when Stardust accesses a string via symbol<T>:

  1. At compile time: The linker calculates s = &RipData - &string_data. This is a fixed distance — it depends only on the relative layout of code and data in the binary, not on any absolute address.
  2. At runtime: The shellcode is loaded at some unknown base address. RipData() uses RIP-relative addressing to return its own actual address.
  3. The calculation: RipData() - s gives the actual runtime address of the string.

Compile-Time vs Runtime Address Translation

At Compile Time (base 0x1000)

string "Hello" at 0x1200
... other data ...
RipData() at 0x1500

s = 0x1500 - 0x1200 = 0x300

At Runtime (loaded at 0x7000)

string "Hello" at 0x7200
... other data ...
RipData() at 0x7500

RipData() - s = 0x7500 - 0x300 = 0x7200 ✔

The distance (0x300) stays constant — only the base address changes.

The G_SYM Macro

For convenience, Stardust provides a G_SYM macro that wraps common uses of symbol<T>. Instead of manually constructing a symbol<T> and calling .get(), you can use G_SYM as a shorthand to access global symbol data in a position-independent way.

C++// Instead of manual symbol<T> usage:
auto str = symbol<const char*>{ offset_value }.get();

// G_SYM provides a cleaner interface:
auto str = G_SYM( my_string );

Comparison: AceLdr's OFFSET Macro

AceLdr solves the same problem with its OFFSET macro. Despite the different syntax (C macro vs C++ template), the underlying principle is identical:

Stardust: symbol<T>
  • C++ template struct
  • Uses RipData() function address as anchor
  • Formula: &RipData - s
  • Type-safe via template parameter
AceLdr: OFFSET Macro
  • C preprocessor macro
  • Uses a known code label as anchor
  • Formula: known_addr - compile_time_distance
  • Cast manually by the caller

Both approaches rely on the same fundamental insight: the relative distance between two points in the binary is fixed at compile time. If you can determine the runtime address of one point (via RIP-relative addressing), you can calculate the runtime address of any other point by subtracting the known distance.

Key Takeaway

symbol<T> exists because shellcode cannot use absolute addresses for data access. It translates compile-time-known relative offsets into runtime-correct pointers using a simple anchor-point calculation. This is critical for accessing any embedded strings or data structures within the shellcode blob.

Knowledge Check

Q1: Why can't shellcode use string literals like normal programs?

Correct! The compiler places string literals in .rdata and references them with absolute addresses based on the binary's preferred base address. When shellcode is injected at a different address, those absolute references point to invalid memory. symbol<T> solves this by computing string addresses relative to a known anchor point at runtime.

Q2: In the symbol<T> calculation, what value stays constant regardless of where the shellcode is loaded?

Correct! The distance 's' is calculated at compile/link time based on the relative positions of RipData() and the string within the binary. This relative layout is fixed in the binary and doesn't change when the shellcode is loaded at a different address. That's why the formula RipData() - s always yields the correct runtime address.