2026-04-21
Every function call creates a stack frame — a block of memory on the stack that holds local variables, saved registers, and the return address. Understanding how these frames are built and torn down is essential for debugging, writing assembly, and understanding exploits.
On x86-64 Linux (System V AMD64 ABI), the calling convention works like this:
CALL instruction.A typical function prologue and epilogue in x86-64 assembly looks like:
push rbp — save the caller's base pointer (8 bytes)mov rbp, rsp — establish new frame basesub rsp, N — allocate N bytes for localsmov rsp, rbp — deallocate localspop rbp — restore caller's frameret — pop return address into RIPReal-world example: When GDB shows you a backtrace with bt, it's walking the chain of saved RBP values. Each saved RBP points to the previous frame's base, and the return address sits at [RBP+8]. If a buffer overflow corrupts the saved RBP or return address, the backtrace breaks — and that's exactly how classic stack-smashing attacks work. The attacker overwrites the return address to redirect execution.
Windows x64 uses a different convention: the first four integer arguments go in RCX, RDX, R8, R9, and the caller must always reserve 32 bytes of "shadow space" on the stack even if fewer than four arguments are passed. This catches people when writing cross-platform assembly.
ARM64 (AArch64) passes the first eight arguments in X0–X7, uses X30 (LR) for the return address, and X29 (FP) as the frame pointer. The key difference: CALL on x86 pushes the return address onto the stack, while ARM's BL instruction stores it in a register. The callee must save LR if it makes further calls.
Rule of thumb for stack frame size: Each frame costs at minimum 16 bytes (saved RBP + return address on x86-64). A recursive function called 1,000 times deep with 48 bytes of locals per frame consumes at least (48 + 16) × 1,000 = 64 KB of stack. The default Linux thread stack is 8 MB, giving you roughly 125,000 frames at that size before a stack overflow. Always do this back-of-envelope math for recursive designs.
Compile with gcc -S -O0 to see unoptimized stack frames in the generated assembly. At -O1 and above, the compiler often omits the frame pointer (-fomit-frame-pointer), using RSP-relative addressing instead — faster, but harder to debug.
