2026-04-21
Every pointer you use in C is a virtual address. The CPU must translate it to a physical address before anything hits RAM. This translation happens through page tables — hierarchical lookup structures maintained by the OS and walked by hardware.
On x86-64, the standard is a 4-level page table (PML4 → PDPT → PD → PT). Each level indexes into a 4KB page of 512 entries (each entry is 8 bytes, 512 × 8 = 4096). A 48-bit virtual address is sliced into four 9-bit indices plus a 12-bit page offset:
Rule of thumb: A single page table walk costs 4 memory accesses. At ~100ns per DRAM access, an uncached translation costs ~400ns — roughly 1000 CPU cycles on a 2.5GHz processor. That's catastrophic if it happens on every load/store.
This is why the TLB (Translation Lookaside Buffer) exists. It's a small, fast cache of recent virtual-to-physical mappings. A typical L1 dTLB holds 64 entries and hits in 1 cycle. L2 TLB might hold 1536 entries with ~7 cycle latency. A TLB miss triggers a hardware page walk.
Real-world consequence: Suppose you're iterating over a 256MB array. With 4KB pages, that's 65,536 pages — far exceeding TLB capacity. You'll suffer constant TLB misses. Switch to 2MB huge pages (via mmap with MAP_HUGETLB or transparent huge pages), and the same array spans only 128 entries — comfortably fitting the TLB. Database engines like PostgreSQL expose huge page configuration for exactly this reason.
Each page table entry isn't just an address. It contains permission bits: present, read/write, user/supervisor, no-execute (NX). This is how the OS enforces memory protection. Writing to a read-only page triggers a page fault (interrupt 14 on x86), which the kernel handles — either killing the process, performing copy-on-write, or loading a page from swap.
When the OS context-switches between processes, it loads a new PML4 base address into the CR3 register. Historically, this flushed the entire TLB. Modern CPUs support PCID (Process-Context Identifiers) — a 12-bit tag that lets TLB entries from different address spaces coexist, avoiding costly flushes. This became critical after the Meltdown mitigation (KPTI) doubled the frequency of CR3 switches.
You can observe TLB behavior directly with perf stat -e dTLB-load-misses,iTLB-load-misses on your binary. If dTLB misses are high relative to total loads, huge pages or restructuring your access patterns will help more than any algorithmic optimization.
