2026-06-08
Every load and store your CPU executes uses a virtual address. The DRAM only understands physical addresses. Somewhere between the load instruction and the cache, hardware has to translate one to the other — and it has to do it in roughly one clock cycle, or your pipeline stalls on every memory access.
The translation lives in the page table, a tree structure in memory. On x86-64 with 4-level paging, walking that tree takes four memory accesses. If every load triggered a page walk, you'd spend 5× more memory bandwidth on translation than on actual data. The TLB is the cache that prevents this.
A TLB is a small, fully- or set-associative CAM-like structure indexed by virtual page number, storing the corresponding physical page number plus permission bits (read/write/execute, user/supervisor, dirty, accessed). On a hit, translation completes in 1 cycle. On a miss, a hardware page walker (a dedicated state machine) traverses the page table and refills the TLB — typically 20–100+ cycles, longer if the walk itself misses in the data cache.
Real example — the cost of a TLB miss: An Intel Skylake L1 dTLB has 64 entries. Each entry maps a 4KB page, so the L1 dTLB covers exactly 64 × 4KB = 256 KB of virtual address space. Walk linearly through a 1 GB array and you'll thrash the dTLB on every page boundary. With ~25-cycle walks happening every 4 KB, you've added ~6 cycles of overhead per byte — and that's before you account for cache misses.
Rule of thumb: If your working set fits in (L1 dTLB entries × page size), translations are free. Above that, switch to huge pages (2 MB on x86, configured via madvise(MADV_HUGEPAGE) or hugetlbfs) — a single 2 MB entry replaces 512 4 KB entries, expanding reach 512×.
This is why database engineers obsess over huge pages: a hash table with random access patterns can spend 30–50% of its cycles in page walks if it overflows the STLB. Context switches make it worse — most TLB entries aren't tagged with a process ID (or use a small ASID space), so a switch flushes most of the structure and the next process pays a wave of walks.
