2026-05-03
You already know how page tables and the TLB translate virtual addresses to physical ones. The standard page size on x86-64 is 4 KiB. This works fine for small programs, but becomes a bottleneck when your application maps gigabytes of memory — databases, JVMs, scientific computing, or any process with a large working set.
The problem is TLB capacity. A typical L1 dTLB holds 64 entries. At 4 KiB per page, that covers just 256 KiB of memory. If your working set is 8 GiB, you need 2 million page table entries, and the TLB can only cache a tiny fraction of them. Every TLB miss triggers a multi-level page table walk — potentially 4 memory accesses on x86-64 — which can cost 20–100 cycles depending on whether the page walk hits cache.
Huge pages fix this by using larger page sizes. x86-64 supports 2 MiB pages (using the Page Size bit in the Page Directory Entry, bypassing the final page table level) and 1 GiB pages (bypassing two levels). With 2 MiB pages, those same 64 TLB entries now cover 128 MiB — a 512x improvement in TLB reach.
Rule of thumb: If your working set exceeds TLB entries × page size (typically 64 × 4 KiB = 256 KiB for L1 dTLB), you are taking TLB misses in your hot path. Switching to 2 MiB pages raises that threshold to 64 × 2 MiB = 128 MiB.
Linux gives you three ways to use huge pages:
mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0). Requires pre-reserved pages via /proc/sys/vm/nr_hugepages./sys/kernel/mm/transparent_hugepage/enabled. Convenient but can cause latency spikes during compaction.Real-world example: Redis, when handling a 50 GiB dataset, can spend 5–10% of CPU time on TLB misses. Enabling huge pages for its heap often yields a measurable throughput improvement. However, Redis specifically warns against THP because the kernel's background compaction (khugepaged) causes copy-on-write amplification during BGSAVE — a forked child now copies 2 MiB chunks instead of 4 KiB. This is why explicit huge pages are often preferred over transparent ones for latency-sensitive workloads.
The tradeoff: Huge pages reduce TLB misses but increase internal fragmentation (wasting up to 2 MiB - 1 per allocation) and require physically contiguous memory, which becomes scarce on long-running systems. They also interact poorly with copy-on-write and memory overcommit.
