Daily Low-Level Programming: Speculative Execution and Meltdown: Why CPUs Run Code That Shouldn't Run

Speculative Execution and Meltdown: Why CPUs Run Code That Shouldn't Run

2026-05-15

Modern CPUs don't wait for branches to resolve or permission checks to complete before executing instructions. They speculate: guess the outcome, run ahead, and roll back architectural state if wrong. The problem is that microarchitectural state — cache contents, branch predictor entries — doesn't roll back. That's the entire premise of Meltdown and Spectre.

Consider this Meltdown-style sequence on a pre-2018 Intel chip:

mov rax, [kernel_addr] — load from a forbidden kernel address
shl rax, 12 — multiply by 4096 (page size)
mov rbx, [user_array + rax] — index into a user-readable array

The first load should fault immediately. But the CPU's out-of-order engine doesn't check page permissions until retirement — instructions are still dispatched speculatively. By the time the fault is raised and the pipeline flushed, instructions 2 and 3 have already executed transiently, loading a specific cache line of user_array based on the secret byte. The fault discards register state, but the cache line stays warm.

Now the attacker catches the SIGSEGV, then times accesses to each of 256 possible cache lines in user_array. The fast one reveals the byte. Repeat for every kernel address you want.

The rule of thumb: a cache hit on modern hardware is ~4 cycles (~1 ns); a miss to DRAM is ~200–300 cycles (~80 ns). That 80x gap is more than enough to distinguish "this line was speculatively loaded" from "this line wasn't" — even through noise, over a few thousand trials.

The fix was KPTI (Kernel Page Table Isolation): unmap kernel pages from userspace page tables entirely, so the speculative load misses the TLB and returns garbage. Cost: every syscall now flushes/reloads the TLB on entry and exit. On syscall-heavy workloads (databases, network servers), KPTI added 5–30% overhead. PCID (Process Context IDs) reduced but didn't eliminate this.

Spectre v1 is worse because it exploits branch prediction, not permission checks. Train a predictor to take the "in-bounds" path, then pass an out-of-bounds index; the CPU speculatively executes the bounds-check-passing path with attacker-controlled data. No privilege boundary needed. Mitigations (lfence after bounds checks, retpolines for indirect branches) cost real performance, which is why spectre_v2=off tempts benchmark-chasers.

Practical takeaway: when you see array_index_nospec() in kernel source, or unexplained lfence in security-sensitive code, that's a speculation barrier. It forces the CPU to wait — surrendering ILP for safety.

See it in action: Check out I started by slaying a god with one punch, and then I awakened my double S-rank talent? by Your Manhwa Recap to see this theory applied.

Key Takeaway: Speculative execution rolls back registers but not caches, turning timing differences into a side channel that leaks data across privilege boundaries the architecture promised to protect.

All newsletters