Daily Hardware Architecture: Speculative Execution and Its Security Implications

Speculative Execution and Its Security Implications

2026-04-25

You already know CPUs speculate — they predict branches and execute ahead. But in January 2018, the world learned that speculation leaves observable side effects even when the CPU rewinds incorrect work. This is the foundation of Spectre and Meltdown, and it fundamentally changed how we think about hardware security.

The core problem: When the CPU speculatively executes an instruction that should have been squashed, it rolls back architectural state (registers, memory writes) perfectly. But it does not roll back microarchitectural state — specifically, cache contents. An attacker can measure cache timing to recover data that was transiently accessed during misspeculation.

Meltdown (CVE-2017-5754) exploited the fact that on Intel CPUs, a userspace load from kernel memory would succeed speculatively before the permission check retired and triggered a fault. The transient instruction could use the illegally-read byte as a cache line index, encoding the secret into cache state. A subsequent flush+reload timing attack recovers the byte. One byte at a time, you dump kernel memory from userspace at roughly 500 KB/s on affected hardware.

Spectre (CVE-2017-5753, 5715) is subtler and harder to fix. Variant 1 (bounds check bypass) mistrains the branch predictor so the CPU speculatively executes past an array bounds check, reading out-of-bounds data and encoding it via a cache side channel. Unlike Meltdown, this works within the same privilege level — a JavaScript JIT engine in your browser could read other tabs' memory.

The mitigations reveal the performance cost of security:

KPTI (Kernel Page Table Isolation): Fixes Meltdown by unmapping kernel pages in userspace page tables. Every syscall now requires a full page table switch. Cost: 5–30% on syscall-heavy workloads (databases, I/O-bound apps).
Retpolines: Replace indirect branches with a return-based trampoline that the branch predictor can't mistrain. Cost: 5–10% on workloads with frequent indirect calls (C++ vtables, interpreters).
Microcode updates: Intel added IBRS/IBPB instructions to flush or restrict branch predictor state across privilege boundaries.

Rule of thumb: A cache-timing side channel can leak approximately 1 bit per cache line probe. With a ~40ns L1 miss vs ~4ns L1 hit distinguishability window and 256 probes per byte (one per possible value), an optimized Spectre gadget leaks roughly one byte per 10 microseconds — slow compared to DMA, devastating compared to "impossible."

Modern CPUs (Intel Golden Cove, ARM Cortex-X3+) now include hardware mitigations: separate branch predictor namespaces per privilege level, speculative store bypass disable, and eager permission checks that block Meltdown-class reads before they reach the cache. The performance tax is baked into silicon now — your CPU is permanently ~5% slower than an alternate timeline where nobody published these attacks.

See it in action: Check out Foreshadow: Breaking the Virtual Memory Abstraction with Speculative Execution - Duo Tech Talk by Duo Security to see this theory applied.

Key Takeaway: Speculative execution creates a gap between what the CPU should have done and what it did do transiently — and cache timing turns that gap into a data exfiltration channel, forcing permanent tradeoffs between performance and isolation.

All newsletters