Daily Low-Level Programming: The Indirect Branch Predictor and Spectre v2: Why Function Pointers Are a Side Channel

The Indirect Branch Predictor and Spectre v2: Why Function Pointers Are a Side Channel

2026-05-21

You already know the Branch Target Buffer caches where direct branches go. But what about indirect branches — call rax, virtual function dispatch, jump tables, function pointers? The CPU can't read the target from the instruction; it has to predict it from history. That predictor is the Indirect Branch Predictor (IBP), and it's the gun that fired Spectre v2.

The IBP keys on the branch's address (plus global history) and stores predicted targets. Crucially, on pre-2018 Intel hardware, the predictor table was shared across privilege levels and across hyperthread siblings. Two unrelated indirect branches that happened to alias to the same predictor entry would train each other.

The attack: An attacker in userspace finds an indirect branch in the kernel (say, a function pointer call in a syscall path). They:

Train the IBP from userspace by repeatedly executing an indirect branch at an address that aliases the kernel's branch, with the target pointing at a "gadget" inside the kernel — code like mov rax, [rdi]; mov rbx, [rax+rcx*8].
Make a syscall. The kernel hits its indirect branch, the IBP mispredicts to the gadget, and the CPU speculatively executes the gadget with kernel privileges before realizing the prediction was wrong.
The speculative loads leave footprints in the cache. The attacker measures cache timings to recover the leaked bytes.

The mitigation: retpolines. Instead of emitting call *%rax, the compiler emits a thunk that uses the Return Stack Buffer (which is per-thread and harder to poison) to redirect control:

call set_up_target    ; pushes return address
capture_spec:
  pause
  lfence
  jmp capture_spec    ; speculation trap
set_up_target:
  mov [rsp], %rax     ; overwrite return addr with real target
  ret                 ; RSB-predicted, lands in capture_spec speculatively

The CPU's speculative path goes nowhere useful (the pause; jmp trap), while the architectural path correctly returns to *rax. Newer CPUs have IBRS, IBPB, and STIBP — hardware controls to flush or partition the predictor on kernel entry and across hyperthreads.

Real-world cost: Linux's retpoline mitigation added roughly 5–25% overhead to syscall-heavy workloads in 2018. Network packet processing took the worst hit because every protocol dispatch is an indirect call. This is why CONFIG_RETPOLINE kernels were measurably slower, and why subsequent CPUs (Zen 3, Ice Lake) added eIBRS — "enhanced IBRS" that's always-on with near-zero cost, letting distros drop retpolines.

Rule of thumb: One indirect call costs ~1–2 cycles when predicted correctly, ~15–20 cycles on misprediction, and ~25–40 cycles when wrapped in a retpoline. C++ vtables, function-pointer dispatch tables, and JIT trampolines all pay this tax — devirtualization (LTO, PGO, final classes) isn't just about inlining, it's about removing predictor pressure.

Key Takeaway: The indirect branch predictor is shared mutable state that crosses privilege boundaries — Spectre v2 turned a performance optimization into a kernel memory disclosure primitive, and we still pay for that mistake on every function pointer call.

All newsletters