Daily Hardware Architecture: Indirect Branch Target Predictors: How CPUs Guess Where Function Pointers Lead

Indirect Branch Target Predictors: How CPUs Guess Where Function Pointers Lead

2026-05-18

Conditional branches have it easy: there are only two possible targets, taken or not-taken. Indirect branches — virtual method calls, function pointers, switch tables, interpreter dispatch loops — can jump to any address the program computes at runtime. The CPU still has to predict the target address ~15 cycles before the actual computation finishes, or the entire pipeline stalls waiting on the load that computes it.

The naive solution is the Branch Target Buffer (BTB): a hash table indexed by branch PC that remembers the last target. This works fine for monomorphic call sites (always calls the same function) but collapses on polymorphic ones. Consider a JIT interpreter's main dispatch:

goto *handlers[opcode] — one indirect branch, hundreds of possible targets
A simple BTB just remembers the last opcode handler, so it mispredicts on virtually every instruction transition

Modern CPUs use an Indirect Target Predictor (ITTAGE) — the same TAGE concept applied to targets instead of taken/not-taken. The predictor hashes the branch PC with the global history of recently executed branches, indexes multiple tables tagged with different history lengths, and picks the longest matching entry. This catches correlated patterns: if the previous five branches went a certain way, the call probably goes to handler X.

For the bytecode interpreter case, this is transformative. The pattern LOAD; ADD; STORE repeated millions of times produces a recognizable history signature, so ITTAGE learns "after LOAD+ADD, the next dispatch goes to STORE handler." Andrei Frumusanu measured Apple's M1 hitting ~95% accuracy on interpreter dispatch, versus ~30% on older designs with just a BTB.

Rule of thumb: An indirect branch misprediction costs roughly 1 full pipeline flush — about 15–20 cycles on modern x86. If a virtual call site has N equally likely targets and uses a basic BTB:

Misprediction rate ≈ (N−1)/N
Average cost per call ≈ misprediction_rate × 18 cycles
For N=4 targets: ~13.5 cycles wasted per call, on top of the call's actual work

This is why devirtualization matters so much in C++ and Java JITs: replacing one polymorphic call with an inline cache (check type, branch to specific function) trades an indirect prediction for a much easier conditional prediction. It's also why -fno-plt and avoiding deep vtable hierarchies show up in hot-path optimization guides — every level of indirection is another indirect branch the predictor has to learn.

The defensive flip side: Spectre v2 (Branch Target Injection) attacks specifically poison the ITTAGE/BTB so victim code speculatively jumps to attacker-chosen gadgets. IBRS, IBPB, and retpoline all exist to flush or bypass this predictor across privilege boundaries.

Key Takeaway: Indirect branch prediction turns "where will this function pointer go?" into a pattern-matching problem, and getting it wrong costs a full pipeline flush — which is why devirtualization, inline caching, and Spectre mitigations all revolve around this one predictor.

All newsletters