Daily Hardware Architecture: Tournament Branch Predictors: How CPUs Combine Multiple Predictors to Beat Them All

Tournament Branch Predictors: How CPUs Combine Multiple Predictors to Beat Them All

2026-05-16

A single branch predictor is like a single weather forecaster — sometimes brilliant, sometimes wrong. A tournament predictor runs multiple predictors in parallel and learns which one to trust for each branch. The Alpha 21264 pioneered this in 1998, and modern CPUs still build on the idea.

The classic tournament design has three components:

Local predictor: tracks each branch's personal history (good for loops with stable patterns — for (i=0; i<10; i++) taken 9 times, not-taken once)
Global predictor: uses the global history of recent branches to predict (good for correlated branches — if (x>0) {...} if (x>0) {...})
Chooser (meta-predictor): a 2-bit saturating counter per branch that tracks which of the two predictors has been more accurate for this specific branch, and picks the winner

The genius is that different branches have different personalities. A loop counter loves the local predictor — its own history tells you everything. A branch like if (error_handler_needed) deep in a call chain correlates with earlier branches and loves the global predictor. The chooser learns this distinction per branch without anyone telling it.

Real example — Alpha 21264: 1024-entry local history table (10 bits each), feeding a 1024-entry local predictor (3-bit counters). A 4096-entry global predictor indexed by 12-bit global history. A 4096-entry chooser table. Total: ~30 Kbits of prediction state. Reported ~94% accuracy on SPEC95 — beating either predictor alone by 2-4 percentage points.

Rule of thumb: on a modern 15-stage pipeline with 4-wide issue, each misprediction costs ~60 instruction slots. Going from 95% → 97% accuracy on a workload with 15% branches means roughly:

Mispredict reduction: (5% − 3%) × 15% × 60 = 0.18 wasted slots per instruction saved — translating to roughly 5-10% IPC improvement on branch-heavy code. That's why Intel, AMD, and Apple pour transistor budget into predictors that look like miniature neural networks (TAGE, perceptron predictors) — modern descendants of the tournament idea, using many tagged tables instead of just two.

The chooser itself uses a clever update rule: only update it when the two predictors disagree. If both got it right or both got it wrong, the chooser learns nothing useful. This keeps the chooser focused on the branches where the choice actually matters.

See it in action: Check out am i top 1 ground? #gorillatag #gorillatagcomp by ItsDusted to see this theory applied.

Key Takeaway: Tournament predictors don't try to build one perfect predictor — they run specialists in parallel and learn per-branch which specialist to trust, exploiting the fact that different branches need fundamentally different prediction strategies.

All newsletters