2026-05-16
A single branch predictor is like a single weather forecaster — sometimes brilliant, sometimes wrong. A tournament predictor runs multiple predictors in parallel and learns which one to trust for each branch. The Alpha 21264 pioneered this in 1998, and modern CPUs still build on the idea.
The classic tournament design has three components:
for (i=0; i<10; i++) taken 9 times, not-taken once)if (x>0) {...} if (x>0) {...})The genius is that different branches have different personalities. A loop counter loves the local predictor — its own history tells you everything. A branch like if (error_handler_needed) deep in a call chain correlates with earlier branches and loves the global predictor. The chooser learns this distinction per branch without anyone telling it.
Real example — Alpha 21264: 1024-entry local history table (10 bits each), feeding a 1024-entry local predictor (3-bit counters). A 4096-entry global predictor indexed by 12-bit global history. A 4096-entry chooser table. Total: ~30 Kbits of prediction state. Reported ~94% accuracy on SPEC95 — beating either predictor alone by 2-4 percentage points.
Rule of thumb: on a modern 15-stage pipeline with 4-wide issue, each misprediction costs ~60 instruction slots. Going from 95% → 97% accuracy on a workload with 15% branches means roughly:
Mispredict reduction: (5% − 3%) × 15% × 60 = 0.18 wasted slots per instruction saved — translating to roughly 5-10% IPC improvement on branch-heavy code. That's why Intel, AMD, and Apple pour transistor budget into predictors that look like miniature neural networks (TAGE, perceptron predictors) — modern descendants of the tournament idea, using many tagged tables instead of just two.
The chooser itself uses a clever update rule: only update it when the two predictors disagree. If both got it right or both got it wrong, the chooser learns nothing useful. This keeps the chooser focused on the branches where the choice actually matters.
