2026-05-27
When a branch predictor guesses wrong, the CPU has typically already issued 100+ instructions down the wrong path. They've allocated physical registers, written rename map entries, and queued memory operations. The CPU must undo all of it in roughly one cycle. The mechanism that makes this possible is the checkpoint.
A checkpoint is a saved snapshot of speculation-critical state, taken at branches likely to mispredict. The key insight: you don't need to save everything. You need to save what can't be reconstructed from the reorder buffer (ROB) walk.
What gets checkpointed:
Two recovery strategies:
Concrete example: Intel Sunny Cove maintains ~16 checkpoints. At each predicted-taken branch with low confidence, it snapshots the RMT. If the branch mispredicts, recovery takes 1 cycle to restore the map plus a few cycles to drain the pipeline. Total mispredict penalty: ~16 cycles. Without checkpoints, the same penalty would balloon to 30+ cycles because of the serial ROB unwind.
Rule of thumb: the cost of a checkpoint is roughly (RMT entries × log₂(PRF size)) bits per snapshot. For a 32-entry RMT with a 256-entry physical register file: 32 × 8 = 256 bits = 32 bytes per checkpoint. Sixteen checkpoints = 512 bytes — small compared to a 4KB ROB.
The hidden constraint: you only have so many checkpoints. When they're exhausted, the CPU stalls fetch at the next low-confidence branch until one frees up at retirement. This is why branch confidence estimation matters as much as branch prediction itself — checkpoints are a scarce resource you allocate to risky bets.
