Daily Hardware Architecture: The Checkpoint and Recovery Mechanism: How CPUs Rewind Time After a Misprediction

The Checkpoint and Recovery Mechanism: How CPUs Rewind Time After a Misprediction

2026-05-27

When a branch predictor guesses wrong, the CPU has typically already issued 100+ instructions down the wrong path. They've allocated physical registers, written rename map entries, and queued memory operations. The CPU must undo all of it in roughly one cycle. The mechanism that makes this possible is the checkpoint.

A checkpoint is a saved snapshot of speculation-critical state, taken at branches likely to mispredict. The key insight: you don't need to save everything. You need to save what can't be reconstructed from the reorder buffer (ROB) walk.

What gets checkpointed:

Rename map table (RMT) — the logical-to-physical register mapping. This is the big one. Without it, you can't tell which physical registers belong to which architectural names after squash.
Free list head pointer — so registers allocated by wrong-path instructions can be released en masse.
Return address stack (RAS) top-of-stack — so call/return prediction doesn't drift permanently after a mispredict.
Global history register — branch predictor state used for the prediction itself.

Two recovery strategies:

ROB walk recovery — on mispredict, walk the ROB backward, un-renaming each instruction. Cheap in hardware (no checkpoint storage), but slow: walking 100 entries can take 20+ cycles. Used in older designs.
Checkpoint recovery — restore the saved RMT in a single cycle. Costs storage (each checkpoint is ~256 bytes for a 64-entry RMT), but recovery is near-instant. Used in Intel since Nehalem, AMD since Zen.

Concrete example: Intel Sunny Cove maintains ~16 checkpoints. At each predicted-taken branch with low confidence, it snapshots the RMT. If the branch mispredicts, recovery takes 1 cycle to restore the map plus a few cycles to drain the pipeline. Total mispredict penalty: ~16 cycles. Without checkpoints, the same penalty would balloon to 30+ cycles because of the serial ROB unwind.

Rule of thumb: the cost of a checkpoint is roughly (RMT entries × log₂(PRF size)) bits per snapshot. For a 32-entry RMT with a 256-entry physical register file: 32 × 8 = 256 bits = 32 bytes per checkpoint. Sixteen checkpoints = 512 bytes — small compared to a 4KB ROB.

The hidden constraint: you only have so many checkpoints. When they're exhausted, the CPU stalls fetch at the next low-confidence branch until one frees up at retirement. This is why branch confidence estimation matters as much as branch prediction itself — checkpoints are a scarce resource you allocate to risky bets.

See it in action: Check out 🔥Reborn 2,000 Years Ago, He Activates a Military System and Is Forced to Marry Four Wives at Start! by Bella's Comic Chronicles to see this theory applied.

Key Takeaway: Modern CPUs survive mispredictions in ~16 cycles instead of 30+ by snapshotting the rename map at risky branches and restoring it in a single cycle, but checkpoints are a scarce resource that gates how speculatively a CPU can run.

All newsletters