The Shadow Register File: How CPUs Switch Contexts Without Spilling to Memory

2026-06-06

When an interrupt fires, the CPU faces a problem: the handler needs registers, but the interrupted program owns them all. The textbook answer is "push them to the stack." The hardware answer, on many architectures, is shadow registers — a second copy of the register file that the CPU swaps to in a single cycle.

The idea is brutally simple. Instead of one architectural register file, the core has two (or more) banks. A mode bit selects which bank the decoder reads and writes. On interrupt entry, the bit flips; the handler now sees a fresh set of registers while the interrupted program's values sit untouched in the other bank. On RETI, the bit flips back. No memory traffic, no cache pollution, no save/restore prologue.

Real-world example: ARM Cortex-M (and classic ARM modes generally). ARM7/9 had banked registers for FIQ — the Fast Interrupt mode swaps R8–R14 to a private bank. That's why FIQ is "fast": the handler can use seven scratch registers immediately without saving the user-mode versions. SPARC took it further with register windows: each function call rotates a window of 16 registers, so call/return is essentially free until you exhaust the window and trap to spill. The Zilog Z80 had an entire shadow set (AF', BC', DE', HL') swapped with the EXX instruction — a 1976 trick that interrupt handlers still loved a decade later.

The tradeoffs are real. Shadow registers cost die area linearly: doubling the file doubles the SRAM cells and read/write ports needed to reach them. They also complicate the rename hardware on out-of-order cores — which is why high-end x86 and modern ARM application cores don't use them. Instead they rely on the deep physical register file (200+ entries) and fast store buffers to make stack spills nearly free. Shadow registers are a microcontroller and DSP trick, where every cycle of interrupt latency matters and you can't afford a 200-entry PRF.

Rule of thumb: if your interrupt handler saves N registers to the stack at ~3 cycles each (store + pipeline pressure), shadow registers save you 2N + entry overhead cycles per interrupt. On a Cortex-M with 8 callee-saved registers and a 100 kHz interrupt rate, that's roughly 1.6 million cycles/sec reclaimed — about 1% of a 150 MHz part, entirely from not touching memory.

The deeper lesson: context-switch cost is a hardware design choice, not a law of nature. You can pay for it in area, in latency, or in software complexity — but somebody pays.

See it in action: Check out Buffer Overflow by Aaron Yoo to see this theory applied.
Key Takeaway: Shadow registers trade die area for interrupt latency by giving the CPU a second register bank to swap into instantly, which is why microcontrollers use them and out-of-order superscalars don't.

All newsletters