2026-06-05
A 6-wide superscalar CPU theoretically needs 12 read ports on its register file (two operands per instruction). But each port adds wires, multiplexers, and area that scale roughly as O(ports²). A monolithic 12-port SRAM would be slower than the ALUs it feeds. So designers cheat: they bank the register file into smaller arrays, each with fewer ports, and pray that simultaneous reads land in different banks.
The trick: split 128 physical registers into, say, 4 banks of 32 registers each, where bank assignment is determined by low-order bits of the physical register number. Each bank has only 3 read ports instead of 12. Total read bandwidth stays at 12 reads/cycle — as long as no two instructions in the same cycle need operands from the same bank simultaneously.
When they do collide, that's a bank conflict, and one of the instructions has to wait a cycle. The scheduler tries to avoid this by tracking bank assignments during register renaming and steering allocations to balance the banks.
Real example: The Alpha 21264 (1998) was famous for taking this to an extreme. Its integer register file was duplicated into two clusters of 80 registers each, with each cluster having 4 read ports. Cross-cluster operand reads took an extra cycle. The compiler and scheduler worked together to keep dependent instructions in the same cluster. AMD's Zen cores use a similar banked-and-clustered approach for their FP register file — bank conflicts on the FP side are a measurable performance counter event.
Rule of thumb for port scaling: SRAM area scales as roughly (read_ports + write_ports)². Going from a 6-port to 12-port register file isn't 2× the area — it's closer to 4×, and the access time grows too. Banking into N banks with P/N ports each cuts area to roughly N × (P/N)² = P²/N — a 4-bank split saves ~75% of the area for the same aggregate bandwidth, at the cost of conflict stalls.
The downside is uneven utilization. If a hot loop happens to keep allocating registers that hash to bank 0, you get persistent conflicts even when banks 1–3 sit idle. Modern renamers include bank-aware allocation heuristics — preferring free physical registers in underused banks when the choice is otherwise arbitrary. It's the same idea as NUMA-aware memory placement, just at nanosecond scale inside a single core.
