2026-05-23
Every cycle, your CPU's scheduler picks several instructions to execute. Each one needs to read its source operands from the physical register file (PRF) before heading to an execution unit. Sounds simple — except the PRF is a giant SRAM array, and every simultaneous read requires its own dedicated read port: a separate set of wires, sense amps, and decoders running from the storage cells out to the bypass network.
The math is brutal. A modern wide core like Intel's Golden Cove can dispatch ~6 instructions per cycle. Most instructions need 2 source operands, so worst-case you need 12 read ports. Add 6 write ports for results and you have an 18-port register file. The area and power cost of an SRAM cell scales roughly with the square of port count — doubling ports quadruples the cell area, because each port adds two bitlines and a wordline transistor per cell.
Concrete example: a 6-port PRF cell might be ~3x the area of a 1-port cell. An 18-port cell? Closer to 30x. That's why register files dominate the floorplan of wide cores and burn surprising amounts of power — those bitlines toggle every cycle, and they're long.
So architects cheat in several ways:
Rule of thumb: if your CPU has N-wide dispatch with 2-source ops, you need ~2N read ports — but bypass typically absorbs 40-60% of reads, letting you build with ~N ports plus stall logic and still hit ~95% of peak throughput.
This is why "just make it wider" isn't free: doubling dispatch width more than doubles register file cost. It's also why Apple's M-series cores invest heavily in PRF design — their 8-wide decode demands extreme port engineering that most x86 designs avoid.
