Daily Hardware Architecture: Instruction Encoding and Decoding: How CPUs Read Their Own Language

Instruction Encoding and Decoding: How CPUs Read Their Own Language

2026-04-28

Before a CPU can execute anything, it must figure out what the bytes in memory actually mean. This is the job of the instruction decoder — one of the most complexity-divergent components between RISC and CISC architectures.

Fixed-width vs. variable-width encoding. ARM and RISC-V use fixed-width instructions (32 bits). x86 uses variable-width instructions ranging from 1 to 15 bytes. This single difference has enormous downstream consequences. With fixed-width encoding, the decoder knows exactly where each instruction boundary is: instruction N starts at address base + 4N. With variable-width encoding, you cannot find instruction N without first decoding instructions 0 through N−1. This is the instruction boundary problem, and it is why x86 decoders are power-hungry beasts.

How x86 solves it. Modern x86 chips (since the Pentium Pro, 1995) don't truly execute x86 instructions. The decoder translates each x86 instruction into one or more micro-ops (μops) — fixed-width, RISC-like internal operations. An Intel Golden Cove core has roughly 6 decoders, but only one "complex" decoder can handle instructions that produce more than one μop. The rest are "simple" decoders limited to single-μop instructions. There's also a μop cache (Intel calls it the DSB) that stores ~4,096 already-decoded μops, letting the CPU skip decoding entirely on hot loops. AMD's Zen 4 has a similar op-cache of ~6,750 entries.

Why this matters for performance. If your hot loop's decoded μops fit in the μop cache, you bypass the decoder bottleneck entirely — the front-end can deliver 6+ μops/cycle. Once you spill out of it, you're limited by decoder throughput. A practical rule of thumb: on x86, keep hot loops under roughly 1,500 instructions (or ~4K μops) to stay in the μop cache. You can verify this with perf stat -e idq.dsb_uops,idq.mite_uops — if MITE (legacy decode path) μops dominate, your loop is too big.

RISC-V's compressed extension (RVC) offers an interesting middle ground. It mixes 16-bit and 32-bit instructions, but the lowest two bits of every instruction encode its length. The decoder reads just those two bits to find the boundary — vastly simpler than x86's prefix-opcode-modrm-sib-displacement-immediate parsing state machine. This gives RISC-V roughly 25–30% better code density than pure 32-bit encoding while keeping decoder hardware simple.

Real-world impact: Apple's M-series chips benefit here. ARM's fixed-width encoding lets Apple build extremely wide decoders (8-wide decode on M1) with modest power. Intel needs substantially more transistors and energy to achieve comparable decode width on x86, which is one reason x86 mobile chips historically trail ARM in perf-per-watt.

See it in action: Check out The Fetch-Execute Cycle: What

#39;s Your Computer Actually Doing? by Tom Scott to see this theory applied.

Key Takeaway: Instruction encoding complexity directly determines decoder width, power cost, and front-end throughput — variable-length x86 pays a permanent silicon tax that fixed-width architectures avoid.

All newsletters