Daily Digital Circuits: Barrel Shifters: How Hardware Shifts by N Bits in One Cycle

Barrel Shifters: How Hardware Shifts by N Bits in One Cycle

2026-05-05

You already know shift registers — they move data one bit per clock edge. But when your CPU executes x << 7, it doesn't burn seven cycles. It does the entire shift combinationally in one cycle. The circuit that pulls this off is the barrel shifter.

The naive approach — a giant N-to-1 mux per output bit — works but explodes in area. For a 32-bit shifter, each of 32 output bits would need a 32-input mux. That's roughly 1024 mux inputs. Painful.

The clever approach is the logarithmic barrel shifter. You decompose the shift amount into powers of two and build a stage per bit of the shift amount. For a 32-bit shifter, the shift amount is 5 bits: shamt[4:0]. You build five stages:

Stage 0: shift by 0 or 1, controlled by shamt[0]
Stage 1: shift by 0 or 2, controlled by shamt[1]
Stage 2: shift by 0 or 4, controlled by shamt[2]
Stage 3: shift by 0 or 8, controlled by shamt[3]
Stage 4: shift by 0 or 16, controlled by shamt[4]

Each stage is just a row of 2:1 muxes — one per bit. A shift of 13 (01101) cascades through stages: shift 1, no shift, shift 4, shift 8 — total 13. Five mux delays end-to-end.

Real-world example: ARM's classic three-operand instruction ADD R0, R1, R2, LSL #3 adds R1 to (R2 shifted left by 3) in a single cycle. The barrel shifter sits in series with the ALU input. This is why ARM assembly feels so expressive — the shifter is free in the datapath. The same block also handles rotates by feeding the shifted-out bits back to the empty positions.

Rule of thumb for cost: An N-bit logarithmic barrel shifter uses N × log₂(N) 2:1 muxes. For 32 bits: 32 × 5 = 160 muxes — versus ~1024 for the brute-force version. The delay is log₂(N) mux delays, so a 64-bit shifter only adds one more stage (six total) over a 32-bit one.

Watch out for two design subtleties: arithmetic right shifts must replicate the sign bit into vacated positions (extra mux input per stage), and funnel shifters — used in unaligned memory access and rotates — concatenate two registers before shifting, doubling the input width but keeping the same stage count.

See it in action: Check out Performing a logical shift by IGCSE Computer Science to see this theory applied.

Key Takeaway: A barrel shifter performs any shift amount in one cycle by cascading log₂(N) stages of 2:1 muxes, each shifting by a successive power of two.

All newsletters