Daily Digital Circuits: Banyan and Butterfly Networks: How Hardware Routes Packets Through Log(N) Stages Instead of N²

Banyan and Butterfly Networks: How Hardware Routes Packets Through Log(N) Stages Instead of N²

2026-05-17

A crossbar switch routes N inputs to N outputs with N² crosspoints — elegant but expensive. At N=64 that's 4,096 switches; at N=1024 it's a million. Multistage interconnection networks (MINs) trade non-blocking guarantees for dramatically less hardware: log₂(N) stages of small 2×2 switches, totaling (N/2)·log₂(N) switching elements.

The banyan network is the canonical example. Each stage contains N/2 little 2×2 switches (a "bar" state passes inputs straight through; a "cross" state swaps them). A packet carries a destination address, and at stage k, the switch routes based on bit k of that address: 0 → upper output, 1 → lower output. After log₂(N) stages, the packet has been steered to its destination by self-routing — no central scheduler, no routing tables. The hardware is the algorithm.

The butterfly network is topologically equivalent but wired with the recursive "perfect shuffle" pattern that gives the FFT its name. Same switch count, same self-routing property, different physical layout — butterflies often map better to 2D silicon because the wire crossings are more uniform.

The catch: banyans are blocking. Two packets headed to different destinations can still collide on a shared internal link. A classic example: at N=8, an input on port 0 destined for output 4 and an input on port 1 destined for output 5 both need the same middle-stage wire. One packet wins, the other stalls or drops. Crossbars never have this problem — that's what you're paying for.

Fixes: A Beneš network doubles the stages (2·log₂(N) − 1) and becomes rearrangeably non-blocking — any permutation is routable with the right switch settings, but you need a global scheduler. A Clos network uses three stages of larger crossbars and is strictly non-blocking if the middle stage has enough fanout. Modern datacenter spine-leaf fabrics are essentially folded Clos networks.

Real-world example: The NVIDIA NVSwitch chip uses a Clos-like topology to connect 8+ GPUs in a non-blocking all-to-all mesh. Ethernet datacenter fabrics (Arista, Broadcom Tomahawk) use leaf-spine Clos to scale to thousands of ports. Inside CPUs, the L3 cache interconnect on AMD's Infinity Fabric uses MIN-style routing rather than a full crossbar at high core counts.

Rule of thumb: Crossbars win below N≈16; MINs win above N≈64; in between, the answer depends on whether you can tolerate blocking. For N=1024, a banyan needs 5,120 switches vs. a crossbar's 1,048,576 — a 200× reduction in silicon, paid for in occasional packet collisions.

Key Takeaway: Multistage networks replace a crossbar's N² crosspoints with (N/2)·log₂(N) self-routing switches, trading guaranteed non-blocking behavior for massive silicon savings that scale.

All newsletters