FPGA Architecture and LUTs: How Programmable Hardware Actually Works

2026-04-29

You've now seen flip-flops, gates, counters, state machines, and memory. Every one of those lessons described circuits that get permanently etched into silicon in an ASIC. But what if you want to change your design after fabrication? That's what an FPGA (Field-Programmable Gate Array) gives you — reconfigurable hardware.

The core trick is the Look-Up Table (LUT). A LUT is just a small SRAM that implements any Boolean function by brute force. A 4-input LUT has 16 SRAM bits — one for every possible combination of four inputs. Instead of building AND, OR, XOR gates from transistors, you simply program the truth table into memory. Any 4-input function — all 65,536 possible ones — fits in the same hardware.

Consider implementing F = (A AND B) OR (C XOR D). In an ASIC, you'd wire up specific gates. In an FPGA, you compute all 16 output values, store them in a LUT, and the inputs serve as the address lines. Input 0110? Read SRAM address 6. Done. The function evaluates in one LUT delay regardless of complexity.

Modern FPGAs (Xilinx 7-series, Intel Cyclone V) use 6-input LUTs — 64 SRAM bits each. These are grouped into slices or logic elements containing:

Slices tile into a grid connected by a programmable routing fabric — a mesh of wire segments and switch matrices, also controlled by SRAM bits. This routing typically consumes 80-90% of the FPGA's area and dominates your timing. A function might evaluate in a LUT in 0.3 ns, but the signal reaching the next LUT through routing can take 2-5 ns.

Rule of thumb: on a modern mid-range FPGA, budget roughly one LUT per gate-equivalent of logic, but expect only 50-70% utilization in practice. A Xilinx Artix-7 XC7A100T has 101,440 LUTs — sounds like a lot, but a soft-core RISC-V CPU consumes 2,000-5,000 LUTs, and peripheral logic adds up fast.

Beyond LUTs, FPGAs embed hard blocks — dedicated silicon for functions too expensive to build from LUTs: block RAM (36 Kbit chunks), DSP slices (multiply-accumulate units), PLLs, and sometimes PCIe or Ethernet MACs. Using these instead of LUT-based equivalents saves massive resources and runs far faster.

Real-world example: High-frequency trading firms use FPGAs to parse Ethernet frames and make trading decisions in under 1 microsecond — something no CPU can match. The entire network stack and decision logic is mapped into LUTs and hard blocks, running at wire speed with deterministic latency.

See it in action: Check out EEVblog #496 - What Is An FPGA? by EEVblog to see this theory applied.
Key Takeaway: An FPGA implements arbitrary logic by storing truth tables in small SRAMs called LUTs, making hardware as reprogrammable as software — but with nanosecond-level, deterministic execution that no processor can replicate.

All newsletters