Daily Digital Circuits: Arithmetic Circuits: How Hardware Adds Numbers

Arithmetic Circuits: How Hardware Adds Numbers

2026-04-24

Every time your CPU executes a + b, it's not running an algorithm — it's propagating signals through a physical circuit. Understanding how hardware adds numbers reveals one of the deepest tensions in digital design: speed vs. area.

The Half Adder and Full Adder. A half adder takes two 1-bit inputs (A, B) and produces a Sum and Carry. It's just two gates: Sum = A XOR B, Carry = A AND B. But real addition needs to chain — bit 3 depends on the carry from bit 2. A full adder adds a third input, Carry-in, giving us: Sum = A XOR B XOR Cin, Cout = (A AND B) OR (Cin AND (A XOR B)). That's your fundamental building block.

Ripple Carry Adder: Simple but Slow. Chain 32 full adders together and you get a 32-bit ripple carry adder. Each stage waits for the carry from the previous stage. If one full adder has a propagation delay of ~2 gate delays for carry, a 32-bit add takes 64 gate delays in the worst case (e.g., adding 1 to 0xFFFFFFFF). At a 1 GHz clock with ~0.1 ns gate delay, that's 6.4 ns — you've already blown your clock period of 1 ns. This circuit cannot run at 1 GHz.

Carry Lookahead: Trading Area for Speed. The insight is that you can precompute carries in parallel. For each bit position, define two signals: Generate (G = A AND B, this bit definitely produces a carry) and Propagate (P = A XOR B, this bit passes an incoming carry through). Then:

C1 = G0 OR (P0 AND C0)
C2 = G1 OR (P1 AND G0) OR (P1 AND P0 AND C0)
C3 = G2 OR (P2 AND G1) OR (P2 AND P1 AND G0) OR (P2 AND P1 AND P0 AND C0)

Each carry is now computed in just 3 gate delays regardless of width — but look at how the logic fans out. A 4-bit CLA group is practical; you then cascade groups hierarchically. A 32-bit CLA adder completes in roughly 8-10 gate delays instead of 64.

Rule of thumb: A ripple carry adder is O(N) delay and O(N) area. A carry lookahead adder is O(log N) delay but O(N log N) area. In modern FPGAs, the carry chain is actually hardwired into the silicon — Xilinx slices have dedicated fast carry logic (the CARRY4 primitive) that gives you near-CLA speeds with ripple-style simplicity.

Real-world impact: In a pipelined CPU, the adder in the ALU must complete within a single clock cycle. Intel's Pentium 4 used a Kogge-Stone adder variant (a parallel-prefix adder) to hit aggressive clock targets. When your compiler emits an ADD instruction, this is the physical circuit that resolves it in under a nanosecond.

See it in action: Check out How does an arithmetic hardware work? by MMLAB-HKU to see this theory applied.

Key Takeaway: Hardware addition is fundamentally limited by carry propagation — every major adder architecture is a different answer to the question "how do we compute carries faster?"

All newsletters