Content-Addressable Memory (CAM): How Hardware Searches in One Cycle

2026-05-06

Regular RAM answers "what's at address 0x42?" — you give it a location, it gives you data. Content-Addressable Memory (CAM) answers the inverse: "which address holds the value 0xDEADBEEF?" You give it data, and it tells you where (or whether) that data lives. The magic: it searches every entry in parallel, in a single clock cycle, regardless of table size.

A CAM cell is essentially an SRAM cell with extra comparison logic — typically two pass transistors per bit that XOR the stored bit against a search line. If any bit mismatches, that row's match line gets pulled low. A row asserts "hit" only if every single bit matches. The match lines from all rows feed a priority encoder that returns the index of the matching entry.

Two flavors matter in practice:

Real-world example: Your home router's switch chip has a MAC address table — when a frame arrives with destination MAC aa:bb:cc:dd:ee:ff, the switch must figure out which port to forward to within nanoseconds. A BCAM with 8K entries does this lookup in one cycle. For Layer 3, a Cisco Catalyst's TCAM holds routes like 10.0.0.0/8 stored as 8 bits of "10" followed by 24 don't-cares. A 40 Gbps router doing 60 million lookups/sec literally cannot afford a tree-walk — only TCAM hits the latency budget.

The cost: CAM cells are huge and power-hungry. A typical 6T SRAM cell is ~6 transistors; a TCAM cell is 16 transistors (two SRAM cells plus comparison logic). Every search activates every match line simultaneously, so power scales linearly with table size.

Rule of thumb — power budget: A TCAM lookup burns roughly 1–3 nJ per search per Mbit of table. So a 40-Mbit core router TCAM doing 100M searches/sec dissipates 4–12 watts just for lookups — which is why high-end routers have dedicated TCAM cooling and why CPU vendors use multi-level hash tables instead of CAM for TLBs above L1.

Where you'll meet CAM as a software engineer: CPU TLBs (fully-associative L1 TLBs are tiny CAMs, ~64 entries), cache tag arrays in fully-associative caches, and any time you read "wire-speed lookup" in a networking datasheet.

See it in action: Check out RAM Vs. CAM by Nice Institutes to see this theory applied.
Key Takeaway: CAM trades transistor count and power for O(1) lookup latency — when you absolutely must find something in one cycle, you pay 16 transistors per bit to make every row compare itself in parallel.

All newsletters