Hardware Transactional Memory: When CPUs Pretend Multiple Things Happened at Once

2026-05-09

Hardware Transactional Memory (HTM) lets a thread mark a region of code as a transaction — the CPU executes it speculatively, tracking every line read and written. If no other core touches those lines before commit, the transaction succeeds atomically. If anyone conflicts, the CPU rolls back all changes and you fall through to a software path. It's optimistic concurrency baked into silicon.

How the hardware actually does it:

The two flavors Intel shipped (TSX):

Concrete example: A hash table with a single lock. Under HLE, ten threads doing inserts to different buckets all elide the lock and run in parallel — zero contention because their write sets don't overlap. The moment two threads hit the same bucket, one aborts and retries under the real lock. Glibc's pthread_mutex used HLE on Haswell-era CPUs for exactly this reason.

Rule of thumb: Keep transactions under ~8KB of touched data and under ~10,000 cycles. Beyond that, abort rates from cache evictions and timer interrupts make the fallback path dominate, and you're slower than just taking the lock.

The cautionary tale: Intel disabled TSX via microcode on most CPUs after 2021 — it was the substrate for the TAA (TSX Asynchronous Abort) side-channel, leaking data across security boundaries during aborts. IBM's POWER and z/Architecture still ship HTM. The idea isn't dead, but x86's commercial run was cut short by its own speculation leaks.

See it in action: Check out I Mined Bitcoin with Pencil and Paper for 2 Hours by Data Slayer to see this theory applied.
Key Takeaway: HTM piggybacks on cache coherence to make optimistic critical sections nearly free under low contention — but transactions are best-effort, bounded by L1, and on x86 became collateral damage of the speculative-execution security era.

All newsletters