SMT Resource Partitioning: How Hyperthreading Splits a Core Between Two Programs

2026-06-04

Simultaneous Multithreading (Intel calls it Hyperthreading) lets one physical core pretend to be two logical cores. The trick: most of a core's execution resources sit idle most of the time. A second thread can use the slack. But "sharing" hides a brutal question: which structures get split, and how?

There are three partitioning strategies, and every structure in the core picks one:

The front-end alternates fetch between threads cycle-by-cycle (or by ICOUNT, picking whichever thread has fewer in-flight instructions). This is why a single-threaded benchmark on an SMT core often runs slightly slower than with SMT disabled — the fetch alternation steals cycles even when the other thread is idle, and competitively-shared caches get split between two working sets.

Concrete example: Run two memory-bound threads on one SMT core. Each thinks it has the full L1, but they share 32 KB. Effective per-thread cache is ~16 KB. If both working sets exceed 16 KB but fit in 32 KB, SMT can make both threads slower than running them sequentially. This is why HPC shops routinely disable SMT — their codes are tuned to use the whole cache.

Rule of thumb: SMT gives a 15–30% throughput boost when threads are diverse (one memory-bound, one compute-bound) and they don't fight for the same execution ports. It gives 0% or negative when threads are identical and already saturating one resource — two AVX-512 threads share one FMA unit and fight every cycle.

The killer detail: a thread that takes an L3 miss holds its ROB slots for ~200 cycles doing nothing. Without SMT, the core is idle. With SMT, the other thread keeps the back-end busy. SMT's real win isn't parallelism — it's latency hiding.

See it in action: Check out [2024] CPU Cores
amp; Threads Explained in 6 Minutes by Indigo Software to see this theory applied.
Key Takeaway: SMT is a bet that two threads' stalls will overlap with each other's work — when their resource demands collide instead, you pay the partitioning cost without gaining the latency-hiding benefit.