2026-05-05
When a CPU executes out-of-order, it desperately wants to issue loads early — they're the longest-latency operations and they gate everything downstream. But there's a problem: an older store sitting in the store queue with an unresolved address might write to the same location the load wants to read. Wait for every prior store address to resolve and you stall constantly. Issue the load anyway and you risk reading stale data. Memory disambiguation is the hardware that lets the CPU bet on the answer.
The naive approach — conservative disambiguation — blocks any load until all prior store addresses are known. Early out-of-order CPUs (P6, original Athlon) did this and paid for it. Modern designs use speculative disambiguation: predict that the load doesn't collide, issue it, and verify later.
The predictor. Intel's Memory Disambiguation Predictor (introduced in Core, ~2006) is a small table indexed by load PC. Each entry tracks whether that load has historically aliased with prior stores. If the prediction says "safe," the load issues speculatively past stores with unknown addresses. When the store address finally resolves, hardware checks the load/store queues for an actual collision.
Concrete example. Consider a memcpy loop where store [rdi+rcx] is followed by load [rsi+rcx]. The addresses come from different base registers and almost never alias, but the CPU can't prove that until both AGUs (address generation units) finish. A predictor that learns "this load never collides" lets the load run ~10 cycles earlier, which on a tight loop is the difference between 1 IPC and 3+ IPC.
Rule of thumb. Store-to-load forwarding works cleanly when the load is fully contained in a prior store of equal or larger size and aligned the same way. Partial overlap (e.g., 8-byte store followed by a misaligned 4-byte load straddling it) triggers a forwarding stall — typically 10–20 cycles on x86 — because the hardware has to wait for the store to retire to L1 before the load can proceed. This is why perf counters like ld_blocks.store_forward matter when tuning hot paths.
The security angle. Memory disambiguation mispredictions were the basis of Spectre v4 (Speculative Store Bypass): a load speculates past a store that would have overwritten the secret, reads the stale value, and leaks it through a cache side channel before the squash kicks in.
