Do any ARMv8 processors exhibit load buffering?

2026-05-15

Stack Overflow: View Question

Tags: arm, memory-model, armv8, relaxed-atomics

Score: 0 | Views: 74

The asker poses the classic load buffering (LB) litmus test: two threads each load one shared variable and then store to another. Under sequential consistency, at least one thread must observe its load before the other thread's store, so reading tmp0 == tmp1 == 1 is impossible. The ARMv7 memory model architecturally permits this outcome (a store can be reordered past an earlier load to a different address). The question is whether any real ARMv8 implementation actually exhibits LB in practice.

Why this is interesting: ARMv8 tightened the architectural memory model relative to ARMv7. While loads can still be hoisted past earlier loads/stores in the abstract machine, the requirement to forbid "out-of-thin-air" values and the introduction of multi-copy atomicity (writes become visible to all observers simultaneously) constrains the implementation space significantly. Even where the architecture permits LB, vendors may not exploit that latitude because:

A solution approach: The asker should look at empirical surveys rather than hope for a definitive vendor answer. The canonical resources are:

The empirical answer, last I saw it: plain LB (independent loads and stores, no dependencies) is occasionally observed on some out-of-order ARMv8 cores, while dependency-carrying variants (LB+data+data, LB+ctrl+ctrl) are not. The Apple M-series and large Cortex-X cores are the most likely candidates because their reorder windows are huge.

Gotchas: Don't conflate "ARMv8 forbids out-of-thin-air" with "LB is forbidden" — OOTA refers to value-fabrication through cyclic dependencies, not LB. Also, compiler reordering can produce LB even if the hardware wouldn't, so use inline assembly when running litmus tests. Finally, multi-copy atomicity (added in ARMv8) rules out IRIW, not LB.

The challenge: Bridging the gap between what the ARMv8 architecture permits and what real silicon actually exhibits requires empirical litmus testing across a fleet of microarchitectures — there's no clean "yes/no" answer in the spec.

All newsletters