Two threads communicating across weak and strong memory models (non-cache-coherent PCIe), why a read barrier is needed?

2026-05-11

Stack Overflow: View Question

Tags: c, x86, volatile, memory-barriers, pci-e

Score: 10 | Views: 237

The asker has two threads coordinating through a shared memory-mapped flag: a RISC-V core embedded behind a PCIe link, and an x86-64 host. Each side writes 1 to signal readiness and spins reading the other's word. The puzzle is why the x86 side appears to need an explicit read barrier despite x86's famously strong TSO model, where loads are not reordered with other loads.

The interesting part is that the usual mental model — "x86 is strongly ordered, so I only need volatile for MMIO" — quietly breaks down once you cross a PCIe boundary into a region that is not part of the coherence fabric. The CPU's memory-ordering guarantees are about how its own core observes operations relative to other coherent agents. They say nothing about how a PCIe endpoint's writes propagate up through the root complex, posted-write buffers, and into the host's view.

The direction toward a clean answer:

  1. Check the MTRR/PAT type of the BAR mapping on the x86 side. UC gives strict ordering per access; WC does not. This dominates the discussion.
  2. On x86, mfence (or a locked op) before the spin-load forces drain of the store buffer and write-combining buffers and prevents speculative reads from being satisfied stale.
  3. On RISC-V, the device side genuinely needs fence ow,ow (or similar) between data writes and the flag write, because RVWMO will happily reorder them. volatile alone is insufficient — it gives per-access atomicity from the compiler's view, nothing about the hardware.
  4. Beware that volatile is a compiler contract only; neither side gets any cross-agent ordering from it.

Gotcha worth flagging: even with correct barriers, a write from the device may sit in an intermediate switch's posted-write queue. The canonical "flush" trick is for the writer to issue a read back from the same region after the flag write — a non-posted transaction that cannot complete until the prior posted write has drained.

The challenge: x86's strong memory model is a per-core, cache-coherent guarantee — it doesn't extend across a PCIe boundary, so reasoning that works for SMP threads silently fails when one "thread" is actually a device behind a posted-write fabric.

All newsletters