2026-06-05
You've seen MONITOR/MWAIT — the ring-0 instructions that let an idle core sleep until a cache line is written. But what if a user-space thread wants to wait for a memory location without burning a CPU with PAUSE loops? Before 2019, you couldn't: MWAIT faulted in ring 3. Intel's Tremont/Tiger Lake added UMONITOR, UMWAIT, and TPAUSE — the user-mode versions.
How it works. UMONITOR rax arms an address-range monitor on the cache line containing [rax]. UMWAIT ecx then halts the logical core until one of three things happens: (1) a write touches the monitored line, (2) the TSC deadline in edx:eax expires, or (3) an interrupt arrives. The ecx register picks the C-state hint — bit 0 clear means C0.2 (deeper sleep, ~50µs wake latency, lets the sibling hyperthread run faster); bit 0 set means C0.1 (shallow, ~1µs wake). TPAUSE is the same but without the monitor — just a timed nap.
The OS controls the ceiling. The IA32_UMWAIT_CONTROL MSR caps how long user code can sleep (default ~100µs on Linux). Exceed it and UMWAIT returns early with CF=1. This prevents a malicious thread from parking forever on a core the scheduler wants back.
Real-world example: DPDK polling. A DPDK worker thread polls an RX ring descriptor for new packets. The classic loop is while (!desc->done) _mm_pause(); — which burns 100% CPU and shows as a fully loaded core in top. With UMWAIT:
umonitor on &desc->doneumwait with a 5µs deadline and C0.2 hintPower draw drops 30-50% on idle cores, the sibling hyperthread regains ~15% throughput, and packet-arrival latency stays under 10µs. Intel's own measurements on Sapphire Rapids show C0.2 saves ~1.5W per core at idle versus a PAUSE spin.
Rule of thumb. If your spin-wait expects to wait longer than a cache miss but shorter than a syscall (roughly 100ns to 50µs), UMWAIT with C0.1 beats both PAUSE-spinning and futex(). Below 100ns, just PAUSE-spin — the C-state transition costs more than you save. Above 50µs, go to the kernel.
Check CPUID.7.0:ECX[5] (WAITPKG) before using; AMD didn't ship this until Zen 5.
