2026-05-28
When a device interrupts a guest VM, the classical path is brutal: the interrupt fires on a physical core, the hypervisor takes it, decides which guest it belongs to, and injects it via a VM exit — a full transition out of guest mode that costs 1,000–10,000 cycles. For a 100k IOPS NVMe device, that's millions of cycles per second burned just shuffling interrupts.
Posted interrupts (Intel VT-d, ARM GICv4) eliminate this. The CPU and IOMMU collaborate to deliver an interrupt directly into a running guest without ever exiting to the hypervisor.
The mechanism rests on two structures:
When a device fires an MSI, the IOMMU intercepts it. If the target vCPU is currently running on a physical core, the IOMMU does an atomic OR into the PIR bitmap, sets ON, and sends the notification vector to that exact physical core. The CPU, in guest mode, recognizes this special vector, atomically swaps PIR into its own virtual-APIC page's IRR, and delivers the interrupt as if it came from a virtual LAPIC — no VM exit.
If the vCPU isn't running (descheduled, on a different core), the IOMMU still ORs into PIR but doesn't notify. When the scheduler later resumes the vCPU, it checks ON, drains PIR, and the interrupt fires.
Concrete win: SPDK with vfio-pci passthrough on a Xeon. Without posted interrupts, an NVMe drive doing 500k IOPS generates 500k VM exits/sec — at ~3,000 cycles each, that's 1.5 billion cycles (≈0.5 of a 3GHz core) lost to exits alone. With posted interrupts enabled, those exits disappear and the same workload completes with the guest never leaving non-root mode. Measured: ~40% lower CPU usage in the host, ~15% lower guest latency p99.
Rule of thumb: any virtualized workload with >50k interrupts/sec per vCPU is exit-bound without posted interrupts. Check dmesg | grep "Posted-interrupts" on the host; verify with perf kvm stat looking for EXTERNAL_INTERRUPT exits dropping near zero.
The catch: PIR updates are atomic but unordered with respect to other guest memory. The CPU needs a special self-IPI handler in guest mode to drain PIR when entering — which is why posted interrupts require both VT-d and APICv (virtual APIC) support.
