Daily Low-Level Programming: The IOMMU: Virtual Memory for Devices

The IOMMU: Virtual Memory for Devices

2026-05-23

You already know the MMU translates virtual addresses to physical for your CPU. The IOMMU (Intel VT-d, AMD-Vi, ARM SMMU) does the same thing for devices. Without it, a PCIe device performing DMA writes raw physical addresses to the memory bus — whatever address the driver programmed into its descriptor ring, the device hits directly.

That's dangerous for three reasons:

Security: A malicious or buggy device (or one whose firmware was compromised over Thunderbolt) can DMA-read your kernel's memory or overwrite credentials in RAM. Pre-IOMMU laptops were vulnerable to "DMA attacks" via FireWire/Thunderbolt in seconds.
Isolation: Passing a GPU to a VM requires guaranteeing the guest's driver can't program the GPU to DMA into the host's memory.
Addressing: A 32-bit legacy device can only emit 32-bit physical addresses. Without translation, it can't reach buffers above 4GB — you'd need bounce buffers (SWIOTLB).

The IOMMU sits between the device and memory, walking page tables keyed by the device's BDF (Bus:Device.Function) identifier. Each device — or each IOMMU group of devices that share a translation context — gets its own page table. When the NIC issues a DMA write to "address 0x1000", the IOMMU translates that I/O virtual address (IOVA) to a real physical page, exactly like the MMU does for CPU loads.

Real-world example: VFIO passthrough. When you assign a GPU to a QEMU guest with vfio-pci, the kernel programs the IOMMU so the GPU's view of memory is the guest's physical address space. The guest driver writes guest-physical addresses into the GPU's command ring; the IOMMU translates those into host-physical pages backing the guest's RAM. The GPU literally cannot touch anything outside that mapping — a hardware-enforced sandbox.

The cost: Every DMA now requires an IOTLB lookup, and misses walk a 4-level page table just like the CPU's TLB. A 100Gbps NIC pushing 14M packets/sec, each touching a different page, can saturate the IOTLB. Mitigations:

Rule of thumb: map DMA buffers in large, persistent regions, not per-packet. A pool of pre-mapped buffers reused via streaming DMA gives ~1 IOTLB miss amortized over thousands of packets.
Use hugepages for DMA buffers — one 2MB IOTLB entry replaces 512 4KB entries.
iommu=pt (passthrough mode) bypasses translation for trusted host devices while keeping it for assigned ones — common in performance-sensitive servers.

Check your groups with find /sys/kernel/iommu_groups/ -type l. Devices in the same group must be passed through together — they share a translation context and can DMA to each other.

See it in action: Check out DMA Controller: How Peripheral Devices Transfer Data to RAM by BitLemon to see this theory applied.

Key Takeaway: The IOMMU gives every device its own virtual address space, turning DMA from a "trust the device" gamble into hardware-enforced isolation — at the cost of an IOTLB that you must design your buffer strategy around.

All newsletters