XSAVE and Extended Processor State: How the Kernel Saves Your AVX Registers Without Knowing They Exist

2026-06-07

Every context switch must save and restore CPU state. In 1985 that meant 8 general-purpose registers and the FPU. Today it means GPRs, x87, SSE (128b), AVX (256b), AVX-512 (512b × 32), MPX bounds, PKRU keys, and whatever Intel ships next quarter. The kernel can't be patched every time a new register set arrives — so x86 invented XSAVE, a self-describing save mechanism the OS uses without knowing what it's saving.

How it works. The CPU exposes its extended state as numbered components: bit 0 = x87, bit 1 = SSE, bit 2 = AVX, bit 5–7 = AVX-512, bit 9 = PKRU, etc. XGETBV tells you which components this CPU supports. The kernel calls XSAVE [mem], rax/rdx where the EDX:EAX mask selects components to save. The CPU writes each enabled component into a layout it documents via CPUID.0Dh, then sets a header bitmap recording what's there. XRSTOR reads the bitmap and reconstructs state — including components the kernel has never heard of.

The optimization that makes it usable. Saving 2.5 KB of AVX-512 state on every context switch would be ruinous. So XSAVE has variants:

Real-world example. A thread that never touches AVX-512 still pays for it if any thread on the machine does — the kernel must allocate a 2576-byte XSAVE area per task. Linux mitigates this with dynamic XSAVE state (kernel 5.16+): AVX-512 buffer is allocated lazily via ARCH_REQ_XCOMP_PERM when a process actually executes a ZMM instruction, triggering #NM and a per-task buffer realloc. Before this, an Intel paper measured ~9 µs added to fork() on Sapphire Rapids purely from zeroing the AVX-512 area.

Rule of thumb. XSAVE area size ≈ 576 + Σ(enabled component sizes). Quick reckoning:

Read it directly with cat /proc/self/status | grep x86_Thread_features_locked or CPUID.0Dh.EBX at runtime.

A subtle gotcha: between XSAVE and XRSTOR, the kernel may have modified the in-memory header bitmap. If you clear a bit, XRSTOR loads the architectural init value for that component — a clever way to zero registers without writing them.

Key Takeaway: XSAVE is a forward-compatible, self-describing state save mechanism so the OS can preserve registers across context switches without ever being patched to know they exist.

All newsletters