2026-06-07
CPUID is the x86 instruction that lets software interrogate the CPU about itself: vendor, family, supported features, cache topology, hypervisor presence. Every libc, every JIT, every cryptography library calls it at startup. It's also one of the slowest "simple" instructions you can execute.
The mechanics. You load a leaf number into EAX (and optionally a subleaf into ECX), execute CPUID, and the CPU writes results into EAX/EBX/ECX/EDX. Leaf 0 returns the maximum supported leaf and the vendor string ("GenuineIntel" or "AuthenticAMD") packed across EBX/EDX/ECX. Leaf 1 gives family/model/stepping plus the famous EDX feature bits (SSE2, MMX, etc.). Leaf 7 (with subleaves) reports AVX-512 variants, BMI, SHA extensions. Leaf 0x80000008 reports physical address bits — critical for kernel page-table setup.
Why it's slow. CPUID is a serializing instruction: it drains the entire pipeline, retires every in-flight instruction, and prevents any speculation past it before completing. On modern Intel/AMD, it costs 200–600 cycles, comparable to a cache miss to DRAM. That's why RDTSCP exists — to avoid the CPUID/RDTSC pair people used to write for serialized timing.
Real-world example. When glibc's dynamic loader picks which memcpy implementation to use (the IFUNC mechanism), it calls CPUID at process startup, checks for AVX2 or AVX-512, then patches the PLT to point at the chosen variant. After that one call, every memcpy dispatches with zero overhead. OpenSSL does the same for AES-NI, SHA-NI, and VAES. If you've ever wondered why a statically linked binary works on every x86-64 box despite using AVX-512 — this is why: runtime dispatch via CPUID.
The hypervisor twist. CPUID always causes a VM exit under virtualization (it's unconditionally trapped). KVM/VMware intercept it to lie about feature support — pinning a VM to "Haswell" features even on a Skylake host so live migration works. This means inside a VM, CPUID can cost 3,000+ cycles. Cache the results.
Rule of thumb. Call CPUID exactly once per feature, at startup, and cache every bit you'll ever check into a global. If your hot path branches on cpuid_has_avx2() and that function re-executes CPUID, you've turned a 1-cycle branch into a 500-cycle pipeline drain.
The cpuid Linux userspace tool (cpuid -1) dumps every leaf — handy when debugging why your AVX-512 code path isn't getting picked on a new CPU.
