Daily Low-Level Programming: The CPUID Instruction: How Your Software Discovers What CPU It's Running On

The CPUID Instruction: How Your Software Discovers What CPU It's Running On

2026-06-07

CPUID is the x86 instruction that lets software interrogate the CPU about itself: vendor, family, supported features, cache topology, hypervisor presence. Every libc, every JIT, every cryptography library calls it at startup. It's also one of the slowest "simple" instructions you can execute.

The mechanics. You load a leaf number into EAX (and optionally a subleaf into ECX), execute CPUID, and the CPU writes results into EAX/EBX/ECX/EDX. Leaf 0 returns the maximum supported leaf and the vendor string ("GenuineIntel" or "AuthenticAMD") packed across EBX/EDX/ECX. Leaf 1 gives family/model/stepping plus the famous EDX feature bits (SSE2, MMX, etc.). Leaf 7 (with subleaves) reports AVX-512 variants, BMI, SHA extensions. Leaf 0x80000008 reports physical address bits — critical for kernel page-table setup.

Why it's slow. CPUID is a serializing instruction: it drains the entire pipeline, retires every in-flight instruction, and prevents any speculation past it before completing. On modern Intel/AMD, it costs 200–600 cycles, comparable to a cache miss to DRAM. That's why RDTSCP exists — to avoid the CPUID/RDTSC pair people used to write for serialized timing.

Real-world example. When glibc's dynamic loader picks which memcpy implementation to use (the IFUNC mechanism), it calls CPUID at process startup, checks for AVX2 or AVX-512, then patches the PLT to point at the chosen variant. After that one call, every memcpy dispatches with zero overhead. OpenSSL does the same for AES-NI, SHA-NI, and VAES. If you've ever wondered why a statically linked binary works on every x86-64 box despite using AVX-512 — this is why: runtime dispatch via CPUID.

The hypervisor twist. CPUID always causes a VM exit under virtualization (it's unconditionally trapped). KVM/VMware intercept it to lie about feature support — pinning a VM to "Haswell" features even on a Skylake host so live migration works. This means inside a VM, CPUID can cost 3,000+ cycles. Cache the results.

Rule of thumb. Call CPUID exactly once per feature, at startup, and cache every bit you'll ever check into a global. If your hot path branches on cpuid_has_avx2() and that function re-executes CPUID, you've turned a 1-cycle branch into a 500-cycle pipeline drain.

The cpuid Linux userspace tool (cpuid -1) dumps every leaf — handy when debugging why your AVX-512 code path isn't getting picked on a new CPU.

See it in action: Check out Find Your Processor, Ram, Motherboard and other System Specs by ENGINEERING SPIRIT to see this theory applied.

Key Takeaway: CPUID is a full pipeline-serializing instruction costing hundreds of cycles (thousands under virtualization), so software calls it once at startup, caches the results, and uses runtime dispatch (IFUNC) to bind hot paths to CPU-specific implementations without per-call overhead.

All newsletters