2026-04-22
Sometimes the compiler isn't enough. You need a specific CPU instruction — a hardware CRC, an atomic compare-and-swap, a cache flush — and C doesn't expose it. You have two tools: inline assembly and compiler intrinsics. Knowing when to use each is what separates controlled precision from self-inflicted debugging nightmares.
Inline assembly embeds raw machine instructions inside C/C++ code. GCC and Clang use extended asm syntax:
uint64_t rdtsc_value;
asm volatile ("rdtsc; shlq $32, %%rdx; orq %%rdx, %%rax"
: "=a" (rdtsc_value) /* output: result in RAX */
: /* no inputs */
: "rdx" /* clobbers RDX */
);
This reads the CPU timestamp counter — something with no C equivalent. The critical parts of the syntax are:
=a): tell the compiler where results land. a = RAX, b = RBX, r = any register.volatile: prevents the compiler from reordering or eliminating the block.Get the clobber list wrong and the compiler will silently overwrite a live register. This is the single most common inline asm bug. Rule of thumb: if your inline asm block touches more than 3 registers, you're probably better off writing a standalone .S file.
Compiler intrinsics are the pragmatic alternative. They look like function calls but compile to specific instructions, and the compiler still understands the data flow — it can allocate registers, reorder safely, and optimize around them:
// x86 popcount via intrinsic
#include <immintrin.h>
int count = _mm_popcnt_u64(bitmask);
// ARM CRC32
#include <arm_acle.h>
uint32_t crc = __crc32cb(init, byte);
Real-world example: Linux's arch/x86/include/asm/bitops.h implements ffs() (find-first-set-bit) using the bsf instruction via inline asm. Doing this in pure C requires a loop or a lookup table — the single instruction is 10-20x faster on hot paths like scheduler bitmask scanning. At 1 GHz, bsf takes ~3 cycles (3 ns) versus a naive loop averaging 16 iterations at 1 cycle each (16 ns).
When to choose which:
wrmsr, invlpg), for precise instruction ordering the compiler must not touch, or when no intrinsic exists.Always compile with -S and inspect the output. Verify the compiler emitted exactly what you intended. Trust but verify.
-S.
