The PLT and Lazy Binding: Why Your First Call to printf() Is Slower Than Your Second

2026-05-29

When your binary calls printf() from libc, the linker doesn't know where printf lives — libc could be loaded anywhere. The GOT holds the resolved address, but who fills it in, and when? The answer is the Procedure Linkage Table (PLT), and the trick is lazy binding: the address is only resolved the first time you call the function.

Each imported function gets a tiny PLT stub — three instructions. Your call printf@plt jumps to printf@plt, which does:

The dynamic linker walks the symbol tables, finds printf in libc, patches the GOT slot with the real address, then tail-jumps to it. Every subsequent call follows the patched GOT pointer directly — two instructions, no resolver.

Real-world example: profile a program that calls getenv() in a tight loop. The first iteration takes ~5,000 cycles (symbol lookup, hash table walk, GOT patch). Iterations 2–N take ~20 cycles. That's why microbenchmarks always warm up before timing — the first call measures ld.so, not your code.

Rule of thumb: first PLT resolution costs ~1–10 μs depending on library size. With ~500 imported symbols across glibc and friends, cold-start binding tax is roughly 500 × 2μs ≈ 1ms — invisible for daemons, painful for short-lived CLI tools. Run LD_DEBUG=bindings ./yourprog to watch every resolution happen in real time.

The security wrinkle: a writable GOT is an attacker's dream — overwrite the printf slot, get arbitrary code execution on the next call. Modern toolchains default to full RELRO (-Wl,-z,relro,-z,now): the linker resolves every symbol at load time and marks the GOT read-only. You lose lazy binding (slower startup), but the GOT becomes a useless target.

Check it: readelf -d ./prog | grep BIND_NOW tells you whether lazy binding is disabled. Production binaries should say yes.

Key Takeaway: The PLT turns every imported function call into an indirect jump through the GOT, with the dynamic linker patching the GOT slot on first use — so first calls are thousands of cycles slower than subsequent ones, unless full RELRO forces eager resolution at startup.

All newsletters