Daily Low-Level Programming: Position-Independent Code and the Global Offset Table

Position-Independent Code and the Global Offset Table

2026-04-24

Every shared library on your Linux system is compiled with -fPIC. This flag produces position-independent code — machine code that works correctly regardless of where in virtual memory it's loaded. Without PIC, the loader would need to patch every absolute address reference at load time (called text relocations), making the .text section unshare-able between processes and destroying the point of shared libraries.

PIC solves this with two key data structures: the Global Offset Table (GOT) and the Procedure Linkage Table (PLT).

The GOT is an array of pointers in the .got section. When PIC code needs to access a global variable, it doesn't embed the variable's absolute address. Instead, it loads the address from the GOT. On x86-64, this uses RIP-relative addressing:

mov rax, [rip + got_entry_offset] — load the address of my_global from the GOT
mov rbx, [rax] — dereference to get the actual value

The GOT itself lives in a writable data segment at a fixed offset from the code. The dynamic linker fills in the real addresses at load time. Since only the GOT gets patched (not the code), the .text pages stay clean and shareable via copy-on-write.

The PLT handles function calls with lazy binding. The first call to printf goes through a PLT stub that jumps to a GOT slot. Initially, that slot points back into the PLT, which calls the dynamic linker (_dl_runtime_resolve) to find the real printf address, patches the GOT, and jumps there. Subsequent calls go directly through the now-resolved GOT entry — one extra indirection, zero resolver overhead.

Real-world example: Run readelf -r /usr/lib/x86_64-linux-gnu/libc.so.6 | head -20 to see relocation entries. Each R_X86_64_GLOB_DAT is a GOT entry; each R_X86_64_JUMP_SLOT is a PLT/GOT pair. On a typical libc, you'll see hundreds of these.

Performance rule of thumb: Each PIC global variable access costs one extra load (GOT indirection) — roughly 1-4 cycles if the GOT entry is in L1 cache. For function calls, after the first lazy resolution, the PLT adds one indirect jump — about 1-2 cycles. On hot paths with tight loops, this matters. That's why executables compiled as non-PIE (before hardened defaults) could be measurably faster for global-heavy workloads. Modern distros use PIE executables anyway because ASLR security benefits outweigh the ~1-2% overhead.

You can disable lazy binding with LD_BIND_NOW=1 or linking with -z now. This resolves all symbols at load time — slower startup, but eliminates PLT overhead and hardens against GOT-overwrite attacks (the GOT can be marked read-only with -z relro).

See it in action: Check out How Global Offset Table (GOT)

amp; Procedure Linkage Table (PLT) Work — Dynamic Linking Explained by SystemDR - Scalable System Design to see this theory applied.

Key Takeaway: PIC trades one level of pointer indirection (through the GOT/PLT) for the ability to share library code pages across processes and load libraries at arbitrary addresses — the foundation that makes shared libraries and ASLR possible.

All newsletters