2026-04-28
You know the ELF format and how the linker resolves symbols. But what actually happens between your shell calling execve() and your main() running? The kernel and the dynamic linker (ld-linux.so) do substantial work that's invisible unless you look.
Step 1: The kernel's job. When execve() fires, the kernel reads the first 128 bytes of your binary. It checks the ELF magic (0x7f ELF), then parses the program headers — specifically PT_LOAD segments. Each PT_LOAD tells the kernel: map this file offset at this virtual address with these permissions. A typical binary has two: one for .text (r-x) and one for .data/.bss (rw-). The kernel calls mmap() internally for each segment.
Step 2: The interpreter. If the ELF has a PT_INTERP segment (almost all dynamically-linked binaries do), the kernel doesn't jump to your binary's entry point. Instead, it loads the dynamic linker specified there (usually /lib64/ld-linux-x86-64.so.2) and jumps to its entry point. You can see this yourself:
readelf -l /bin/ls | grep INTERP — shows the interpreter pathLD_DEBUG=all /bin/ls 2>&1 | head -80 — watch the loader work in real timeStep 3: Dynamic linking at load time. The dynamic linker walks the DT_NEEDED entries in your binary's .dynamic section, performing a breadth-first load of shared libraries. For each library, it maps the PT_LOAD segments, then processes relocations — patching GOT entries and applying R_X86_64_GLOB_DAT, R_X86_64_JUMP_SLOT, and others. With lazy binding (the default), JUMP_SLOT relocations initially point to the PLT stub that calls back into the resolver on first use.
Step 4: Initialization. Before main(), the loader runs .init and .init_array functions in dependency order — leaves first, your binary last. This is where C++ global constructors execute and where __attribute__((constructor)) functions run. After initialization, the loader jumps to your binary's entry point (_start), which calls __libc_start_main, which finally calls main().
Real-world example: Preloading with LD_PRELOAD=./mymalloc.so ./app works because the loader inserts your library at the front of the symbol search order during Step 3. Your malloc shadows glibc's because the loader resolves symbols by searching libraries in load order.
Rule of thumb: Every shared library adds roughly 1–2ms of startup time on a cold cache (mapping + relocations). A binary linking 50 libraries pays ~50–100ms before main() executes. You can measure this precisely with perf stat -e task-clock -- /bin/ls and compare against a statically-linked equivalent.
execve() and main(), the kernel maps ELF segments into memory, the dynamic linker recursively loads and relocates shared libraries, and initialization functions run — a multi-stage pipeline you can observe and influence with LD_DEBUG, LD_PRELOAD, and readelf.
