2026-05-02
You've used strace. You've run strace -p 1234 and watched syscalls scroll by like the Matrix. Then you got bored, hit Ctrl-C, and went back to adding print statements. That's because nobody showed you the options that make strace actually useful.
Trick 1: Find out why a program is slow. The -c flag gives you a syscall profiling summary — a histogram of where time is actually going:
$ strace -c -S time ls /var/log/
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- --------
62.14 0.000234 11 20 getdents64
18.03 0.000068 3 22 openat
8.49 0.000032 1 24 close
...
That -S time sorts by time spent instead of call count. Now you know your program isn't slow because of compute — it's making 20 getdents64 calls reading enormous directories.
Trick 2: Trace only what matters. Filtering by syscall category with -e trace=%net is far more useful than filtering by individual syscall name:
$ strace -e trace=%net curl -s https://example.com > /dev/null
socket(AF_INET6, SOCK_DGRAM, ...) = 3
socket(AF_INET, SOCK_STREAM, ...) = 5
connect(5, {sa_family=AF_INET, sin_port=htons(443), ...}) = -1 EINPROGRESS
The category filters are: %file (anything touching paths), %net (sockets), %process (fork/exec/exit), %signal, %memory (mmap/brk). These have existed since strace 4.17 and almost nobody uses them.
Trick 3: See what files a program actually opens. Not what it might open. What it did open, successfully:
$ strace -e trace=%file -z -o /tmp/files.log python3 my_script.py
$ grep openat /tmp/files.log | grep -v ENOENT
openat(AT_FDCWD, "/home/shaun/.config/myapp/settings.json", O_RDONLY) = 3
The -z flag (added in strace 5.2) suppresses syscalls that fail. No more wading through 400 lines of ENOENT from Python trying every possible module path.
Trick 4: Attach to all threads of a running process. Modern software is multithreaded. Plain -p misses child threads. Use -f with -p:
$ strace -f -p $(pidof nginx) -e trace=write -s 256 2>&1 | grep 'write(4,'
The -s 256 increases the string truncation limit from the pathetic default of 32 bytes so you can actually read the data being written.
Trick 5: Inject faults to test error handling. This is the one that makes people's jaws drop. Since strace 4.16, you can make syscalls fail on purpose:
$ strace -e fault=openat:when=3+:error=ENOSPC df -h
openat(AT_FDCWD, "/proc/filesystems", O_RDONLY) = 3
openat(AT_FDCWD, "/proc/self/mountinfo", ...) = 3
openat(AT_FDCWD, "/proc/self/mountstats", ...) = -1 ENOSPC (injected)
That when=3+ means "let the first two openat calls succeed, then fail every one after with ENOSPC." You're doing chaos engineering from userspace with zero code changes. Test how your app handles disk-full conditions, permission errors, or network timeouts without actually creating those conditions.
Trick 6: Timestamps that actually help. Use -r for relative timestamps (time since previous syscall) or -T for per-call duration:
$ strace -T -e trace=connect curl -s https://example.com > /dev/null
connect(5, {AF_INET, 93.184.216.34:443}, 16) = 0 <0.024183>
That <0.024183> at the end is wall-clock time spent inside the connect syscall — 24ms to TCP handshake. No tcpdump required.
-c for profiling, %net/%file category filters, -z to hide failures, and fault injection via -e fault=, it's a Swiss Army knife for understanding and testing any program's relationship with the kernel.
