Tool Nobody Knows: flock: Mutex Locks That Survive in Shell Scripts

flock: Mutex Locks That Survive in Shell Scripts

2026-05-05

You write a backup script. You drop it in cron. Six months later, the database is so big the script takes 90 minutes, but cron fires it every hour. Now you have two — then three — copies racing each other, corrupting the tarball, and OOMing the box at 3am.

Junior fix: a PID file with echo $$ > /tmp/lock and a trap. It mostly works. It also leaks stale locks when a process is kill -9'd, races on creation, and has subtle bugs you'll discover in production.

The right answer has been sitting in util-linux since forever: flock(1). It uses kernel-level flock(2) advisory locks, which the kernel automatically releases when the holding process dies. No stale locks. No races. Roughly 8KB of C.

The cron pattern

# Run at most one instance. If another is running, exit silently.
* * * * * flock -n /var/lock/backup.lock /usr/local/bin/backup.sh

-n means non-blocking: if the lock is held, exit immediately with status 1. No queue buildup, no log spam.

Inside a script (subshell form)

#!/bin/bash
(
  flock -x -w 10 200 || { echo "locked, giving up"; exit 1; }
  # critical section — only one process can be here
  rsync -a /data/ /backup/
) 200>/var/lock/sync.lock

File descriptor 200 is opened by the redirection; flock locks that fd. When the subshell exits, the fd closes and the kernel drops the lock — even on kill -9, panic, or power loss.

Self-locking script idiom

#!/bin/bash
# Re-exec self under flock if not already locked
[ "${FLOCKER}" != "$0" ] && exec env FLOCKER="$0" flock -en "$0" "$0" "$@"
# ... rest of script runs with $0 locked ...

The script uses itself as the lockfile. No /var/lock hygiene. No cleanup. The FLOCKER env var prevents infinite re-exec.

Shared vs exclusive locks

Most people forget flock supports reader/writer semantics:

flock -s lockfile cat data.db    # shared (multiple readers OK)
flock -x lockfile rebuild-index  # exclusive (blocks readers)

Combine with -w 30 for a 30-second timeout, or -E 75 to set a custom exit code when -n can't acquire — distinguishes "lock held" from real errors in your monitoring.

Why this beats the alternatives

vs PID files: No stale-lock recovery code. The kernel handles death.
vs mkdir as mutex: mkdir works but leaks on kill -9. flock can't.
vs lockfile(1) from procmail: flock is in util-linux, ships everywhere, simpler semantics.
vs systemd timers with OnUnitActiveSec: works on any box, including the AIX relic in the corner.

One subtle gotcha

flock locks are per-fd, not per-file. If your script opens the same lockfile twice in two redirections, it gets two separate locks. And NFS support depends on your server — for cross-host coordination, use a real lock manager. For everything on a single box, flock is the answer.

Check it: flock --help. It's already installed. You just never read the man page.

Key Takeaway: Stop reinventing PID-file locking — flock gives you race-free, kernel-cleaned mutex locks in one line of shell.

All newsletters