2026-06-01
You get a Slack message at 9am: "the box was hammered at 3am, what happened?" You SSH in, run top, see normal load, and stare blankly. The moment is gone. Unless you installed atop — in which case the moment is sitting in /var/log/atop/, waiting for you to replay it.
atop is a top-like process monitor with one trick that changes everything: it runs as a daemon and writes a compact binary snapshot of CPU, memory, disk, network, and every process to disk every 10 minutes. Months of history fits in a few hundred MB. You can scrub through time like a video.
Install and enable the recorder:
sudo apt install atop
sudo systemctl enable --now atop # the recorder
sudo systemctl enable --now atopacct # process accounting: catches short-lived procs
Replay yesterday's logfile:
atop -r /var/log/atop/atop_20260531
# t step forward one interval
# T step backward
# b jump to specific time (e.g. b 03:00)
# m d n c switch to memory / disk / network / commandline view
# p sort by CPU; M by memory; D by disk
The killer feature: atop retains processes that died during the interval. When a runaway Python script gets OOM-killed at 3:07am and vanishes, normal monitoring loses it. atop shows it in red with an exit code, RSS at death, and the full command line.
Need a window around the incident?
atop -r /var/log/atop/atop_20260531 -b 02:55 -e 03:15
The parseable mode is where scripting gets fun. Each category emits fixed-field records, so awk just works:
# Top memory hogs between 2-3am yesterday
atop -r /var/log/atop/atop_20260531 -b 02:00 -e 03:00 -P PRM \
| awk '$1=="PRM" && $12 > 500000 {print $2, $NF, $12}' \
| sort -k3 -n -r | head
# Disk write bandwidth per process during incident
atop -r /var/log/atop/atop_20260531 -b 03:00 -e 03:15 -P PRD \
| awk '$1=="PRD" {print $NF, $10}' \
| sort | datamash -g 1 sum 2 | sort -k2 -n -r
Categories you can request with -P: CPU MEM DSK NET PRC PRM PRD PRN (process variants) and more. Mix them: -P "PRC PRM PRD".
For aggregate reports in the sar style — without sar's missing process detail — there's atopsar:
atopsar -A -r /var/log/atop/atop_20260531 -b 02:00 -e 04:00
atopsar -c -r atop_20260531 # cpu only
atopsar -m -r atop_20260531 # memory only
Why this beats the alternatives for single-host postmortems:
Tune /etc/default/atop: INTERVAL=60 for one snapshot per minute on critical boxes. Logs rotate daily and gzip after a week. Disk cost: trivial. Forensic value when something breaks at 3am: enormous.
