hyperfine: Benchmarking CLI Commands Without Lying to Yourself

2026-04-29

You've done it. I've done it. We've all done it: time my_command, run it three times, squint at the numbers, and declare victory. This is not benchmarking. This is astrology with a terminal.

hyperfine is a command-line benchmarking tool written in Rust by David Peter (the same person who gave us fd and bat). It does what time does, except it also handles warmup runs, statistical outlier detection, multiple runs, shell spawning overhead calibration, and side-by-side comparison of commands. It turns vibes-based performance claims into actual data.

Install it:

# Ubuntu/Debian
apt install hyperfine

# macOS
brew install hyperfine

# Cargo
cargo install hyperfine

The simplest use — benchmark a single command with automatic run count:

$ hyperfine 'find . -name "*.py"'
Benchmark 1: find . -name "*.py"
  Time (mean ± σ):     124.3 ms ±   4.2 ms    [User: 38.1 ms, System: 85.9 ms]
  Range (min … max):   118.1 ms … 134.7 ms    23 runs

It automatically decides how many runs to do for statistical significance. But the real power is head-to-head comparisons:

$ hyperfine 'find . -name "*.py"' 'fd -e py'
Benchmark 1: find . -name "*.py"
  Time (mean ± σ):     124.3 ms ±   4.2 ms
Benchmark 2: fd -e py
  Time (mean ± σ):      18.7 ms ±   1.1 ms

Summary
  'fd -e py' ran 6.65 ± 0.43 times faster than 'find . -name "*.py"'

That "6.65 ± 0.43 times faster" line is what you paste in the pull request. No more hand-waving.

Warmup runs matter when you're benchmarking anything that touches disk. The first run populates the page cache; subsequent runs hit warm cache. hyperfine lets you be explicit:

$ hyperfine --warmup 3 'grep -r TODO src/'

Or if you want to benchmark cold cache performance, you can run a setup command before each timed run:

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' \
    'grep -r TODO src/'

That --prepare flag runs before every single timed iteration. This is how you get honest cold-cache numbers instead of accidentally benchmarking your RAM.

Parameterized benchmarks are where things get surgical. Want to test how your program scales with input size?

$ hyperfine --parameter-scan threads 1 8 \
    'my_program --threads {threads} input.dat'

This runs the benchmark for each value of {threads} from 1 to 8. You can also use --parameter-list for non-numeric values:

$ hyperfine --parameter-list compiler gcc,clang \
    '{compiler} -O2 -o /dev/null main.c'

Export results for downstream analysis or pretty charts:

$ hyperfine --export-json results.json \
    --export-markdown results.md \
    'sort bigfile.txt' 'sort --parallel=4 bigfile.txt'

The JSON export includes every individual run time, so you can feed it into your own plotting scripts. The Markdown export gives you a table ready to paste into GitHub issues.

One subtle feature: hyperfine calibrates shell startup overhead by measuring the time to spawn an empty shell, then subtracts it. This means when you're benchmarking a 5ms command, you're not accidentally including 3ms of bash startup in your numbers. You can also bypass the shell entirely with --shell=none if your command needs no shell features.

I use it constantly: comparing JSON parsers, testing whether a new index actually speeds up a query, verifying that a "performance optimization" actually optimizes performance. The number of times hyperfine has told me my "improvement" was within noise is humbling — and exactly the point.

Key Takeaway: Stop running time three times and eyeballing it — hyperfine gives you statistically sound benchmarks with warmup, cache control, and automatic comparison in a single command.

All newsletters