parallel: GNU parallel's Tricks That Aren't Just xargs -P

2026-05-22

Everyone discovers xargs -P and thinks they've solved parallelism. Then they hit a job where they need per-input output isolation, resume-from-failure, or remote execution across a cluster of boxes, and xargs starts looking like a toy. GNU parallel has been quietly doing all of this since 2010, and most people use maybe 5% of it.

The killer feature nobody talks about is --joblog combined with --resume-failed. Run a batch of 10,000 jobs, half fail because S3 throttled you, fix the throttling, and re-run exactly the failures:

parallel --joblog jobs.log --resume-failed -j 16 \
  './upload.sh {}' ::: files/*.tar.gz

The joblog records exit codes, wall time, host, and the exact input. --resume skips successes; --resume-failed retries non-zero exits. xargs has nothing for this — you'd be writing your own state file.

Output never interleaves. Run noisy commands in parallel without garbled stdout:

parallel --line-buffer 'curl -s {} | grep -i error' ::: $(cat urls.txt)
# Or fully grouped (default) — each job's full output prints atomically when done
parallel 'pytest tests/{}' ::: unit integration e2e

The ::: input syntax composes like a Cartesian product, which xargs cannot do at all:

# Try every combination of compiler × optimization × file
parallel 'gcc -{2} {3} -o /tmp/{1}-{2} {3}' \
  ::: gcc-11 gcc-12 clang-15 \
  ::: O0 O2 O3 \
  ::: src/*.c

That's a 3 × 3 × N matrix expansion in one line. The substitution operators are surprisingly rich: {.} strips extension, {/} strips path, {//} keeps only path, {#} is the job number, {%} is the slot number (great for round-robin pinning to GPUs):

parallel -j 4 'CUDA_VISIBLE_DEVICES={%} python train.py {}' ::: configs/*.yaml

Remote execution is built in. If you have SSH access to a few machines, parallel will distribute work for you:

parallel -S host1,host2,host3 --transferfile {} --return {}.out \
  './process.sh {}' ::: data/*.bin

It rsyncs the input, runs the command remotely, copies the result back. No Ansible, no Kubernetes — three flags.

Progress and ETA you can actually read:

parallel --bar --eta 'convert {} {.}.webp' ::: photos/*.jpg
# Computers: 1:local / 4 / 1247
# 53% 663:584=15m32s 12.4s/job

The pitfall everyone hits: parallel buffers output to keep it clean, which means a stuck job has invisible output. Use --line-buffer when you want streaming progress, --tag to prefix each line with the input so you can tell jobs apart in interleaved mode, and --halt now,fail=1 to bail on first failure instead of grinding through 9,999 doomed jobs.

One more trick — --pipe turns parallel into a parallel tee for chunked stdin:

# Process a giant log file with 8 parallel workers, 1MB chunks
cat huge.log | parallel --pipe --block 1M -j 8 'grep -c ERROR'
# Sum the per-chunk counts; faster than single-threaded grep on multi-core

It's the closest thing Unix has to map-reduce in a single binary, and it's already installed on half the boxes you ssh into.

Key Takeaway: GNU parallel is xargs with a joblog, Cartesian inputs, remote execution, and stdin chunking — learn the four flags --joblog, :::, -S, and --pipe and you've replaced a dozen ad-hoc shell scripts.

All newsletters