The Retry Pattern: Exponential Backoff and Jitter

2026-05-03

When a network call fails, your first instinct is to retry it immediately. This is almost always wrong. If a service is struggling under load and 500 clients all retry instantly, you've just doubled the traffic hitting an already-sick system. This is called a retry storm, and it can turn a minor blip into a full outage.

The fix is exponential backoff: each retry waits longer than the last. A common formula is delay = base * 2^attempt. With a 1-second base, your retries fire at 1s, 2s, 4s, 8s, 16s. This gives the downstream service breathing room to recover.

But there's a subtlety. If 500 clients all start at the same moment, they'll all retry at 1s, then all at 2s — still synchronized. This is where jitter comes in. You randomize each delay so clients spread their retries across time. The two common approaches:

Here's a real-world example. Say your payment service calls a bank API that occasionally returns 503. Without backoff, a 200ms blip causes a retry storm that extends the outage to 30 seconds. With exponential backoff plus jitter, each client independently backs off, the bank recovers in under a second, and users barely notice.

Rule of thumb for max retries: calculate the total worst-case delay before giving up. With base=1s, 5 retries gives a max wait of 1+2+4+8+16 = 31 seconds. For user-facing requests, 3 retries (max ~7s) is usually the limit before you should fail and show an error. For async background jobs, 5-8 retries with a cap of 60s between attempts is reasonable.

Critical details people forget:

Most HTTP client libraries and cloud SDKs have built-in retry with backoff — AWS SDK, gRPC, and Axios retry plugins all support it. Don't hand-roll this unless you have a reason to. Configure what's already there.

See it in action: Check out Understanding Exponential Backoff: A Comprehensive Guide to Efficient Retries by Muhammad Daif to see this theory applied.
Key Takeaway: Always pair retries with exponential backoff and jitter — immediate retries on failure feel intuitive but amplify outages instead of recovering from them.

All newsletters