Daily Software Engineering: The Circuit Breaker Pattern

The Circuit Breaker Pattern

2026-04-22

When your service calls another service, what happens when that dependency is down? Without protection, every request blocks for 30 seconds waiting for a timeout, your thread pool fills up, and your entire system crashes. One failing dependency takes down everything. This is called cascading failure, and the circuit breaker pattern exists to prevent it.

The pattern works exactly like an electrical circuit breaker. It has three states:

Closed (normal): Requests flow through. The breaker silently tracks failures.
Open (tripped): Requests are immediately rejected without calling the downstream service. You return a fallback, a cached value, or a graceful error — in milliseconds instead of waiting for a timeout.
Half-Open (testing): After a cooldown period, the breaker lets one request through. If it succeeds, the breaker closes. If it fails, it opens again.

Here's a minimal implementation in pseudocode:

failureCount = 0; state = CLOSED; lastFailureTime = null;

On each call: if state is OPEN and cooldown hasn't elapsed, fail fast. If cooldown has elapsed, let one request through (HALF_OPEN). On success, reset to CLOSED. On failure, increment the counter. When failureCount >= threshold, flip to OPEN and record the timestamp.

Real-world example: Your checkout service calls a payment gateway. The gateway starts returning 500 errors. Without a circuit breaker, every checkout attempt hangs for your full timeout (say 10 seconds), your users stare at a spinner, and your request queue backs up. With a circuit breaker configured at 5 failures in 60 seconds, after 5 failed payments the breaker opens. Subsequent checkout attempts instantly get a "Payment temporarily unavailable, please try again shortly" message. Your service stays responsive. After 30 seconds of cooldown, one request probes the gateway — if it's back, traffic resumes.

Rule of thumb for tuning: Set your failure threshold to timeout_seconds × normal_requests_per_second × 0.5. If your timeout is 5 seconds and you normally send 10 req/s, trip after roughly 25 failures. This prevents tripping on isolated errors while catching real outages within a few seconds. Start conservative (trip early) and loosen from there.

A few practical notes:

Scope breakers per dependency, not globally. A failing payment gateway shouldn't block your inventory service calls.
Count only server errors (5xx, timeouts), not client errors (4xx). A 404 isn't an outage.
Emit metrics when state changes. An open circuit breaker is an alert-worthy event.
Pair it with retries carefully. Retries go inside the breaker. If retries exhaust, that counts as one failure toward the threshold. Never retry outside an open breaker — that defeats the purpose.

Most languages have battle-tested libraries: resilience4j (Java), Polly (.NET), opossum (Node.js), pybreaker (Python). Use them instead of hand-rolling — they handle edge cases like concurrent state transitions and sliding window counters.

See it in action: Check out Circuit Breaker Pattern in Microservices by ByteMonk to see this theory applied.

Key Takeaway: A circuit breaker fails fast when a dependency is down, protecting your system from cascading failures by trading a guaranteed quick error for a slow, hopeless timeout.

All newsletters