Daily Software Engineering: The Refresh-Ahead Cache Pattern: Reload Before You're Asked

The Refresh-Ahead Cache Pattern: Reload Before You're Asked

2026-05-21

Most caches wait for a miss to do work. A user requests a key, the cache shrugs, and the application pays the full database round-trip while the user watches a spinner. Refresh-ahead flips that timing: predict that a hot key will be requested again soon, and reload it from the source before its TTL expires. The next read still hits a warm cache.

The mechanism is simple. When you fetch a cached value, check how close it is to expiry. If the remaining TTL is below some threshold — say 20% of the original TTL — kick off an asynchronous refresh in the background and return the still-valid cached value immediately. The user gets a fast response; the next user gets a freshly loaded entry.

Concrete example: A product catalog service caches pricing data with a 5-minute TTL. Under cache-aside, every 5 minutes the first unlucky request waits ~200ms for the database. With refresh-ahead and a 20% threshold (1 minute), once the entry's age crosses 4 minutes, the next read triggers an async reload. P99 latency stays flat because no user ever pays for the miss on hot keys.

Rule of thumb for the refresh threshold: set it to roughly source_latency_p99 / TTL, rounded up to a sensible percentage. If your backing store's P99 is 300ms and your TTL is 60s, that's 0.5% — too tight to be useful, so floor it at 10–20%. If the source is slow (say 3s) and TTL is 30s, you need 10% just to have time to refresh before expiry.

Where it shines:

Predictable hot keys: homepage data, leaderboards, feature flag configs, currency exchange rates.
Expensive recomputation: aggregations, ML model outputs, joins across multiple services.
Latency-sensitive paths: anywhere a single cache miss would breach your SLO.

Where it hurts:

Cold or sparse keys: refresh-ahead does nothing for a key nobody reads. You still pay the first miss.
Large keyspaces: refreshing every accessed key wastes backend capacity. Combine with access-frequency tracking so only genuinely hot keys refresh.
Stampedes: if 100 concurrent requests hit the threshold simultaneously, they may all trigger refreshes. Use a single-flight lock (one in-flight refresh per key) to deduplicate.

The pattern complements — doesn't replace — cache-aside. You still need miss handling for cold keys and TTL as the safety net for staleness. Refresh-ahead is the optimization layered on top, paying small background costs to eliminate user-visible misses on the keys that matter most.

See it in action: Check out Roku Hidden Menu by Popo to see this theory applied.

Key Takeaway: Refresh-ahead trades a little background work for predictable latency by reloading hot keys before they expire, so users never wait on a miss.

All newsletters