2026-05-21
Most caches wait for a miss to do work. A user requests a key, the cache shrugs, and the application pays the full database round-trip while the user watches a spinner. Refresh-ahead flips that timing: predict that a hot key will be requested again soon, and reload it from the source before its TTL expires. The next read still hits a warm cache.
The mechanism is simple. When you fetch a cached value, check how close it is to expiry. If the remaining TTL is below some threshold — say 20% of the original TTL — kick off an asynchronous refresh in the background and return the still-valid cached value immediately. The user gets a fast response; the next user gets a freshly loaded entry.
Concrete example: A product catalog service caches pricing data with a 5-minute TTL. Under cache-aside, every 5 minutes the first unlucky request waits ~200ms for the database. With refresh-ahead and a 20% threshold (1 minute), once the entry's age crosses 4 minutes, the next read triggers an async reload. P99 latency stays flat because no user ever pays for the miss on hot keys.
Rule of thumb for the refresh threshold: set it to roughly source_latency_p99 / TTL, rounded up to a sensible percentage. If your backing store's P99 is 300ms and your TTL is 60s, that's 0.5% — too tight to be useful, so floor it at 10–20%. If the source is slow (say 3s) and TTL is 30s, you need 10% just to have time to refresh before expiry.
Where it shines:
Where it hurts:
The pattern complements — doesn't replace — cache-aside. You still need miss handling for cold keys and TTL as the safety net for staleness. Refresh-ahead is the optimization layered on top, paying small background costs to eliminate user-visible misses on the keys that matter most.
