Caching: Strategies, Layers & Invalidation
The full caching story: browser → CDN → app → DB. Cache-aside, write-through, write-behind. Redis vs Memcached. Cache stampede and thundering herd. How to design a cache that actually works in production.
Caching: Strategies, Layers & Invalidation
The full caching story: browser → CDN → app → DB. Cache-aside, write-through, write-behind. Redis vs Memcached. Cache stampede and thundering herd. How to design a cache that actually works in production.
What you'll learn
- Caching exists because the latency difference between RAM (100ns) and database (1-10ms) is 10,000-100,000x. At any meaningful scale, caching is structural, not optional.
- Use cache-aside as your default read pattern. Write event-driven DEL on every write, with TTL as a safety fallback. This gives near-zero stale windows without write-through complexity.
- "Cache invalidation is one of the two hard problems in CS" — because race conditions, missed invalidation paths, and stampedes make it genuinely hard. Always use DEL (not UPDATE) on writes to avoid race conditions.
- Cache stampede (thundering herd) occurs when a hot cache key expires and many requests hit the DB simultaneously. Prevent with: mutex locks for high-traffic keys, probabilistic early expiration, and cache warming for predictable traffic events.
- Choose Redis over Memcached for 95% of use cases: rich data structures, optional persistence, pub/sub for event-driven invalidation, and native cluster support. Add an in-process cache (Guava/Caffeine) as an L1 cache in front of Redis for the hottest < 1% of data.
Lesson outline
Why Caching Exists: The Latency Problem
Caching exists because of a fundamental law of physics: accessing data from memory is orders of magnitude faster than accessing it from disk or over a network. The latency table that every FAANG engineer has memorized:
| Storage Level | Latency | Relative Speed | Technology Examples |
|---|---|---|---|
| CPU L1 cache | 1 nanosecond | 1x (baseline) | CPU register |
| CPU L2 cache | 4 nanoseconds | 4x slower | On-die CPU cache |
| RAM (in-process memory) | 100 nanoseconds | 100x slower | Application in-memory cache (Guava, Caffeine) |
| NVMe SSD | 100 microseconds | 100,000x slower | Local SSD storage |
| Redis (same datacenter) | 500 microseconds | 500,000x slower | Remote in-memory store |
| HDD (mechanical disk) | 10 milliseconds | 10,000,000x slower | Traditional spinning disk |
| Database query (with index) | 1-10 milliseconds | 1-10M slower | PostgreSQL, MySQL |
| Database query (no index) | 100-1000 milliseconds | 100M-1B slower | Full table scan |
| Cross-datacenter (US → EU) | 150 milliseconds | 150M slower | Multi-region replication |
The implication is striking: a database query (1ms) is 10,000 times slower than a RAM access (100ns). Redis (0.5ms) is still 5,000 times slower than in-process memory. This means putting data at the right cache level can reduce latency by 1,000-100,000x.
The Practical Impact of the Latency Table
A service doing 10,000 DB queries/sec at 5ms each uses ~50 server-seconds of wait time. If 90% of those queries are cacheable and you add Redis (0.5ms), you reduce wait time from 50 server-seconds to 5.45 server-seconds — a 9x improvement. This means you can serve the same load with 9x fewer servers, or serve 9x more traffic on the same infrastructure. Caching is the single highest-ROI performance optimization in most web systems.
But caching introduces new problems: stale data, cache invalidation complexity, cold start (the "thundering herd"), and memory constraints. Understanding when each problem occurs — and how to prevent it — is what separates a naive cache implementation from a production-grade one.
The 4 Core Problems Caching Solves (and Introduces)
- Latency reduction — Cache serves data at memory speed (< 1ms) vs. disk/DB speed (1-100ms). Most visible benefit. Directly improves user experience.
- Throughput amplification — One DB handles 50K QPS. With 95% cache hit rate, that DB sees only 2,500 QPS — serving 20x the traffic with the same DB. This is the scaling superpower of caching.
- Compute cost reduction — Some queries are expensive to compute (aggregations, ML ranking, complex JOINs). Caching the result means computing once, serving many times.
- INTRODUCED: Stale data problem — Cached data is a snapshot at point-in-time. If the underlying data changes, the cache is wrong until invalidated. The cache invalidation problem is why Phil Karlton famously called it "one of two hard problems in CS."
The Caching Hierarchy: Browser → CDN → App → DB
There isn't one cache — there's a hierarchy of caches, each progressively closer to the user. Data flows from the source of truth (database) through multiple cache layers before reaching the user. Each layer has different characteristics: location, capacity, latency, and freshness.
graph LR
U[User Browser] -->|Cache-Control header
max-age=3600| BC[Browser Cache
Local storage
0ms, stalest]
BC -->|Cache miss| CDN[CDN Edge Node
1-50ms from user
GB capacity]
CDN -->|Cache miss| RP[Reverse Proxy Cache
Nginx/Varnish
at datacenter]
RP -->|Cache miss| AC[App Cache
Redis/Memcached
0.5ms, TB capacity]
AC -->|Cache miss| DB[(Database
Source of Truth
1-10ms, any size, freshest)]
style U fill:#4dabf7
style BC fill:#74c0fc
style CDN fill:#a5d8ff
style RP fill:#d0ebff
style AC fill:#e7f5ff
style DB fill:#fffThe caching hierarchy: closer to user = faster but less fresh. Further from user = slower but more fresh.
| Cache Layer | Location | Latency | Capacity | Who Controls Invalidation | TTL Range |
|---|---|---|---|---|---|
| Browser cache | User's device | 0ms (local) | 50-500MB | HTTP Cache-Control headers | Minutes to weeks |
| CDN (CloudFront, Akamai) | Edge PoP, 10-50ms from user | 5-50ms | Effectively unlimited | Cache-Control, CDN API invalidation | Minutes to years |
| Reverse proxy (Nginx, Varnish) | Datacenter edge | < 1ms | 10-100GB per node | Cache-Control, explicit purge | Seconds to minutes |
| Application cache (Redis) | Same datacenter as app | 0.5-2ms | 25 GB - 100 TB (cluster) | Application code explicitly | Seconds to hours |
| DB buffer pool | Database server RAM | < 0.1ms | % of DB instance RAM | Automatic (LRU by DB) | LRU-managed, no TTL |
HTTP Cache-Control: The Most Underused Cache Tool
Browser and CDN caches are controlled by HTTP headers. The most important: Cache-Control: max-age=3600, public — tells both browser AND CDN to cache for 1 hour. Cache-Control: max-age=0, private — tells CDN not to cache (user-specific), browser can cache. Cache-Control: no-store — nothing anywhere should cache this response. ETag: "abc123" — version identifier; browser sends If-None-Match header on next request; if same, server returns 304 Not Modified (no body — saves bandwidth). Most engineers don't set these headers, leaving cache behavior to defaults. Setting them correctly eliminates 50-80% of CDN origin requests for static assets.
The key insight about the hierarchy: when a request arrives at the browser, it checks its local cache first. Cache hit: sub-millisecond, doesn't even reach the network. Cache miss: request goes to CDN. CDN cache hit: 5-50ms, doesn't reach your servers. CDN cache miss: request reaches your origin, hits your reverse proxy, then Redis, then DB. Every layer hit saves the latency and compute of all subsequent layers. A CDN cache hit at 20ms saves you 0.5ms Redis + 5ms DB + network latency + server CPU — a total savings of > 50ms.
Cache-Aside (Lazy Loading) Pattern
Cache-aside is the most common caching pattern and should be your default. The application code explicitly manages the cache: before reading data, check the cache. On a hit, return from cache. On a miss, read from database, populate the cache, then return the data.
Cache-Aside Pattern: Read Path and Write Path
01
Read path (GET /user/123): Check Redis: GET user:123. If cache hit (< 1ms): return cached data immediately. If cache miss: query database: SELECT * FROM users WHERE id = 123 (5ms). Write result to Redis: SET user:123 {JSON} EX 3600 (1 hour TTL). Return data to client.
02
Write path (PUT /user/123): Update database: UPDATE users SET name=... WHERE id = 123. Delete cache entry: DEL user:123. (Alternative: update cache entry immediately, but delete is safer — avoids stale write race condition). Next read will populate the cache with fresh data.
03
Cache miss handling: On first request after cache eviction or cold start, the cache miss causes a database read. This is the cost of freshness. The TTL controls how often this happens: 1-hour TTL means at most one DB read per user per hour per cache key.
04
Error handling: What if Redis is down? Cache-aside degrades gracefully: if Redis GET fails, fall through to database. This is the key advantage — the cache is not required for correctness, only for performance. Your system slows down when cache fails, but doesn't break.
Read path (GET /user/123): Check Redis: GET user:123. If cache hit (< 1ms): return cached data immediately. If cache miss: query database: SELECT * FROM users WHERE id = 123 (5ms). Write result to Redis: SET user:123 {JSON} EX 3600 (1 hour TTL). Return data to client.
Write path (PUT /user/123): Update database: UPDATE users SET name=... WHERE id = 123. Delete cache entry: DEL user:123. (Alternative: update cache entry immediately, but delete is safer — avoids stale write race condition). Next read will populate the cache with fresh data.
Cache miss handling: On first request after cache eviction or cold start, the cache miss causes a database read. This is the cost of freshness. The TTL controls how often this happens: 1-hour TTL means at most one DB read per user per hour per cache key.
Error handling: What if Redis is down? Cache-aside degrades gracefully: if Redis GET fails, fall through to database. This is the key advantage — the cache is not required for correctness, only for performance. Your system slows down when cache fails, but doesn't break.
Cache-Aside Pseudocode
function getUser(userId): cacheKey = "user:" + userId. value = redis.get(cacheKey). if value: return JSON.parse(value) // cache hit. user = db.query("SELECT * FROM users WHERE id = ?", userId). if user: redis.setex(cacheKey, 3600, JSON.stringify(user)) // cache for 1 hour. return user. // Result: DB only hit on cache miss (1/3600 of requests if TTL=1hr and each user accessed roughly every hour)
sequenceDiagram
participant C as Client
participant App as App Server
participant Redis as Redis Cache
participant DB as Database
C->>App: GET /user/123
App->>Redis: GET user:123
alt Cache Hit
Redis-->>App: {name: "Alice", ...} (0.5ms)
App-->>C: 200 OK (total: ~2ms)
else Cache Miss
Redis-->>App: nil
App->>DB: SELECT * FROM users WHERE id=123 (5ms)
DB-->>App: {name: "Alice", ...}
App->>Redis: SET user:123 {JSON} EX 3600
App-->>C: 200 OK (total: ~8ms)
endCache-aside: cache hit serves in ~2ms. Cache miss falls through to DB at ~8ms and populates cache.
Cache-Aside: Pros and Cons
- Pro: Resilient to cache failure — Cache is on the hot path but not required. If Redis goes down, requests hit the database. Slow, but not broken. No data loss.
- Pro: Only caches requested data — Lazy loading means only data that's actually requested gets cached. The cache fills organically with hot data, not all data.
- Con: Cache miss penalty — First request after cache eviction is slow (DB read + Redis write). Under cold start or after a cache clear, all requests hit the DB simultaneously — the thundering herd problem.
- Con: Potential stale data window — Between a DB update and cache invalidation, readers may see stale data. The window is: time between update and DEL key (typically milliseconds in well-written code, but can be longer if DEL fails).
Write-Through and Write-Behind Caching Patterns
Cache-aside handles reads. But what about writes? There are three write strategies, each with different consistency and performance trade-offs.
The Three Write Caching Patterns at a Glance
Write-through: write to cache AND database synchronously. Always fresh, slower writes. Write-behind (write-back): write to cache first, async flush to database. Fast writes, risk of data loss. Write-around: skip cache entirely on write, write directly to DB. Cache remains clean, but reads after writes will be cache misses.
| Pattern | Write Path | Read Path | Consistency | Write Latency | Data Loss Risk |
|---|---|---|---|---|---|
| Write-Through | Write cache + DB synchronously | Cache always has latest data | Strong | High (both writes must complete) | None (DB always has data) |
| Write-Behind | Write cache, async DB flush | Cache has latest, DB may lag | Eventual | Low (only cache write) | Risk if cache fails before flush |
| Write-Around | Write directly to DB, skip cache | First read = DB (cache miss), subsequent = cache | Strong (DB = truth) | Medium (DB write) | None |
| Cache-Aside + Delete | Write DB, then delete cache key | Next read repopulates cache | Slightly eventual (brief stale window) | Low (delete is fast) | None |
When to Use Each Pattern
- Cache-aside + delete (most common) — Use for most CRUD applications. Write to DB, delete the cache key. Simple, resilient, eventually consistent. Perfect for user profiles, product info, any data that's read far more than written.
- Write-through — Use when readers expect immediate consistency: "I just updated my profile — I should see the change immediately." Write-through ensures the cache is always fresh. Cost: every write hits both DB and cache, so write latency increases.
- Write-behind — Use for high-write-throughput workloads where some data loss is acceptable: analytics events, clickstream data, metrics aggregation. Buffer writes in Redis, flush to DB in batches. 1,000 DB writes become 1 bulk insert. Risk: if Redis fails before flush, you lose the buffered writes.
- Write-around — Use for data that is written once and rarely read back immediately: log files, archive data, batch-imported records. Prevents cache pollution with data that users won't re-read shortly after writing.
The Write Race Condition: Why Order Matters
Dangerous pattern: write DB, then UPDATE cache (not delete). Two concurrent writers can cause: Writer A: DB update → Writer B: DB update → Writer B: cache update (with B's value) → Writer A: cache update (with A's old value, OVERWRITING B's correct value). Cache now has stale data indefinitely. Fix: always DELETE cache on write, never UPDATE. The next read will populate with fresh data. This "delete on write" pattern avoids race conditions entirely.
Cache Invalidation Strategies: The Hard Problem
"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton. This quote is famous because cache invalidation is genuinely hard. Getting it wrong means users see stale data or, worse, incorrect data.
Cache invalidation is the process of removing or updating cached data when the underlying source data changes. There are four strategies, each appropriate for different scenarios.
The 4 Cache Invalidation Strategies
- TTL (Time-To-Live) Expiration — Set a maximum lifetime on cached data. After TTL expires, the next read fetches fresh data and refreshes the cache. Pros: simple, automatic, no explicit invalidation code. Cons: data is always somewhat stale (up to TTL duration). Best for: product catalog, static pages, anything where "recent" is good enough.
- Event-Driven Invalidation — When data changes, explicitly delete the cache key. Code: after UPDATE users SET ..., do Redis.DEL("user:" + user_id). Pros: near-zero stale window (only between DB write and DEL). Cons: every write path must include cache invalidation. Easy to miss one update path. Best for: user profiles, anything where freshness matters.
- Cache Tags / Surrogate Keys — Tag cache entries with related identifiers. Invalidate all entries with a given tag when data changes. Example: product page cached with tags ["product:123", "category:electronics"]. When product 123 changes: invalidate all entries tagged "product:123". Varnish and Fastly CDN support this natively. Pros: efficient bulk invalidation. Cons: complex to implement correctly.
- Version-Based Invalidation — Include a version number in the cache key: "user:123:v5". When data changes, increment the version. Old key naturally expires by TTL. Pros: atomic — no invalidation race conditions. Cons: cache usage is higher (old versions linger until TTL). Best for: immutable-ish data (product images, CSS files — cache busting).
The Cache Invalidation Race Condition
Here's a subtle bug that causes stale data even with event-driven invalidation: T=0: Reader reads cache (miss). T=1: Writer updates DB. T=2: Writer deletes cache key. T=3: Reader writes OLD data to cache (read was started at T=0, before the update). Now cache has stale data until TTL expires. Solution: use the "cache-aside + delete-on-write" pattern consistently. The window where stale data is cached is bounded by the time between your DB read and cache write (typically milliseconds). For critical data where this matters, use write-through or distributed locking.
sequenceDiagram
participant W as Writer
participant DB as Database
participant R as Redis
participant C as Reader
W->>DB: UPDATE users SET name="Bob" WHERE id=123
DB-->>W: OK
W->>R: DEL user:123
R-->>W: OK (key deleted)
C->>R: GET user:123
R-->>C: nil (key was deleted)
C->>DB: SELECT * FROM users WHERE id=123
DB-->>C: {name: "Bob"} ← fresh data
C->>R: SET user:123 {name:"Bob"} EX 3600
Note over W,C: Event-driven invalidation: stale window is
only the time between DB write and DEL
(typically < 1ms in practice)Event-driven invalidation: delete cache key on write, next read fetches fresh data from DB
| Strategy | Freshness | Complexity | Best TTL | Use Case |
|---|---|---|---|---|
| TTL only | Stale by up to TTL | Low | 1 min - 1 hour | Product catalog, news articles, any data where "recent" is OK |
| Event-driven delete | Near-instant (< 1ms stale window) | Medium | 24 hours (fallback) | User profiles, account settings, anything user-visible |
| Cache tags | Instant (bulk invalidation) | High | Unlimited (tag-controlled) | CDN content, pages with complex dependencies |
| Version keys | Instant (old key abandoned) | Medium | Short (prevent bloat) | Static assets, CSS/JS files (cache busting) |
Redis vs. Memcached: The Deep Dive
Redis and Memcached are both in-memory key-value stores, but they have very different capabilities. Choosing the wrong one can limit your architecture or add unnecessary complexity.
| Dimension | Redis | Memcached |
|---|---|---|
| Data structures | Strings, Hashes, Lists, Sets, Sorted Sets, HyperLogLog, Streams, Bitmaps, Geospatial | Strings only (arbitrary byte arrays) |
| Persistence | RDB snapshots + AOF (append-only file). Survives restarts. | Memory only. All data lost on restart. |
| Replication | Built-in master-replica replication. Redis Cluster (automatic sharding). | No built-in replication. Third-party tools needed (mcrouter). |
| Pub/Sub messaging | Native PUBLISH/SUBSCRIBE commands. Used by Socket.io, chat systems. | Not supported. |
| Atomic operations | INCR, LPUSH, SADD, ZADD, etc. — all atomic. MULTI/EXEC transactions. | Only CAS (compare-and-swap). Limited atomic ops. |
| Lua scripting | Run Lua scripts server-side (atomic execution) | Not supported. |
| Memory efficiency | Higher overhead per key (data structure metadata) | Lower overhead per key (simpler storage format) |
| Throughput | ~1M ops/sec single node | ~1.5M ops/sec single node (simpler = faster) |
| Horizontal scaling | Redis Cluster: native sharding up to 1000 nodes | Client-side hashing (consistent hashing via client libraries) |
When to Choose Redis vs. Memcached
Choose Redis when: you need data persistence (cache that survives restarts), you need pub/sub messaging, you need sorted sets for leaderboards or rate limiting, you need atomic operations beyond simple CAS, or you want a single technology for both caching AND session storage AND pub/sub. Choose Memcached when: you only need a simple key-value cache with no persistence, you're optimizing for raw throughput (1.5M ops/sec vs 1M), you have a team already expert in Memcached, or you need multi-threading (Memcached is multi-threaded, Redis single-threaded per shard). In practice: Redis is the default choice for 90%+ of use cases. Memcached is typically only chosen for legacy reasons or extreme simplicity requirements.
Redis Data Structures and When to Use Each
- String (GET/SET) — Simple caching: user profiles, API responses, HTML fragments. SET user:123 {JSON} EX 3600. The most basic and most-used type.
- Hash (HSET/HGET) — Structured objects where you need partial updates. Instead of serializing an entire user object, store each field separately: HSET user:123 name "Alice" email "alice@..." Updated fields don't require re-reading the entire object.
- Sorted Set (ZADD/ZRANGE) — Leaderboards, rate limiting (score = timestamp), priority queues. ZADD leaderboard 9500 "user_A". ZRANGE leaderboard 0 9 WITHSCORES (top 10 players). Critical for any ranked list.
- List (LPUSH/RPOP) — Message queues (LPUSH + BRPOP = blocking queue), activity feeds (most recent N items). LPUSH user:123:feed {tweet_JSON}. LRANGE user:123:feed 0 99 (latest 100 feed items).
- Set (SADD/SMEMBERS) — Unique visitors, "who liked this post", tag management. SADD post:456:likes user_789. SCARD post:456:likes (count of likes). Set operations: SINTER (intersection), SUNION (union) for "friends who follow this artist."
- HyperLogLog — Approximate unique count with 0.81% error. Uses 12KB regardless of data size. PFADD daily-visitors user_123. PFCOUNT daily-visitors. Perfect for analytics: unique page visitors, unique searches.
CDN Caching: Edge Nodes, Cache-Control Headers, and Cost Math
A CDN (Content Delivery Network) is a globally distributed cache. Requests are routed to the nearest "Point of Presence" (PoP) instead of your origin server. If the CDN has the content cached, it serves it immediately from 5-50ms away. If not, it fetches from origin, caches, and serves future requests from the edge.
Major CDN Providers and Their Strengths
Cloudflare: 285+ PoPs worldwide, DDoS protection built-in, WAF, free tier available. AWS CloudFront: tight AWS integration, S3 origin, Lambda@Edge for edge computing, pricing: $0.0085-$0.0120/GB depending on region. Fastly: real-time cache invalidation (< 150ms global purge), Varnish-based (highly configurable). Akamai: oldest CDN, enterprise-focused, 4,000+ PoPs, used by major media companies. For most applications on AWS: CloudFront. For edge computing requirements: Cloudflare Workers.
CDN Cache-Control Strategy for Different Content Types
01
Static assets (JS, CSS, images): Cache-Control: public, max-age=31536000, immutable. One year TTL. Use content-based hash in filename for cache busting (style.abc123.css). Never changes, so cache forever. Immutable hint tells browser not to revalidate even on refresh.
02
API responses (public data): Cache-Control: public, max-age=60, s-maxage=300. Browser caches 60 seconds; CDN caches 5 minutes. Acceptable staleness for product catalog, news feeds. s-maxage is CDN-specific; max-age is browser-specific.
03
User-specific API responses: Cache-Control: private, max-age=0. Browser may cache for back/forward navigation; CDN must NOT cache. Never cache user-specific data (account info, personalized recommendations) at CDN level — wrong user would see wrong data.
04
Dynamic HTML pages: Cache-Control: no-cache (must revalidate with origin). Or: Cache-Control: public, max-age=60 with ETag. If ETag unchanged, CDN returns 304 Not Modified (no body). Reduces bandwidth while ensuring freshness.
05
Streaming content: Cache-Control: no-store. Live video, real-time data, WebSockets — never cacheable. Bypass CDN entirely for these paths.
Static assets (JS, CSS, images): Cache-Control: public, max-age=31536000, immutable. One year TTL. Use content-based hash in filename for cache busting (style.abc123.css). Never changes, so cache forever. Immutable hint tells browser not to revalidate even on refresh.
API responses (public data): Cache-Control: public, max-age=60, s-maxage=300. Browser caches 60 seconds; CDN caches 5 minutes. Acceptable staleness for product catalog, news feeds. s-maxage is CDN-specific; max-age is browser-specific.
User-specific API responses: Cache-Control: private, max-age=0. Browser may cache for back/forward navigation; CDN must NOT cache. Never cache user-specific data (account info, personalized recommendations) at CDN level — wrong user would see wrong data.
Dynamic HTML pages: Cache-Control: no-cache (must revalidate with origin). Or: Cache-Control: public, max-age=60 with ETag. If ETag unchanged, CDN returns 304 Not Modified (no body). Reduces bandwidth while ensuring freshness.
Streaming content: Cache-Control: no-store. Live video, real-time data, WebSockets — never cacheable. Bypass CDN entirely for these paths.
CDN Cost Math: Why Getting Cache-Control Right Saves Real Money
Scenario: 10 million page views/day. Each page loads 50 assets (JS, CSS, images) averaging 100KB each. Without CDN: 10M × 50 × 100KB = 50TB/day of origin egress. At AWS egress pricing of $0.09/GB: $4,608/day = $1.68M/year. With CDN at 95% cache hit rate: only 5% hits origin = 2.5TB/day of origin egress = $230/day = $84,000/year. Savings: $1.6M/year. CDN cost at $0.01/GB × 50TB/day at edge: $500/day = $182,500/year. Net savings: $1.6M - $182.5K = $1.42M/year. Getting Cache-Control headers right (to maximize hit rate) is worth significant engineering investment.
Cache Stampede & Thundering Herd: The Cold Start Problem
The cache stampede (also called thundering herd) is one of the most dangerous failure modes in cached systems. It happens when many requests arrive simultaneously for data that is not in the cache — and all of them go to the database at once.
Scenario: a popular product page is cached. At 11:00 PM, the TTL expires. In the next 100ms, 10,000 requests arrive for this page (it's a flash sale countdown). All 10,000 miss the cache. All 10,000 hit the database simultaneously. The database, which normally handles 1,000 QPS, suddenly receives 100,000 QPS. It falls over. The website is down during the flash sale.
Twitter 2012: Cache Stampede on Trending Topics (Real Incident)
Twitter's trending topics caused a cache stampede in 2012. When a topic started trending, the cache key for that topic expired (or didn't exist yet) simultaneously across millions of users scrolling their timelines. All those users' requests queried the database for the same trending data at once — far more load than the DB could handle. The cascade: DB overloaded → query timeouts → timeline load failures → 503 errors → users retrying → more DB load. A 3-hour partial outage resulted. Post-mortem fix: probabilistic early expiration + prewarming. Twitter now regenerates trending cache before TTL expires, and uses mutex locks to prevent stampedes.
Three Solutions to Cache Stampede
- Mutex / Cache Lock — First request to find a cache miss acquires a distributed lock (Redis SET NX EX). Other requests wait. Lock holder queries DB, populates cache, releases lock. Other requests see cache hit. Downside: requests that arrived during the lock wait — adding latency. Use for: highly popular data with expensive DB queries. Avoid for: queries that must return in < 10ms (lock waiting adds latency).
- Probabilistic Early Expiration (PER) — Don't wait for TTL to expire. Occasionally regenerate the cache early with some probability that increases as expiry approaches. Formula: probability_to_refresh = exp(-remaining_ttl / delta) where delta controls aggressiveness. When probability triggers, fetch fresh data and update cache proactively. Downside: occasional "early" DB reads even when cache is still valid. Use for: very high-traffic keys where any stampede is catastrophic.
- Cache Warming / Pre-Generation — Before cache expires (or before an expected traffic spike), proactively regenerate the cache entry. Background job runs every 50 minutes to refresh 1-hour TTL entries. For events: run cache warmer for expected popular pages 30 minutes before go-live. Netflix warms its recommendation cache before major show releases. Downside: requires a job scheduler and knowledge of what to pre-warm. Use for: predictable high-traffic events, page-level caching.
sequenceDiagram
participant R1 as Request 1
participant R2 as Request 2 (concurrent)
participant Redis as Redis
participant DB as Database
R1->>Redis: GET product:123
Redis-->>R1: nil (cache miss)
R1->>Redis: SET product:123:lock 1 NX EX 5
Redis-->>R1: OK (lock acquired)
R2->>Redis: GET product:123
Redis-->>R2: nil (cache miss)
R2->>Redis: SET product:123:lock 1 NX EX 5
Redis-->>R2: FAIL (lock already held)
R2->>R2: Wait 100ms, retry...
R1->>DB: SELECT * FROM products WHERE id=123
DB-->>R1: {name: "Widget", price: 29.99}
R1->>Redis: SET product:123 {JSON} EX 3600
R1->>Redis: DEL product:123:lock
R1-->>R1: Return to client
R2->>Redis: GET product:123
Redis-->>R2: {name: "Widget"} (cache hit!)
R2-->>R2: Return to clientMutex pattern: first request acquires lock, populates cache. Others wait, then see cache hit.
Cache Design in Interviews: Sizing, Keys, Eviction, Monitoring
When cache design comes up in a FAANG interview, interviewers expect you to address four specific topics: cache sizing (how big?), key design (how do you structure keys?), eviction policy (what happens when cache is full?), and monitoring (how do you know it's working?). A complete answer covers all four.
The 4 Cache Design Dimensions
01
Cache sizing: Calculate working set = total dataset × hot_fraction (typically 20%). Multiply by serialization overhead (2.5x for Redis key + value + metadata). Divide by node capacity (25-50 GB per Redis node): nodes_needed = (working_set × 2.5) ÷ 30. Add 50% for replication. Example: 500K users × 2KB profile × 20% hot = 200 MB working set. 200MB × 2.5 / 25GB = 0.02 nodes → 1 node easily. At 500M users: 200 GB working set → 200 × 2.5 / 25 = 20 nodes (40 with replication).
02
Cache key design: Keys must be unique, predictable, and invalidatable. Use namespaced keys: "entity:id:attribute". Examples: user:123:profile, product:456:details, feed:789:page:1. Include version if using version-based invalidation: user:123:profile:v7. Keep keys short — every byte matters in Redis memory. Avoid hot keys: if one key gets 1M QPS (Redis single-thread bottleneck), spread it across 10 keys with modulo.
03
Eviction policy: What happens when Redis runs out of memory? Policies: allkeys-lru (evict least recently used — best for caching). volatile-lru (only evict keys with TTL — if no TTL set, OOM error). allkeys-random (random eviction — bad). noeviction (return error when full — fine for persistent data, terrible for cache). Default recommendation: allkeys-lru. This automatically evicts cold data to make room for hot data.
04
Monitoring metrics: Cache hit rate (hits / (hits + misses)) — target > 80% for most use cases, > 95% for feed/timeline caches. Eviction rate — if high, cache is undersized. Connection count — if maxed out, need connection pooling. Memory fragmentation ratio — Redis INFO memory shows mem_fragmentation_ratio; > 1.5 means fragmentation waste, restart or defrag. Latency percentiles — Redis should be < 1ms p99. Anything higher indicates memory swapping or CPU saturation.
Cache sizing: Calculate working set = total dataset × hot_fraction (typically 20%). Multiply by serialization overhead (2.5x for Redis key + value + metadata). Divide by node capacity (25-50 GB per Redis node): nodes_needed = (working_set × 2.5) ÷ 30. Add 50% for replication. Example: 500K users × 2KB profile × 20% hot = 200 MB working set. 200MB × 2.5 / 25GB = 0.02 nodes → 1 node easily. At 500M users: 200 GB working set → 200 × 2.5 / 25 = 20 nodes (40 with replication).
Cache key design: Keys must be unique, predictable, and invalidatable. Use namespaced keys: "entity:id:attribute". Examples: user:123:profile, product:456:details, feed:789:page:1. Include version if using version-based invalidation: user:123:profile:v7. Keep keys short — every byte matters in Redis memory. Avoid hot keys: if one key gets 1M QPS (Redis single-thread bottleneck), spread it across 10 keys with modulo.
Eviction policy: What happens when Redis runs out of memory? Policies: allkeys-lru (evict least recently used — best for caching). volatile-lru (only evict keys with TTL — if no TTL set, OOM error). allkeys-random (random eviction — bad). noeviction (return error when full — fine for persistent data, terrible for cache). Default recommendation: allkeys-lru. This automatically evicts cold data to make room for hot data.
Monitoring metrics: Cache hit rate (hits / (hits + misses)) — target > 80% for most use cases, > 95% for feed/timeline caches. Eviction rate — if high, cache is undersized. Connection count — if maxed out, need connection pooling. Memory fragmentation ratio — Redis INFO memory shows mem_fragmentation_ratio; > 1.5 means fragmentation waste, restart or defrag. Latency percentiles — Redis should be < 1ms p99. Anything higher indicates memory swapping or CPU saturation.
The Cache Hit Rate Formula and What It Tells You
Hit rate = redis_cache_hits / (redis_cache_hits + redis_cache_misses). Redis CLI: redis-cli INFO stats | grep keyspace. Calculate: (keyspace_hits / (keyspace_hits + keyspace_misses)) × 100. Target hit rates by use case: User profile cache: > 70% (users re-requested frequently). Product catalog: > 90% (few products, many requests per product). Timeline/feed: > 95% (heavily read per user per session). Search autocomplete: > 80%. If hit rate is low: (1) TTL is too short — data expires before it's re-requested. (2) Key cardinality is too high — too many unique keys, each cached once. (3) Working set is too large for cache size — increase Redis memory.
graph TB
Size[Cache Sizing] --> WS[Working Set
= total data × 20%]
WS --> Nodes[Redis Nodes
= WS × 2.5 ÷ 25GB × 1.5x replication]
Keys[Key Design] --> NS[Namespace:ID:field
e.g. user:123:profile]
Keys --> Short[Keep short
every byte costs memory]
Keys --> HotKey[Avoid hot keys
shard if > 100K QPS on one key]
Evict[Eviction Policy] --> LRU[allkeys-lru
best for caches]
Evict --> noev[noeviction
only for persistent data]
Monitor[Monitoring] --> HitRate[Hit rate > 80%]
Monitor --> EvictRate[Eviction rate < 1%]
Monitor --> Latency[p99 latency < 1ms]The 4 cache design dimensions: sizing, key design, eviction policy, monitoring
How this might come up in interviews
Caching is the most commonly discussed topic in system design interviews because it's required in every scaled system. Interviewers specifically test: (1) Can you explain the caching hierarchy? (2) Do you know cache-aside vs. write-through? (3) Can you explain the cache invalidation problem? (4) Do you know what a cache stampede is and how to prevent it? (5) Can you size a cache (working set × 2.5 / node capacity)? Strong candidates also mention monitoring (hit rate), eviction policy (allkeys-lru), and TTL design.
Common questions:
- L4: "What's the difference between cache-aside and write-through caching?" [Tests: basic pattern knowledge and ability to explain trade-offs]
- L4-L5: "You're caching user profiles with a 1-hour TTL. A user updates their profile. What problem occurs and how do you fix it?" [Tests: cache invalidation problem identification and event-driven solution]
- L5: "What is a cache stampede and when does it occur? How do you prevent it?" [Tests: thundering herd knowledge, mutex lock or probabilistic early expiration solutions]
- L5-L6: "Design the caching layer for Twitter's home timeline. How do you cache 300M users' personalized feeds?" [Tests: cache key design, fan-out vs pull, Redis sorted sets, working set calculation]
- L6: "What's the difference between Redis and Memcached? When would you choose each?" [Tests: depth of knowledge on data structures, persistence, pub/sub, cluster models]
- L6-L7: "Design a caching system that needs to handle 10 million cache keys with sub-millisecond invalidation globally across 3 regions." [Tests: CDN surrogate keys, Redis replication lag, eventual consistency of cache invalidation, global CDN purge APIs]
Key takeaways
- Caching exists because the latency difference between RAM (100ns) and database (1-10ms) is 10,000-100,000x. At any meaningful scale, caching is structural, not optional.
- Use cache-aside as your default read pattern. Write event-driven DEL on every write, with TTL as a safety fallback. This gives near-zero stale windows without write-through complexity.
- "Cache invalidation is one of the two hard problems in CS" — because race conditions, missed invalidation paths, and stampedes make it genuinely hard. Always use DEL (not UPDATE) on writes to avoid race conditions.
- Cache stampede (thundering herd) occurs when a hot cache key expires and many requests hit the DB simultaneously. Prevent with: mutex locks for high-traffic keys, probabilistic early expiration, and cache warming for predictable traffic events.
- Choose Redis over Memcached for 95% of use cases: rich data structures, optional persistence, pub/sub for event-driven invalidation, and native cluster support. Add an in-process cache (Guava/Caffeine) as an L1 cache in front of Redis for the hottest < 1% of data.
Before you move on: can you answer these?
You're caching user profile data in Redis with a 1-hour TTL. A user updates their name. For up to 1 hour, other requests might see the old name. What are your options to fix this, and what are the trade-offs?
Option 1: Event-driven invalidation — on profile update, immediately DEL the cache key. Next request fetches fresh data. Pros: near-zero stale window (milliseconds). Cons: every write path must include the DEL; if DEL fails (Redis connection issue), stale data persists until TTL. Option 2: Write-through — on profile update, write the new data to both DB and Redis simultaneously. Pros: cache always has fresh data, no staleness. Cons: every write requires a Redis call (higher write latency, Redis becomes critical path). Option 3: Reduce TTL — use 5 minutes instead of 1 hour. Pros: simpler, no code changes. Cons: more DB load (12x more DB reads per cache entry). Option 4: Real-time notification + push invalidation — publish an event when profile updates; subscribers delete the cache key. Pros: decoupled. Cons: complex architecture. For most production apps: use option 1 (event-driven DEL) + keep the 1-hour TTL as a safety net for missed invalidations.
Your interview question is: design a caching strategy for Twitter's Home Timeline. The feed contains the last 800 tweets from accounts the user follows. It changes constantly as new tweets are posted. How do you design the cache?
The Twitter home timeline is one of the most studied caching problems. Key requirements: (1) Each feed is personalized — user_id must be part of the cache key. (2) Feeds change constantly — new tweets are written every second. (3) Most users have < 2,000 follows — manageable fan-out. (4) Celebrities (10M+ followers) can't use naive fan-out. Design: Cache key: feed:user_id (e.g., feed:12345). Cache value: ordered list of the last 800 tweet IDs (just IDs, not full tweet data). When a user posts a tweet: fan-out to all followers' feed caches using LPUSH (add to front) + LTRIM (keep only 800 items). Tweet content is cached separately: tweet:tweet_id → full tweet JSON. On feed read: LRANGE feed:user_id 0 99 (get 100 tweet IDs), then pipeline GET tweet:tweet_id for each ID. TTL: Feed cache: 7 days (with event-driven LPUSH — no need to expire since it's kept fresh). Tweet cache: 24 hours (tweets rarely change after posting). Celebrity exception: for accounts with > 1M followers, skip fan-out. At read time, merge their recent tweets with the pre-computed fan-out feed. Cache hit rate target: > 99% for active users. Hit rate for 30-day-inactive users: irrelevant (let it expire).
Your Redis cache has a 90% hit rate, but during peak traffic, DB latency spikes. Explain why this might happen and how you diagnose and fix it.
The 10% cache miss rate at peak traffic is the culprit. At 100K QPS and 90% hit rate: 10K QPS hits the DB during normal traffic. If peak is 3x normal (300K QPS): 30K QPS hits the DB. If the DB's comfortable limit is 20K QPS, it saturates during peak. The math reveals: improving hit rate from 90% to 99% reduces DB load from 30K to 3K peak QPS — a 10x improvement without adding any DB infrastructure. Diagnosis steps: (1) redis-cli INFO stats | grep keyspace to confirm hit rate. (2) CloudWatch DB CPU and connection count to correlate DB saturation with traffic peaks. (3) Find the specific cache keys with low hit rates (Redis monitoring with CloudWatch Enhanced Monitoring for ElastiCache or keyspace notifications + sampling). Fixes by root cause: if hot data with short TTL → increase TTL. If popular queries not being cached → add caching to those code paths. If stampede during TTL expiry → add jitter. If DB itself is slow → add DB read replicas to handle the miss traffic. The goal: design for 99%+ hit rate on the hottest 20% of data.
💡 Analogy
Caching is like keeping document copies on your desk instead of walking to the file room every time. Documents on your desk (in-process memory cache) = sub-millisecond. Documents in the filing cabinet in the same office (Redis, same datacenter) = half a second walk. Documents in the basement archive (database) = elevator ride, 5 minutes. The trick is knowing which documents to keep on your desk (hot data), how long a copy stays valid before it goes stale (TTL), and when to run back to the file room to get a fresh copy (cache invalidation). The documents on your desk have expiry dates — some go stale in an hour (news articles), some in a year (product images), some never go stale (static files).
⚡ Core Idea
Cache works by trading freshness for speed. The closer to the user, the faster (and staler) the cache. Every cache layer answers the question: "Is the performance gain worth the staleness risk for this data?" Cache-aside with TTL is the universal default; event-driven invalidation for freshness-sensitive data; write-through for critical data. The hard part is cache invalidation — use event-driven DEL on writes plus a TTL as a fallback to handle missed invalidations.
🎯 Why It Matters
At any meaningful scale, caching is not optional — it's structural. A database with 50K QPS capacity behind a service receiving 500K QPS will be destroyed without a cache. The cache hit rate determines your entire DB infrastructure cost: at 90% hit rate, your DB sees 50K QPS. At 99% hit rate, your DB sees 5K QPS — 10x less, meaning 10x cheaper DB infrastructure. The 9% difference in hit rate determines whether you need a $500/month DB instance or a $5,000/month one.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Discussion
Questions? Discuss in the community or start a thread below.
Join DiscordIn-app Q&A
Sign in to start or join a thread.