System Design: Distributed Cache
Caching is the single most effective performance optimization in distributed systems. A well-placed cache can reduce database load by 90%, cut response times from 200ms to 5ms, and save thousands of dollars in infrastructure costs.
But distributed caching introduces its own class of problems: cache invalidation (the second hardest problem in CS), thundering herds, hot key issues, and consistency guarantees. Let's design a distributed cache that handles all of them.
requirements
Functional:
- Key-value storage with TTL (time-to-live) support
- Support for multiple data types (strings, hashes, lists, sorted sets)
- Cache-aside, write-through, and write-behind patterns
- Cluster mode with automatic sharding
- Pub/Sub for cache invalidation events
Non-functional:
- Sub-millisecond read latency (p99 < 1ms)
- Support 1M+ operations per second
- High availability with automatic failover
- Linear horizontal scalability
- Memory-efficient storage
caching patterns
Before designing the cache infrastructure, understand how applications interact with it.
Cache-Aside (Lazy Loading):
┌────────┐ 1. GET key ┌───────┐
│ App │──────────────▶│ Cache │
│ Server │◀──────────────│ │
│ │ 2. miss └───────┘
│ │
│ │ 3. query ┌───────┐
│ │──────────────▶│ DB │
│ │◀──────────────│ │
│ │ 4. result └───────┘
│ │
│ │ 5. SET key ┌───────┐
│ │──────────────▶│ Cache │
└────────┘ └───────┘
Write-Through:
App ──write──▶ Cache ──write──▶ DB (synchronous)
Write-Behind (Write-Back):
App ──write──▶ Cache ──async──▶ DB (batched, asynchronous)
Cache-aside is the most common pattern. The application checks the cache first, falls through to the database on miss, and populates the cache for next time. Simple, but the first request for every key hits the database.
Write-through keeps the cache and database in sync by writing to both on every update. Slower writes but the cache is always fresh.
Write-behind writes to cache immediately and flushes to the database asynchronously in batches. Fastest writes but you can lose data if the cache crashes before flushing.
For most applications, cache-aside with TTL is the right default.
consistent hashing
With multiple cache nodes, you need to decide which node stores which key. Naive hashing (hash(key) % N) breaks catastrophically when you add or remove a node — every key remaps, causing a thundering herd of cache misses.
Consistent hashing solves this. Map both nodes and keys onto a ring. Each key is stored on the first node clockwise from its position.
Consistent Hash Ring:
Node A (pos 0)
╱╲
╱ ╲
╱ ╲
Node D ╱ ╲ Node B
(pos 270) (pos 90)
╲ ╱
╲ ╱
╲ ╱
╲╱
Node C (pos 180)
Key "user:123" hashes to position 45
→ Stored on Node B (first node clockwise)
Key "session:456" hashes to position 200
→ Stored on Node D (first node clockwise)
Add Node E at position 135:
→ Only keys between 90-135 move from C to E
→ All other keys stay put
When you add a node, only K/N keys need to move (K = total keys, N = total nodes). Without consistent hashing, all K keys would need to move.
Virtual nodes: Real consistent hashing uses virtual nodes — each physical node gets 100-200 positions on the ring. This ensures even distribution. Without virtual nodes, the ring can be severely unbalanced.
eviction policies
When the cache is full, something has to go. The eviction policy determines what.
Eviction Policies:
┌─────────────┬──────────────────────────────────────┐
│ Policy │ How it works │
├─────────────┼──────────────────────────────────────┤
│ LRU │ Evict least recently used │
│ │ Good default for most workloads │
├─────────────┼──────────────────────────────────────┤
│ LFU │ Evict least frequently used │
│ │ Better for skewed access patterns │
├─────────────┼──────────────────────────────────────┤
│ TTL-based │ Evict expired keys first │
│ │ Predictable, application-controlled │
├─────────────┼──────────────────────────────────────┤
│ Random │ Evict random key │
│ │ Surprisingly good in practice │
├─────────────┼──────────────────────────────────────┤
│ allkeys-lru │ LRU across ALL keys (Redis default) │
│ │ Best when everything is cacheable │
└─────────────┴──────────────────────────────────────┘
Redis implements approximate LRU — it samples 5 random keys and evicts the least recently used among them. This is faster than true LRU (which requires a linked list) and nearly as accurate. Redis 4.0+ also supports LFU, which works better when some keys are accessed in bursts but shouldn't be evicted between bursts.
My recommendation: Use allkeys-lru as the default. Set TTLs on everything. If you have keys that must never be evicted, use volatile-lru (only evict keys with a TTL set) and keep your permanent keys TTL-free.
cache stampede prevention
The classic thundering herd: a popular cache key expires, 1,000 concurrent requests all miss the cache simultaneously, and all 1,000 hit the database at once.
Cache Stampede:
Time T: Key "popular-page" expires
│
┌───────────┼───────────┐
│ │ │
Req 1 Req 2 Req 3 (1000 requests)
│ │ │
▼ ▼ ▼
Cache Cache Cache
MISS MISS MISS
│ │ │
▼ ▼ ▼
┌─────────────────────────┐
│ DATABASE OVERLOADED │ ← 1000 identical queries
└─────────────────────────┘
Three solutions:
- Locking (mutex). When the first request gets a cache miss, it acquires a distributed lock (Redis SETNX). Other requests wait or return stale data. Only one request hits the database.
Cache miss → try SETNX lock_key → success?
YES → query DB → set cache → release lock
NO → wait 50ms → retry cache lookup
-
Early expiration (probabilistic). Before a key expires, some requests proactively refresh it. Each request checks: if TTL < 30 seconds AND random() < 0.1, refresh the cache in the background. This spreads the refresh load.
-
Never expire (background refresh). Set TTL to infinity. A background job refreshes keys before they go stale. This is the best approach for keys you can identify in advance (homepage, popular product pages).
I'd use locking for most cases and background refresh for the known hot keys.
hot key problem
Even with consistent hashing, some keys receive disproportionate traffic. A viral tweet, a flash sale product, a celebrity's profile — one key can saturate a single cache node.
Hot Key Mitigation:
Normal: Key → Node A (100% traffic)
With replicas:
Key_r1 → Node A (33% traffic)
Key_r2 → Node B (33% traffic)
Key_r3 → Node C (33% traffic)
Client reads from random replica.
Solution 1: Client-side local cache. Each app server caches the hottest keys in memory (L1 cache). TTL of 1-5 seconds. This absorbs the majority of reads before they hit the distributed cache (L2).
Solution 2: Key replication. Append a random suffix (1-3) to the key. Store 3 copies on different nodes. Read from a random copy. This spreads the load across 3 nodes instead of 1.
Solution 3: Read replicas. Redis supports read replicas. Route hot key reads to replicas, writes to the primary.
Redis vs Memcached
The two most common distributed cache implementations. Here's when to use each.
┌──────────────┬───────────────────┬───────────────────┐
│ Feature │ Redis │ Memcached │
├──────────────┼───────────────────┼───────────────────┤
│ Data types │ Strings, hashes, │ Strings only │
│ │ lists, sets, │ │
│ │ sorted sets, │ │
│ │ streams, HyperLL │ │
├──────────────┼───────────────────┼───────────────────┤
│ Persistence │ RDB + AOF │ None │
├──────────────┼───────────────────┼───────────────────┤
│ Replication │ Primary-replica │ None (client-side) │
├──────────────┼───────────────────┼───────────────────┤
│ Clustering │ Built-in (Redis │ Client-side only │
│ │ Cluster) │ │
├──────────────┼───────────────────┼───────────────────┤
│ Threading │ Single-threaded │ Multi-threaded │
│ │ (6.0+ I/O threads)│ │
├──────────────┼───────────────────┼───────────────────┤
│ Memory │ Higher overhead │ Lower overhead │
│ efficiency │ per key │ (slab allocator) │
├──────────────┼───────────────────┼───────────────────┤
│ Best for │ Complex data, │ Simple key-value, │
│ │ pub/sub, queues, │ maximum throughput, │
│ │ persistence needs │ large working sets │
└──────────────┴───────────────────┴───────────────────┘
Choose Redis when you need data structures (sorted sets for leaderboards, pub/sub for invalidation, streams for event processing) or persistence. Choose Memcached when you need pure key-value caching with maximum memory efficiency and your data fits the string-only model.
In practice, Redis wins 90% of the time because the richer data types are too useful to give up. Memcached still dominates at Facebook-scale where memory efficiency on simple key-value workloads saves millions in infrastructure.
cache invalidation strategies
The hardest part. When the source data changes, how do you ensure the cache reflects it?
TTL-based: Set a TTL and accept stale data for the duration. Simple, works for most cases. A 60-second TTL means data is at most 60 seconds stale.
Event-driven: When data changes, publish an invalidation event. Cache subscribers delete the stale key. This is more complex but gives you near-zero staleness.
Versioned keys: Include a version number in the cache key. When data changes, increment the version. Old keys naturally expire via TTL. No explicit invalidation needed.
Versioned Key Strategy:
v1: cache:product:123:v1 → {...old data...}
↓ product updated, version incremented
v2: cache:product:123:v2 → {...new data...}
↓ v1 expires via TTL (never explicitly deleted)
My preference: TTL for most things (good enough), event-driven invalidation for data where staleness has business impact (pricing, inventory, permissions).
what I'd build at each scale
1k ops/sec: A single Redis instance. No clustering needed. Enable persistence (RDB snapshots). Done.
100k ops/sec: Redis Cluster with 6 nodes (3 primary, 3 replica). Consistent hashing via Redis Cluster's hash slot mechanism. Client-side connection pooling.
1M+ ops/sec: Redis Cluster + client-side L1 cache (in-memory, 5-second TTL for hot keys). Key replication for known hot keys. Multi-region with independent clusters per region.
Start simple. Add complexity only when monitoring proves you need it.
More in System Design
System Design: Distributed Job Scheduler
Designing a cron-at-scale system — priority queues, exactly-once execution, retry with dead letter queues, and the monitoring that keeps it honest.
System Design: File Storage Service
Designing S3-like object storage — chunking, deduplication, CDN integration, and the metadata layer that ties it all together.
System Design: Payment System
Idempotency, double-entry bookkeeping, webhook handling, and PCI compliance — the system where bugs cost real money.