System DesignMarch 19, 20268 min read

System Design: Distributed Cache

system-designarchitecturecachingredis

Caching is the single most effective performance optimization in distributed systems. A well-placed cache can reduce database load by 90%, cut response times from 200ms to 5ms, and save thousands of dollars in infrastructure costs.

But distributed caching introduces its own class of problems: cache invalidation (the second hardest problem in CS), thundering herds, hot key issues, and consistency guarantees. Let's design a distributed cache that handles all of them.

requirements

Functional:

Key-value storage with TTL (time-to-live) support
Support for multiple data types (strings, hashes, lists, sorted sets)
Cache-aside, write-through, and write-behind patterns
Cluster mode with automatic sharding
Pub/Sub for cache invalidation events

Non-functional:

Sub-millisecond read latency (p99 < 1ms)
Support 1M+ operations per second
High availability with automatic failover
Linear horizontal scalability
Memory-efficient storage

caching patterns

Before designing the cache infrastructure, understand how applications interact with it.

Cache-Aside (Lazy Loading):
┌────────┐  1. GET key    ┌───────┐
│  App    │──────────────▶│ Cache  │
│  Server │◀──────────────│        │
│         │  2. miss       └───────┘
│         │
│         │  3. query      ┌───────┐
│         │──────────────▶│  DB    │
│         │◀──────────────│        │
│         │  4. result     └───────┘
│         │
│         │  5. SET key    ┌───────┐
│         │──────────────▶│ Cache  │
└────────┘                └───────┘

Write-Through:
  App ──write──▶ Cache ──write──▶ DB (synchronous)

Write-Behind (Write-Back):
  App ──write──▶ Cache ──async──▶ DB (batched, asynchronous)

Cache-aside is the most common pattern. The application checks the cache first, falls through to the database on miss, and populates the cache for next time. Simple, but the first request for every key hits the database.

Write-through keeps the cache and database in sync by writing to both on every update. Slower writes but the cache is always fresh.

Write-behind writes to cache immediately and flushes to the database asynchronously in batches. Fastest writes but you can lose data if the cache crashes before flushing.

For most applications, cache-aside with TTL is the right default.

consistent hashing

With multiple cache nodes, you need to decide which node stores which key. Naive hashing (hash(key) % N) breaks catastrophically when you add or remove a node — every key remaps, causing a thundering herd of cache misses.

Consistent hashing solves this. Map both nodes and keys onto a ring. Each key is stored on the first node clockwise from its position.

Consistent Hash Ring:

            Node A (pos 0)
              ╱╲
             ╱  ╲
            ╱    ╲
   Node D  ╱      ╲  Node B
  (pos 270)        (pos 90)
            ╲      ╱
             ╲    ╱
              ╲  ╱
               ╲╱
            Node C (pos 180)

Key "user:123" hashes to position 45
  → Stored on Node B (first node clockwise)

Key "session:456" hashes to position 200
  → Stored on Node D (first node clockwise)

Add Node E at position 135:
  → Only keys between 90-135 move from C to E
  → All other keys stay put

When you add a node, only K/N keys need to move (K = total keys, N = total nodes). Without consistent hashing, all K keys would need to move.

Virtual nodes: Real consistent hashing uses virtual nodes — each physical node gets 100-200 positions on the ring. This ensures even distribution. Without virtual nodes, the ring can be severely unbalanced.

eviction policies

When the cache is full, something has to go. The eviction policy determines what.

Eviction Policies:
┌─────────────┬──────────────────────────────────────┐
│ Policy      │ How it works                          │
├─────────────┼──────────────────────────────────────┤
│ LRU         │ Evict least recently used             │
│             │ Good default for most workloads       │
├─────────────┼──────────────────────────────────────┤
│ LFU         │ Evict least frequently used           │
│             │ Better for skewed access patterns     │
├─────────────┼──────────────────────────────────────┤
│ TTL-based   │ Evict expired keys first              │
│             │ Predictable, application-controlled    │
├─────────────┼──────────────────────────────────────┤
│ Random      │ Evict random key                      │
│             │ Surprisingly good in practice          │
├─────────────┼──────────────────────────────────────┤
│ allkeys-lru │ LRU across ALL keys (Redis default)   │
│             │ Best when everything is cacheable      │
└─────────────┴──────────────────────────────────────┘

Redis implements approximate LRU — it samples 5 random keys and evicts the least recently used among them. This is faster than true LRU (which requires a linked list) and nearly as accurate. Redis 4.0+ also supports LFU, which works better when some keys are accessed in bursts but shouldn't be evicted between bursts.

My recommendation: Use allkeys-lru as the default. Set TTLs on everything. If you have keys that must never be evicted, use volatile-lru (only evict keys with a TTL set) and keep your permanent keys TTL-free.

cache stampede prevention

The classic thundering herd: a popular cache key expires, 1,000 concurrent requests all miss the cache simultaneously, and all 1,000 hit the database at once.

Cache Stampede:

Time T: Key "popular-page" expires
                │
    ┌───────────┼───────────┐
    │           │           │
 Req 1       Req 2       Req 3    (1000 requests)
    │           │           │
    ▼           ▼           ▼
  Cache       Cache       Cache
  MISS        MISS        MISS
    │           │           │
    ▼           ▼           ▼
  ┌─────────────────────────┐
  │    DATABASE OVERLOADED   │  ← 1000 identical queries
  └─────────────────────────┘

Three solutions:

Locking (mutex). When the first request gets a cache miss, it acquires a distributed lock (Redis SETNX). Other requests wait or return stale data. Only one request hits the database.

Cache miss → try SETNX lock_key → success?
  YES → query DB → set cache → release lock
  NO  → wait 50ms → retry cache lookup

Early expiration (probabilistic). Before a key expires, some requests proactively refresh it. Each request checks: if TTL < 30 seconds AND random() < 0.1, refresh the cache in the background. This spreads the refresh load.
Never expire (background refresh). Set TTL to infinity. A background job refreshes keys before they go stale. This is the best approach for keys you can identify in advance (homepage, popular product pages).

I'd use locking for most cases and background refresh for the known hot keys.

hot key problem

Even with consistent hashing, some keys receive disproportionate traffic. A viral tweet, a flash sale product, a celebrity's profile — one key can saturate a single cache node.

Hot Key Mitigation:

Normal:  Key → Node A (100% traffic)

With replicas:
  Key_r1 → Node A (33% traffic)
  Key_r2 → Node B (33% traffic)
  Key_r3 → Node C (33% traffic)

Client reads from random replica.

Solution 1: Client-side local cache. Each app server caches the hottest keys in memory (L1 cache). TTL of 1-5 seconds. This absorbs the majority of reads before they hit the distributed cache (L2).

Solution 2: Key replication. Append a random suffix (1-3) to the key. Store 3 copies on different nodes. Read from a random copy. This spreads the load across 3 nodes instead of 1.

Solution 3: Read replicas. Redis supports read replicas. Route hot key reads to replicas, writes to the primary.

Redis vs Memcached

The two most common distributed cache implementations. Here's when to use each.

┌──────────────┬───────────────────┬───────────────────┐
│ Feature      │ Redis              │ Memcached          │
├──────────────┼───────────────────┼───────────────────┤
│ Data types   │ Strings, hashes,  │ Strings only       │
│              │ lists, sets,      │                    │
│              │ sorted sets,      │                    │
│              │ streams, HyperLL  │                    │
├──────────────┼───────────────────┼───────────────────┤
│ Persistence  │ RDB + AOF         │ None               │
├──────────────┼───────────────────┼───────────────────┤
│ Replication  │ Primary-replica   │ None (client-side) │
├──────────────┼───────────────────┼───────────────────┤
│ Clustering   │ Built-in (Redis   │ Client-side only   │
│              │ Cluster)          │                    │
├──────────────┼───────────────────┼───────────────────┤
│ Threading    │ Single-threaded   │ Multi-threaded     │
│              │ (6.0+ I/O threads)│                    │
├──────────────┼───────────────────┼───────────────────┤
│ Memory       │ Higher overhead   │ Lower overhead     │
│ efficiency   │ per key           │ (slab allocator)   │
├──────────────┼───────────────────┼───────────────────┤
│ Best for     │ Complex data,     │ Simple key-value,  │
│              │ pub/sub, queues,  │ maximum throughput, │
│              │ persistence needs │ large working sets  │
└──────────────┴───────────────────┴───────────────────┘

Choose Redis when you need data structures (sorted sets for leaderboards, pub/sub for invalidation, streams for event processing) or persistence. Choose Memcached when you need pure key-value caching with maximum memory efficiency and your data fits the string-only model.

In practice, Redis wins 90% of the time because the richer data types are too useful to give up. Memcached still dominates at Facebook-scale where memory efficiency on simple key-value workloads saves millions in infrastructure.

cache invalidation strategies

The hardest part. When the source data changes, how do you ensure the cache reflects it?

TTL-based: Set a TTL and accept stale data for the duration. Simple, works for most cases. A 60-second TTL means data is at most 60 seconds stale.

Event-driven: When data changes, publish an invalidation event. Cache subscribers delete the stale key. This is more complex but gives you near-zero staleness.

Versioned keys: Include a version number in the cache key. When data changes, increment the version. Old keys naturally expire via TTL. No explicit invalidation needed.

Versioned Key Strategy:

v1: cache:product:123:v1 → {...old data...}
  ↓ product updated, version incremented
v2: cache:product:123:v2 → {...new data...}
  ↓ v1 expires via TTL (never explicitly deleted)

My preference: TTL for most things (good enough), event-driven invalidation for data where staleness has business impact (pricing, inventory, permissions).

what I'd build at each scale

1k ops/sec: A single Redis instance. No clustering needed. Enable persistence (RDB snapshots). Done.

100k ops/sec: Redis Cluster with 6 nodes (3 primary, 3 replica). Consistent hashing via Redis Cluster's hash slot mechanism. Client-side connection pooling.

1M+ ops/sec: Redis Cluster + client-side L1 cache (in-memory, 5-second TTL for hot keys). Key replication for known hot keys. Multi-region with independent clusters per region.

Start simple. Add complexity only when monitoring proves you need it.