I failed three system design interviews in a row.
Not because I didn't know the concepts. I knew them. Caching, sharding, consistent hashing, CAP theorem, message queues — I could define every one.
What I couldn't do: answer what came next.
"What happens when the cache gets stale?"
"Why are you sharding this?" *
*"So you'd ignore partition tolerance?"
Every time, I had the surface answer. Every time, the follow-up question exposed that I'd never thought one level deeper.
These are the 5 places that Gap cost me.
1. Everyone Adds a Cache. Almost Nobody Thinks About What Comes Next.
I used to say "add Redis" like it was a complete answer.
Performance problem? Redis. Slow API? Redis. The interviewer asks about the read load? Redis.
Then one interviewer asked, "What happens when the user updates their profile?"*
I stared. I knew how to add a cache. I hadn't thought once about what happens when the data behind it changes.
That's the trap. A cache is your phone saving images from apps so it doesn't re-download them every time—simple, fast, and invisible. In backend systems, Redis stores hot data entirely in RAM instead of hitting a database on every request. Think of it as a database that never sleeps. Responses in under a millisecond.
Most read traffic is for the same hot data—the same user profiles, the same popular posts. A cache catches all of that before the database ever has to wake up.
Here's how the cache-aside pattern looks in code—the most common one you'll see:
def get_user(user_id: str):
# Step 1: Check cache first
cached = redis.get(f"user:{user_id}")
if cached:
return json.loads(cached) # Cache hit ✓
# Step 2: Cache miss — go to the database
user = db.find_one({"_id": user_id})
# Step 3: Store result for next time (expires in 1 hour)
redis.setex(f"user:{user_id}", ttl=3600, value=json.dumps(user))
return user
# ⚠️ The hidden danger: what if the user updates their profile?
# If you forget: redis.delete(f"user:{user_id}")
# ...they'll see stale data for up to an hour.

Three strategies—each is right in one situation and quietly catastrophic in the wrong one.
I once watched a team spend three days debugging incorrect pricing at checkout. The cache populated correctly on product creation — but silently failed to invalidate when the price changed. Wrong prices were served for six weeks. Code looked fine. Tests passed.
A cache without an invalidation strategy isn't a performance win. It's a time bomb with a clean interface.
💬 Interviewer follow-up you must answer: "How does the cache know when to invalidate?" Have that answer ready before they ask.
2. Sharding: Impressive on a Whiteboard, Painful in Production
I once watched a candidate spend 25 minutes designing a sharding strategy for a system serving 3,000 daily users.
Custom shard keys. Cross-shard routing. Resharding logic. The interviewer stopped him mid-sentence.
"Why are you sharding this?"
He didn't have an answer. He didn't get the job.
Over-engineering isn't ambition. It's anxiety wearing the mask of thoroughness.
Sharding splits your database across multiple servers—each one owns a slice of the data. It's genuinely powerful and genuinely hard: joins across shards become painful, transactions need distributed coordination, and debugging requires knowing which shard holds your data.
The right order before you even think about sharding:

Exhaust every step before moving to the next. Most systems never need to go past step 2.
# ❌ Bad shard key — range-based timestamp sharding creates a hot shard
def get_shard(created_at: datetime) -> int:
# All writes in the current month go to the same shard
if created_at >= CURRENT_MONTH_START:
return NUM_SHARDS - 1 # every new write piles here
return hash(f"{created_at.year}-{created_at.month}") % (NUM_SHARDS - 1)
# Older shards go cold. The latest shard gets hammered.
# ✅ Good shard key — user_id distributes evenly
def get_shard(user_id: str) -> int:
return hash(user_id) % NUM_SHARDS
# Load spreads evenly regardless of when writes happen.
Instagram ran on a single Postgres instance far longer than most people realize. They only sharded when simpler options genuinely couldn't keep up.
💬 The answer that lands: Not "here's my sharding strategy." But "here's why I'd exhaust vertical scaling, read replicas, and caching first — and here's the signal that would tell me it's time."
3. Why Adding One Server Can Break Your Entire Cache
The short answer: naive hashing routes cache keys by key % N. Change N, and almost every key routes to a different server. Your cache goes cold instantly.
# ❌ Naive modulo — breaks the moment you add a server
servers = ["server_1", "server_2", "server_3"]
def get_server(key: str) -> str:
return servers[hash(key) % len(servers)]
get_server("user_123") # → "server_2"
# You add a 4th server to handle more load...
servers.append("server_4")
get_server("user_123") # → "server_1" ← different server!
# Almost every key now maps somewhere new.
# Your entire cache just went cold. Enjoy the database stampede.
Adding one server to a cache cluster can invalidate most of your cached data at once—causing every request to hit the database simultaneously. That's not a scaling win. That's an outage.
Consistent hashing is the fix. Instead of a circular ring, both servers and keys are mapped onto a circular ring. Each key belongs to the nearest server clockwise.
Add a server? It takes only the keys between itself and its neighbor—roughly 1/N of data. Everything else stays put.

Adding one node displaces ~1/N keys. With naive modulo hashing, you'd be moving almost everything.
Akamai was built on this. Their founders wrote the original paper in 1997, designing consistent hashing specifically to solve the server-addition problem at CDN scale. Redis Cluster and Cassandra use the same principle today.
💬 Why this matters in an interview: Most candidates skip this. Knowing why consistent hashing exists — not just what it is — signals you've thought seriously about distributed systems.
So far we've been talking about scaling reads and distributing data. But distributed systems fail in a second, harder way — not "how do you store more?" but "what happens when the parts of your system stop agreeing with each other?"
4. The CAP Theorem Is Taught Wrong. Here's What It Actually Means.
I once confidently explained the CAP theorem in an interview. The interviewer looked up and asked:
"So you'd consider building a system that ignores partition tolerance?"
I had nothing. The conversation got uncomfortable fast.
Here's the problem: CAP is almost always taught as "pick any two." That's misleading.
Partition tolerance isn't optional. Networks fail — servers lose the ability to talk to each other. It will happen to your system. So the real choice is
When a network partition occurs, do you prioritize consistency or availability?
| CP (Consistency first) | AP (Availability first) | |
|---|---|---|
| Behaviour | Refuses requests until partition heals | Keeps responding, may return stale data |
| Risk | Downtime during failures | Stale reads |
| Use when | Payments, inventory, anything financial | Social feeds, recommendations, analytics |
| Examples | CockroachDB, ZooKeeper, Postgres (sync replication) | Cassandra, DynamoDB, CouchDB |
Facebook chose AP for their social graph—a slightly stale follower count beats an app that won't load. Systems that prioritize CP—like CockroachDB or Postgres with synchronous replication—refuse writes during a partition rather than risk an inconsistent state. You'd rather reject a transaction than double-charge someone.
💬 The move: Don't just define CAP — apply it. "This is a payments system, so I'd use Postgres with synchronous replication. I'd rather reject a write during a network failure than risk charging someone twice."
5. Message Queues Don't Guarantee What You Think They Guarantee
Picture a restaurant on a Friday night. Orders are flying in faster than the kitchen can handle. If every waiter walked directly to a chef and demanded immediate attention, the kitchen collapses.
Instead, orders go on a ticket rail. Chefs work through them steadily. The kitchen stays calm no matter how busy the front gets.
That's a message queue.
Uber's trip events flow through Kafka. Netflix triggers encoding jobs through queues. Slack's notification pipeline is async.
Most candidates in interviews draw a queue, say, "This decouples the services," and move on. That's not wrong. But here's what nobody mentions until they're paged at 3am:
In most queue systems, design as if any message can arrive more than once.
Retries, consumer crashes, and network timeouts can all cause duplicate processing. A user gets charged twice, an email is sent twice, and a report generates twice.
def process_payment(message: dict):
payment_id = message["payment_id"]
# ✅ Idempotency check — safe to receive this twice
if db.payment_already_exists(payment_id):
print(f"Already processed {payment_id}. Skipping.")
return
# Process only if we haven't seen this before
stripe.charge(message["amount"], message["card_token"])
db.mark_payment_complete(payment_id)
# Without this: duplicate charge on retry.
# With this: second delivery is a no-op. ✓

The queue delivers to one. Pub/Sub delivers to all. Get this wrong and you'll either starve consumers or duplicate work across all of them.
There's a word for this: idempotency. It means the second call does nothing if the first one already worked. Stripe built it into their payment API from day one. Every queue-related incident I've seen—every duplicate charge, every double email—came down to idempotency missing somewhere in the pipeline.
💬 The line that stands out: "Consumers will check for duplicate message IDs before processing — queues guarantee at-least-once delivery, not exactly-once."
Quick Reference: Surface Answer vs Strong Answer
| Concept | What most candidates say | What lands in interviews |
|---|---|---|
| Caching | "Add Redis" | TTL strategy, invalidation on write, cache-aside vs write-through |
| Sharding | "Split the DB" | Last resort — exhaust vertical scaling, replicas, caching first |
| Consistent hashing | "Distributes keys evenly" | Why adding servers shouldn't remap everything |
| CAP theorem | "Pick any two" | CP vs AP during network partitions—with a real example applied |
| Message queues | "Decouples services" | At-least-once delivery, duplicate handling, idempotency |
The Pattern Behind All 5 Mistakes
The mistake was never "I didn't know Redis" or "I didn't know Kafka."
The mistake was treating components as answers.
Every component in a distributed system creates a new problem:
- Caches create invalidation
- Shards create routing
- Queues create retries
- Replicas create consistency trade-offs
- More servers create key remapping
The interview isn't testing whether you know the component. It's testing whether you know the consequence.
That's the only pattern worth memorising.
Found this useful? I write about system design, engineering interviews, and real production systems. Follow along—more coming.








![[System Design] GraphHopper Distance Matrix: Self-Host OSRM vs Haversine for Route Optimization](https://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fyoo5j6x1kubw2zbx6ayz.png)




