You can't have it always-on and always-right. Pick one.
Every distributed system makes you choose, and most teams choose by accident — then act genuinely surprised when the tradeoff shows up at 2 a.m. as a customer screenshot of a wrong number. This post is the choice, stripped of the jargon that usually buries it, with real examples you can run.
The moment the choice gets forced
Here's the setup. Your data lives on more than one machine — a primary and a replica, two nodes, a cluster, whatever. You did that on purpose, for durability and scale. Most of the time those copies agree and nobody thinks about it.
Then the network between them hiccups. Not "goes down forever" — just a few seconds where node A can't reach node B. This is called a network partition, and here's the uncomfortable truth: it is not a question of if, it's when. Networks are not your friend. Packets drop, links saturate, a switch reboots.
In that moment — copies can't talk to each other, and a read request comes in — your system has exactly two options, and it cannot have both:
- Refuse to answer until it's sure it has the latest, agreed-upon truth. (It stays correct, but it's briefly unavailable.)
- Answer immediately with whatever this node has, and admit the value might be a few seconds stale. (It stays available, but it's briefly inconsistent.)
That's it. That's the whole famous "CAP theorem" in one sentence: when the network partitions (P), you choose between Consistency (C) and Availability (A). You don't get to keep both during the partition. Anyone selling you "always-on AND always-right" is hiding the moment the network forces the choice.
The two answers, in plain language
Strong consistency means every read sees the latest write. The instant a write succeeds, every subsequent read anywhere returns the new value. To guarantee that, the system sometimes has to say "I can't answer right now" — because the only honest alternative during a partition would be to risk handing you a stale value.
Eventual consistency means the system always answers, immediately, from whatever node you reached — and the copies reconcile "eventually," usually in milliseconds to seconds. The price: for that brief window, a read might return yesterday's value. Given a little time and no new writes, all copies converge to the same answer. Hence "eventual."
Neither is "correct." They're bets about which kind of pain hurts less for this specific data.
Make it concrete: the same read, two ways
DynamoDB lets you pick per-read, which makes the tradeoff wonderfully literal:
import boto3
table = boto3.resource("dynamodb").Table("accounts")
# Eventually consistent read (the DEFAULT): fast, cheap, might be slightly stale
table.get_item(Key={"id": "user-123"})
# Strongly consistent read: guaranteed latest, costs more, can fail during a partition
table.get_item(Key={"id": "user-123"}, ConsistentRead=True)
One boolean. ConsistentRead=True says "I need the truth, and I'll accept that this read might be slower or unavailable to get it." The default says "answer me fast, stale-by-a-moment is fine."
On the write side, relational databases expose the same dial through replication. In Postgres:
-- synchronous_commit = on : the primary waits for the replica to confirm
-- the write before telling the client "done."
-- Stronger consistency, higher latency.
-- synchronous_commit = off : the primary confirms immediately and ships
-- to the replica in the background.
-- Lower latency, a window where the replica is behind.
SET synchronous_commit = on;
Same fundamental choice, opposite end of the pipe: wait for agreement (consistent, slower), or confirm now and reconcile later (available, faster).
The "read your own write" trap
Here's the bug eventual consistency hands you, and almost everyone hits it once. A user updates their profile, the page reloads, and their change isn't there — because the reload hit a replica that hadn't caught up yet.
update_profile(user_id, {"name": "New Name"}) # writes to the primary
profile = read_profile(user_id) # reads from a lagging replica
# profile["name"] is still the OLD name. The user thinks the save failed.
The data wasn't lost. The replica just hadn't received it yet. But your user doesn't know that — they retype it, hit save again, and now you've got a support ticket about a "broken" feature that's working exactly as designed. The fix is to route reads-after-writes to the primary, or use a session-consistency guarantee — but you only know to do that if you understood you chose eventual consistency in the first place.
How to actually pick (per piece of data, not per company)
The mistake is picking one religion for your whole system. You choose per data type, by asking what a stale or refused answer costs:
→ Account balance, inventory count, anything money or correctness-critical → lean strong. A stale balance is a double-spend, an overdraft, a lawsuit. Here you'd rather the system say "one moment" than confidently lie. Refuse and retry beats fast and wrong.
→ Like counts, view counts, feed ordering, "last seen" timestamps → lean eventual. Nobody is harmed if the like counter is off by three for two seconds. Availability wins easily; just answer.
→ Quorum systems give you a middle dial. In systems like Cassandra you tune it per query: require how many nodes must agree on a read (R) and a write (W) out of N copies. If R + W > N, any read is guaranteed to see the latest write — you've bought strong consistency by making reads or writes wait for more nodes. Lower the numbers and you trade that guarantee for speed and availability. The knob is right there in the query.
-- Cassandra: consistency is a per-statement choice
CONSISTENCY QUORUM; -- wait for a majority — stronger, slower
CONSISTENCY ONE; -- first node to answer wins — faster, possibly stale
The principle: every "always-on and always-right" architecture is hiding the exact moment it will be forced to break one of those two promises. Your job is to find that moment, decide on purpose which promise you break and for which data — and find out before your users do, not after.
So here's the real question: the system you ship today — which did it actually choose, strong or eventual? And did anyone choose it on purpose, or did it just default its way into a decision you'll meet at 2 a.m.?
I'm Vinicius Fagundes — principal data engineer, independent, and an MBA lecturer in São Paulo. I help teams make these tradeoffs deliberately instead of accidentally. That work lives at vf-insights.com.











