Most uptime monitors work the same way: one probe somewhere checks your site, and if that probe can't reach it, you get paged. I ran tools like that for years. Maybe half the "down" alerts I got at 3am were the probe's own network having a bad minute — not my site.
A check from a single location is one machine's opinion at one moment. A local routing hiccup, a flaky peering link, a momentary DNS blip on the probe's side: from one vantage point they all look exactly like "your site is down."
So when I built SonarOps, I made the confirmation cascade across regions. Checks run every 60 seconds. When one probe sees an outage, a second probe in another region (EU and USA) re-checks before any alert fires. Second probe also can't reach you? It's real, you get paged. It can? The first probe just had a bad moment, and you stay asleep.
It isn't a vote or a quorum. It's a cascade: one probe raises a hand, a second one somewhere else confirms before I bother you.
The tradeoff is honest: that step costs a second or two of detection delay. I'll take it. I'd rather hear about a real outage two seconds later than get woken for a phantom one that fixes itself before I've opened the laptop.
Three things I learned building this:
- Retrying on the same probe isn't enough. If the probe's local network is the problem, a retry from the same place just confirms its bad minute.
- Geography beats count. Two probes in one datacenter aren't independent. Two in different regions are.
- Most false pages are network, not server. Once I cross-checked across regions, the 3am noise basically went away.
Full version — how cross-region monitoring works and where single-probe checks break: Multi-location monitoring guide.
And if you just want to play with the numbers, I keep a few free calculators (no signup) — uptime SLA, downtime cost, error budget: sonarops.it/tools.













