Alert Fatigue Is Real — Here's How to Fight It
Your on-call engineer has received 47 alerts this week. 44 of them resolved themselves. 2 were false positives. 1 was real. Do you think they responded to all 47 with the same urgency?
When everything is critical, nothing is. Learn how to tune your alert thresholds, reduce noise, and make sure your team actually responds when something important breaks.
What causes alert fatigue
- Threshold too sensitive. Alerting on any single HTTP error when your service handles 10,000 requests/minute generates constant noise. - No symptom-based alerting. Alerting on CPU > 80% rarely matters. - Alert duplication. Three separate monitors firing for the same underlying issue. - Chatty recovery notifications. A service that flaps generates alerts in both directions.
Threshold tuning
A good alert threshold is set at 3–4 standard deviations from your normal baseline.
For response time: if your p95 is normally 200ms, alerting at 500ms is appropriate. Alerting at 250ms will wake you up for every minor traffic spike.
For uptime: don't alert on a single failed check. Require 2 consecutive failures from at least 2 regions.
Symptom-based vs cause-based alerting
Alert on symptoms (user impact), not causes (system metrics).
✗ Cause-based: CPU > 90%, memory > 85% ✓ Symptom-based: API error rate > 5%, checkout flow failing
Routing alerts to the right channel
Slack/Discord — SEV2 and below. Email — daily digests, non-urgent notifications. Webhook — PagerDuty/OpsGenie integration for SEV1. SMS — only for SEV1 with an explicit on-call rotation.
The monthly alert review
1. Which alerts fired most frequently? 2. What percentage were actionable? 3. Which alerts were always followed by a recovery notification (flapping)? 4. Did any real incidents go undetected?
Delete or mute any alert with an actionable rate below 50%.
Feature Guide
Uptime Monitoring
AlertsDock gives teams uptime monitoring for websites, APIs, TCP checks, DNS checks, SSL expiry, and fast alert routing without enterprise overhead.
Read guideAlternative Page
Better Stack Alternative
Compare AlertsDock with Better Stack for teams that want a more focused monitoring product covering uptime, cron jobs, status pages, and webhooks.
See comparisonMore articles
Choosing the Right Alerting Channel: Email vs Slack vs PagerDuty vs SMS
The right alert at the wrong time through the wrong channel is as bad as no alert at all. Here is a practical framework for matching alert severity to the channel that will actually wake someone up.
Frontend Monitoring: Real User Monitoring vs Synthetic Testing
Backend uptime checks miss the browser. Real user monitoring shows you what actual users experience — slow renders, JavaScript errors, and failed resource loads that your API monitors never see.
Monitoring Your CI/CD Pipeline: Catching Deploy Failures Before They Reach Users
A broken deployment pipeline is as bad as a broken service. When builds silently fail or deployments stall, you ship stale code and never know.