Best Practices3 February 20258 min read

The On-Call Runbook Every Small Team Needs

The 3am incident is not the time to figure out your process. A runbook is a pre-written guide that tells your on-call engineer exactly what to do when they're groggy, stressed, and staring at a red dashboard.

Best PracticesUptime MonitoringWebsite MonitoringApi MonitoringCron Job Monitoring

Best Practices

What a runbook is (and is not)

A runbook is not a troubleshooting guide for every possible failure. It's a checklist for the first 30 minutes of an incident — the period where teams most often make mistakes.

The 5-step incident response framework

1. Acknowledge (0–2 min) — Claim the incident. Post in your team channel.

2. Assess (2–5 min) — What's actually broken? Check your monitoring dashboard.

3. Communicate (5 min) — Update your status page with 'Investigating'.

4. Mitigate (5–30 min) — Get to a working state as fast as possible.

5. Document (post-incident) — Write a blameless postmortem within 48 hours.

Severity levels for small teams

SEV1 — Production is down. Wake up on-call. Update status page. Escalate immediately.

SEV2 — Degraded. Handle during business hours.

SEV3 — Minor. Ticket and schedule fix.

Who to wake up and when

- On-call engineer: first responder for all SEV1/SEV2 - Engineering lead: escalate if not resolved in 30 minutes - CTO/VP: escalate for extended SEV1 or data breach

Tools and quick commands

# Check recent deploy
git log --oneline -10 origin/main

# Roll back git revert HEAD && git push

# Restart API docker compose restart api

# Check error logs docker compose logs --tail=100 api | grep ERROR ```

This article is available across the supported locale routes — use the language switcher above to change.

Feature Guide

Uptime Monitoring

AlertsDock gives teams uptime monitoring for websites, APIs, TCP checks, DNS checks, SSL expiry, and fast alert routing without enterprise overhead.

Read guide

Alternative Page

Better Stack Alternative

Compare AlertsDock with Better Stack for teams that want a more focused monitoring product covering uptime, cron jobs, status pages, and webhooks.

See comparison

AlertsDock Team

3 February 2025

Try AlertsDock free

Best Practices

Incident Playbooks That Auto-Execute: From Runbook to Runtime

Writing a runbook nobody reads at 3am is a waste. Writing one that auto-starts the instant a monitor goes down and logs every step is a force multiplier. Here's how to make on-call feel less like solo crisis response and more like following a checklist.

Best Practices

Monitoring Your CI/CD Pipeline: Catching Deploy Failures Before They Reach Users

A broken deployment pipeline is as bad as a broken service. When builds silently fail or deployments stall, you ship stale code and never know.

Best Practices

Log Management Without the Complexity: A Practical Guide for Growing Teams

Logs are the most verbose source of truth in your system. They are also the most expensive to store and search. Here is how to get maximum value from logs without drowning in them.

The On-Call Runbook Every Small Team Needs

What a runbook is (and is not)

The 5-step incident response framework

Severity levels for small teams

Who to wake up and when

Tools and quick commands

Uptime Monitoring

Better Stack Alternative

More articles

Incident Playbooks That Auto-Execute: From Runbook to Runtime

Monitoring Your CI/CD Pipeline: Catching Deploy Failures Before They Reach Users

Log Management Without the Complexity: A Practical Guide for Growing Teams