How to Write a Runbook That People Actually Use
Writing a runbook is easy. Writing a runbook that people actually use during incidents? That’s harder. Understanding why runbooks fail is the first step to writing better ones.
Most runbooks fail not because they’re wrong, but because they’re unusable under pressure. Here’s how to write a runbook that works when it matters.
How to Write a Runbook: The Fundamentals
Start with the End State
Before writing steps, define what success looks like:
# Restart API Service
## Success Criteria
- All API pods running and healthy
- Response time < 500ms
- No errors in logs for 5 minutes
When someone follows your runbook, they need to know when they’re done.
Write for Your Worst Day
You won’t use this runbook on a calm Tuesday afternoon. You’ll use it at 3 AM when something is broken and people are waiting.
Bad runbook writing:
“First, you’ll want to check the pods. You can use kubectl for this. The command would be something like kubectl get pods, though you may need to specify the namespace.”
Good runbook writing:
Check pod status:
```bash
kubectl get pods -n production -l app=api
```
Expected: All pods show `Running` status
One Action Per Step
Each step should be one command or one decision:
## ❌ Bad: Multiple actions in one step
### Step 1
Check the pods, look at the logs, and restart if needed.
## ✅ Good: One action per step
### Step 1: Check Pod Status
```bash
kubectl get pods -n production
```
### Step 2: Review Logs
```bash
kubectl logs deployment/api -n production --tail=100
```
### Step 3: Restart if Needed
```bash
kubectl rollout restart deployment/api -n production
```
How to Write a Runbook: Structure
Use a Consistent Template
Every runbook should follow the same structure. For a detailed breakdown, see our runbook template guide:
# [Title]
## Overview
What this runbook does and when to use it.
## Prerequisites
What you need before starting.
## Procedure
Step-by-step instructions.
## Verification
How to confirm success.
## Rollback
What to do if something goes wrong.
## Escalation
Who to contact if this doesn't work.
Include Copy-Pasteable Commands
Nobody should have to type commands from memory:
## ❌ Bad
Run the kubectl command to restart the deployment in production.
## ✅ Good
```bash
kubectl rollout restart deployment/api -n production
```
Show Expected Output
Tell people what they should see:
```bash
kubectl get pods -n production -l app=api
```
Expected output:
```
NAME READY STATUS RESTARTS AGE
api-5d4b9c6f7-abc12 1/1 Running 0 5m
api-5d4b9c6f7-def34 1/1 Running 0 5m
api-5d4b9c6f7-ghi56 1/1 Running 0 5m
```
How to Write a Runbook: Common Mistakes
Mistake 1: Assuming Context
Your runbook will be used by someone who doesn’t have your mental context.
## ❌ Bad
Connect to the server and check the logs.
## ✅ Good
SSH to the API server:
```bash
ssh admin@api-prod-1.example.com
```
Check application logs:
```bash
tail -f /var/log/api/application.log
```
Mistake 2: Skipping Edge Cases
Document what to do when things don’t go as expected:
### Step 3: Verify Deployment
```bash
kubectl rollout status deployment/api -n production
```
**If this times out:**
1. Check events: `kubectl get events -n production --sort-by='.lastTimestamp'`
2. Check pod details: `kubectl describe pod -l app=api -n production`
3. If pods are in CrashLoopBackOff, see [Crash Loop Runbook]
Mistake 3: Forgetting Prerequisites
List everything needed upfront:
## Prerequisites
**Access Required:**
- [ ] VPN connected
- [ ] kubectl configured for production
- [ ] AWS credentials valid
**Verify Access:**
```bash
kubectl config current-context # Should show: production
aws sts get-caller-identity # Should show your account
```
Mistake 4: No Rollback Plan
Every runbook needs an escape hatch:
## Rollback
If the deployment causes issues:
```bash
kubectl rollout undo deployment/api -n production
```
If rollback also fails, immediately:
1. Page the platform team
2. Post in #incidents with details
3. Do not attempt further changes
How to Write a Runbook: Maintenance
Date Your Runbooks
Add last-updated dates and review schedules:
**Last Updated:** 2025-11-10
**Last Tested:** 2025-10-15
**Next Review:** 2025-12-10
**Owner:** @platform-team
Test Regularly
A runbook that hasn’t been tested is a runbook that might not work:
## Testing Schedule
- [ ] Test in staging: Monthly
- [ ] Full dry-run with on-call: Quarterly
- [ ] Update after any related changes: Immediately
Make Updates Easy
Store runbooks in version control. Review changes like code.
From Document to Execution
You now know how to write a runbook. But even the best-written runbook is just a document until someone executes it. For practical examples, check out our runbook examples for DevOps teams. Not sure about the difference between runbooks and playbooks? See our runbook vs playbook comparison.
Stew turns your runbooks into interactive procedures. Each command runs with a click. Each step tracks automatically. Your well-written runbook becomes a reliable execution tool.
Join the waitlist and make your runbooks executable.