How to Write a Runbook That People Actually Use

Writing a runbook is easy. Writing a runbook that people actually use during incidents? That’s harder. Understanding why runbooks fail is the first step to writing better ones.

Most runbooks fail not because they’re wrong, but because they’re unusable under pressure. Here’s how to write a runbook that works when it matters.

How to Write a Runbook: The Fundamentals

Start with the End State

Before writing steps, define what success looks like:

# Restart API Service

## Success Criteria
- All API pods running and healthy
- Response time < 500ms
- No errors in logs for 5 minutes

When someone follows your runbook, they need to know when they’re done.

Write for Your Worst Day

You won’t use this runbook on a calm Tuesday afternoon. You’ll use it at 3 AM when something is broken and people are waiting.

Bad runbook writing:

“First, you’ll want to check the pods. You can use kubectl for this. The command would be something like kubectl get pods, though you may need to specify the namespace.”

Good runbook writing:

Check pod status:
```bash
kubectl get pods -n production -l app=api
```
Expected: All pods show `Running` status

One Action Per Step

Each step should be one command or one decision:

## ❌ Bad: Multiple actions in one step

### Step 1
Check the pods, look at the logs, and restart if needed.

## ✅ Good: One action per step

### Step 1: Check Pod Status
```bash
kubectl get pods -n production
```

### Step 2: Review Logs
```bash
kubectl logs deployment/api -n production --tail=100
```

### Step 3: Restart if Needed
```bash
kubectl rollout restart deployment/api -n production
```

How to Write a Runbook: Structure

Use a Consistent Template

Every runbook should follow the same structure. For a detailed breakdown, see our runbook template guide:

# [Title]

## Overview
What this runbook does and when to use it.

## Prerequisites
What you need before starting.

## Procedure
Step-by-step instructions.

## Verification
How to confirm success.

## Rollback
What to do if something goes wrong.

## Escalation
Who to contact if this doesn't work.

Include Copy-Pasteable Commands

Nobody should have to type commands from memory:

## ❌ Bad
Run the kubectl command to restart the deployment in production.

## ✅ Good
```bash
kubectl rollout restart deployment/api -n production
```

Show Expected Output

Tell people what they should see:

```bash
kubectl get pods -n production -l app=api
```

Expected output:
```
NAME                   READY   STATUS    RESTARTS   AGE
api-5d4b9c6f7-abc12    1/1     Running   0          5m
api-5d4b9c6f7-def34    1/1     Running   0          5m
api-5d4b9c6f7-ghi56    1/1     Running   0          5m
```

How to Write a Runbook: Common Mistakes

Mistake 1: Assuming Context

Your runbook will be used by someone who doesn’t have your mental context.

## ❌ Bad
Connect to the server and check the logs.

## ✅ Good
SSH to the API server:
```bash
ssh admin@api-prod-1.example.com
```

Check application logs:
```bash
tail -f /var/log/api/application.log
```

Mistake 2: Skipping Edge Cases

Document what to do when things don’t go as expected:

### Step 3: Verify Deployment

```bash
kubectl rollout status deployment/api -n production
```

**If this times out:**
1. Check events: `kubectl get events -n production --sort-by='.lastTimestamp'`
2. Check pod details: `kubectl describe pod -l app=api -n production`
3. If pods are in CrashLoopBackOff, see [Crash Loop Runbook]

Mistake 3: Forgetting Prerequisites

List everything needed upfront:

## Prerequisites

**Access Required:**
- [ ] VPN connected
- [ ] kubectl configured for production
- [ ] AWS credentials valid

**Verify Access:**
```bash
kubectl config current-context  # Should show: production
aws sts get-caller-identity     # Should show your account
```

Mistake 4: No Rollback Plan

Every runbook needs an escape hatch:

## Rollback

If the deployment causes issues:

```bash
kubectl rollout undo deployment/api -n production
```

If rollback also fails, immediately:
1. Page the platform team
2. Post in #incidents with details
3. Do not attempt further changes

How to Write a Runbook: Maintenance

Date Your Runbooks

Add last-updated dates and review schedules:

**Last Updated:** 2025-11-10
**Last Tested:** 2025-10-15
**Next Review:** 2025-12-10
**Owner:** @platform-team

Test Regularly

A runbook that hasn’t been tested is a runbook that might not work:

## Testing Schedule

- [ ] Test in staging: Monthly
- [ ] Full dry-run with on-call: Quarterly
- [ ] Update after any related changes: Immediately

Make Updates Easy

Store runbooks in version control. Review changes like code.

From Document to Execution

You now know how to write a runbook. But even the best-written runbook is just a document until someone executes it. For practical examples, check out our runbook examples for DevOps teams. Not sure about the difference between runbooks and playbooks? See our runbook vs playbook comparison.

Stew turns your runbooks into interactive procedures. Each command runs with a click. Each step tracks automatically. Your well-written runbook becomes a reliable execution tool.

Join the waitlist and make your runbooks executable.