← Back to blog

Kubernetes Runbook Templates: Ready to Use

· 5 min read · Stew Team
devops runbook templatekubernetesdevops

Kubernetes operations require specialized runbooks. Generic templates don’t account for namespaces, pod lifecycles, or the declarative nature of K8s.

This guide provides Kubernetes-specific DevOps runbook templates you can use immediately.

Why Kubernetes Needs Specialized Runbook Templates

Kubernetes introduces unique operational challenges:

  • Multiple resources: Deployments, pods, services, configmaps
  • Namespace isolation: Commands need namespace context
  • Declarative state: Desired state vs. actual state debugging
  • Dynamic infrastructure: Pods come and go constantly

A DevOps runbook template for Kubernetes must account for these realities.

DevOps Runbook Template: Kubernetes Deployment

# Kubernetes Deployment Runbook

## Metadata
- **Owner:** @platform-team
- **Last Updated:** 2025-12-01
- **Cluster:** production-us-east-1

## Prerequisites
- [ ] kubectl configured for target cluster
- [ ] Verify cluster context: `kubectl config current-context`
- [ ] Image available in registry

## Pre-Deployment Checks

### Verify Current State
​```bash
kubectl get deployment $DEPLOYMENT -n $NAMESPACE
kubectl get pods -l app=$APP_LABEL -n $NAMESPACE
​```

### Check Available Resources
​```bash
kubectl describe nodes | grep -A 5 "Allocated resources"
​```

## Deployment Procedure

### Step 1: Update Image
​```bash
kubectl set image deployment/$DEPLOYMENT $CONTAINER=$NEW_IMAGE -n $NAMESPACE
​```

### Step 2: Monitor Rollout
​```bash
kubectl rollout status deployment/$DEPLOYMENT -n $NAMESPACE --timeout=300s
​```

### Step 3: Verify Pods
​```bash
kubectl get pods -l app=$APP_LABEL -n $NAMESPACE
​```

**Expected:** All pods Running, READY 1/1

## Verification

### Health Check
​```bash
kubectl exec -it deploy/$DEPLOYMENT -n $NAMESPACE -- curl -s localhost:8080/health
​```

### Check Logs
​```bash
kubectl logs deployment/$DEPLOYMENT -n $NAMESPACE --tail=50
​```

## Rollback
​```bash
kubectl rollout undo deployment/$DEPLOYMENT -n $NAMESPACE
kubectl rollout status deployment/$DEPLOYMENT -n $NAMESPACE
​```

DevOps Runbook Template: Kubernetes Scaling

# Kubernetes Scaling Runbook

## Metadata
- **Owner:** @platform-team
- **Trigger:** High CPU/Memory alerts, latency spikes

## When to Scale

- CPU utilization > 80% for 5 minutes
- Memory utilization > 85%
- Response latency p99 > 2 seconds
- Queue depth > 1000 messages

## Current State Check

### Resource Utilization
​```bash
kubectl top pods -n $NAMESPACE -l app=$APP_LABEL
​```

### Current Replica Count
​```bash
kubectl get deployment $DEPLOYMENT -n $NAMESPACE -o jsonpath='{.spec.replicas}'
​```

### HPA Status (if configured)
​```bash
kubectl get hpa -n $NAMESPACE
​```

## Scale Up Procedure

### Manual Scale
​```bash
kubectl scale deployment/$DEPLOYMENT --replicas=$NEW_COUNT -n $NAMESPACE
​```

### Verify New Pods
​```bash
kubectl get pods -l app=$APP_LABEL -n $NAMESPACE -w
​```

**Wait for:** All new pods to show Running and Ready

### Verify Load Distribution
​```bash
kubectl top pods -n $NAMESPACE -l app=$APP_LABEL
​```

## Scale Down Procedure

### Gradual Scale Down
​```bash
# Scale down incrementally to avoid traffic spikes
kubectl scale deployment/$DEPLOYMENT --replicas=$TARGET -n $NAMESPACE
​```

### Monitor During Scale Down
​```bash
kubectl logs -f deployment/$DEPLOYMENT -n $NAMESPACE | grep -i error
​```

## Rollback
​```bash
kubectl scale deployment/$DEPLOYMENT --replicas=$ORIGINAL_COUNT -n $NAMESPACE
​```

DevOps Runbook Template: Kubernetes Pod Debugging

# Kubernetes Pod Debugging Runbook

## Metadata
- **Owner:** @platform-team
- **Use When:** Pods in CrashLoopBackOff, ImagePullBackOff, or not Ready

## Initial Assessment

### Pod Status Overview
​```bash
kubectl get pods -n $NAMESPACE -l app=$APP_LABEL
​```

### Identify Problem Pods
​```bash
kubectl get pods -n $NAMESPACE --field-selector=status.phase!=Running
​```

## Debugging by Symptom

### CrashLoopBackOff

#### Check Recent Logs
​```bash
kubectl logs $POD_NAME -n $NAMESPACE --previous
​```

#### Check Container Exit Code
​```bash
kubectl describe pod $POD_NAME -n $NAMESPACE | grep -A 10 "Last State"
​```

#### Common Causes
- Application error on startup
- Missing environment variables
- Failed health checks

### ImagePullBackOff

#### Check Image Details
​```bash
kubectl describe pod $POD_NAME -n $NAMESPACE | grep -A 5 "Image"
​```

#### Verify Image Exists
​```bash
docker manifest inspect $IMAGE_NAME
​```

#### Check Image Pull Secrets
​```bash
kubectl get secrets -n $NAMESPACE | grep docker
​```

### Pending State

#### Check Events
​```bash
kubectl describe pod $POD_NAME -n $NAMESPACE | grep -A 20 "Events"
​```

#### Check Node Resources
​```bash
kubectl describe nodes | grep -A 10 "Allocated resources"
​```

#### Common Causes
- Insufficient CPU/memory
- Node selector mismatch
- PVC not bound

## Interactive Debugging

### Exec into Running Container
​```bash
kubectl exec -it $POD_NAME -n $NAMESPACE -- /bin/sh
​```

### Debug Container (K8s 1.25+)
​```bash
kubectl debug $POD_NAME -n $NAMESPACE --image=busybox -it
​```

## Resolution Actions

### Force Delete Stuck Pod
​```bash
kubectl delete pod $POD_NAME -n $NAMESPACE --grace-period=0 --force
​```

### Restart Deployment
​```bash
kubectl rollout restart deployment/$DEPLOYMENT -n $NAMESPACE
​```

DevOps Runbook Template: Kubernetes Incident Response

# Kubernetes Incident Response Runbook

## Metadata
- **Owner:** @platform-team
- **Severity:** P1
- **Use When:** Service degradation or outage

## Immediate Actions (First 5 Minutes)

### Acknowledge and Communicate
Post in #incidents:
​```
INCIDENT: [Service] experiencing issues
Impact: [User-facing impact]
Status: Investigating
Commander: @your-name
​```

### Quick Health Check
​```bash
# Overall cluster health
kubectl get nodes
kubectl get pods -A | grep -v Running

# Specific service
kubectl get pods -n $NAMESPACE -l app=$APP_LABEL
kubectl get events -n $NAMESPACE --sort-by='.lastTimestamp' | tail -20
​```

## Triage

### Check Recent Changes
​```bash
kubectl rollout history deployment/$DEPLOYMENT -n $NAMESPACE
​```

### Check Resource Pressure
​```bash
kubectl top nodes
kubectl top pods -n $NAMESPACE --sort-by=memory
​```

### Check Service Endpoints
​```bash
kubectl get endpoints $SERVICE -n $NAMESPACE
​```

## Common Fixes

### Rollback Recent Deployment
​```bash
kubectl rollout undo deployment/$DEPLOYMENT -n $NAMESPACE
​```

### Restart Pods
​```bash
kubectl rollout restart deployment/$DEPLOYMENT -n $NAMESPACE
​```

### Scale Up (if resource constrained)
​```bash
kubectl scale deployment/$DEPLOYMENT --replicas=10 -n $NAMESPACE
​```

## Post-Incident

- [ ] Update incident channel with resolution
- [ ] Create post-incident review ticket
- [ ] Document in runbook if new scenario

Making Your Kubernetes Runbooks Executable

These DevOps runbook templates for Kubernetes give you a starting point. For more general guidance, see our DevOps runbook template guide and runbook examples.

Stew makes your Kubernetes runbooks executable. Run kubectl commands with a click. Track which step you’re on during incidents. Share proven procedures with your team.

Join the waitlist and transform your Kubernetes operations.