Kubernetes Runbook Templates: Ready to Use
· 5 min read · Stew Team
devops runbook templatekubernetesdevops
Kubernetes operations require specialized runbooks. Generic templates don’t account for namespaces, pod lifecycles, or the declarative nature of K8s.
This guide provides Kubernetes-specific DevOps runbook templates you can use immediately.
Why Kubernetes Needs Specialized Runbook Templates
Kubernetes introduces unique operational challenges:
- Multiple resources: Deployments, pods, services, configmaps
- Namespace isolation: Commands need namespace context
- Declarative state: Desired state vs. actual state debugging
- Dynamic infrastructure: Pods come and go constantly
A DevOps runbook template for Kubernetes must account for these realities.
DevOps Runbook Template: Kubernetes Deployment
# Kubernetes Deployment Runbook
## Metadata
- **Owner:** @platform-team
- **Last Updated:** 2025-12-01
- **Cluster:** production-us-east-1
## Prerequisites
- [ ] kubectl configured for target cluster
- [ ] Verify cluster context: `kubectl config current-context`
- [ ] Image available in registry
## Pre-Deployment Checks
### Verify Current State
```bash
kubectl get deployment $DEPLOYMENT -n $NAMESPACE
kubectl get pods -l app=$APP_LABEL -n $NAMESPACE
```
### Check Available Resources
```bash
kubectl describe nodes | grep -A 5 "Allocated resources"
```
## Deployment Procedure
### Step 1: Update Image
```bash
kubectl set image deployment/$DEPLOYMENT $CONTAINER=$NEW_IMAGE -n $NAMESPACE
```
### Step 2: Monitor Rollout
```bash
kubectl rollout status deployment/$DEPLOYMENT -n $NAMESPACE --timeout=300s
```
### Step 3: Verify Pods
```bash
kubectl get pods -l app=$APP_LABEL -n $NAMESPACE
```
**Expected:** All pods Running, READY 1/1
## Verification
### Health Check
```bash
kubectl exec -it deploy/$DEPLOYMENT -n $NAMESPACE -- curl -s localhost:8080/health
```
### Check Logs
```bash
kubectl logs deployment/$DEPLOYMENT -n $NAMESPACE --tail=50
```
## Rollback
```bash
kubectl rollout undo deployment/$DEPLOYMENT -n $NAMESPACE
kubectl rollout status deployment/$DEPLOYMENT -n $NAMESPACE
```
DevOps Runbook Template: Kubernetes Scaling
# Kubernetes Scaling Runbook
## Metadata
- **Owner:** @platform-team
- **Trigger:** High CPU/Memory alerts, latency spikes
## When to Scale
- CPU utilization > 80% for 5 minutes
- Memory utilization > 85%
- Response latency p99 > 2 seconds
- Queue depth > 1000 messages
## Current State Check
### Resource Utilization
```bash
kubectl top pods -n $NAMESPACE -l app=$APP_LABEL
```
### Current Replica Count
```bash
kubectl get deployment $DEPLOYMENT -n $NAMESPACE -o jsonpath='{.spec.replicas}'
```
### HPA Status (if configured)
```bash
kubectl get hpa -n $NAMESPACE
```
## Scale Up Procedure
### Manual Scale
```bash
kubectl scale deployment/$DEPLOYMENT --replicas=$NEW_COUNT -n $NAMESPACE
```
### Verify New Pods
```bash
kubectl get pods -l app=$APP_LABEL -n $NAMESPACE -w
```
**Wait for:** All new pods to show Running and Ready
### Verify Load Distribution
```bash
kubectl top pods -n $NAMESPACE -l app=$APP_LABEL
```
## Scale Down Procedure
### Gradual Scale Down
```bash
# Scale down incrementally to avoid traffic spikes
kubectl scale deployment/$DEPLOYMENT --replicas=$TARGET -n $NAMESPACE
```
### Monitor During Scale Down
```bash
kubectl logs -f deployment/$DEPLOYMENT -n $NAMESPACE | grep -i error
```
## Rollback
```bash
kubectl scale deployment/$DEPLOYMENT --replicas=$ORIGINAL_COUNT -n $NAMESPACE
```
DevOps Runbook Template: Kubernetes Pod Debugging
# Kubernetes Pod Debugging Runbook
## Metadata
- **Owner:** @platform-team
- **Use When:** Pods in CrashLoopBackOff, ImagePullBackOff, or not Ready
## Initial Assessment
### Pod Status Overview
```bash
kubectl get pods -n $NAMESPACE -l app=$APP_LABEL
```
### Identify Problem Pods
```bash
kubectl get pods -n $NAMESPACE --field-selector=status.phase!=Running
```
## Debugging by Symptom
### CrashLoopBackOff
#### Check Recent Logs
```bash
kubectl logs $POD_NAME -n $NAMESPACE --previous
```
#### Check Container Exit Code
```bash
kubectl describe pod $POD_NAME -n $NAMESPACE | grep -A 10 "Last State"
```
#### Common Causes
- Application error on startup
- Missing environment variables
- Failed health checks
### ImagePullBackOff
#### Check Image Details
```bash
kubectl describe pod $POD_NAME -n $NAMESPACE | grep -A 5 "Image"
```
#### Verify Image Exists
```bash
docker manifest inspect $IMAGE_NAME
```
#### Check Image Pull Secrets
```bash
kubectl get secrets -n $NAMESPACE | grep docker
```
### Pending State
#### Check Events
```bash
kubectl describe pod $POD_NAME -n $NAMESPACE | grep -A 20 "Events"
```
#### Check Node Resources
```bash
kubectl describe nodes | grep -A 10 "Allocated resources"
```
#### Common Causes
- Insufficient CPU/memory
- Node selector mismatch
- PVC not bound
## Interactive Debugging
### Exec into Running Container
```bash
kubectl exec -it $POD_NAME -n $NAMESPACE -- /bin/sh
```
### Debug Container (K8s 1.25+)
```bash
kubectl debug $POD_NAME -n $NAMESPACE --image=busybox -it
```
## Resolution Actions
### Force Delete Stuck Pod
```bash
kubectl delete pod $POD_NAME -n $NAMESPACE --grace-period=0 --force
```
### Restart Deployment
```bash
kubectl rollout restart deployment/$DEPLOYMENT -n $NAMESPACE
```
DevOps Runbook Template: Kubernetes Incident Response
# Kubernetes Incident Response Runbook
## Metadata
- **Owner:** @platform-team
- **Severity:** P1
- **Use When:** Service degradation or outage
## Immediate Actions (First 5 Minutes)
### Acknowledge and Communicate
Post in #incidents:
```
INCIDENT: [Service] experiencing issues
Impact: [User-facing impact]
Status: Investigating
Commander: @your-name
```
### Quick Health Check
```bash
# Overall cluster health
kubectl get nodes
kubectl get pods -A | grep -v Running
# Specific service
kubectl get pods -n $NAMESPACE -l app=$APP_LABEL
kubectl get events -n $NAMESPACE --sort-by='.lastTimestamp' | tail -20
```
## Triage
### Check Recent Changes
```bash
kubectl rollout history deployment/$DEPLOYMENT -n $NAMESPACE
```
### Check Resource Pressure
```bash
kubectl top nodes
kubectl top pods -n $NAMESPACE --sort-by=memory
```
### Check Service Endpoints
```bash
kubectl get endpoints $SERVICE -n $NAMESPACE
```
## Common Fixes
### Rollback Recent Deployment
```bash
kubectl rollout undo deployment/$DEPLOYMENT -n $NAMESPACE
```
### Restart Pods
```bash
kubectl rollout restart deployment/$DEPLOYMENT -n $NAMESPACE
```
### Scale Up (if resource constrained)
```bash
kubectl scale deployment/$DEPLOYMENT --replicas=10 -n $NAMESPACE
```
## Post-Incident
- [ ] Update incident channel with resolution
- [ ] Create post-incident review ticket
- [ ] Document in runbook if new scenario
Making Your Kubernetes Runbooks Executable
These DevOps runbook templates for Kubernetes give you a starting point. For more general guidance, see our DevOps runbook template guide and runbook examples.
Stew makes your Kubernetes runbooks executable. Run kubectl commands with a click. Track which step you’re on during incidents. Share proven procedures with your team.
Join the waitlist and transform your Kubernetes operations.