Using the Bash REPL for Incident Response
When production is down, the bash REPL is your primary tool. How you use it during incidents can mean the difference between quick resolution and extended outages.
This guide covers bash REPL techniques for incident response. For structured incident procedures, see our incident response runbook guide.
The REPL During Incidents
Incidents are chaotic. The bash REPL gives you:
- Immediate feedback: See results instantly
- Flexibility: Adapt commands to what you discover
- Power: Full access to system tools
- History: Track what you’ve tried
Setting Up for Incidents
Before incidents happen, configure your bash REPL.
History Configuration
Never lose a command during an incident:
# ~/.bashrc
HISTSIZE=50000
HISTFILESIZE=100000
HISTCONTROL=ignoredups
HISTTIMEFORMAT="%F %T "
shopt -s histappend
PROMPT_COMMAND="history -a"
Prompt with Context
Know where you are:
# ~/.bashrc
PS1='[\t] \u@\h:\w\$ '
# Shows: [14:32:15] user@production-1:/var/log$
Essential Aliases
# Quick access during incidents
alias pods='kubectl get pods -A | grep -v Running'
alias logs='journalctl -u'
alias conns='ss -tuln'
alias procs='ps aux --sort=-%cpu | head -20'
REPL Techniques for Debugging
Check Before Acting
Always verify before making changes:
# See what would happen
kubectl get pods -l app=api
# Then take action
kubectl delete pods -l app=api
Build Commands Incrementally
Don’t write complex one-liners from scratch:
# Step 1: Get raw data
kubectl get events --sort-by='.lastTimestamp'
# Step 2: Filter
kubectl get events --sort-by='.lastTimestamp' | grep -i error
# Step 3: Focus on recent
kubectl get events --sort-by='.lastTimestamp' | grep -i error | tail -20
# Step 4: Extract useful info
kubectl get events --sort-by='.lastTimestamp' -o json | jq '.items[] | select(.type=="Warning") | {time: .lastTimestamp, message: .message}'
Use Variables for Repetition
# Set once
POD=$(kubectl get pods -l app=api -o jsonpath='{.items[0].metadata.name}')
# Use repeatedly
kubectl logs $POD
kubectl describe pod $POD
kubectl exec -it $POD -- /bin/sh
Safe Destructive Commands
# Preview
find /var/log -name "*.log" -mtime +30 -print
# Execute only after confirming the list
find /var/log -name "*.log" -mtime +30 -delete
Real-Time Monitoring
The bash REPL excels at real-time observation.
Watch Command
# Update every 2 seconds
watch -n 2 'kubectl get pods'
# Highlight changes
watch -d 'free -h'
# Exit on change
watch -g 'cat /var/run/service.pid'
Tail Multiple Logs
# Multiple files
tail -f /var/log/app/*.log
# With timestamps
tail -f /var/log/app.log | while read line; do echo "$(date): $line"; done
Live Filtering
# Stream and filter
kubectl logs -f deployment/api | grep -i error
# With context
tail -f /var/log/syslog | grep -B2 -A2 "error"
Parallel Operations
Speed up investigations with parallel commands.
Background Jobs
# Run health checks in parallel
curl http://service-1/health &
curl http://service-2/health &
curl http://service-3/health &
wait
Using xargs
# Check all nodes
kubectl get nodes -o name | xargs -I{} -P5 kubectl describe {}
# Parallel SSH
echo "node1 node2 node3" | xargs -n1 -P3 -I{} ssh {} "uptime"
Recording Your Session
Capture what you do for postmortems.
Script Command
# Start recording
script incident-2024-01-15.log
# Work normally...
kubectl get pods
tail -f /var/log/app.log
# Stop recording
exit
Asciinema
# Record with timing
asciinema rec incident.cast
# Play back
asciinema play incident.cast
History Export
# After incident, save recent history
history | tail -100 > incident-commands.txt
Common REPL Patterns
Quick Service Check
# One-liner health check
for svc in api worker scheduler; do
echo -n "$svc: "
curl -s http://$svc/health | jq -r '.status'
done
Resource Usage Snapshot
# CPU and memory at a glance
echo "=== CPU ===" && top -bn1 | head -5
echo "=== Memory ===" && free -h
echo "=== Disk ===" && df -h | grep -v tmpfs
Connection Investigation
# What's connected to this port?
ss -tnp | grep :8080 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn
The REPL Postmortem Problem
After an incident, you need to:
- Document what happened
- Share what commands worked
- Create runbooks for next time
But REPL sessions are messy. Commands are interleaved with output. Dead ends are mixed with solutions.
From REPL to Runbook
Extract the working commands into a runbook:
# API Service Recovery
## Identify failing pods
```bash
kubectl get pods -l app=api | grep -v Running
```
## Check recent events
```bash
kubectl get events --sort-by='.lastTimestamp' | grep api | tail -10
```
## Restart deployment
```bash
kubectl rollout restart deployment/api
```
## Verify recovery
```bash
kubectl rollout status deployment/api
```
This becomes your starting point for the next incident.
Stew: REPL-Style Execution, Runbook Persistence
Stew combines REPL interactivity with documentation structure. Run commands from your runbooks with a click. See output inline. Build on what works.
Your incident REPL sessions become executable team knowledge.
Join the waitlist and transform how you handle incidents.