Using the Bash REPL for Incident Response

When production is down, the bash REPL is your primary tool. How you use it during incidents can mean the difference between quick resolution and extended outages.

This guide covers bash REPL techniques for incident response. For structured incident procedures, see our incident response runbook guide.

The REPL During Incidents

Incidents are chaotic. The bash REPL gives you:

Immediate feedback: See results instantly
Flexibility: Adapt commands to what you discover
Power: Full access to system tools
History: Track what you’ve tried

Setting Up for Incidents

Before incidents happen, configure your bash REPL.

History Configuration

Never lose a command during an incident:

# ~/.bashrc
HISTSIZE=50000
HISTFILESIZE=100000
HISTCONTROL=ignoredups
HISTTIMEFORMAT="%F %T "
shopt -s histappend
PROMPT_COMMAND="history -a"

Prompt with Context

Know where you are:

# ~/.bashrc
PS1='[\t] \u@\h:\w\$ '
# Shows: [14:32:15] user@production-1:/var/log$

Essential Aliases

# Quick access during incidents
alias pods='kubectl get pods -A | grep -v Running'
alias logs='journalctl -u'
alias conns='ss -tuln'
alias procs='ps aux --sort=-%cpu | head -20'

REPL Techniques for Debugging

Check Before Acting

Always verify before making changes:

# See what would happen
kubectl get pods -l app=api

# Then take action
kubectl delete pods -l app=api

Build Commands Incrementally

Don’t write complex one-liners from scratch:

# Step 1: Get raw data
kubectl get events --sort-by='.lastTimestamp'

# Step 2: Filter
kubectl get events --sort-by='.lastTimestamp' | grep -i error

# Step 3: Focus on recent
kubectl get events --sort-by='.lastTimestamp' | grep -i error | tail -20

# Step 4: Extract useful info
kubectl get events --sort-by='.lastTimestamp' -o json | jq '.items[] | select(.type=="Warning") | {time: .lastTimestamp, message: .message}'

Use Variables for Repetition

# Set once
POD=$(kubectl get pods -l app=api -o jsonpath='{.items[0].metadata.name}')

# Use repeatedly
kubectl logs $POD
kubectl describe pod $POD
kubectl exec -it $POD -- /bin/sh

Safe Destructive Commands

# Preview
find /var/log -name "*.log" -mtime +30 -print

# Execute only after confirming the list
find /var/log -name "*.log" -mtime +30 -delete

Real-Time Monitoring

The bash REPL excels at real-time observation.

Watch Command

# Update every 2 seconds
watch -n 2 'kubectl get pods'

# Highlight changes
watch -d 'free -h'

# Exit on change
watch -g 'cat /var/run/service.pid'

Tail Multiple Logs

# Multiple files
tail -f /var/log/app/*.log

# With timestamps
tail -f /var/log/app.log | while read line; do echo "$(date): $line"; done

Live Filtering

# Stream and filter
kubectl logs -f deployment/api | grep -i error

# With context
tail -f /var/log/syslog | grep -B2 -A2 "error"

Parallel Operations

Speed up investigations with parallel commands.

Background Jobs

# Run health checks in parallel
curl http://service-1/health &
curl http://service-2/health &
curl http://service-3/health &
wait

Using xargs

# Check all nodes
kubectl get nodes -o name | xargs -I{} -P5 kubectl describe {}

# Parallel SSH
echo "node1 node2 node3" | xargs -n1 -P3 -I{} ssh {} "uptime"

Recording Your Session

Capture what you do for postmortems.

Script Command

# Start recording
script incident-2024-01-15.log

# Work normally...
kubectl get pods
tail -f /var/log/app.log

# Stop recording
exit

Asciinema

# Record with timing
asciinema rec incident.cast

# Play back
asciinema play incident.cast

History Export

# After incident, save recent history
history | tail -100 > incident-commands.txt

Common REPL Patterns

Quick Service Check

# One-liner health check
for svc in api worker scheduler; do
  echo -n "$svc: "
  curl -s http://$svc/health | jq -r '.status'
done

Resource Usage Snapshot

# CPU and memory at a glance
echo "=== CPU ===" && top -bn1 | head -5
echo "=== Memory ===" && free -h
echo "=== Disk ===" && df -h | grep -v tmpfs

Connection Investigation

# What's connected to this port?
ss -tnp | grep :8080 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn

The REPL Postmortem Problem

After an incident, you need to:

Document what happened
Share what commands worked
Create runbooks for next time

But REPL sessions are messy. Commands are interleaved with output. Dead ends are mixed with solutions.

From REPL to Runbook

Extract the working commands into a runbook:

# API Service Recovery

## Identify failing pods
```bash
kubectl get pods -l app=api | grep -v Running
```

## Check recent events
```bash
kubectl get events --sort-by='.lastTimestamp' | grep api | tail -10
```

## Restart deployment
```bash
kubectl rollout restart deployment/api
```

## Verify recovery
```bash
kubectl rollout status deployment/api
```

This becomes your starting point for the next incident.

Stew: REPL-Style Execution, Runbook Persistence

Stew combines REPL interactivity with documentation structure. Run commands from your runbooks with a click. See output inline. Build on what works.

Your incident REPL sessions become executable team knowledge.

Join the waitlist and transform how you handle incidents.