CLI Troubleshooting Commands: The Essential Guide for DevOps
When something breaks at 2am, your CLI troubleshooting commands are your first line of defense. Knowing which commands to run—and in what order—can mean the difference between a 5-minute fix and a 2-hour outage.
This guide covers the essential CLI troubleshooting commands that every DevOps engineer should have ready. For turning these into executable procedures, see how to write a runbook.
Why CLI Troubleshooting Commands Matter
GUIs are great for exploration. But when you’re debugging production issues, CLI commands give you:
- Speed: No clicking through menus
- Scriptability: Chain commands together
- Remote access: Work over SSH
- Precision: Get exactly the data you need
System Health Commands
Start every troubleshooting session with a quick system health check.
Check System Load
uptime
Output shows load averages for 1, 5, and 15 minutes. If the 1-minute load is significantly higher than the 15-minute load, something recently changed.
Memory Overview
free -h
The -h flag gives human-readable output. Watch for high swap usage—it usually means memory pressure.
Disk Space
df -h
Full disks cause surprising failures. Check this early in any troubleshooting session.
Process Overview
top -bn1 | head -20
Batch mode (-b) with one iteration (-n1) gives a snapshot. Look for processes consuming excessive CPU or memory.
Network Troubleshooting Commands
Network issues are among the most common production problems.
Check Connectivity
ping -c 4 your-service.internal
Basic but essential. Confirms network path exists.
DNS Resolution
dig your-service.internal +short
DNS problems masquerade as application failures. Always verify resolution.
Port Connectivity
nc -zv hostname 443
Netcat checks if a specific port is reachable. Essential for debugging connection refused errors.
Active Connections
ss -tuln
Shows listening ports and established connections. Faster than netstat.
Log Analysis Commands
Logs tell you what actually happened.
Tail Recent Logs
tail -f /var/log/syslog
Follow mode (-f) streams new entries in real-time.
Search Logs for Errors
grep -i error /var/log/application.log | tail -50
Case-insensitive search for errors, limited to recent entries.
Count Error Occurrences
grep -c "connection refused" /var/log/app.log
Quantify the problem. Is it happening once or thousands of times?
Time-based Log Search
journalctl --since "10 minutes ago" -u your-service
For systemd services, journalctl provides powerful filtering.
Process Troubleshooting Commands
When services misbehave, dig into processes.
Find Process by Name
pgrep -a nginx
Shows PIDs and full command lines for matching processes.
Check Process Details
ps aux | grep [n]ginx
The bracket trick avoids matching the grep command itself.
Trace System Calls
strace -p PID -f
See exactly what a process is doing. Add -f to follow child processes.
Open Files
lsof -p PID
Shows files, sockets, and connections held by a process.
Building Troubleshooting Runbooks
Individual commands are useful. Structured troubleshooting procedures are better. Combine these CLI troubleshooting commands into runbooks your team can execute consistently.
# Service Health Check Runbook
## Step 1: System Resources
```bash
uptime && free -h && df -h
```
## Step 2: Service Status
```bash
systemctl status your-service
```
## Step 3: Recent Errors
```bash
journalctl -u your-service --since "5 minutes ago" | grep -i error
```
Making Commands Executable
Typing commands from documentation works, but it’s slow and error-prone. See our runbook examples for templates you can use directly.
Stew turns your CLI troubleshooting commands into executable runbooks. Every command runs with a click. Output appears inline. Your team spends less time copying and pasting, more time solving problems.
Join the waitlist and make your troubleshooting procedures executable.