CLI Troubleshooting Commands: The Essential Guide for DevOps

When something breaks at 2am, your CLI troubleshooting commands are your first line of defense. Knowing which commands to run—and in what order—can mean the difference between a 5-minute fix and a 2-hour outage.

This guide covers the essential CLI troubleshooting commands that every DevOps engineer should have ready. For turning these into executable procedures, see how to write a runbook.

Why CLI Troubleshooting Commands Matter

GUIs are great for exploration. But when you’re debugging production issues, CLI commands give you:

Speed: No clicking through menus
Scriptability: Chain commands together
Remote access: Work over SSH
Precision: Get exactly the data you need

System Health Commands

Start every troubleshooting session with a quick system health check.

Check System Load

uptime

Output shows load averages for 1, 5, and 15 minutes. If the 1-minute load is significantly higher than the 15-minute load, something recently changed.

Memory Overview

free -h

The -h flag gives human-readable output. Watch for high swap usage—it usually means memory pressure.

Disk Space

df -h

Full disks cause surprising failures. Check this early in any troubleshooting session.

Process Overview

top -bn1 | head -20

Batch mode (-b) with one iteration (-n1) gives a snapshot. Look for processes consuming excessive CPU or memory.

Network Troubleshooting Commands

Network issues are among the most common production problems.

Check Connectivity

ping -c 4 your-service.internal

Basic but essential. Confirms network path exists.

DNS Resolution

dig your-service.internal +short

DNS problems masquerade as application failures. Always verify resolution.

Port Connectivity

nc -zv hostname 443

Netcat checks if a specific port is reachable. Essential for debugging connection refused errors.

Active Connections

ss -tuln

Shows listening ports and established connections. Faster than netstat.

Log Analysis Commands

Logs tell you what actually happened.

Tail Recent Logs

tail -f /var/log/syslog

Follow mode (-f) streams new entries in real-time.

Search Logs for Errors

grep -i error /var/log/application.log | tail -50

Case-insensitive search for errors, limited to recent entries.

Count Error Occurrences

grep -c "connection refused" /var/log/app.log

Quantify the problem. Is it happening once or thousands of times?

Time-based Log Search

journalctl --since "10 minutes ago" -u your-service

For systemd services, journalctl provides powerful filtering.

Process Troubleshooting Commands

When services misbehave, dig into processes.

Find Process by Name

pgrep -a nginx

Shows PIDs and full command lines for matching processes.

Check Process Details

ps aux | grep [n]ginx

The bracket trick avoids matching the grep command itself.

Trace System Calls

strace -p PID -f

See exactly what a process is doing. Add -f to follow child processes.

Open Files

lsof -p PID

Shows files, sockets, and connections held by a process.

Building Troubleshooting Runbooks

Individual commands are useful. Structured troubleshooting procedures are better. Combine these CLI troubleshooting commands into runbooks your team can execute consistently.

# Service Health Check Runbook

## Step 1: System Resources
```bash
uptime && free -h && df -h
```

## Step 2: Service Status
```bash
systemctl status your-service
```

## Step 3: Recent Errors
```bash
journalctl -u your-service --since "5 minutes ago" | grep -i error
```

Making Commands Executable

Typing commands from documentation works, but it’s slow and error-prone. See our runbook examples for templates you can use directly.

Stew turns your CLI troubleshooting commands into executable runbooks. Every command runs with a click. Output appears inline. Your team spends less time copying and pasting, more time solving problems.

Join the waitlist and make your troubleshooting procedures executable.