← Back to blog

CLI Troubleshooting Commands: The Essential Guide for DevOps

· 4 min read · Stew Team
clitroubleshootingdevopscommands

When something breaks at 2am, your CLI troubleshooting commands are your first line of defense. Knowing which commands to run—and in what order—can mean the difference between a 5-minute fix and a 2-hour outage.

This guide covers the essential CLI troubleshooting commands that every DevOps engineer should have ready. For turning these into executable procedures, see how to write a runbook.

Why CLI Troubleshooting Commands Matter

GUIs are great for exploration. But when you’re debugging production issues, CLI commands give you:

  • Speed: No clicking through menus
  • Scriptability: Chain commands together
  • Remote access: Work over SSH
  • Precision: Get exactly the data you need

System Health Commands

Start every troubleshooting session with a quick system health check.

Check System Load

uptime

Output shows load averages for 1, 5, and 15 minutes. If the 1-minute load is significantly higher than the 15-minute load, something recently changed.

Memory Overview

free -h

The -h flag gives human-readable output. Watch for high swap usage—it usually means memory pressure.

Disk Space

df -h

Full disks cause surprising failures. Check this early in any troubleshooting session.

Process Overview

top -bn1 | head -20

Batch mode (-b) with one iteration (-n1) gives a snapshot. Look for processes consuming excessive CPU or memory.

Network Troubleshooting Commands

Network issues are among the most common production problems.

Check Connectivity

ping -c 4 your-service.internal

Basic but essential. Confirms network path exists.

DNS Resolution

dig your-service.internal +short

DNS problems masquerade as application failures. Always verify resolution.

Port Connectivity

nc -zv hostname 443

Netcat checks if a specific port is reachable. Essential for debugging connection refused errors.

Active Connections

ss -tuln

Shows listening ports and established connections. Faster than netstat.

Log Analysis Commands

Logs tell you what actually happened.

Tail Recent Logs

tail -f /var/log/syslog

Follow mode (-f) streams new entries in real-time.

Search Logs for Errors

grep -i error /var/log/application.log | tail -50

Case-insensitive search for errors, limited to recent entries.

Count Error Occurrences

grep -c "connection refused" /var/log/app.log

Quantify the problem. Is it happening once or thousands of times?

journalctl --since "10 minutes ago" -u your-service

For systemd services, journalctl provides powerful filtering.

Process Troubleshooting Commands

When services misbehave, dig into processes.

Find Process by Name

pgrep -a nginx

Shows PIDs and full command lines for matching processes.

Check Process Details

ps aux | grep [n]ginx

The bracket trick avoids matching the grep command itself.

Trace System Calls

strace -p PID -f

See exactly what a process is doing. Add -f to follow child processes.

Open Files

lsof -p PID

Shows files, sockets, and connections held by a process.

Building Troubleshooting Runbooks

Individual commands are useful. Structured troubleshooting procedures are better. Combine these CLI troubleshooting commands into runbooks your team can execute consistently.

# Service Health Check Runbook

## Step 1: System Resources
​```bash
uptime && free -h && df -h
​```

## Step 2: Service Status
​```bash
systemctl status your-service
​```

## Step 3: Recent Errors
​```bash
journalctl -u your-service --since "5 minutes ago" | grep -i error
​```

Making Commands Executable

Typing commands from documentation works, but it’s slow and error-prone. See our runbook examples for templates you can use directly.

Stew turns your CLI troubleshooting commands into executable runbooks. Every command runs with a click. Output appears inline. Your team spends less time copying and pasting, more time solving problems.

Join the waitlist and make your troubleshooting procedures executable.