Linux CLI Troubleshooting Commands Every SRE Should Know
Linux systems power most production infrastructure. When they have problems, you need the right CLI troubleshooting commands to diagnose issues quickly.
This reference covers Linux-specific commands for system troubleshooting. For a broader overview, see our essential CLI troubleshooting guide.
CPU Troubleshooting Commands
High CPU usage is one of the most common alerts you’ll encounter.
Real-time CPU Monitoring
htop
Interactive process viewer with better visuals than top. Press F6 to sort by CPU.
CPU Usage by Process
ps -eo pid,ppid,cmd,%cpu,%mem --sort=-%cpu | head -20
Non-interactive snapshot of top CPU consumers.
Per-Core Usage
mpstat -P ALL 1 5
Shows utilization for each CPU core. Essential for diagnosing unbalanced workloads.
CPU Steal Time
vmstat 1 5
The st column shows steal time—critical for detecting noisy neighbor issues in virtualized environments.
Memory Troubleshooting Commands
Memory issues cause OOM kills, swapping, and performance degradation.
Detailed Memory Breakdown
cat /proc/meminfo
Complete memory statistics. Look for MemAvailable for actual usable memory.
Top Memory Consumers
ps -eo pid,ppid,cmd,%mem --sort=-%mem | head -20
Find which processes are using the most RAM.
Cache and Buffer Usage
free -h
The buff/cache column shows memory used for disk caching—available if applications need it.
Swap Activity
vmstat 1 5
Watch si (swap in) and so (swap out) columns. Non-zero values indicate memory pressure.
OOM Killer History
dmesg | grep -i "out of memory"
Check if the OOM killer has terminated processes recently.
Disk Troubleshooting Commands
Disk issues range from full filesystems to I/O bottlenecks.
Filesystem Usage
df -h
Human-readable disk space usage. Check for partitions above 90%.
Inode Usage
df -i
Running out of inodes prevents file creation even with free space.
Directory Size
du -sh /var/log/*
Find which directories are consuming space.
Find Large Files
find / -type f -size +100M -exec ls -lh {} \; 2>/dev/null
Locate files over 100MB.
Disk I/O Statistics
iostat -xz 1 5
Watch %util for disk saturation and await for latency.
I/O by Process
iotop -o
Real-time view of which processes are performing I/O.
Process Troubleshooting Commands
Understanding process behavior is key to debugging applications.
Process Tree
pstree -p
Visualize parent-child relationships between processes.
Process State
ps aux | awk '$8 ~ /D/ {print}'
Find processes in uninterruptible sleep (D state)—often waiting on I/O.
Open File Descriptors
ls -la /proc/PID/fd | wc -l
Count open file descriptors. Compare against limits.
File Descriptor Limits
cat /proc/PID/limits | grep "open files"
Check if a process is approaching its file descriptor limit.
Process Environment
cat /proc/PID/environ | tr '\0' '\n'
View environment variables for a running process.
Network Troubleshooting Commands
Linux network troubleshooting requires specific tools.
Interface Statistics
ip -s link
Shows packet counts, errors, and drops per interface.
Routing Table
ip route
Verify traffic is taking the expected path.
Connection States
ss -s
Summary of socket states. High TIME-WAIT counts may indicate connection churn.
Listening Ports
ss -tlnp
Shows which processes are listening on which ports.
TCP Connection Details
ss -tn state established
List established connections.
Systemd Service Commands
Most modern Linux distributions use systemd.
Service Status
systemctl status your-service
Shows running state, recent logs, and resource usage.
Service Logs
journalctl -u your-service -f
Stream logs for a specific service.
Failed Services
systemctl --failed
List all services in a failed state.
Service Dependencies
systemctl list-dependencies your-service
Show what a service depends on.
Creating Linux Troubleshooting Runbooks
Combine these CLI troubleshooting commands into structured procedures:
# Linux System Health Check
## Quick Overview
```bash
uptime && free -h && df -h
```
## Top Resource Consumers
```bash
ps -eo pid,cmd,%cpu,%mem --sort=-%cpu | head -10
```
## Recent System Errors
```bash
dmesg | tail -50 | grep -i -E "(error|fail|warn)"
```
## Disk I/O Check
```bash
iostat -xz 1 3
```
For more runbook structures, see our DevOps runbook templates.
Making Troubleshooting Executable
Static documentation gets outdated. Stew keeps your Linux CLI troubleshooting commands executable—run them directly from your runbooks with output captured inline.
Join the waitlist and transform your Linux troubleshooting docs.