← Back to blog

Linux CLI Troubleshooting Commands Every SRE Should Know

· 4 min read · Stew Team
clitroubleshootinglinuxsre

Linux systems power most production infrastructure. When they have problems, you need the right CLI troubleshooting commands to diagnose issues quickly.

This reference covers Linux-specific commands for system troubleshooting. For a broader overview, see our essential CLI troubleshooting guide.

CPU Troubleshooting Commands

High CPU usage is one of the most common alerts you’ll encounter.

Real-time CPU Monitoring

htop

Interactive process viewer with better visuals than top. Press F6 to sort by CPU.

CPU Usage by Process

ps -eo pid,ppid,cmd,%cpu,%mem --sort=-%cpu | head -20

Non-interactive snapshot of top CPU consumers.

Per-Core Usage

mpstat -P ALL 1 5

Shows utilization for each CPU core. Essential for diagnosing unbalanced workloads.

CPU Steal Time

vmstat 1 5

The st column shows steal time—critical for detecting noisy neighbor issues in virtualized environments.

Memory Troubleshooting Commands

Memory issues cause OOM kills, swapping, and performance degradation.

Detailed Memory Breakdown

cat /proc/meminfo

Complete memory statistics. Look for MemAvailable for actual usable memory.

Top Memory Consumers

ps -eo pid,ppid,cmd,%mem --sort=-%mem | head -20

Find which processes are using the most RAM.

Cache and Buffer Usage

free -h

The buff/cache column shows memory used for disk caching—available if applications need it.

Swap Activity

vmstat 1 5

Watch si (swap in) and so (swap out) columns. Non-zero values indicate memory pressure.

OOM Killer History

dmesg | grep -i "out of memory"

Check if the OOM killer has terminated processes recently.

Disk Troubleshooting Commands

Disk issues range from full filesystems to I/O bottlenecks.

Filesystem Usage

df -h

Human-readable disk space usage. Check for partitions above 90%.

Inode Usage

df -i

Running out of inodes prevents file creation even with free space.

Directory Size

du -sh /var/log/*

Find which directories are consuming space.

Find Large Files

find / -type f -size +100M -exec ls -lh {} \; 2>/dev/null

Locate files over 100MB.

Disk I/O Statistics

iostat -xz 1 5

Watch %util for disk saturation and await for latency.

I/O by Process

iotop -o

Real-time view of which processes are performing I/O.

Process Troubleshooting Commands

Understanding process behavior is key to debugging applications.

Process Tree

pstree -p

Visualize parent-child relationships between processes.

Process State

ps aux | awk '$8 ~ /D/ {print}'

Find processes in uninterruptible sleep (D state)—often waiting on I/O.

Open File Descriptors

ls -la /proc/PID/fd | wc -l

Count open file descriptors. Compare against limits.

File Descriptor Limits

cat /proc/PID/limits | grep "open files"

Check if a process is approaching its file descriptor limit.

Process Environment

cat /proc/PID/environ | tr '\0' '\n'

View environment variables for a running process.

Network Troubleshooting Commands

Linux network troubleshooting requires specific tools.

Interface Statistics

ip -s link

Shows packet counts, errors, and drops per interface.

Routing Table

ip route

Verify traffic is taking the expected path.

Connection States

ss -s

Summary of socket states. High TIME-WAIT counts may indicate connection churn.

Listening Ports

ss -tlnp

Shows which processes are listening on which ports.

TCP Connection Details

ss -tn state established

List established connections.

Systemd Service Commands

Most modern Linux distributions use systemd.

Service Status

systemctl status your-service

Shows running state, recent logs, and resource usage.

Service Logs

journalctl -u your-service -f

Stream logs for a specific service.

Failed Services

systemctl --failed

List all services in a failed state.

Service Dependencies

systemctl list-dependencies your-service

Show what a service depends on.

Creating Linux Troubleshooting Runbooks

Combine these CLI troubleshooting commands into structured procedures:

# Linux System Health Check

## Quick Overview
​```bash
uptime && free -h && df -h
​```

## Top Resource Consumers
​```bash
ps -eo pid,cmd,%cpu,%mem --sort=-%cpu | head -10
​```

## Recent System Errors
​```bash
dmesg | tail -50 | grep -i -E "(error|fail|warn)"
​```

## Disk I/O Check
​```bash
iostat -xz 1 3
​```

For more runbook structures, see our DevOps runbook templates.

Making Troubleshooting Executable

Static documentation gets outdated. Stew keeps your Linux CLI troubleshooting commands executable—run them directly from your runbooks with output captured inline.

Join the waitlist and transform your Linux troubleshooting docs.