Troubleshooting a Unix system is about systematically narrowing down the root cause of a problem. The steps depend on whether the issue is performance-related, a crash, a service outage, or hardware failure. Here’s a structured approach you can use:
Run basic commands:
uptime # load average, uptime, logged-in users
top or htop # CPU, memory, running processes
free -m # memory usage
df -h # disk usage
du -sh /path/* # find large directories
iostat, vmstat # I/O bottlenecks
Look for:
Find misbehaving processes:
ps aux --sort=-%cpu | head
ps aux --sort=-%mem | head
Restart or kill problematic processes:
kill -9 <pid>
systemctl restart <service>
Check open ports and network services:
netstat -tulnp
ss -ltnp
System logs:
less /var/log/syslog # Debian/Ubuntu
less /var/log/messages # RHEL/CentOS
journalctl -xe
Application/service logs (e.g., Apache: /var/log/httpd/
).
Look for errors, warnings, crashes.
Verify connectivity:
ping 8.8.8.8
curl -I http://example.com
Check interface status:
ip addr
ip route
DNS issues:
dig example.com
nslookup example.com
Disk health:
smartctl -a /dev/sda
dmesg | grep -i error
Memory test (requires reboot):
memtest86+
CPU temperature / sensors:
sensors
fsck /dev/sda1
Rebuild initramfs or grub if necessary.
Look for suspicious logins:
last
w
Check for unusual processes:
ps aux | grep -v root
lsof -i
Verify permissions and firewall rules:
iptables -L -n
✅ Start broad (system load, health, logs), then drill down (processes, services, hardware).