Troubleshooting a Unix System

Troubleshooting a Unix system is about systematically narrowing down the root cause of a problem. The steps depend on whether the issue is performance-related, a crash, a service outage, or hardware failure. Here’s a structured approach you can use:

1. Identify the Problem

Ask: What’s wrong? (slow, unresponsive, network down, process not working, boot failure, etc.)
Get error messages (logs, user reports, console output).
Note when it started and any recent changes (updates, config edits, hardware swaps).

2. Check System Health

Run basic commands:

uptime          # load average, uptime, logged-in users
top or htop     # CPU, memory, running processes
free -m         # memory usage
df -h           # disk usage
du -sh /path/*  # find large directories
iostat, vmstat  # I/O bottlenecks

Look for:

High CPU or memory usage
Disk full or high I/O wait
Swapping (memory exhaustion)

3. Check Processes & Services

Find misbehaving processes:

ps aux --sort=-%cpu | head
ps aux --sort=-%mem | head

Restart or kill problematic processes:

kill -9 <pid>
systemctl restart <service>

Check open ports and network services:

netstat -tulnp
ss -ltnp

4. Check Logs

System logs:

less /var/log/syslog    # Debian/Ubuntu
less /var/log/messages  # RHEL/CentOS
journalctl -xe

Application/service logs (e.g., Apache: /var/log/httpd/).

Look for errors, warnings, crashes.

5. Check Network

Verify connectivity:

ping 8.8.8.8
curl -I http://example.com

Check interface status:

ip addr
ip route

DNS issues:

dig example.com
nslookup example.com

6. Hardware Checks

Disk health:

smartctl -a /dev/sda
dmesg | grep -i error

Memory test (requires reboot):

memtest86+

CPU temperature / sensors:

sensors

7. Boot & Filesystem Problems

If system won’t boot, use single-user mode or rescue mode.
Check filesystem:

fsck /dev/sda1

Rebuild initramfs or grub if necessary.

8. Security Checks

Look for suspicious logins:

last
w

Check for unusual processes:

ps aux | grep -v root
lsof -i

Verify permissions and firewall rules:

iptables -L -n

9. If Still Stuck

Isolate: Is it hardware, OS, or application?
Roll back recent changes (packages, configs).
Search logs & errors online.
Check vendor/OS documentation.

Rule of Thumb

✅ Start broad (system load, health, logs), then drill down (processes, services, hardware).