DevOps Day 12: The Port Conflict Detective Story¶
Today's task was the most realistic production troubleshooting scenario I've faced so far. It wasn't a simple, one-step fix; it was a layered problem that required a methodical approach to diagnose and resolve. An Apache service was unreachable, and I had to put on my detective hat to figure out why.
This journey took me from identifying a failed service, to discovering a port conflict with a completely unexpected application, and finally to configuring a firewall I hadn't anticipated. It was a fantastic lesson in not jumping to conclusions and using the system's own diagnostic tools to follow the evidence.
Table of Contents¶
- The Task
- My Troubleshooting Journey: A Step-by-Step Solution
- Why Did I Do This? (The "What & Why")
- Deep Dive: The Sysadmin's Method - A Layered Approach
- Common Pitfalls
- Exploring the Commands Used
The Task¶
My objective was to diagnose and fix an issue on App Server 1, where the Apache web server was unreachable on its designated port (e.g., 6400 or 5003). I had to ensure the service was running correctly and was accessible from the jump host.
My Troubleshooting Journey: A Step-by-Step Solution¶
My approach was to systematically investigate each layer of the potential problem, from the application itself to the network firewall.
Step 1: Confirming the Failure¶
First, from the jump host, I confirmed the issue using curl.
curl http://stapp01:6400
# Output: curl: (7) Failed to connect to stapp01 port 6400: No route to host
Step 2: The First Clue - A Failed Service¶
I logged into App Server 1 and checked the Apache (httpd) service status.
ssh tony@stapp01
sudo systemctl status httpd
failed state. This was my first big clue. The problem wasn't just a blocked port; the application wasn't even running.
Step 3: The Second Clue - The Root Cause¶
The systemctl status output gave me the most important piece of evidence:
(98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:6400
Step 4: The Third Clue - Identifying the Culprit¶
To find the "squatter" on port 6400, I used netstat.
sudo netstat -tulpn | grep 6400
tcp 0 0 127.0.0.1:6400 0.0.0.0:* LISTEN 445/sendmail: accep
The sendmail service, which handles email, was incorrectly configured and had stolen Apache's port.
Step 5: The First Fix - Resolving the Port Conflict¶
With the culprit identified, the fix was clear. I stopped sendmail to free up the port and then started httpd.
# Stop the conflicting service
sudo systemctl stop sendmail
sudo systemctl disable sendmail # To prevent it from starting on reboot
# Start the correct service
sudo systemctl start httpd
sudo systemctl status httpd confirmed that Apache was now active (running).
Step 6: The Final Hurdle - The Firewall¶
I went back to the jump host and ran curl again. It still failed with "No route to host." This proved that even though the service was running, there was a second problem: the server's firewall was blocking external connections.
I initially assumed the server used firewalld.
sudo firewall-cmd --permanent --add-port=6400/tcp
# Output: sudo: firewall-cmd: command not found
iptables firewall system.
Step 7: The Final Fix - Configuring iptables¶
With the correct tool identified, I added the rule to allow traffic on the port and, crucially, saved the new configuration.
# Add a rule to accept TCP traffic on port 6400
sudo iptables -I INPUT -p tcp --dport 6400 -j ACCEPT
# Save the new rules so they persist after a reboot
sudo service iptables save
Step 8: Final Verification¶
One last time, I switched back to the jump host and ran the test.
curl http://stapp01:6400
Why Did I Do This? (The "What & Why")¶
- Troubleshooting Methodology: This task wasn't about knowing one command; it was about having a process. By checking the service, then the logs, then the ports, and finally the firewall, I could logically narrow down the problem without guessing.
- Port Binding: A core networking concept. Only one application can "bind" to or "listen" on a specific IP address and port combination at a time. The "Address already in use" error is a classic symptom of a port conflict.
- Firewalls (firewalld vs. iptables): A server firewall is a security layer that controls what network traffic is allowed in and out.
- firewalld is the modern, dynamic firewall manager on most RHEL-based systems.
- iptables is the older, classic Linux firewall utility. It's powerful but more complex. Encountering the command not found error for firewall-cmd was the key that told me I needed to use iptables instead. This is a common situation in environments with servers of different ages.
Deep Dive: The Sysadmin's Method - A Layered Approach¶
The most valuable lesson from this task was reinforcing a systematic troubleshooting method. When a service is unreachable, I now follow this mental checklist:
-
Layer 1: Is the Service Running? (
systemctl status)- If it's
failedordead, the problem is with the service itself. I need to check the logs (journalctl). - If it's
running, the service is likely fine, and the problem is with its configuration or the network.
- If it's
-
Layer 2: Is it Listening on the Correct Port? (
netstat -tulpn)- This checks the service's configuration. Is it listening on the port I expect? Is it listening on the right network interface (e.g.,
0.0.0.0for all, or127.0.0.1for local only)?
- This checks the service's configuration. Is it listening on the port I expect? Is it listening on the right network interface (e.g.,
-
Layer 3: Is the Firewall Blocking the Port? (
firewall-cmdoriptables)- Even if the service is running and listening correctly, the server's firewall might be dropping all incoming connections. This is the final gatekeeper.
By methodically peeling back these layers, I can find the root cause of almost any "service unreachable" issue.
Common Pitfalls¶
- Stopping at the First Fix: The biggest trap would have been stopping after I got Apache running. I might have assumed the problem was solved, but the firewall was a second, independent issue. Always verify from the client's perspective!
- Assuming the Firewall Type: As I saw, not all servers use the same firewall. Being prepared to switch from firewalld to iptables (or vice-versa) is a key skill.
- Forgetting to Save iptables Rules: A classic mistake is to add a rule with the iptables command but forget to run service iptables save. The rule would work temporarily but would be lost after the next server reboot, causing the problem to mysteriously reappear.
Exploring the Commands Used¶
- curl http://[host]:[port]: My primary tool for testing connectivity from a client's perspective.
- sudo systemctl status [service]: Checks the current status of a service.
- sudo systemctl start/stop/disable [service]: The basic commands for controlling a service.
- sudo journalctl -xeu [service]: Shows the detailed logs for a specific service, essential for finding error messages.
- sudo netstat -tulpn: A powerful command to see all listening TCP/UDP ports, the programs using them, and their PIDs.
- sudo iptables -I INPUT -p tcp --dport [port] -j ACCEPT: Inserts a rule at the top of the input chain to accept TCP traffic on a specific port.
- sudo service iptables save: Saves the current iptables rules so they persist after a reboot.