Find out if a service has gone down on a remote machine

Q I have a remote server thousands of miles away. Unfortunately, all I have is the bandwidth and the hardware. Whenever things go wrong, I either have to pay extra or fix things myself. I am also running a software firewall, using the iptables. When a service is unreachable, what's the best way to find out where the breakage is occurring? Secondly, can you recommend a good way of applying firewall rules while making sure my SSH session doesn't get dropped?

A I'll start with your first question. Let's check for the most obvious cause: whether there's a process listening on the port we're trying to connect to: port 25, say.

# netstat -vatnpu | grep 25
tcp       0       0 127.0.0.1:25
0.0.0.0:*                LISTEN
3971/master

That shows Postfix is running but is bound to the loopback interface. Loopback is unlike regular network interfaces in that anything bound to it is not accessible to the outside world, but is limited to the same machine. So that might be a point of failure.

But what if the output suggested everything was OK on that front?

# netstat -vatnpu | grep 25
tcp       0     0 0.0.0.0:25
0.0.0.0:*               LISTEN
3971/master

That indicates the daemon is listening correctly on all addresses for all addresses. So let's check if the daemon is actually running healthily. We do this by initiating a Telnet connection from the same machine to the public IP of the external interface. Let's pretend it's 1.2.3.4.

$ telnet 1.2.3.4 25
Trying 1.2.3.4...
Connected to 1.2.3.4 (1.2.3.4).
Escape character is '^]'.

Exact output depends on the daemon's config. So now we know that the process is alive and kicking and that the daemon is listening on the correct address or addresses. One last thing we can do on the local machine is to sniff the interface that the daemon is supposed to be listening on; eth0, say. You should look for packets in both directions:

# tcpdump -vni eth0 tcp port 25
tcpdump: listening on eth0, link-type
EN10MB (Ethernet), capture size 96 bytes
21:53:16.627942 IP (tos 0x10, ttl
64, id 4623, offset 0, flags [DF],
proto 6, length: 60) 1.2.3.5.52056 >
1.2.3.4.25: S [tcp sum ok]
2918495501:2918495501(0) win
32767 <mss
16396,sackOK,timestamp 34318082
0,nop,wscale 2>
21:53:16.628093 IP (tos 0x0, ttl 64,
id 0, offset 0, flags [DF], proto 6,
length: 60) 1.2.3.4.25 >
1.2.3.5.52056: S [tcp sum ok]
2929251633:2929251633(0) ack
2918495502 win 32767 <mss
6396,sackOK,timestamp 3431808
34318082,nop,wscale 2>/

These two packets are a SYN and a SYN/ACK message to and from the daemon on port 25 respectively. We could consider two more outputs. The first is where the daemon seems to reply with an address that is different from the destination address in the first packet, like this:

# tcpdump -vni eth0 tcp port 25
tcpdump: listening on eth0, link-type
EN10MB (Ethernet), capture size 96
bytes
21:53:16.627942 IP (tos 0x10, ttl
64, id 4623, offset 0, flags [DF],
proto 6, length: 60) 1.2.3.5.52056 >
1.2.3.4.25: S [tcp sum ok]
2918495501:2918495501(0) win
32767 <mss
16396,sackOK,timestamp 34318082
0,nop,wscale 2>
21:53:16.628093 IP (tos 0x0, ttl 64,
id 0, offset 0, flags [DF], proto 6,
length: 60) 5.6.7.8.25 >
1.2.3.5.52056: S [tcp sum ok]
2929251633:2929251633(0) ack
2918495502 win 32767 <mss
16396,sackOK,timestamp 34318082
34318082,nop,wscale 2>

You may ask why the daemon would send a packet back with a different address than the one it was contacted with. This can happen when you use source NAT incorrectly. If you looked through the output from iptables -t nat -L -vn | grep '[MASQ|NAT] you would probably find the culprit. The last possible output you might come across is where you can't see anything in tcpdump. That happens when your host is blocking access to that port. I assumed the client and server machine have adequate connectivity and that the server is reachable from the client machine, otherwise the answer could fill a book! To answer your second question, I've found that it's not uncommon to get locked out of a machine due to a hasty firewall command or a wrong sequence of commands. There are some precautions you can take. Where I'm implementing a firewall for the first time and need to set INPUT's policy to DROP, I time a service iptables restart in case I'm locked out just after adding all the ACCEPT rules. You need to be in 'screen' if you would like to disconnect using the same terminal and then reconnect. You need to reconnect because the existing connection might still be healthy as the packets are matching a rule with ESTABLISHED. The command is as follows:

# iptables -P INPUT DROP &&
sleep 10m && service iptables
restart

Do Ctrl+A Ctrl+D to detach from the screen session, log out and reconnect. You can reattach the screen session by typing screen -r and Ctrl+C in the shell, which will cause sleep to fail and service iptables restart not to be issued. We get the same outcome on a generic Linux install by using the iptables save/restore commands supplied in the iptables package. iptables-save is a utility that dumps the kernel iptables setup in a format that iptables restore understands, to STDOUT by default. As you might have guessed, prior to running the script or the DROP rule, we will be saving the current in kernel config, only to a file, by regular redirection. Do this by typing

# iptables-save > ~/iptables-dump

The same config would be instantiated in kernel by typing

# iptables-restore < ~/iptables-dump

The previous process of issuing the iptables commands in a safe way could be repeated by typing

# iptables-save > ~/iptables dump
&& iptables -P INPUT DROP &&
sleep10m && iptables-restore < ~iptables-dump

Or where we're running a script:

# iptables-save > ~/iptables-dump
&& /path/to/firewall/script &&
sleep10m && iptables-restore < ~/
iptables-dump

Back to the list