Troubleshooting via a maze of network devices

04.12.2006

The portal vendor provided a list of ports that needed to be opened for proper operation. Because this was an intermittent problem, I felt sure that the necessary ports were not blocked. If there was such a block, there would be no access at all from off-site. For completeness, I checked the configuration of all intermediate devices, including firewalls, routers and traffic managers, verifying that none of the ports was blocked.

At this point, I was fairly convinced that it was either an application issue or that another port needed access that the vendor had either omitted or was not aware of. I used IPTraf to determine which ports were being accessed on the portal server from within the network (in other words, on a known working connection). The vendor port list was correct.

A traffic manager was used to manage the company's Internet bandwidth. I created a rule on the traffic manager to classify all packets to the portal's server in an attempt to see if perhaps the traffic manager was misclassifying an application. After a short time, the applications were registering correctly: HTTP and SSL. I removed the classification and turned my attention to the firewall.

Sometimes, firewalls have connection timeout rules that can block applications after a period of inactivity. However, the logs of the firewalls involved revealed no such blocks.

I then used tcpdump at the firewall to analyze traffic at the network edge. From that trace, it was apparent that some clients were not sending a PUSH packet to open the necessary portal application. This pointed to a client issue.