21 September 2009

When nslookup and ping work, but server still looks down

Nslookup works, ping does not

Many network issues can be isolated when pinging by IP address works, but nslookup does not. This points to a name resolution issue. But what about when
  • pinging by IP works
  • nslookup works
  • pinging by name does NOT work!

Background: The secure network has access to the corporate network, but the corporate network does not have access to the secure network. This connection is managed by a standard firewall maintaining state tables.

The problem: Users on the secure network, however, are reporting that servers on the corporate network disappear and reappear randomly during the day.

So make a guess:
a) the firewall is occasionally getting overwhelmed
b) Windows DNS on the secure network is having issues
c) Windows DNS on the corporate network is having issues
d) Bad switch or other physical issue
e) Secure active directory or corporate active directory is uncooperative
f) None of the above

The corporate mail server, always up, was the system people most noticed disappearing and reappearing during the day. So, when Outlook lost connectivity, the troubleshooting begins.

Curiously, when doing nslookup, everything would look okay:


C:\Windows\system32>nslookup
Default Server: dns1.secure.net
Address: 10.1.1.1

> mail.yourdomain.com
Server: dns1.secure.net
Address: 10.1.1.1

Name: mail.yourdomain.com
Address: 10.0.6.1


But doing a ping would result in this:

C:\Windows\system32>ping mail.yourdomain.com

Ping request could not find host mail.yourdomain.com. Please check the name and try again.


Yet more curious:
C:\Windows\system32>ping 10.0.6.1
Pinging mail.yourdomain.com [10.0.6.1] with 32 bytes of data:
Reply from 10.0.6.1: bytes=32 time=376ms TTL=124
Reply from 10.0.6.1: bytes=32 time=182ms TTL=124
Reply from 10.0.6.1: bytes=32 time=484ms TTL=124
Reply from 10.0.6.1: bytes=32 time=340ms TTL=124

Ping statistics for 10.0.6.1:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 182ms, Maximum = 484ms, Average = 345ms


This would seem to eliminate an overworked firewall. Pinging by IP worked, so the connectivity was there. This also would seem to eliminate a physical layer issue.

But also, nslookup worked, so name resolution was working as well. What the heck?

The error logs of dns1.secure.net did not show any errors. The corporate network forwarder was set to dns12.yourdomain.com and dns14.yourdomain.com. Perhaps the issue was with one of these systems. The idea being, dns1.secure.net would forward the request, but one of these would not be able to reply for whatever reason.

Unfortunately, neither dns12 nor dns14 showed any issues. And furthermore, no one was reporting connection issues on the corporate network.

After a while, I gave up and tried to run a backup of my laptop to backup.publicdomain.com, which has an intranet address and an external for those not in . But I could not connect.

Again, nslookup looked fine:

C:\Windows\system32>nslookup
Default Server: dns1.secure.net
Address: 10.1.1.1

> backup.publicdomain.com
Server: dns1.secure.net
Address: 10.1.1.1

Name: backup.publicdomain.com
Address: 10.8.1.4


But ping gave me a different answer this time:

C:\Windows\system32>ping backup.publicdomain.com

Pinging backup.publicdomain.com [public IP] with 32 bytes of data
Request timed out.
Request timed out.


I realized it was doing a recursive lookup, but from where? That is when I remember there are two DNS servers on the secure network:

C:\Windows\system32>nslookup
Default Server: dns1.secure.net
Address: 10.1.1.1

> server 10.1.1.2
Default Server: dns2.secure.net
Address: 10.1.1.2

> backup.publicdomain.com
Server: dns2.secure.net
Address: 10.1.1.2

Non-authoritative answer:
Name: backup.publicdomain.com
Address: PUBLIC IP


A recursive lookup! Meaning dns2 did NOT know the company's external domain. How about the internal domain?

> mail.yourdomain.com
Server: dns2.secure.net
Address: 10.1.1.2

*** dns2.secure.net can't find mail.yourdomain.com: Non-existent domain
> owa.yourdomain.com
Server: dns2.secure.net
Address: 10.1.1.2

*** dns2.secure.net can't find owa.yourdomain.com: Non-existent domain


In other words, 10.1.1.2 did not have the forwarders set up. And at some point, the client TCP/IP stack had switched dns2 for DNS, EVEN THOUGH NSLOOKUP WENT TO dns1!!!

I have since added all of the forwarders so they match. Since then, I have not had any connectivity issues. Problem solved. And why were the forwarders not set up in the first place? The dns server had recently been rebuilt from scratch and the during the rebuild, the forwarders were forgotten.


Answer: b) Windows DNS on the secure network is having issues

No comments: