Slow DNS Resolution on Ubuntu Linux Server 14.05 LTS

Posted by & filed under Linux, Server Admin.

This all started with WordPress timeouts. I was trying to activate some premium plugins, and the license activation was timing out. I started doing some digging and found they use the WordPress core library WP_http which in turn uses curl to make the request. I wrote my own code to use WP_Http and it failed in the same way with a timeout. I added a timeout parameter to the wp_remote_get() call, and it was able to complete without a timeout. I then used a IP address in place of the domain name and it worked without the need for the timeout parameter.

<?php
$response = wp_remote_get('http://google.com');
print_r($response);
echo wp_remote_retrieve_body( $response );
?>

With that info in hand, I decided it must be on the server. I started doing some tests:

web@web:~$ time curl "http://google.com"
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>

real    0m5.565s
user    0m0.007s
sys     0m0.000s

I then did the same test from another server that uses the same DNS servers in resolv.conf:

dev@web1 [~]# time curl google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>

real    0m0.121s
user    0m0.000s

After much googling, I found a few number of suggested solutions:

  • Disable IPv6
  • Ensure /etc/nsswitch.conf is set correctly (hosts: files dns)

Neither of these worked for me. Finally, I added the following directive into my resolv.conf and it fixed the issue!

options single-request

Apparently, this is actually somewhat related to ipv6 — from the resolv.conf manpage:

single-request (since glibc 2.10)
                     Sets RES_SNGLKUP in _res.options.  By default, glibc
                     performs IPv4 and IPv6 lookups in parallel since
                     version 2.9.  Some appliance DNS servers cannot handle
                     these queries properly and make the requests time out.
                     This option disables the behavior and makes glibc
                     perform the IPv6 and IPv4 requests sequentially (at the
                     cost of some slowdown of the resolving process).

Now, I get good response times when I curl:

web@web:~$ time curl google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>

real    0m0.170s
user    0m0.007s
sys     0m0.000s

Looks like the resolver sends parallel requests, fails to see the IPv6 response, waits 5 sec and sends sequential requests because it thinks the nameserver is broken. By adding the options single-request, glibc makes the requests sequentially be default and does not timeout.

I found some good info and hints on this issue here: https://bbs.archlinux.org/viewtopic.php?id=75770

Lastly, to bring this whole thing full circle, the WprdPress plugins now are able to get out and communicate successfully. Woohoo!