The Web Developer's Guide to DNS

Between the zillions of JavaScript bundles, cat videos, and banner ads that are the web, it’s easy to forget that something–in fact, several somethings–have to glue it all together. One of those somethings is the Domain Name System, DNS, which bears the inglorious responsibility of turning a hostnames like pets.com into a machine-friendly IP address.

Here’s what it looks like through dig.


$ dig pets.com
; <<>> DiG 9.10.3-P4-Ubuntu <<>> pets.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17431
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;pets.com.                      IN      A

;; ANSWER SECTION:
pets.com.               9708    IN      A       72.52.10.14

;; Query time: 14 msec
;; SERVER: 127.0.1.1#53(127.0.1.1)
;; WHEN: Mon Apr 08 21:03:43 PDT 2019
;; MSG SIZE  rcvd: 53

Skip down to the QUESTION and ANSWER: we wanted pets.com, and we got back 72.52.10.14. That’s DNS in action, and most of the time we can and do take it for granted. Still, issues do arise. Most of our conscious interactions with DNS start with the word NXDOMAIN or a too-generous TTL. When something goes wrong, having a cursory understanding of what’s happening under the hood can be helpful in diagnosing, fixing, and (better yet) preempting issues in the firmament of the web.

Which is as good a place as any to start.

Layers on layers

You’re probably already on good terms with the Hypertext Transfer Protocol, HTTP, which packages up webpage content in a way that browsers (“user agents”, in the vernacular) can understand. HTTP doesn’t specify how the browser connects to a server, but another protocol–the Transmission Control Protocol, TCP–sure does. Then there’s the Internet Protocol (IP), which specifies how both client and server should be addressed, and beneath that a link layer to sort out the actual hardware.

For HTTP, the whole stack comes out something like this:

Layer Protocol Notes
Application HTTP Format data request and response
Transport TCP Deliver data between client and server
Internet IP Address client and server
Link Ethernet Map request to physical network
Layering of an HTTP request

There you have it–the protocol stack where a web developer will spend 95% of her working life. TCP/IP is the gold standard in connection management, and except for those times when full-duplex communication is worth the trouble or low latency is more important than, say, reliable delivery–more on that in a moment–it’s where the web developer’s web begins and ends.

Connections are overrated

DNS can use a similar TCP/IP stack, but being parts of a simple system, most DNS operations can also travel the wire on the Internet’s favorite Roulette wheel: the User Datagram Protocol, UDP.

On a good day, UDP is fast, simple, and stripped bare of unnecessary niceties like delivery guarantees and congestion management. But a UDP message may also never be delivered, or it may be delivered twice. It may never get a response, which makes for fun client design–particularly coming from the relatively safe and well-adjusted world of HTTP. With TCP, you get an established connection and all kinds of accommodations when Things Inevitably Go Wrong. UDP? “Best effort” delivery. Which mostly means a quick prayer for soft landings before your packet gets tossed over the fence.

There and back again, a DNS request

Let’s get down to DNS. The usual story plays out something like this:

  1. You type "pets.com" into lynx (or whatever Chrome alternative the kids are using these days)
  2. lynx asks a DNS “resolver” to identify the server containing "pets.com"
  3. The resolver doesn’t know firsthand, but it can forward your request to a friendly neighborhood DNS nameserver
  4. If the nameserver doesn’t know either, it can at least supply the address of another nameserver that might.
  5. When that nameserver doesn’t know, it may throw in the towel and ask one of the web’s root servers to kindly please direct it to the name server responsible for the portion of the domain space beneath ".com"
  6. The ".com" nameserver can identify an authoritative name server responsible for "pets.com", which can in turn provide an IP address for "pets.com"
  7. Any upstream resolvers may cache the result for future reference.

The same process can also turn a host address back into the corresponding domain. This involves a cute little trick with a special domain (in-addr.arpa) and a timely inversion. Here’s a clue: 4.4.8.8.in-addr.arpa is the hostname of the public DNS server at 8.8.4.4.

See it?

With the IP reversed, the DNS zone just under in-addr.arpa maps to an entire network (8.in-addr.arpa) immediately beneath the top-level domain space and corresponding to the 8.0.0.0/8 block of IPv4 addresses.

An even more interesting feature of DNS is its assumption that at any given hop a domain will just as often not be known. This is where UDP suddenly seems like a better fit, both mechanically–datagrams being relatively lightweight travelers through an overworked network stack–but philosophically, too.

Couldn’t resolve a host? Well, your request probably never arrived, either. Better luck next time.

In the zone

Say the datagram did arrive, however, and it’s time to serve a request. When a query reaches an adequately capable nameserver, that server will understand its place in the great domain hierarchy through a “zone” that looks something like this:

$TTL 86400  ;1d
$ORIGIN pets.com.
@             IN      SOA   ns1.pets.com. ns2.pets.com. (
                        2019040700 ; se = serial number
                        43200      ; ref = refresh (12h)
                        900        ; ret = update retry (15m)
                        1209600    ; ex = expiry (2w)
                        3600       ; nx = nxdomain ttl (1h)
                        )
              IN      NS      ns1.pets.com.
              IN      MX  10  mail.pets.com.
www           IN      CNAME   @

If you’ve adjusted CNAME or TXT records in your domain registrar’s web interface, what you were actually editing were the resource records (“RR”s) in the underlying zone. When you hit “save”, the serial number (se) incremented to reflect the change. As clients everywhere evicted their last-retrieved cached copy of the pets.com zone, your new change (with its new serial) bubbled out across the internet, and some indeterminate time later it finished going “live”.

We’ll gloss over most of the details (see: diminishing returns), but this caching business is important. Every record in DNS land contains a TTL (“time to live”) indicating how long it may be cached by a client before it needs to be refreshed from a trustworthy server again. Where the TTL isn’t explicitly set, the default $TTL is used instead.

This caching thing is such serious business that even NXDOMAIN (“NX” as in, “non-existent”, as in, “try again later”) errors within the zone still have a lifetime. The general goal is to avoid repeating DNS request for as long as reasonably possible.

Other applications

As our little tour has ventured forth from client to server and back, we haven’t once authorized a request. Assuming you know what to ask for, DNS is open to whoever comes a-knocking. This makes sense–it’s the Internet, after all–but it also has an interesting implication. Intentionally or otherwise, DNS has wound up with all the trappings of a lumbering, indispensable, distributed database. While we can use DNS to address friends and neighbors, we can also use it to establish trust (see DKIM and SPF), ownership, and the location of other interesting systems.

In web development, it’s easy to leave DNS as something to muddle through when absolutely needed. But just a bit of time invested in studying it and actually learning it shines some light on a fundamental, enduring part of the Internet’s plumbing. It’s worth a peek! And I’d love to know what you find.

Featured