The Documentation You've Been Looking For

Back in the dark ages, machines and their human operators communicated by flipping bits–a slow, cumbersome exercise whose necessity was thankfully short-lived. To share information more efficiently, the operators began speaking with one another in ever higher levels of abstraction. Pushes and pops became subroutines and classes, all in an attempt to describe the machines’ operation in more accessible terms.

But it’s called “code” for a reason. Even the clearest software turns utterly opaque when handed to a reader without explanation or context, and that assumes the software is accessible at all. If we’re shipping binaries or a closed-source API, the program on its own is about as useful as a Greek phrasebook in Bolivia.

So, context. Context, context, context. And that’s why we have documentation.

The last thing out

For its all its value, documentation is often the furthest thing from a developer’s mind. Even in a waterfall project with a first-pass spec available up front, the code will inevitably change before it ships. New information becomes available. New customer-driven feature requests are fulfilled. Documentation written before the project is loaded on to the truck is documentation that will change again.

Writing documentation preemptively is to resign ourselves to rework. Nobody likes rework.

The self-documenting myth

That’s why we have self-documenting code, right? Code as crystal-clear as a mountain spring–the kind of code that others can effortlessly consume with no direction beyond the source file itself.

Let’s set that notion aside for a moment. Conventions, clear code, and simple interfaces are goals we should be striving for anyway, but aspects of even immaculate code can only be read by inference. Why did the author define a particular method? What trade-offs were considered in the design of this class? How have they evolved over time? And what problem is the unit actually intended to be used?

One answer is external, prose documentation. No matter how deep we dig through the midden, we can only be so expressive with our source code alone. On the other hand, documentation outside our code lies out of sight. Out of mind. Without a conscience effort to maintain it, it falls immediately out of date with changes to the software itself.

For the interface to a proprietary project that might be an acceptable–or unavoidable–price. For a fast-moving startup’s dev team, though, it’s absolutely avoidable. The last thing a company driven by immediate value wants is an inward-facing project vying for engineers’ attention.

We have this: the self-documenting ideal is easily maintained. In the perfect case, “self-documenting” simply means employing the same best practices that should define the code anyway. Zero overhead, but it can only provide context as sophisticated as the structure and organization of the code will allow. At the other end of the spectrum, prose comes with all the benefits and hindrances one would expect: while it is accessible and easily authored, its distance from the source code requires a separate effort to maintain.

The middle way

If we don’t want prose, and we can’t tell our complete story with code alone, it’s time to turn to supplemental documentation. We won’t pour cycles into external documentation, but neither will we avoid it. We’ll simply employ good practices in the tools we already use to maintain references into the “why” of our code. The result will be less accessible than prose but easier to maintain; unlike purely self-documenting code, the context will actually be available when we need it.

The story begins with three supplemental sources that already factor into our day-to-day routine:

  • source-control
  • automated test scenarios
  • inline comments

For each, we’ll consider what information we can capture and how to use it effectively. By the end, we’ll understand how the pieces support each other and fit into the larger documentation puzzle.

Source control

First up, source control. It’s our history: a log of incremental changes that we can use to reconstruct a thorough picture of the changes that have contributed to the project’s current state. Using git as an example (though the specific tool is a bit of a detail), consider the potential trove of information that could be buried in any commit:

Adds 'X-Frame-Options' header

Global header prevents rendering inside iframes (clickjacking) for users
with supported browsers. Additional embed pages may be whitelisted later.

Curious about the rationale behind a particular piece of code? So long as the hashes of previous changes can be tracked down, all the details of the commit can help explain it.

Of course, history is only as useful as what is written down: effective use of source control means practicing good hygiene.

For git, commits consist of a one-line summary followed by any additional detail about the issue. That summary is critical: besides showing at the top of each message in git log, it’s also the first information represented in many front ends.

$ git log --pretty=oneline
c862b494d7c2f88fa1a5f369a5848be8e5e7133d Improves tablet layouts
48e030df5cfa6a4c7e90a978fc9f52b0c5d77534 Adds compass watch task

The detail in a longer description comes in handy during initial review, when reviewing changelog entries, or when using git blame to track down the commit representing a particular design decision.

$ git blame --date=short
121bdd59 (rjz 2014-09-16 73)   if (start > 0) {
cbf3c4a3 (rjz 2014-09-16 74)     prev = (-1 + Math.max(0, start - limit));
cd079d6f (rjz 2014-09-14 75)   }

$ git show 121bdd59 --summary
commit 121bdd59f01696fd3c3dd5440a7326aeba5dd5b3
Author: rjz <rj+nospam@rjzaworski.com>
Date:   Tue Sep 16 19:52:13 2014 -0700

    Sets pagination bounds (Fixes #53)

    Checks that `prev` and `next` values refer to accessible pages before
    attempting to return them. If inaccessible, return placeholders to
    minimize upstream exceptions

If there’s ever a temptation to push a commit labeled ‘WIP’ or ‘more changes’, think of the future.

Test coverage

Tests are a critical piece of the software development cycle, but–besides ensuring that code works–well-written test scenarios describe everything from the high-level problems motivating a feature to the expected behavior at each step along the way.

Effective testing is a broad problem on its own, but for the sake of documentation we need two things:

  1. Clear annotations throughout the test scenarios
  2. Accessible mappings from the tests to the code they cover

Annotations can take a variety of forms. In BDD-style testing, the scenario description is simply a part of the test:

Scenario: Registering a new account
  Given a registration request
  When a valid username and password are supplied
  Then a new account is created

Filenames and inline comments can also help hint at the scenario’s goal and fill in details, but–so long as the rationale and expected outcome of the test are clear–even the test code alone can provide useful context for its subject.

It’s little surprise, then, that another aspect of effective tests is a mapping back to the tested code. While the example above describes the intent of something (“Registering a new account”), we should always try to describe which part of the code it addresses. Filenames and convention go a long way:

$ ls routes
routes/registration.rb
routes/spec/registration_spec.rb

If a test corresponds to a particular module or method within a file, however, we can be even more specific by incorporating the subject name into the test itself.

Scenario: registration#create() Registering a new account
  Given a registration request
  When a valid username and password are supplied
  Then a new account is created

Switching between code and its spec is now trivial. Simply match the keyword (if our editor supports it), or–in the very worst case–grep for occurrences of registration.create elsewhere in the project.

Named linkage is more obviously more applicable to unit testing than to higher-level tests, but keyword-happy names provide context at every level. We group functional tests after a particular project component or feature group; so long as the name is familiar and reflected in namespaces, folders, or our documentation labeled another developer can easily map the code to the usages documented in the tests.

Inline comments

The final pieces in our brief survey is the ubiquitous inline comment. In a self-documenting world, it’s an unwanted smell–excessive detail tacked on to code the developer couldn’t make speak for itself. This criticism isn’t unfounded; many times an inline comment may simple be replaced by a clearer method name or an intermediate variable.

For instance, many comments:

def atIndex (items: Vector[String], index: Int): Option[String] =
  // `index` is one-based index
  if (Range(0, items.size - 1).isDefinedAt(index - 1))
    Some(items(index - 1))
  else
    None

…can be clearly conveyed by renaming a parameter:

def atIndex (items: Vector[String], index: Int): Option[String] = {
  val zeroBasedIndex = index - 1
  if (Range(0, items.size - 1).isDefinedAt(zeroBasedIndex))
    Some(items(zeroBasedIndex))
  else
    None
}

At times, though, code can’t speak for itself. Perhaps it covers a legacy use-case, or a business edge not captured in the underlying data model. Whatever the reason, comments are far more locally apparent than either test-cases or notes left committed to far-away source control.

// Expects a human-friendly one-based `index`
def atIndex (items: Vector[String], index: Int): Option[String] = {
  // ...
}

While they help reduce context switching during development, comments are neither a primary source of information (exception: autodoc annotations) nor a replacement for more thorough documentation elsewhere. And even local as they are, they still must be maintained: out-of-date comments range from confusing to downright damaging, if the referenced code has changed beneath them.

Effective comments capture hints that aren’t readily apparent from the code. They do so succinctly, with clear focus and purpose. They can be read at a glance. They don’t require a focus group to understand. They convey the rationale for neighboring code, and–if additional details are available–cite external references (bug reports, tickets, or issue numbers) for the benefit of the reader.

Conclusion

Our code’s story is all around us, but unless we’re shipping binaries we don’t need to write books to tell it. The tools we already use can provide the context we need–why a decision is at it is; the intended behavior of this feature or that–and for internal documentation, at least, we need not burden ourselves with much more. By maintaining good hygiene, by dropping ample references, and by being conscious of the role our history, tests, and comments play, we can leave the prose for the user manual–and get back to work on the implementation beneath it.

Hey, I'm RJ! For more learnings about software and management, find me @rjzaworski or sign up for my semi-regular newsletter.

Let’s keep in touch

Send me timely updates on software, product, and process.