Thinking in Types

CREDIT: nietjuh via pexels (CC/2.0)

Types are an unavoidable detail of computer programming. Whether they’re set explicitly or not, data types represent the assumptions we’ve made about a program–its inputs and outputs, of course, but also the structure and behavior of internal state. Even when a legacy code base or institutional challenges prevent us from fully embracing them, thinking in terms of types will yield software that’s clearer and easier to maintain.

First, some background.

Four types

Programming languages approach typing along two axes. One axis ranges from “weak” to “strong” typing and distinguishes type systems that will not allow unexpected datatypes (strong typing) from those that will make every possible accommodation before throwing a type error (weak typing).

The other axis separates “static” and “dynamic” type-checking, where the static variety happens at compile time and dynamic somewhat later. Dynamic languages still have type errors. They just aren’t raised until they happen.

When we’re thinking in types, we’re usually thinking in terms of the strong, static sort. Many of the languages that fall into this quartile are sometimes criticized by dynamic-language aficionados for the paperwork they add, but–as we’ll see shortly–the same rules of the type system still exist whether we set them intentionally or not.

Weak types

Imagine you’re the interpreter of a generic, weakly-typed language. You’re laying low in memory, just minding your own business, when along comes a developer and does what developers do. “Hey! Interpreter! Run me this thing.”

function double (a) {
  return 2 * a;
}

In our hypothetical language (which just happens to look rather like JavaScript) “this thing” is double, a function that will multiply its lone argument by two. We’ll gamely run double for any type of input we can think of, though the result may not always be what the developer expects:

double(2)     // 4
double('2')   // 4
double(['2']) // 4
double('two') // NaN

Duck, duck, NaN–a little mystery, but utterly self-inflicted. In a language with static typing (that just happens to look rather like TypeScript), the type signature of double would look something like this:

declare function double (a: any): any;

In other words, the function definition gives us negligible information about the input or return type of double. When arbitrary data is passed in, it’s all an interpreter can do to make an educated guess (within certain pre-existing rules) about how it should be handled.

What we do know, however, is that double should return the product of a Number (2) and the input a. This provides a useful clue. JavaScript represents * in infix notation, but consider how multiplication would appear if written as a function:

declare function * (a: Number, b: Number): Number;

Since the * operation will always return a Number, the interpreter can now fill in the missing return type in the original declaration.

declare function double (a: any): Number;

At this point, all that’s left is the small matter of deciphering a. Here, too, we have a clue: just like its return value, both operands of * must be numbers. If we provide an a that is not a number, the compiler will attempt to satisfy this constraint by “fixing” our mistake.

2 * 2     // (a: Number, b: Number)
2 * '2'   // b: coercion (String => Number, `b = 2`)
2 * ['2'] // b: coercion (Array => String => Number, `b = 2`)
2 * 'two' // b: coercion fails (String => Number, `b => NaN`)

All of this is to say that a deep enough dive under the hood will eventually reveal an operation where have to know a type. Processors know nothing of String, Array, or Number; to add we need two numbers of fixed size. Explicit or not, types are necessary details. Weak typing simply leaves it to the interpreter to deduce (or guess) how the puzzle-pieces ought to fit together. If we want double to take numbers, we have to give it numbers. If we don’t know what to give double, we need to be prepared for whatever mysteries fall out. Even when they’re implicit, clear types make for easier reasoning and fewer errors.

Strengthening weak types

But maybe we want to guarantee types anyway. In JavaScript, for example, developers will sometimes deploy type guards to bound the range of data types they need to worry about. If we only want to handle numbers inside double, for instance, there’s an easy way to enforce it:

function double (a) {
  if (typeof a !== 'number') {
    throw new TypeError('a is not a number!');
  }
  return a * a;
}

In effect, we’ve lifted one tiny section of weakly-typed code into something resembling strong typing. Heavy-handed, perhaps, but we can now trust a to be numeric.

While it’s possible to shore up weak typing, doing it ourselves throughout a non-trivial codebase tends to be non-performant, exhausting, and error-prone. “Don’t fight the language,” the old axiom goes, and user-defined type checking comes perilously close.

If we want types and the context allows, maybe there’s an easier way.

Types

Contrast the mix of guesswork and defensive coding we employed in our weakly-typed language the type declaration we would give up front in a strong, statically-typed variation:

declare function double (a: number): number;

We’ll no longer be able to double('2') without explicitly casting the string, of course, but with this definition we can now trust that double will only receive for numeric arguments. As an added bonus, TypeScript is statically-typed, meaning that all types must check out at compile time (JavaScript, a dynamic language, will perform its sporadic checks only at runtime).

While weak types are extremely flexible—we can double string arguments without casting them!—even in a toy example they can lead us to unpleasant surprises. With static typing we’re forced to declare intent, rather than burying it away for others to stumble across.

Type systems taken to their logical extreme can add considerable complexity with only marginal return. But applying them pragmatically–or at least conceding their necessity–can lessen guesswork and make software easier to maintain. That doesn’t mean aggressively guarding the types in every. single. path of a dynamic language, but it does mean characterizing overloaded functions, frequent existence checks, and type coercion beneath the input layer as smells to be addressed. Variables should be consistent; functions should be clear; and data inside the program should rarely concern itself with null or undefined.

We’ll never save ourselves from bad assumptions, of course, but we can always strive to produce simpler, safer, more reliable code. Even in dynamic, weakly-typed languages, thinking in types is a good place to start!

Let’s keep in touch

Reach out on Twitter or subscribe for (very) occasional updates.

Hey, I'm RJ: digital entomologist and intermittent micropoet, writing from the beautiful Rose City.