That mouldering stew of legacy features, less-than-clear tests, hacks, patches, and outdated dependencies that garnishes the pasta-platter of evolved software? The only things certain in life are Death and Technical Debt.
Left unmaintained, the sonorous tones of the best designed applications eventually become discordant. Velocity slows to a crawl. Long-serving team members grow less productive, and new hires are on-boarded through a festering swamp of apologies and residual guilt.
No-one wants that.
But “want” is different than “want enough to pay for.” Weighed against the day-to-day race to ship features and support customer needs, refactoring is easy to leave unattended. If you haven’t tried to sell the business case for refactoring, here’s a line to practice:
We need to put [incredibly compelling and/or lucrative feature] on hold while we refactor, because…
Marketing looks skeptical. Sales balks. And out come the story points, velocity graphs, and senior members of the team to try and salvage the case. Their testimony adds up to something like this:
That’s the time it takes to deliver a feature, charted from now until a few minutes before the inevitable heat death of the Universe. Weeks pass. Complexity rises. Velocity drops. More and more energy is consumed by longstanding problems with legacy code, leaving less available to create real value to the business.
We can’t remove complexity entirely, nor should we: cleanup isn’t free, time and energy are finite, and investments in system hygiene must be weighed against all the other development requirements of a system. What’s more, the nuances of a sophisticated system require complexity far beyond whatever naïve solution came before.
Still, any additional complexity–necessary though it may be–brings with it a pernicious increase to the entropy of the overall system.
This second sort of complexity is the one we’re after. It doesn’t improve customers’ lives; it isn’t contributing to the health of the system; and a safari into its tangle sounds about as appetizing as an hour on the beltway at 5pm. Productivity drops. Development slows. It’s time to talk.
Let’s redraw our hypothetical velocity chart, this time with regular cleanup built in:
This is the goal: heading off the rate of decline to keep features flowing smoothly through the pipe.
Technical debt is a natural byproduct of forward progress–difficult to anticipate and even harder to plan for. Inside the tech team, the warning signs are subtle and easily missed; outside, scheduling cleanup preemptively can be next to impossible. To address it effectively, we need a framework that will encourage developer initiative without bringing feature development to a grinding halt.
Let’s approach technical debt the same way we do testing, grouping work items by scope and establishing team expectations for when and how each should be taken on. Just as we test behavior at “local” (unit), “regional” (functional), and “global” (end-to-end) scale, cleanup tasks range from a single module up to widely distributed systems. Let’s break refactoring tasks up accordingly, and consider when–and-how–issues at each level can be addressed.
Local cleanups may be as small as an individual function with cumbersome or unclear parameters. Instead of planning to fix them explicitly, our usual strategy will be to promote training and peer review that minimize them in the first place. When they do slip through, subsequent “drive-by” refactoring–where a developer leverages her immediate context to address technical debt near other in-progress work–can be an efficient tool to keep them from spreading further.
Here’s an example: we’ve just rushed an analytics client out to production that will help measure the effectiveness of a new marketing campaign. As a last-minute add, its software interface was cobbled together “as soon as possible”. The developer would normally have spent more time designing it–in fact, the presumptive design was challenged by a teammate in peer review–but with only our registration system relying on it and a deadline setting in we agreed that a fix could follow after.
Maybe a month passes, and we’re gearing up to add an additional OAuth signup to the registration flow. With drive-by refactoring, any new feature like this means a modest investment in fixing existing issues nearby. This gives the developer working on the OAuth update an implicit green light to look for and address velocity-killing issues–the dubious interface on our slapped-together analytics client, say–before they can metastasize to other parts of the project.
That’s great news not only for long-term project health, but also for the stakeholders involved. Managers are able to address technical debt preemptively, with minimal accounting. Developers have a chance to improve their own condition with very little context-switching or process overhead.
The biggest challenges of drive-by refactoring lie in building a team culture that supports it, but only within the code in question. Code review and pair programming are useful tools for recognizing and addressing local issues, and open the door to regular check-ins on scope: feedback from a peer or pair can help developers navigate away from deeper rabbit holes.
Now, the bad news. Another deadline set in and the timeline for the OAuth integration collapsed. The drive-by refactor is mired in the tar pit of “just didn’t happen,” and–fast-forward to the present day–our neglected analytics client is complicating user engagement projects across our application.
What started as a local problem has gone regional: not only do the client and registration workflow need a good Augean scrub, but all the additional consumers (and their consumers, and the tests securing them) now need attention as well. Our debt is no longer trivial, and addressing it will strain development resources and raise the risk to other projects’ timelines. With more time on the line, drive-by refactoring is no longer a viable option.
In addition to the time involved in addressing them, regional debts left untended also pose a much greater risk to nearby project timelines. In the best case, vigilant developers recognize issues early, allowing managers to consider, prioritize, and schedule their cleanup alongside the customer-facing features on the company roadmap.
The cleanup itself varies significantly. Some teams choose to ticket larger issues as they arise but wait for regularly scheduled “cleaning days” to address as many as possible. Others tackle them in their ordinary flow, with managers or product owners helping to maintain a healthy balance of features, bugfixes, and larger cleanup items in the development team’s queue.
This steady, deliberate investment is the easiest kind to shelve during planning conversations, but teams that are able to prioritize velocity-sustaining cleanup alongside short-term delivery are able to avoid the deep, involuntary disruption of issues that simply can’t wait any longer.
A local debt left untended may expand to a regional scale, but we would be surprised to see it go global. Where smaller debts are a natural by-product of development and regional debts a function of local debt left untended, the system-sized problems tend to arise from changing assumptions within the application architecture itself.
Everyone has an ORM story. ORMs aren’t inherently bad, but the convenience they lend to new projects tends to fall away in more mature, domain-specific applications. Specialized queries take up much of the database load, and–just like that–an outgrown ORM starts to impede the entire enterprise. The ORM wasn’t wrong at the time, but the assumptions have changed.
Paying a debt at this scale means a significant outlay in planning units. Weeks, months, story points, or a very large T-Shirt–we’ll burn them. Any estimates we are able to make should come with a healthy dose of salt–the incidence of “unknown unknowns“ on an application wide project is far higher than any local jurisdiction. But since refactoring at this scale may block the delivery pipeline for quite some time, decomposition and effective project management are crucial to keeping bugfixes and critical features flowing.
We also face a very serious conversation about rebooting the project and starting over. If our business logic is tightly coupled to the particulars of our data layer, for instance, the challenge of separating them–and then replacing the ORM with something better-matched to our needs–may seem more daunting than rewriting from scratch. It may even be worth it, but–as we’ve discussed in the past–the realities of greenfield development aren’t always as sunny as they seem.
So we push through. We establish patterns for refactoring smaller units of code–smother critical behavior in tests; add new implementation; remove the old–rinse, repeat, and gradually work our way through the system. And later–and maybe much later–we emerge blinking into the dawn of a new day.
Technical debt. Can’t outrun it, can’t run without it.
If we’re efficient, cleanup happens proactively. We encourage developers to address local issues while they’re easy and cheap, nipping future problems before they can spread throughout the code. For those that do reach a regional level, we strive to recognize them early and incorporate fixes into our schedules and plans. We’ll trade a few days of internal betterment to avoid a future fire, clinging dearly to proaction for as long as our situation will allow.
And those times when a bad assumption raises debt to the global scale, well, we draw up our plans and hope for the best.