Whatever the tools and approach, software testing proves that software does what it claims to do. Tests help developers eliminate defects, build confidence, practice good design, and ideally all three. They also take time to write, run, and update–time that’s no longer available for other development tasks.
High-value testing seeks to maximize the return on that investment. Like much of software development, it’s as much art as science. But a few practical principles can help keep things pointed in the right direction.
The first and most important rule that of the non-test. The aim of software testing—the most fundamental reason to bother—is to secure the value the software creates. If a test doesn’t, it’s better to have no test at all.
The relative benefit of additional testing tapers off as coverage nears 100%. In a safety-critical application the conversation may begin at 100%–but in less-risky settings, there are pragmatic arguments against such exhaustive testing.
The familiar challenge of balancing quality, scope, and time (pick two) applies to testing, too. As low-quality tests are worse than useless and software has no value until it ships, the compromise usually comes down to scope.
If there’s only budget for a single test, the highest return will come from verifying that the positive, “happy path” is working as intended. From there, tests can expand to cover the negative cases as risk requires and budget allows.
- Do tests inspire an appropriate level of confidence in our application?
- Is testing slowing upfront development?
All the outside world knows about a function, class, or module is its public interface. Tests should begin and end there.
As software evolves, its internal implementation should be free to follow. Testing takes time. Testing details that don’t impact outward behavior is inefficient; worse, it can confuse and discourage future changes. Internal logic that can be extracted into a generally-useful unit may deserve a test of its own—but if not, and if the interface above it is working as intended, don’t lose sleep over what’s happening beneath.
Testing is also an opportunity both to show how the interface is used and to actually use it. Every test is an extra chance to smooth out a rough edge before it can snag a customer. Afterwards, it’s living documentation.
- Do tests reference internal methods or configuration settings that wouldn’t be seen in normal operations?
- Do tests reflect how the interface should be used?
- Do tests document both the interface’s use cases and error states?
No matter the slope of the testing pyramid, separating the zippy, independent tests that verify local behavior from the hefty specs securing the assembled system will improve the responsiveness and value of both. The goal is a reliable test suite, yes, but also one that encourages a fast feedback cycle. Nobody wants to wait through a treacle-slow end-to-end run just to confirm that a unit test passes, nor should that end-to-end run shouldn’t spend much time repeating unit test assertions.
- Is it easy to delineate between lightweight unit tests and more-expensive functional or end-to-end tests?
- Are functional or end-to-end tests repeating details that should be captured by unit tests?
Any open-source library worth its salt will ship with its own community-maintained test suite. It shouldn’t require additional testing in userland. Trusting the community’s experience (and contributing patches upstream if that faith proves misplaced) means more time to test business logic and raise the value of the test suite as a whole.
- Are dependencies in widespread use by the community?
- Are tests spending time re-asserting “known-good” behavior in dependencies?
- Can gaps in a dependency’s test suite be contributed upstream?
Most of the parts within a reasonably complex software application are both dependent on and dependencies of other parts of the same. It’s common for tests to supply mocks—test-specific implementations of external interfaces—to help isolate unit behavior. But see the problem? If the “real” implementation changes, an out-of-date mock will still allow tests to pass. Even a correct mock represents new code to write, verify, and maintain–who will be testing that?
If a mock isn’t needed, don’t write it. Don’t be shy about refactoring to make it unnecessary, either. Mocks often crop up when logic and I/O are tightly coupled; in many cases decoupling them will lead to more reusable design (and simpler testing) than supplying internal behavior via a test mock.
If a mock is needed, though:
- Keep it simple (most testing utilities provide canonical tools to help with this).
- Keep it clean (don’t use mocks to store state, and don’t share mocks between tests)
- Can logic be tested independent of external behavior?
- Are mocks implemented using tools provided by a test utility or the language’s standard library?
- What assumptions are represented (and upheld) by each mock?
Tests that can’t be trusted are worse than no tests at all. Energy invested in both their reliability and depiction of reality will always, always pay off.
It works on my machine
A test suite should reach the same conclusions no matter who’s running it. For unit tests this means avoiding dependencies on network, I/O, the local timezone, and other tests. Deeper, end-to-end tests should proceed from a consistent, isolated state—shared datastores or service dependencies are a recipe for heartache when multiple test runs are proceeding in parallel in a continuous integration environment.
Tests should isolate a single variable, poke it, and see how it responds. If other things are poking it at the same time, it takes some seriously fancy filters and statistical magic–or more likely a sad, sad playthrough of the run-it-until-it-passes game–to get back to the way things were working before.
- Do tests interact with a database, filesystem, the network interface, or perform any other (potentially-fraught) I/O?
- Do tests depend on or otherwise interact with other tests?
- Will a time-dependent test produce the same output when run in another timezone?
- Will the same test pass if it’s run on another operating system? (I have been known to say unkind things to developers who let case-sensitivity issues escape their OS X or Windows development environment for a production server running Linux)
- Does logic depend on random number-generators or any other non-deterministic state?
Non-deterministic tests clog development workflows and breed mistrust in the test suite generally. Fix them promptly. Better yet, avoid writing them in the first place.
Production is the one place in the Universe that really, truly quacks like production. There’s something to be said for testing there, particularly around performance-related issues, but it’s also a very expensive place to get things wrong.
Given the hefty operational risk of testing in production, it’s usually best to substitute a reasonable facsimile instead. The application configuration and operating environment should be as “prod-like” as possible, even if system parameters are beefed up or the underlying dataset is slimmed down in the name of developer experience.
- Where does the test environment deviate from production?
- What production-specific issues aren’t (or can’t be) adequately covered in the test suite?
A comprehensive test suite is a worthy aspiration with diminishing returns. Squeezing the most from it requires a careful balance between the value it secures and its upkeep cost. A good test suite isn’t merely complete: it’s also relatively straightforward to modify or run. Tests that are easy to write, verify, and adjust encourage quick iteration and deliver more value. And tests that no-one wants to touch…
One lens for assessing upkeep is the time needed to modify an existing test.
If an assertion fails, it shouldn’t take an expedition to explain where or why. An easy first step towards clear testing (itself an art form) is to make sure the runner or test description highlights the test’s subject and motivation.
For unit tests, identify the test’s subject and what the test is meant to prove. This might be a module, class, or method—it might even be assigned by convention in the test harness. Next, explain the motivation. What is the test out to achieve?
For higher-level integration or end-to-end testing, it may be more appropriate to describe tests in terms of the actors involved and their progress through a predefined test script. Again, any failures should make it clear what the actor was trying to accomplish and where their expectations weren’t met.
- Do failed tests clearly indicate what was being tested?
- Do failed tests clearly indicate what failed?
- Do test descriptions follow a predictable format throughout the test suite?
Dogmatic test-driven developers test first. Less committed developers may test later, or not at all. But whether starting with the tests or adding them after the fact, there’s no reason to hold onto tests that have outlived their usefulness. As logic changes, tests may become irrelevant, redundant, or outright misleading. It’s tricky—often dangerous—to throw away legacy code that isn’t well understood. But if a test doesn’t make sense even after a modest investigation, the world will be a better place with one less mystery in it.
It’s not only true in testing, but a certain sort of ruthlessness goes into keeping up a high-value test suite. Tests create drag, case closed. A thorough test suite takes longer to run; builds are slower to build, and changes take longer to make. That’s a good thing when something needs to behave in a very specific way, but in less-critical code it can simply slow down the speed of other, necessary development.
Focus on value. Test accordingly. If the value isn’t there, or if the opportunity costs exceed the perceived benefits, it’s time for tests to go. They can always come back later, but odds are that you won’t even miss them.