Good tests don't change

A common complaint about unit tests is the overhead that it takes to write and maintain them. I think the core reason for this is that people write tests that they keep needing to rewrite.

There are two specific causes I want to write about:

testing implementation details, and
tests not being abstract enough.

Good tests don't change*, even if data structures change.

* unless behavior does

Implementation details

To help talk about implementation details, let's consider a fictitious HTTP API Demeter that searches and analyses a database of foods. Demeter is pretty typical: it has several abstraction layers:

a web layer handling status codes, serialization, HTTP stuff;
a logic layer for actual... logic; and
a storage layer that handles talking to a database.

Implementation details are parts of the program that can change, and still solve the problem. Demeter stores data in a PostgreSQL database. It could have been MySQL. It could have been a graph database. It could have been Mongo. Picking is important, but usually there are many suitable choices.

diagram of the three layers, with line
between web and logic to define implementation details

You can take this to extremes and argue that the fact it is over HTTP is also an implementation detail. It could be gRPC, it could be raw TCP. I'm going to assume that the public API is not an implementation detail. To be a useful concept we need to draw a line.

So what do you unit test? The HTTP API, right?

No! Implementation details are contextual. To your users the public API may not be an implementation detail, but it is an implementation detail to our tests. Lets say one of our tests wants to verify that we can ask for the calories of a banana. In this context, anything else is an implementation detail¹.

Abstracting details

We are used to making abstractions in the code we write to solve problems, like the three layers of Demeter. We should also be making abstractions for our tests.

Most frameworks for making HTTP APIs allow you to create a test server for sending requests to, without creating a real server bound to a port. Examples I'm familiar with would be:

This lets us abstract away all the details except our public API. This is a good start, but we can go a lot further.

Abstract away how

The test server works great, so we go ahead and write hundreds of tests for Demeter's API. We have URLs, query parameters, and JSON strewn across all of these tests, because that's the public interface.

One of these tests might look like this:

describe("calory API", () => {
  it("should add a kiwi", async () => {
    const response = test_app.fetch("/foods", {
      method: "POST",
      body: { name: "kiwi" /* other fields */ },
    });
    expect(response.status).toBe(204);

    // get the food back
    const getResponse = test_app.fetch("/foods/kiwi");
    expect(await response.json()).toEqual({
      /* stuff */
    });
  });
});

A requirement changes, or a design problem is exposed. It turns out it takes a long time to add a new food to Demeter because a lot of work needs to be done updating everything connected to that food. It takes so long that users' requests are timing out.

We decide to move to a task system that lets users submit tasks, and then poll for the results later. All our tests are now failing! We have to manually go and change several hundred tests that didn't care how we got the result in the first place!

To prevent these kind of things, you can abstract away details about how you make requests. The solution here is to have a function/object that makes the type of request you have a hundred tests for. Our test might now look like this:

describe('calory API', () => {
    it('should add a kiwi', async () => {
        const response = add_food(test_app, {
            { info: { name: "kiwi" }, /* other fields */ }
        });
        expect(response.status).toBe(204);

        // get the food back
        const food = get_food(('kiwi');
        expect(food).toEqual({ /* stuff */ });
    })
})

function add_food(test_app, food) {
    const response = test_app.fetch('/foods', {
        method: "POST",
        body: { info: { name: "kiwi" }, /* other fields */ }
    });

    const task_id = response.body.task_id;

    for (const attempts = 0; attempts < 10; attempts++) {
        const response = test_app.get(`/foods/${task_id}`);
        // check response, return if success, maybe sleep if not.
    }
}

function get_food(test_app, food)) { /* details */ }

Before the change the test just used the testing library's raw request functionality directly (ie test_app.fetch). After the change, the add_food function makes the request and handles polling for us. All of our tests pass again. If this function was in place before, it would have been just changing add_food.

This is exactly what you aim for if you write tests using Cucumber or the Robot framework². You're trying to describe things at a higher level, a level that is a more human description of the test.

It's impossible to abstract things away perfectly, or to be prepared for any given change. But, a small amount can go a long way in making tests more resilient to implementation detail changes, and even public interface changes. This can make for a more pleasant developer experience³.

Abstract away what

It is easy to have tests filled with data. Data that shows how you expect a certain request's JSON to look, data that you need as input. This data is almost always an implementation detail to what you are testing.

A way to avoid this, is using builders⁴. These are classes/functions that build up this data for you, using a 'higher level' language than the raw data. For example lets say our endpoint for adding a food to Demeter required a payload like this:

const kiwi = {
  guid: "e86ca0c5-5e0a-4ac3-9d6d-2e6329f86a59",
  info: { name: "kiwi" },
};
const result = add_food(test_app, kiwi);
expect(result.status).toBe(204);

The guid field is not important to our test. Neither is the info structure. We can make a builder to create this data:

const kiwi = new FoodBuilder().name("kiwi").build();
const result = add_food(test_app, kiwi);
expect(result.status).toBe(204);

Now our test does not include the detail about the guid or info, our builder can handle that. This is way more flexible. We could

change field names,
add mandatory fields,
remove fields,
change the overall structure,
even change from JSON to another format.

We would only need to change our builder (and maybe add_food) to cope, not the many tests that might be using it.

As a bonus, the builder conveys the relevant details, and hides the irrelevant.

Stuff from elsewhere

There is a lot more to say about testing, but I want to keep it reasonably short. I have found the above techniques to be incredible useful to me across all manner of projects. I hope you might get some value out of it too.

Some related stuff you might find interesting:

How to test by matklad. This article hits so many of the points I wanted to get across here, and more.
How to test above links to Testing at the boundaries which talks about these ideas with respect to a compiler.
UI testing has PageObjects that are similar these concepts.

This was posted to reddit, and I've made some edits since then to improve clarity.

As with anything you can take this too far. If there is a component of your system that is very complicated, it can be far more practical to test that unit in isolation. I would suggest following these principles for that new unit. For example if you wrote a parser for a data format, you can test that in isolation, but avoid testing the implementation details of the parser itself! ↩
I'm personally not a fan of the 'human language' testing frameworks. I believe most of the value comes from abstracting away the details, which is more than attainable in your preferred language. The whole 'even the customer can write them!' rarely materializes. ↩
Care needs to be taken to abstract the correct things. If you abstract away any of the "Act arrange assert" parts, it can make unit tests hard to read. You should be able to tell what's being tested by just looking at the test. ↩
This is close enough to the Gang of Four (GoF) builder pattern that I think the name is appropriate. But a key part in GoF is that it is used to construct many representations, which is not done here. ↩