Owen Gage

Unit test your public API

The correct level to unit test your system is the public API. Not the function level. Not the class level. Unit testing at this level of abstraction amplifies its benefits: the ability to change and refactor with confidence.

As a more concrete example, if you have a service that is exposed as a HTTP API, your unit tests should be making HTTP requests. The framework you're using likely has some mechanism to do this without a HTTP server being bound to a port. If your service is based on events, your unit tests should be creating and receiving events.

How do we do this in practice, and how do we maximize the benefit of being able to refactor our code?

Jump to key points

Vicious mockery

You need to be able to run your entire service within a unit test. For real systems this can be difficult. You may have one or more databases you connect to, other APIs that you depend on, or have timeouts that would make tests take too long.

The solution to this is mocking. I'm going to use this term broadly to mean creating a mimic. I don't find distinguishing spies, mocks, fakes etc useful.

Mocks' only strong use case is testing failures. For example, a mock makes it possible to know how your service behaves if your database fails, or your network connection drops. There isn't always a good way to test your application with real dependencies, so we have to mock.

In order of desperation, I follow these techniques:

  1. Don't mock, use the real version.
  2. Use a substitute, for example, an in-memory database.
  3. Use a mock of something you control.

Don't mock

If you can use the real version for testing, you should. If you depend on a JSON parser, just use the JSON parser in the tests. You don't need to mock something that has no side effects. You may need some tests that do mock dependencies like this to trigger unlikely failure conditions, but it shouldn't be many.

Things like SQLite's in-memory database can also be used in unit testing. This means you'll actually be unit testing SQL statements, which are often only caught at system test level.

The concept of 'mocking everything but the code under test' is damaging to our goal of being able to refactor. Any refactoring will involve updating mocks, increasing the amount of work and mental effort.

Excessive mocking reduces the efficacy of tests. Mocks encode behaviour, and typically no obvious error occurs if a mock's behaviour doesn't match the real implementation. The long-term effect of this kind of mocking is that tests will pass even if the system doesn't actually function, which undermines everyone's trust in the tests. All mocking has this effect and we avoid it by mocking as little as possible.

Use a substitute

It would be easy to argue that a substitute is a mock given my broad use of the word. I agree, but the distinguishing factor to me is if you have to implement the behaviour. A substitute is when you are able to mostly just drop a different implementation in place without much effort.

Like the previously mentioned SQLite in-memory database, we can substitute using a real database persisted to disk with an in-memory one. The line between using a real thing and a substitute can be blurry, but in any case I haven't had to implement a SQL engine.

Other examples may be using a hashmap rather than a key-value store like Redis, though if you depend on advanced usage you may end up implementing some behaviour yourself.

Aim for substitutes that are fast, easy to set up, and mimic production dependencies closely.

Use a mock of something you control

Okay so there is no substitute. You need to get your hands dirty and implement something yourself. For most tests, you probably want something that just acts like the real thing for all intents and purposes, but you can skimp everywhere else.

Much like unit testing at the right level is important, mocking at the right level is important. It can be tempting to implement something exactly like what you are mocking, but you shouldn't. Mock at your abstraction level, not that of your dependency.

The most obvious example of this is if you have to mock a SQL database. You're not going to write a SQL parser and implement an entire mock SQL engine in order to mock out your database. You're going to create some form of higher-level interface (such as CRUD operations on users) for what operations you do on your database and mock that.

Not only does this make dependencies easier to mock, it generally makes them easier to replace. The less unique behaviour you encode into your mocks the more likely it is another implementation could be slotted into it.

Something you control indicates that you should not mock things you don't have control of. This ties nicely into high-level mocks. Often languages will have some way to directly mock libraries. For example, Node's proxyquire allows overriding imports, while Python's patch does similar. If you've used these you'll know they can be a bit finicky and tend to have subtle behaviour differences between languages.

These methods allow you to mock things you don't control. You don't own the public APIs of the dependencies you depend on. If they change then every test that uses these mocks will likely need updating. If you instead use high-level mocks only of APIs you've defined yourself, then you can update just those mocks. See my previous Good tests don't change.

Instead of using these different monkey-patching methods, we can use simple dependency injection the same way in practically every language.

Dependency injection

The key to making it easy to unit test your public API is good dependency injection. I like to keep my usage simple, with no fancy frameworks. Just object constructors and interfaces organised in whatever form of 'main' your language has.

This avoids the need for the language-specific monkey-patching and, I would argue, leads to clearer code and less knowledge burden.

Here is an example of main from a simplified login service in Rust and Axum:

#[tokio::main]
async fn main() {
    let cookie_domain = must_env("COOKIE_DOMAIN");
    let external_bind = must_env("EXTERNAL_BIND");
    let conn_str = must_env("CONNECTION_STRING");
    let origins = must_env("ALLOW_ORIGINS");
    let notifier = Arc::new(Mutex::new(notifier_from_env().await));
    let pool = sqlx::SqlitePool::connect(&conn_str).await.unwrap();

    let state = AppStateInner {
        cookie_domain,
        pool,
        notifier,
    };

    let log_layer = common_rs::default_log_layer();
    let external_router =
        ext::make_external_router(state.clone(), &origins).layer(log_layer);

    info!("Binding external API to {external_bind}");
    let external = TcpListener::bind(external_bind).await.unwrap();

    let mut sigterm = signal(SignalKind::terminate()).unwrap();
    let mut sigint = signal(SignalKind::interrupt()).unwrap();

    let ext_serve = axum::serve(external, external_router.into_make_service());

    tokio::select!(
        _ = ext_serve.into_future() => {},
        _ = sigterm.recv() => {},
        _ = sigint.recv() => {},
    );
}

This shows main taking responsibility for a few things such as:

The ext::make_external_router function contains pretty much all the logic of the service, and can be thoroughly unit tested. We can create a harness for the vast majority of our tests like so:

async fn harness() -> Harness {
    let notifier = MockNotifier::new();
    let pool = SqlitePool::connect(":memory:").await.unwrap();
    sqlx::migrate!("./migrations").run(&pool).await.unwrap();

    let state = AppStateInner {
        cookie_domain: "cookie_domain".to_owned(),
        pool,
        notifier,
    };

    let ext = make_external_router(state.clone(), &["http://localhost:80"]);

    Harness { ext }
}

We're using an in-memory SQLite database, and a mock version of the Notifier, as well as coming up with suitable values for any configuration, rather than pulling from the process environment.

We can now write unit tests like the following:

#[tokio::test]
async fn unauth_random_cookie() {
    let h = harness().await;
    let resp = get_req("/api/profile")
        .header(COOKIE, "__Secure-session=123")
        .send(&h.ext)
        .await;

    assert_eq!(StatusCode::UNAUTHORIZED, resp.status());
}

This test has no implementation details present, meaning it will only change if the external behaviour changes.

We can expand this to more complicated unit tests by making sure we factor out any high level details, such as creating a test user with make_test_user here:

#[tokio::test]
async fn happy_login_flow_with_redirect() {
    let h = harness().await;
    make_test_user(&h.int, "test", "pass").await;

    let resp = post_req("/api/login")
        .header(COOKIE, "__Host-authenticity=12345")
        .form(&[
            ("username", "test"),
            ("password", "pass"),
            ("authenticity_key", "12345"),
            ("redirect", "http://somewhere/abc"),
        ])
        .send(&h.ext)
        .await;

    assert_eq!(StatusCode::SEE_OTHER, resp.status());
    assert_eq!(
        "http://somewhere/abc",
        resp.headers().get(LOCATION).unwrap()
    );
}

These unit tests would survive changing the database used, the web framework used, the cryptographic techniques used and more. This makes the tests more likely to not change when we refactor our code. We are free to reorganise internals without the burden of changing tests and mocks.

Key points

To quickly summarise the rest of the article:

Some other notes I didn't include: