continuous integration – Service-specific releases vs Releasing all services on every change

The following applies to a platform/company at an early stage in its life, moving quickly and cheaply with a small engineering team.

Imagine a platform with the following components:

  • Cloud infrastructure (e.g. terraform)
  • Client-facing app
  • Internal-facing admin app
  • Third-party API
  • Shared relational DB, with frequent schema changes

For many of the past years, if I were to build this from scratch, I’d have isolated these components (let’s just call them all services for simplicity) as much as possible from the beginning, in their own repos, with their own build pipelines, given they have different use cases despite interdependencies (DB and infra in particular). They’d also have their own releases. If a change is just being made to the internal admin tool, there would be no reason to trigger a release of the client-facing app at the same time. This clearly reduces some risk (and, a more minor benefit of quicker, leaner releases).

However, again and again in my teams I’ve seen how much complexity, overhead and frankly risk, gets added with coordinating individual service releases. A new product feature that requires a schema change, as well as one or more changes in other services, needs a lot of brainpower to release correctly. Starting from the PRs, reviewers can only see part of the overall change, which adds risk from them not seeing the big picture. Then there is releasing these independent updates to an environment, ensuring the build is done in the right order and that the correct branches are all being used. This gets even crazier when you’ve got automation or end-to-end testing that depends on seeded data, or more granular and versioned shared packages.

Is all this headache really worth it at an early stage? Is it so wrong to through everything in a monorepo, and any little change anywhere rebuilds the entire platform, right down to the infra (granted, if using terraform at least, would do a diff and only apply if changed)? If the build pipeline is robust and well-integrated with one of the main cloud providers, re-releasing a service that didn’t actually change shouldn’t cause any issues. And you get the benefit of seeing an entire cross-platform change at once without dealing with multiple dependent releases, with a single build that is hardwired to release in the correct order, with holistic cross-platform smoke testing. Yes, builds will be slower, possibly with a lot of unnecessary steps, but let’s assume there’s also a pretty good local environment setup so that deploys to a cloud test environment aren’t too frequent.

Thoughts? As a reminder, please view your opinions through the lens of a small, scrappy team. If you have a big org with delivery managers and all kinds of orchestration tooling, then maybe the problem isn’t as severe.