microservices – Should you expose API endpoints on an application that is under heavy load or delegate it to another application?

Its a question of ease of programming and deployment.

You have services A and B, when you run your application/solution as a whole A is under heavy load, B is under light load.

In order to prevent A being a bottle neck you want to add more CPU resource to A, but not to B as it would just go unused.

If you are using powerful multicore boxes you might as well put both A and B on all the boxes, and if you are good at multiprocessor programming, even have them in the same application. The OS will divide up the available processing power as per the needs of each application, some B’s will never get any load but it wont matter as the overhead of running a idle B is insignificant.

Your deployment is pretty simple because you just put all your micro-services on every box and just scale the number of boxes

If however you are using tiny containers that can barely handle a single A, you might want to consider the overhead of running B when its not going to get any traffic.

Or maybe each instance only has a single processor which will have to switch between working on A and B causing delays.

In this case you might find it better to separate A and B into different programs. then you have have boxes dedicated to A or B and scale them up independently.

Your deployment is more complex, but your code is simplified and its arguably more efficient resource usage.