architecture – Best practices for calling from one microservice to another in a loop

Let’s assume we have a use case where ServiceA needs to make several calls to ServiceB. I know it would be best if the calls could be consolidated as one request, but let’s say that’s just not possible for this use case.

My question is who should be concerned with not overloading ServiceB? Should ServiceA trust that ServiceB will have some appropriate rate-limiting and be able to deal with a surge of requests? Or should ServiceA implement some limit on its end in terms of how many requests it makes at one time to ServiceB? For example, make 3 requests to ServiceB, only once those are resolved make another 3 requests?

It seems to me that ServiceA would only be guessing at the capacity of ServiceB, and any limiting by A will not even be possible for asynchronous requests to B