I am designing a workflow, and am trying to avoid parallel deployments of the same service. Thus I am looking to have one service that handles both interactive and batch traffic. My main concern is how to ensure that my service can horizontally scale fast enough, with large batch runs, and not interfere with the interactive traffic. Are there any design patterns for this? Additionally we are primarily using AWS technologies, kubernetes, and JVM deployed languages. Additionally I think it is important to know we will have two endpoints with some traffic going through /service/interactive and /service/batch. We could use a few different mechanism to throttle but I think its a bad experience for our batch users to have to retry if we through a 429. We could also use something like a reply-to queue or a 2-way queue for the batch traffic, but how would we scale up our service to handle more traffic if we have defined a fixed dequeue rate. Can we set the dequeue rate at the queue level instead of at each instance of the service, and can that number dynamically change?
Really just looking for any patterns to handle both batch and interactive traffic in one service. Even if we have to have a parallel implementation for interactive/batch traffic how do we scale batch since it all comes at once? The batches could come at different times through out the day, so time based scaling is not an option, plus I have never been a fan of time based scaling as it is brittle.
Thanks in advance!