aws – Kubernetes auto scaling

We are currently in the process of doing an infrastructure overhaul.

A bit of background of what our business model currently is:

We are an aggregator of bills and payment services for businesses. Utilities, mobiles and much more. We are making it easier for companies to connect to these services by just connecting to our APIs. Each of these bills and payment services has their custom script with different APIs. We have a software acting as a middleman, it processes the incoming request, triggers the correct script, waits for the response, and then format and return the response.

Initially, our middleware and scripts are on two different servers. We use elastic beanstalk to host our scripts. However, we face a lot of issues. Since elastic beanstalk auto scale depending on the traffic coming in. Here is an example of what did go wrong.

  1. There are three instances running
  2. low traffic detected, shutting down one instance
  3. Suddenly, 100 request coming in, 20 requests are routed to the instance that is shutting down
  4. the middleware that is waiting for the 20 requests receives empty responses.

We then transition to a single large EC2 that host our middleware and our scripts together. The requests come in, and the middleware will point to the API (our PHP scripts) that is stored locally, runs it and return it to our users.

This worked out great. However, we encountered a hardware failure after two years of using this approach which brought down our whole server for 2 hours. Now we are trying to mitigate this and make sure the scripts have a 100% uptime.

It is also not scalable since edits to the code are made by RDP into the EC2 and changing the code locally.

Currently, we are trying to use EKS / Kubernetes instead.

Since Kubernetes also auto-scale nodes, I’m afraid the problem we faced when using elastic beanstalk might happen again.

To summarize, will kubernetes auto-scale affect responses that are in the midst of processing? Some of our scripts might take more than 5 minutes since our scripts are synchronous.

Any recommendation on how to improve this architecture are also welcomed.