I'm working on this containerized API where the user can interact with the output generated by the first step in two steps (one asynchronous and one synchronous) through a front-end service.
I wasn't sure if I should publish it here or on SO, but since it's more about design than implementation, I decided to publish it here. Let me know if you think it's more appropriate for SO.
The flow looks like this:
- The user has requested to run an asynchronous job for which they have been given a unique identifier.
The job creates a model that the user wants to test:
They send a POST request to a front-end service with the unique identifier and the user-defined data (json) with which they want to test the model.
The front-end service starts a Kubernetes job.
The init container requests the model.
The main container loads the model.
The front-end service somehow sends a compute request to the main container using the user-supplied JSON.
The answer is forwarded to the user.
The pod stays in operation for a while so that when you receive similar requests for the same model, you don't have to boot up new pods every time.
After a while, the pod is shut down and the job ends.
I'm having trouble with step 6 (and broadly with step 8). As far as I know, pods created from a job cannot be connected by a service. And even if this is possible, multiple requests for different models can occur simultaneously, so the service must be able to dynamically distinguish the pods.
The first iteration of this project was that the back-end container could load new models dynamically. However, after review, it was decided that this was not desirable. In order to load a new model, the container must now be restarted where the init container retrieves the correct data.
My first thought was to have the back-end job send a request to retrieve the data, but that leads to several problems:
1. The front-end service must store the json request in a database even though it is read only once because the back-end request can be forwarded to another front-end pod.
2. How would the job know to request new data? (Step 8)
3. How are the results sent to the user?
The second thought was to skip steps 8 and 9 and let the job run completely, let the front end read the job status, and read the logs when finished. At least that's how it works in the job documentation. However, this would mean that the job logs have to be reserved for output, which seems like a bad design.
However, we can build on that and instead of writing to the logs, writing to the database. This shares problem 1 of my first idea in the sense that the database contains data once read, but so far this seems to be the only viable solution.
What is your thought Is this the right way, or do you have a completely different way to summarize this behavior?