Assuming you have multiple servers separately running applications which need to asynchronously process data from a database table, without processing any of the rows multiple times, what would be the ideal pattern to implement this process?
From looking at this issue, I currently see two possibilities:
- Implementing a separate external process to select the table, and feed the resulting rows as messages in a queue for the distributed processes to consume.
- Run the distributed applications completely separately, independent from any shared queue, but select rows to process with a transaction statement, and update a column on the DB to mark these rows as having already been processed, in order to not re-process rows.
I think option one is more performant, but I am wondering if there are other patterns out there to solve this problem which I have not seen yet.