multitenancy – Design question for handling large volumes of messages in multi-tenant queue

I have a system with two applications interacting via a message queue. Let’s call them Producer and Consumer. Some key context is that this a multi-tenancy scenario.

Producer produces events based on various inputs (user interactions, api, etc…) and Consumer does down stream processing on these. One of our key constraints is that Consumer can only process events one-at-a-time-per-tenant.

Our current solution (a bit naive) is that multiple worker threads are pulling from the queue and processing events, and if a tenant has another event in progress later worker thread(s) just waits. This has been fine for a couple years given our thread pools and typical event production patterns, but we had a scenario where thousands of events for a single tenant were generated in Producer, and all of Consumer’s worker threads except one were stuck waiting. Consumer was therefore processing events from the queue one at a time, and our “eventual consistency” lag time became suboptimal.

We’ve got some candidate ideas for managing this:

  1. Load balancing across queues – new messages go to the most empty queue, but tenants are locked to a single queue (how we achieve this exactly TBD)
  2. Create a “slow lane” queue – if during processing of an event, the tenant is already in use, move the events to the “slow lane”. This will drain the primary queue quickly but has implications for event processing idempotency I’m not sure will be valid for our scenario.

Before we start digging on these options and looking for others, I’m curious if anyone here has experience with patterns for dealing with this type of situation.

Appreciate any info/advice/guidance. Thanks!!