Even if not all accounts are equal, would you say that a random bucket sized group of accounts generate a roughly equal amount of work?
If so (and the stream of work is smooth enough) you can still use partitioning.
To address scaling the nodes: This is exactly what
Kafka is good for. It will guarantee for you that work belonging to the same account will always go to the same node. (The same thread even). It will gracefully handle node failures, downtimes, new nodes for you. You can even explicitly listen to re-balancing events, but you probably won’t have to if you’re doing it right.
Kafka Streams, you can basically just forget about all that and even can create stateful processing nodes with Kafka doing everything for you, including transfering the state if your nodes dies over to another node.
If you can’t use Kafka (you’re on Windows, or it’s just too complicated for the project) you can try to roll your own re-balancing. If your use-case is simple enough, you might be able to get it working in a couple of days. You’ll probably need something like Rendezvous Hashes, or something similar.