Distributed Computing – DynamoDB Event Store in AWS


I design an event store for AWS and chose DynamoDB as this was the best option. My desing seems pretty good, but I have problems that I can not solve.

Events

Events are uniquely identified by the pair (StreamId, EventId):

  • StreamId: It's the same as the AggregateId, which means on Event stream for on Aggregate.
  • EventId: An incremental number that helps keep the order in the same event stream

Events are retained on DynamoDb. Each event is assigned to a single record in a table. Mandatory fields are StreamId, EventId, EventName, and Payload (more fields can be added easily).
The partitionKey is the StreamId, the sortKey is the EventId.

Optimistic locking is used when writing an event to an event stream. To do that, I use the conditional DynamoDb writes. If an event with the same event (StreamId, EventId) already exists, I need to recalculate the aggregate, recheck the conditions, and rewrite if the conditions are exceeded.

event streams

Each event stream is identified by the partitionKey. Querying a stream for all events equals a query for partitionKey = $ {streamId} and sortKey between 0 and MAX_INT.
Each event stream identifies one and only one aggregate. This helps to handle concurrent writes to the same aggregate using the optimistic lockout as explained earlier. This also provides excellent performance in recalculating an aggregate.

Publication of events

Events will be published using the combination of DynamoDB Streams + Lambda.

Repeat events

This is where the problems begin. Since each event stream is associated with only one aggregate (resulting in a large number of event streams), there is no easy way to find out from which event streams I need to query all events.

I thought about using an extra record, somewhere in DynamoDB, which stores all StreamIds in an array. I can then ask and start asking about the events. However, if a new stream is created during playback, I lose it.

Am I missing something? Or is my design just wrong?