We have about 2 to 3 dozen microservices serving our customers.
These services are provided in a Kubernetes cluster.
These services are accessible to the outside world through only 3 or 4 API gateways
We've found that sometimes two or more microservices need the same data.
We've looked at a few strategies to solve the problem, and we've done that in part.
As with any design, we are not 100% sure if this is the right approach and if we do not see the pitfalls in design.
Comments / suggestions / thoughts of people experiencing this will be helpful.
When a service of lesser business importance (for example, ServiceL) requires data of higher business importance (for example, ServiceH)
ServiceL calls ServiceH to retrieve the required data
When a service of lesser business importance (Say ServiceL) needs data from many important services (eg, ServiceH1, ServiceH2, etc.).
ServiceH1, ServiceH2, etc. publish news about RabbitMQ.
News publishing is done through a non-blocking fire-and-forget mechanism. (So that these services are not stopped)
The ServiceL uses these messages and stores the data in its own data store.
The delay in data availability for ServiceL is fine
When a business-critical service (such as ServiceH) needs data from a less important service (such as ServiceL).
ServiceL publishes news about RabbitMQ through the fire-and-forget mechanism or the blocking mechanism (depending on the urgency of synchronizing data).
ServiceH uses the message and stores it in its data store.
Often, ServiceH data is needed for reporting and summary. And we are fine if the summary is not up to date
When data from two services are needed and both not only read data but also change it, we believe that domain identification is wrong.
We revise in this case. (Often these two microservices connect to each other)
If we now use a messaging framework like RabbitMQ to synchronize data between services, the data will become out of sync over a period of time.
If the data is out of sync, we can see RabbitMQ's statistics and play messages. (This we believed bring unnecessary complexity)
So we have jobs that run once a day to synchronize the data from the source service to the destination service.
(Synchronize job access data through services and not directly from the datastores.)
Is this a good way to synchronize such data? Any pitfalls?