architecture – Duplication of data vs loose coupling

Currently, I’m working with a health management application, let’s call it application A. It’s a partly prebuilt generic application that we are extending on. The main entities are journals, events, and persons(patients/medical staff/external contacts).
The user can use the application through a React app to see medical journals with the most necessary data about a person’s medical condition and historical medical-related events. It’s also possible to create new events in the medical record. Several data grids show information about the leading entities in the applications, for example, name, age, address, event data, event category, journal name/no, etc.

In the same corporation, we also have an internal API that exposes person-related data to several applications. This API is a large data register containing data for several million persons, addresses, etc. Most of them are not related to application A since they don’t have a medical journal and properly never will. This database can, on rare occasions, receive updates from national registers. This could, for example, be the address of a person that can change. Sometimes columns for all persons have also changed when data are represented in a new way. The API’s responsibility is to expose the latest data related to a person for several applications.

Currently, what has been done is that all persons are being copied to application A from the API, even if they don’t have a medical journal. Then several scheduled jobs fetch all new data every half hour which can be problematic since these can take a very long time to process, and it’s just a pain to keep them in sync. Currently, there is also some confusion with which application is the authority of the person-data.

Now we have two options to mitigate the problem we are facing. We have come up with proposals.

The first proposal is to hold only an ID of the person, and every time we need to show a page that shows the medical condition, we have to call the API to get the name because the name is present on that page. We also have to develop solutions to joining data that users can use for the data grids. Since most tables use service side filtering with graphQL, this is problematic if we need data from multiple sources. If the API goes down, then application A will not work. The good thing about this solution is that the person-related data is only stored in one place.

The second solution is only to have person-data in application A that are connected to a medical journal. The data is then checked asynchronously for updates when the user tries to access personal information on specfic pagers. If there are new data, the user will get notified, and the updates will then be transfered and stored in application A’s database. In this solution, the application can still work if the API is down for some reason. And the problems with the data grids are not a problem since all the relevant data that should be showed in a data grid are stored in application A’s database. The downside is, though, that some data are copied.

As far as I have read, duplication in, for example, a microservice architecture can be wrong, and others say it can be a tool you can use to decouple the different services, for example, here Microsoft’s microservice guide. I’m more into the second solution since I think the services are too coupled in the first solution, and the services can be too chatty, and even though it is the “same” entity, they live in different contexts. But I still have a feeling of unsureness.

Does anyone have any advice on how to deal with that? Or maybe is there another way to solve problems like this?