In a microservices architecture, each microservice manages only data related to its bounded context. The entire domain data is spread across multiple databases and sometimes across multiple storage technologies — relational databases and different NoSQL variants.
Handling data requirements that cross microservice boundaries is not an easy task. There are several approaches you can take — getting data from multiple sources together when required, making the data available where required, or moving the data.
Aggregate Data When Required
If the page or API design allows it, have the client or an aggregator service query all sources and combine the data. This is a good solution when at least one of the result sets being combined is small and you just want to get it done.
For instance, prices from one microservice and stock levels from another can be combined to show a table with both prices and stock levels. However, if you want to sort out-of-stock items by price, you cannot limit your stock level results to one page. Your client or aggregator has to retrieve all out-of-stock items, and use that to retrieve a sorted list. You can avoid this problem by sacrificing functionality — allow sorting and filtering only on data from one of the sources.
Complex aggregations that use values calculated from different result sets are best done in an aggregator service. Simpler ones, like looking up a small set of values can be done on the client side.
Letting the client aggregate is architecturally simpler but introduces additional complexity in the client code. This will not be a good experience if you are exposing your API for external use. On the other side, aggregator services take on the availability requirements of all the microservices they combine. So, if the same service aggregates user details and pending orders in addition to prices and stock levels, it will need to handle a simultaneous peak in load for both.
A Properly Managed Cache
If your application can live with data that is slightly out of date, caching data in a separate service or within one of the services is a good solution. You can use caching as a mechanism to bring data closer to where its needed, optimise retrieval, pre-compute values, or a combination of these.
Caching may seem like a poor fit for many use cases. But a well thought out page can leverage caching without degrading user experience. For instance, if you have a watch list of items in an auction site, retrieving the list and a server side cache of leading bids for items expiring later is an option. Bids for items expiring soon can be retrieved directly from the bid service.
Caching has its own challenges — keeping the cache up to date being one of them. The simplest way is to reload the entire cache periodically. This may not be feasible with large data volumes since most caches are not optimised for writes. Updating only what changed fixes this problem but is a bit more complex to implement.
Caches also need to be factored into your disaster recovery strategy. If your source is restored to an earlier point in time, your cache is suddenly “invalid”. If your source is a facade for the actual source of truth, this may not really be a problem. For instance, if we lose the last hours’ worth of work in a data entry system, the cache will be just ahead of the intermediate source. The actual source of truth is the paper trial. Actual ways to solve the problem include keeping track of changes made to the cache within the recovery window for rollback or restoring the cache to an earlier point in time.
Is Data Where Its Supposed to Be?
If you have too many cross-cutting data requirements, check for incorrectly applied microservice boundaries or incomplete domain models. Look over the data involved to see if it must be moved, duplicated or if the sources must be combined.
Data can end up in the wrong microservice — a result of incorrect design choices, an incomplete migration when a larger microservice was broken up, or an evolving understanding of the domain. The solution is straightforward, though sometimes tedious — move the data to the microservice it belongs to.
More often, similar data belongs in multiple microservices. Incorporating data into a bounded context is different from caching, where we duplicate the data. The difference may be subtle but determines when and how the data is updated. Cache updates are periodic or driven by updates in the source. Data incorporated into a bounded context are updated by commands on the aggregate.
For instance, in a financial reporting solution, if you hit the account microservice to get the account contacts every time you generate a report, you should have the contacts in the reporting microservice too — as a dated list of contacts.
Sometimes you can be too enthusiastic in breaking up a monolith and end up with microservices that are too small. For instance, if you have bank accounts and bank transactions in separate microservices, you can find yourself hitting accounts every time you need to change a transaction. Combining them into one, with the account as the aggregate, will be a better design.
There are many ways to retrieve data across microservice boundaries and they may seem complex and different but you will find that the main thing that changes is your way of thinking about figuring out where data belongs.
Microservices are not silver bullets. They solve many problems but throw up other challenging ones to ponder over and solve.