The Problem - Entities and Coupling
When setting out on the journey of moving to a Microservice based architecture there is a gravitational pull in the direction of going as fine-grained as possible. It is all too easy to arrive at the conclusion that the way to go are services that are responsible for a single entity only, after all, we often think about our entities and when we ask “What is this component responsible for?“ a single word is a simple answer “The Product! The User!“ and those are the boundaries by which we decide to go cut our Microservices. This is considered an anti-pattern today. Why is that?
In the end, value is provided to the user with customer journeys, not entities. So we will need to assemble such journeys. Since none of the services really care about how they are used, they need to be orchestrated by some other component. This can be done from various positions within your architecture, for the sake of the argument we assume that the Gateway1 bears that responsibility.
Assuming we’d want to process a customer’s order, we would now have the Gateway manage the interaction between the user, the product and the inventory. It’s easy to see that this will soon lead to the Gateway growing in size and complexity as all business logic that’s clearly not assignable to an entity finds it’s way into it.
Those of you familiar with the topic might point to Choreographies as an alternative to Orchestration. With a Choreography, we don’t put the logic for the interactions into a central component, but we instead put it into the services interacting with each other - increasing the scope of responsibilities for the individual service.
This essentially splits the shared business logic and distributes it across the services themselves.
While both approaches have their pros and cons, the heart of the matter cannot be addressed by a choice for one or the other: Entities need to interact to provide business value. The services are therefore strongly coupled to each other!
By building entity-services we believe that you don’t use Microservices to their fullest potential. Going down that path will lead to you facing the following issues:
- Teams can’t work independently due to strong coupling. A change in the business logic of order-processing would likely touch multiple services.
- The Microservices are very chatty. This makes the system harder to understand, debug and profiling as well as incurring additional performance overhead due extraneous marshalling, parsing and, most importantly, network hops.
- The Microservices will exhibit a similar availability and resilience as if you would have stayed in a Monolith. One service is down - the whole system doesn’t work - just imagine the user-service being down! No orders going through.
- Last but not least, if you were to change something in the flow depicted above, chances are high that you will have to adapt, retest and redeploy all three services.
Improvements - Customer Journeys
“Okay, so we established that this isn’t the best way to cut your services. What else can we do? We could cut them around the often-mentioned user-journey. That’d make sense, right?”
In fact, it does. If we shift our focus from what a system knows (entities) to what a system does (fulfil various value-adding customer journeys), we get a different viewpoint which should help us arrive at more independently scalable, resilient, deployable and changeable services.
Instead of a user and a product service say we now arrived at an architecture in which we talk about an account-management-service and an order-service. In a consumer-facing business, it is unlikely that most customer journeys can make do without, well, the customer. What if the order fails because the user has no valid credit cards? How would the order-service be able to make that call and, more importantly, wouldn’t we want to update the user based on this information as well - aka. from the order-service? Unfortunately it will, in all likelihood, be impossible to have a one-to-one mapping of entities to customer-journeys, reality just isn’t that simple. So multiple of these services will reference the same entity.
Since, lest we want to run into all kinds of synchronisation issues, we don’t want to own an entity in multiple services, one of these customer journeys that, say, touch the user entity, will be the leading service. It is the only one that can change, update or delete user entities - in our case, this could be the account-management-service.
In this case, the order-service will let the account-management-service know that the user’s payment fails and the account-management-service will make the necessary adjustments.
While this is better, if we just migrate to a customer journey driven service-cut without other changes we will still have many of the problems we had with entity services.
As you can see, we moved the orchestration away from the Gateway1 as a shared component and into a service itself. Assuming we’re changing the order-service and we don’t need additional information from the account-management-service or perform new mutations on the user-entity owned by it, then we’d only have to re-test and re-deploy the order-service!
Note that this is different than the transitioning to a choreography with entity services as described in the first chapter. Here, only one service owns the whole customer journey - if that journey needs changing, only the service owning it needs to be adapted instead of multiple services needing changing.
The issues we haven’t solved yet are the ones of independent scaling and fault-tolerance. If the account-management-service is down, the order-service won’t have access to the data it needs to perform it’s duties - if it were requesting the user-address, for example, it couldn’t make the required GET-request and could therefore not finish the order-process.
Since most of your online shop likely does something related to users, that would mean that, if the account-management-service, which is responsible for the user, is down, none of your services that need any user information would be able to function.
The Solution - Customer Journeys with Co-located Data
How can we solve, or at least ease, the scalability and fault tolerance? A proven approach is to the co-location of data with an event-driven architecture. What we do here is locate all data that a customer journey needs read-access to right into it’s own persistence. It continuously listens to the changes published by each service on the even-stream and builds its own replica of the data containing only what is relevant to it. It is then therefore capable of functioning even though the other services are down. Not only that, but since it only stores what is relevant to itself, it can keep the hydrated model very simple and therefore reduce complexity.
To address the silent introduction of an Event-Stream: We don’t necessarily NEED to use them. Building replicas could also be done by the signup-service actively pushing to the order-service with a POST request. It’s just that we can further decouple the services by making SENDING and RECEIVING of messages independent of each other - by going asynchronous. It turns out that using a messaging or eventing platform is a perfect solution for that.
If I was the order-service and I don’t care about the user’s birthday and his purchase history, I don’t have to store that data.
“But wait - what about the case where the user’s payments fail and the order-service needs to notify the account-management-service that the user doesn’t have any valid credit cards? Doesn’t that mean this customer journey therefore still break if the account-management-service is down?”
Not necessarily. If your service needs to wait for a mutation to an entity owned by another, you won’t be able to proceed from a business-perspective - you’re hitting the hard limit of how decoupled your services can be by how decoupled the underlying business-processed really are. In most cases you don’t really have to though. In our case, for example, the order-service could just publish an event that the payment failed and the account-management-service can pick it up when it becomes available again.
Many organisations have been burnt by deciding to go with entity-services. We recommend that you think twice before going down that route. Cutting our service’s, and with that, team’s responsibilities by customer journeys with data co-location brings many benefits, such as:
- high team-independence and therefore agility
- a high degree of independence from the remainder of the system during execution, allowing for pin-point scaling
- much smaller deployment footprint of new and updated features
- low chattiness during customer journeys, leading to higher performance and reliability
This increase in resilience and scalability does, of course, not come entirely for free, just like with other architectures, there are downsides. In this case we have eventual consistency and data duplication to deal with in exchange for the increase in decoupling. Experience has shown us that it is worth the tradeoff in a good deal of the cases.
If you’d like to investigate whether such approaches could help alleviate pains in your architecture, don’t hesitate to get in touch!
1 we use Gateway as a stand-in for other orchestration units such as, e.g.: BFFs here.