Distributed architectures like Microservices have been trending for quite a while. At the same time, there still exist plenty of monolithic systems in the wild. Monoliths are not generally bad. Architectural styles, like the micro-kernel architecture, organize complex monolithic codebases. But confining many developers to a single deployable artifact creates challenges. It is very hard to optimize a monolithic codebase for heterogenous features. Additionally, the developers remain closely coupled during their work and during deployment.
Regardless of their architecture, some monoliths become legacy. These systems typically provide important business capabilities, but everyone that must deal with them is unsatisfied.
- Developers are slowed down by highly coupled and complex code, outdated frameworks and untested behavior.
- Ops-Admins are worrying about frequent production-failures
- Managers are frustrated by increased cost, reduced time-to-market or bad performance.
When the collective pain becomes too much, it is time to migrate these legacy monoliths. This article summarizes some of the most useful migration patterns related to the data and logic of a monolith.
The most integral part for every migration is agreeing on clear goals. The how is less important than the why. Only through a common understanding of goals, these long journeys can be successful.
- Improve scalability to handle peak loads
- Maximizing developer efficiency to free up IT budget
- Shorten time-to-market to be faster than the competitors
Migrating the Monolith
Effective migrations need to take more than just the code into account. Selecting the right migration patterns for business data is just as relevant.
Migrating features outside of the Monolith with Strangler Fig
The idea behind this migration pattern is to gradually stop using features of the legacy system and switch to migrated features instead. Originally proposed by Martin Fowler, this pattern is described through the plant Strangler Fig which slowly encircles (strangles) a tree until it is strong enough to stand on its own1.
On the outset, legacy features are accessed by the outside world through APIs. As preparation, a reverse proxy is placed between the outside world and the monolith.
Next a new application can be created which iteratively implements individual features of the API.
How the new application is created can be decided based on the internal coupling of the legacy monolith. Either by re-implementing existing features from scratch or re-using code from the legacy system through copy and own or even shared libraries.
After migrated features have been finalized, the reverse proxy is used to route API calls to the new application, effectively shutting down the feature in the monolith.
The legacy feature is not removed immediately. In fact, changes to the legacy system are avoided. This enables A/B testing between the API-output of the legacy feature and the migrated feature, while also providing a safety net. If bugs are found in the migrated feature, the legacy feature can be re-activated any time. The actual legacy removal can be done whenever suitable.
This has the huge benefit, that migrated features can be implemented iteratively to production maturity. The actual migration itself is then swift and, if need be, reversible.
The greatest challenges for this pattern are changes conducted during the migration. It is most tempting to perform in-depth optimizations when migrating features. Unfortunately, this defeats comparability to the legacy monolith. It is recommended to perform feature-changing optimizations only after a successful migration.
If there is too much coupling within the monolith to identify a sufficiently separateable feature, it might be necessary to organize the monolith internally beforehand. How this can be done is described in the following pattern.
Organizing the Monolith internally through Branch by Abstraction
Sometimes it is not possible to identify a sufficiently decoupled feature to migrate outside of the monolith. In these cases, the legacy monolith must be refactored beforehand. The branch by abstraction pattern helps to conduct this.
The identified feature is first wrapped by an abstraction (i.e., an interface). All dependent features are then migrated to use the abstraction instead of the concrete feature implementation.
Once wrapped behind an abstraction, a new and improved implementation can be added. This new implementation does not even have to reside within the monolith.
Like Strangler Fig, the actual legacy implementation is not changed (besides adding the abstraction). The migration keeps the legacy feature as a safety net and can compare the behavior of the migration with its legacy counterpart.
While this seems like a lot of effort to extract an individual feature, it is one of the few ways to iteratively prepare a legacy monolith for its migration. The only alternative to refactorings in a highly coupled legacy monolith are risky big bang re-implementations of the entire monolith. Or as Fowler puts it: „The only thing a Big Bang rewrite guarantees is a Big Bang!“.
Do not forget the Data
Business logic usually depends on the availability of business data. When migrating features, the related data quickly becomes a roadblock. It is tempting to allow shared access to the legacy database, but this short-term fix is usually the worst possible solution. It severely limits the implementation options for a migrated feature, it closely couples the legacy with its migration and, in case of write operations, poses a huge risk for consistency issues.
A migration requires clear separations
- in data structure
- in write operations (i.e. data ownership)
- structural change impact
Synchronizing the Data
When applying this pattern, data is extracted out of the legacy database and loaded into the datastore of the migrated feature. During data-loading, structural changes can be applied to suit the migrated feature. The target datastore can be everything, ranging from a new schema, or a dedicated database to an entirely different database-type (e.g., document-oriented).
Many frameworks support data intensive synchronizations, for example Kafka Connect2 or StreamSets3. The synchronization consists of two phases. Initially the bulk of the data is transferred. After that, only the incremental changes in the legacy database are sent. Incremental changes can be collected push-based (e.g.: with Change Data Capture) or pull-based (e.g.: with periodical polling).
The migrated features can access data from their own datastore. Scalability is much better than querying data over the network. Coupling to the legacy data structure can be avoided. Data ownership can be transferred as needed. As long as the synchronization is active, the ownership lies with the legacy system. After a feature is fully migrated, the underlying synchronization can be discarded, transferring the ownership to the migrated datastore.
On the downside, setting up a synchronization can be resource intensive. The data itself may be complex and require a lot of business logic up front to be correctly interpreted. There is no safety net, similar to Strangler fig, as the synchronization is only one way. After the ownership has been transferred, it is complicated to go back. If the interpretation of legacy data cannot be easily replicated, the following pattern may provide an alternative.
Using the Monolith as Data Access Layer
The business logic to interpret legacy data is already in place within the legacy monolith. Depending on the complexity, it can be simpler to extend the monolith to provide tailored data access. This gradually transforms the monolith to a data access layer.
This pattern can leverage existing logic within the logic monolith. Ownership remains with the legacy system and all state changes are still conducted by the monolith.
While this may seem like a balanced solution, in most cases it can only serve as an intermediate migration step. Migrated features are less independent from the monolith and accessing data over the network also has clear limits on performance and scalability.
There are other patterns for data migration, all with trade-offs. Similar to the described patterns, other patterns also either replicate data (e.g.: with Primary and Replica) or make it available from the source (e.g.: with Database View).
As is often the case, the right migration patterns depend on the given situation. This article outlines some of the more widely used migration patterns.
Strangler Fig is one of the most popular patterns for monoliths because it enables comparability between the legacy system and the new system in addition to a built-in safety net. Depending on the internal coupling, preparations inside the legacy monolith might be necessary before migrating features.
The related business data must be handled with its own migration pattern. There is no single popular data migration pattern. Data migration is always complex, and each pattern has significant tradeoffs.
Last piece of advice
Conducting analysis on the codebase and the surrounding company context helps to plan migrations. But perfect information will never be available. It is important to start a migration in small increments to learn from challenges and adapt the used patterns.
- Strangler Fig by Martin Fowler: https://martinfowler.com/bliki/StranglerFigApplication.html↩