Breaking Down Migration to Microservice Databases

With the growing necessity for companies to digitally transform, a lot of emphasis has been given to the microservices architecture, with its improved scalability and distributed design.  While these facets may apply to some, adopting microservices is equally about the universal DevOps goals of improving lead times and reducing the batch size of releases, ultimately leading to more flexible and frequent production deployments of higher quality software.

Unfortunately, microservices is an advanced architecture, where it can be difficult to achieve the desired ideals of high cohesion and low coupling from scratch.  In part because success relies heavily on understanding the domain boundaries within your system upfront, knowing which parts will change and deploy together or separately.  This is actually good news for owners of a monolith system wanting to transform, as the most successful adoptions of microservices occur by decomposing and slowly evolving monolithic applications where the domain is already understood.

In his recently released book, "Migrating to Microservice Databases", Edson Yanaga, Red Hat’s Director of Developer Experience, takes an in-depth look at evolving relational databases to the challenges of zero downtime deployments and decentralized data management.  As generically characterized, each component of a microservice should be coded as an independent product with a mindset for automation, evolution and failure tolerance, and with the simplified concept that each microservice should have its own database.  Yanaga’s book examines several strategies for how an existing relational database can be reshaped to meet these needs.

A challenging aspect of zero downtime deployments, is that throughout an upgrade, your database needs to successfully respond to multiple versions of the same service.  During a blue/green style deployment, the current “blue” deployment runs concurrently with the newer “green” version until there is a single switch over to the newer version.  In a canary deployment, the newer version is slowly released to all of the production nodes, allowing for a quick rollback if the canary dies.  In either scenario, the process necessitates that a “normal” database change be broken up into non-destructive elements for greater flexibility and rollback capability.

For example, take this relatively simple SQL statement to fix an incorrectly named column:

ALTER TABLE [customers] RENAME COLUMN [wrong] TO [correct];

Even this straightforward statement needs to be broken up and implemented over several different deployments to support multiple software versions simultaneously.

First add the correctly named column:

ALTER TABLE [customers] ADD COLUMN [correct] VARCHAR(20);

Then slowly copy over values, a small “shard” at a time, for better performance and to avoid database locking:

UPDATE [customers] SET [correct] = [wrong] WHERE [id] BETWEEN 1 AND 100;

UPDATE [customers] SET [correct] = [wrong] WHERE [id] BETWEEN 101 AND 200;

...

Finally remove the incorrectly named column, but only when you are sure it is no longer in use anywhere:

ALTER TABLE [customers] DELETE COLUMN [wrong];

Yanaga details how to process this, and other nuanced database migrations, along with the corresponding code releases, to promote zero downtime.

In a later chapter, Yanaga discusses various strategies to integrate data between decentralized services.  Typically a monolith application and database are tightly coupled and the database tables are organically entangled.  Following the guideline that each microservice should have its own database, yet recognizing that a monolith database already exists, a database again needs to be slowly evolved to separate the tables to their unique service owners.  But rarely do services operate in isolation and they often depend on the data available in the other tables.  So these data integrations, which were unintentionally occurring in the persistence layer, are slowly migrated to defined service interactions. In essence, the data is pulled apart to be systematically brought back together again.  Yanaga evaluates various traditional, relational database technologies, such as views, triggers and ETL tools, plus more advanced techniques of data virtualization and event sourcing, to meet the needs of distributed data integration.