C/C++, Drizzle, MySQL

Towards a New Modular Replication Architecture

Over the past week, I’ve been refactoring the way that the Drizzle kernel communicates with plugin modules that wish to implement functionality related to replication. There are many, many potholes in the current way that row-based replication works in Drizzle, and my refactoring efforts were solely focused on three things:

  • Make an interface for replicating actions which occur inside a server that is clear and simple to understand for the caller or the interface
  • Make an interface that uses only documented data structures and standardized containers.
  • Completely remove the notion that logging is tightly-coupled with replication.

Let me expand on these two goals, and why they are critical to the success of a replication architecture.

Simple, Clear Interfaces Designed for the Interface Caller

I have a very strong belief that interfaces should always be designed from the perspective of the caller of the interface. By focusing on the caller of the interface, you produce interfaces that are inherently more stable and simpler than if you design an interface with the perspective of the consumer of the interface’s objects — or the internal implementation’s perspective. If you design an interface from the perspective of how the internals of a system work, then you invariably end up with interfaces that reflect the internal implementation of something. Interfaces should be generalized, not specific to an implementation. I’ll expand on this with an example in the next section.

Interfaces should be as simple and clear as possible. Why? Because people won’t implement the darn interfaces if they can’t understand them. Simple as that. Drizzle wants a vibrant ecosystem of plugins and modules. If developers have a difficult time understanding an interface, that means we get fewer plugins, and that just won’t do.

Interfaces Must Use Documented Data Structures and Standard Containers

This may seem like a no-brainer, but if you look at the interfaces in MySQL, you will notice that so many of them pass pointers to internal MySQL objects (THD, st_table, TABLE_SHARE, etc). You will also notice that in order to use the interface, implementors must become intimately familiar with the custom iterators (List<Item> List_iterator<>, List_iterator_fast<>, etc) in order to really implement their plugins. Many of these custom objects and containers are poorly or not documented at all. This makes the plugin developer’s life harder. This is why there are so very few plugins written for MySQL that have not been written by MySQL engineers who are familiar with the internal implementation of the server. The harder life is for the plugin developer, the smaller the ecosystem of plugins will be. Period.

Now, why do I say “only standard containers”? Well, how many of you readers realized that List<Item> was not actually a list of Items; that in fact it is a list of Item pointers? Probably very few. By using the STL and its very-well-documented, standardized container classes in the interface, we remove the need of the plugin developer to have to learn Yet-Another-Custom template implementation. Easier for the plugin developer, so faster time-to-market for plugins.

Laying Waste to Tight-Coupling of Replication with “Other Stuff” (Like Logging)

It’s no secret that MySQL’s internal subsystems are very tightly-coupled. Back in 2005, when I wrote the system internals chapter in Pro MySQL, I noted that the internal subsystems of MySQL were interwoven, tightly-coupled, and a bit of a bowl of spaghetti. Very little substantial progress has been made since then on refactoring those systems into separate modules. Replication is no different. In MySQL, the replication subsystem is tightly coupled to a logging subsystem that most readers would be familiar with as “The Binlog”.

Here’s the problem with that: it ties the implementation (the binlog) to the replication interface. Why is this bad? Well, it means that if you want to modularize the replication system, you can’t do so without fiddling with the binlog. And likewise, if you want to improve the binlog, you invariably will be affecting the replication system.

Tight-coupling of implementation to interfaces is disastrous from an architectural point of view. Touching each runs the risk of breaking the other. When these types of architectural faux-pas litter the code, it means that development and debugging life-cycles are extended well past what they should be, because developers working on one system tend to affect other subsystems that shouldn’t really be affected by their changes.

And so, one of the big goals of this refactoring work was to produce interfaces which did not interfere with each other. Changing an implementation shouldn’t break the whole system.

A New, Modular Replication Architecture

OK, enough about the goals of the new replication architecture. Let’s see some diagrams and some code, eh?

In Drizzle, we strive to have a clear separation between the core drizzled kernel and modules which implement functionality. The module communicates with the kernel through an API. The goal of the this API is to shield the internal implementation of the core kernel from plugin/module developers. All plugin developers need to know is what will be passed to them from the kernel (documented data structures) and what to provide the kernel back as output (a return code, for instance). The kernel doesn’t care how the plugin implements something. All it cares about is that the plugin provide methods which match the API’s base class interfaces.

There are a number of documented data structures which are used to in the API’s calling interface. These data structures are all Google Protobuffer (GPB) Message derived classes. Because they are all GPB derived message classes, they all work in the same way: via the Protobuffer Message API, which is well-documented and automatically handles versioning when changes to these message classes are needed. The current list of Message classes are:

  • EventList — describes a list of Events
  • Event — describes a single Event
  • StartTrxEvent — describes as Event which delineates the start of a transaction
  • EndTrxEvent — describes as Event which delineates the end of a transaction
  • InsertRecordEvent — describes a specialized type of Event which occurs when a new record in a table is inserted
  • DeleteRecordEvent — describes a specialized type of Event which occurs when a record is removed from a table
  • UpdateRecordEvent — describes a specialized type of Event which occurs when a record in a table is modified

More Event specialty classes will be added as needed, but they will all follow the same interface as each other.

In addition to the above data classes, for a replication architecture, what are the basic “worker classes” that plugins may implement? For the first phase of this new replication system, I have three basic class interfaces:

  • EventReplicator — Responsible for replicating an EventList to an EventApplier
    • The main interface method is bool EventReplicator::replicateEvents(EventApplier *, const EventList&)
  • EventApplier — Responsible for consuming an EventList and applying it to a target
    • The main interface method is bool EventApplier::applyEvents(const EventList&)
  • EventReader — Responsible for reading an EventList from some source and passing it to an EventApplier
    • The main interface method is bool EventReader::getEvents(EventList *);

But you might be thinking… Jay, where’s the Binlog?

There isn’t one in the interface. That’s the whole point. A “binlog” is an implementation, not an interface. Need a more concrete vision of how a module would use such an interface? OK, let’s imagine an example module, which we’ll call “RecoveryModule”. The purpose of this module would be to provide backup and restore capabilities to Drizzle. Let’s see how the interface to the Drizzle kernel would be implemented in the module…

The RecoveryModule Implementation Overview

As noted, the purpose of this module is clear: to provide backup and restore functionality for Drizzle. We’ll implement this functionality using a serialized transaction log. Here are the classes which we will map out for the module:

  • SynchronousReplicator — extends EventReplicator and is responsible for replicating event lists passed from the kernel to any registered EventAppliers in a synchronous, transactional way.
  • TrxLogWriter — extends EventApplier and is responsible for writing events to a serial transaction log.
  • TrxLogReader — extends EventReader and is responsible for reading events from the transaction log.
  • TrxLogApplier — extends EventApplier and is responsible for applying events it reads (via the EventReader) from the transaction log.

As you can see, all of the classes above extend (or more explicitly, inherit from a publicly defined interface class in /drizzled/plugin/) a base interface class. There is a clear separation of duties. Replicators replicate events. Appliers apply events. Readers read events. No mixing of one into the other. This way, a developer can work on the writing of records to a transaction log (the TrxLogWriter) and another developer can work on the implementation of applying those records during a recovery phase (TrxLogApplier). The two developers don’t need to know anything about how the other class is implemented. The simply implement the interface of their class. In fact, the TrxLogWriter implementation might be in an entirely different module! The developer of the TrxLogApplier doesn’t care, and isn’t affected by that. All she needs to know is the interface of an EventReader and the EventApplier, and the details of the Event data structures (PODs). That’s it. Clean separation.

Here is a graphical overview of how such a module might look, with a separation of the components of the module, the plugin interface API, and the drizzled kernel.

You will note that I also have put in a couple other classes in the module, called SqlLogWriter, which inherits from EventApplier, and SqlLogReader, which inherits from EventReader. This is to show that the module could implement the MySQL General Log easily using the existing API. A general log is merely an implementation detail. The module developer can create a general log implementation just by specializing the EventApplier class to write, for instance, raw SQL records…

Also note the AsynchReplicator class… Mark Callaghan, you could implement a SemiSynchReplicator if you wanted. 🙂

Flow of Events from Drizzle Kernel to Module

So, how would the flow of events happen from the kernel to a module and back? I’ve put together a quick diagram of how the API would be used to communicate between the kernel and the module. You can see some API method calls (such as replicateEvents() or registerApplier(). These method names are of course still up for debate, but I think they adequately communicate what the API would look like and the purpose of the interface calls…

OK, so in my next blog post I’ll show some example code from my local branch which implements a module similar to the above. I’ll end this one now because for the next blog post I need to get some code highlighting done to make it easier to read…

Now Is the Time to Influence the Shape of the API

Robert Hodges, I’m lookin’ at you, kid! 😉

In all seriousness, this is the time when we are defining the interfaces between the kernel and the modules/plugins. If you have suggestions, want something included in the API, or in general want to tell me I’m full of shit, then join us on the Drizzle Discussion mailing list and let your thoughts be heard! 🙂