Articles – joinfu

Project Mulligan’s Architecture

jaypipes — Fri, 10 Aug 2018 16:56:14 +0000

This is the second and final part of series describing Project Mulligan, the OpenStack Redo. If you missed the first part go read it. It is about changes I’d make to the community and mission of OpenStack had I my way with a theoretical reboot of the project.

In this part, I’ll be discussing what I’d change about the architecture, APIs and technology choices of OpenStack in the new world of Project Mulligan.

A word on programming language

For those of you frothing like hyenas waiting to get into a religious battle over programming languages, you will need to put your tongues back into your skulls. I’m afraid you’re going to be supremely disappointed by this post, since I quite deliberately did not want to get into a Golang vs. Python vs. Rust vs. Brainfuck debate.

Might as well get used to that feeling of disappointment now before going further. Or, whatever, just stop reading if that’s what you were hoping for.

For the record, shitty software can be written in any programming language. Likewise, excellent software can be written in most programming languages. I personally have a love/hate relationship with all three programming languages that I code with on a weekly basis (Python, Golang, C++).

Regardless of which programming language might be chosen for Project Mulligan, I doubt this love/hate relationship would change. At the end of the day, what is most important is the communication mechanisms between components in a distributed system and how various types of data are persisted. Programming language matters in neither of those things. There are bindings in every programming language for communicating over HTTP and for persisting and retrieving data from various purpose-build data stores.

Redoing the architecture

There are four areas of system design that I’d like to discuss in relation to OpenStack and Project Mulligan:

component layout: the topology of the system’s components and how components communicate with each other
dependencies: the technology choices a project makes regarding dependencies and implementation
pluggability: the degree to which a project tolerates flexibility of underlying implementation
extensibility: the degree to which a project enables being used in ways that the project was not originally intended

Before getting to Project Mulligan’s architecture, let’s first discuss the architecture of OpenStack v1.

OpenStack’s architecture

OpenStack has no coherent architecture. Period.

Within OpenStack, different projects are designed in different ways, with the individual contributors on a project team making decisions about how that project is structured, what technologies should be dependencies, and how opinionated the implementation should be (or conversely, how pluggable it should be made)

Some projects, like Swift, are highly opinionated in their implementation and design. Other projects, like Neutron, go out of their way to enable extensibility and avoid making any sort of choice when it comes to underlying implementation or hardware support.

Swift

Taking a further look at Swift, we see it is designed using a router-like topology with a top-level Proxy server routing incoming client requests along to various stand-alone daemon components fulfilling the needs of different parts of the request (object storage/retrieval, container and account metadata lookup, reaper and auditor workers, etc).

From a technology dependency point of view, Swift has very few.

There’s no message queue. Instead, the (minimal) communication between certain Swift internal service daemons is done via HTTP calls, and most Swift service daemons push incoming work requests on to internal simple in-memory queues for processing.

There is no centralized database either. Swift replicates SQLite database files from one container/account server to another. These SQLite database files are replicated between other nodes in the Swift system via shellout calls to the rsync command-line tool,

Finally, the object servers require filesystems with xattr support in order to do their work. While Swift can work with the OpenStack Identity service (Keystone), it has no interdependency with any OpenStack service nor does it utilitize any shared OpenStack library code (the OpenStack Oslo project).

Swift’s authors built some minor pluggability into how some of their Python backend classes were written, however pluggability is mostly not a priority in Swift. I’m not really aware of anyone implementing out-of-tree implementations for any of the Swift server code. Perhaps the Swift authors might comment on this blog entry and let me know if that is incorrect or outdated information.

Swift is not extensible in the sense that the core Swift software does not enable the scope of Swift’s API to extend beyond its core mission of being a highly available distributed object storage system.

Swift’s API is a pure data plane API. It is not a control plane API, meaning its API is not intended to perform execution of actions against some controllable resources. Instead, Swift’s API is all about writing and reading data from one or more objects and defining/managing the containers/accounts associated with those objects.

Neutron

Looking at Neutron, we see the opposite of Swift. From a component layout perspective, we see a top-level API server that issues RPC calls to a set of agent workers via a traditional brokered message bus.

There is a centralized database that stores object model information, however much of Neutron’s design is predicated on a plugin system that does the actual work of wiring up an L2 network. Layer 3 networking in Neutron always felt like it was bolted on and not really part of Neutron’s natural worldview. ¹

Neutron’s list of dependencies is broad and is influenced by the hardware and vendor technology a deployer chooses for actually configuring networks and ports. Its use of the common OpenStack Oslo Python libraries is extensive, as its dependency on a raft of other Python libraries. It communicates directly (and therefore has a direct relationship) with the OpenStack Nova and Designate projects, and it depends on the OpenStack Keystone project for identity, authentication and authorization information.

Nearly everything about Neutron is both pluggable and extensible. Everything seems to be a driver or plugin or API extension ². While there is a top-level API server, in many deployments it does little more than forward requests on to a proprietary vendor driver that does “the real work”, with the driver or plugin (hopefully ³) saving some information about the work it did in Neutron’s database.

The “modular L2 plugin” (ML2) system is a framework for allowing mechanism drivers that live outside of Neutron’s source tree to perform the work of creating and plugging layer-2 ports, networks, and subnets. Within the Neutron source tree, there are some base mechanism drivers that enable software-defined networking using various technologies like Linux bridges or OpenVSwitch.

This means that every vendor offering software-defined networking functionality has its own ML2 plugin (or more than one plugin) along with separate drivers for its own proprietary technology that essentially translate the Neutron worldview into the vendor’s proprietary system’s worldview (and back again). And example of this is the Cisco Neutron ML2 plugin which has mechanism drivers that speak the various Cisco-flavored networking.

One nice relatively recent development in Neutron is the separation of many “common” API constructs and machinery into the neutron-lib repository. This at least goes part way towards reducing some duplicative code and allowing out of tree source repositories to import a much smaller footprint than the entire Neutron source tree.

On the topic of extensibility in Neutron’s API, I’d like to point to Neutron’s own documentation on its API extensions, which states the following:

The purpose of Networking API v2.0 extensions is to:
– Introduce new features in the API without requiring a version change.
– Introduce vendor-specific niche functionality.
– Act as a proving ground for experimental functionalities that might be included in a future version of the API.

I’ll discuss a bit more in the section below on Project Mulligan’s API, but I’m not a fan of API extensibility as seen in Neutron. It essentially encourages a Wild West mentality where there is no consistency between API resources, no coherent connection between various resources exposed by the API, and a proliferation of vendor-centric implementation details leaking out of the API itself. Neutron’s API is not the only OpenStack project API to succumb to these problems, though. Not by a long shot.

Nova

Still other projects, like the OpenStack Nova project, have dependencies on traditional databases and brokered message queues, a component layout that was designed to address a specific scale problem but causes many other problems and a confusing blend of highly extensible, sometimes extensible, and extensible-in-name-only approaches to underlying technology choices.

Nova’s component layout features a top-level API server, similar to Neutron and Swift. From there, however, there’s virtually nothing the same about Nova. Nova uses a system of “cells” which are designed as scaling domains for the technology underpinning Nova’s component communications: a brokered message queue for communicating between various system services and a traditional relational database system for storing and retrieving state.

Take old skool RPC with all the operational pitfalls and headaches of using RabbitMQ and AMQP. Then tack on eight years of abusing the database for more than relational data and horrible database schema inefficiencies. Finally, overcomplicate both database connectivity and data migration because of a couple operators’ poor choices and inexperienced developers early in the development of Nova. And you’ve got the ball of spaghetti that Nova’s component layout and technology choice currently entails.

As for Nova’s extensibility, it varies. There is a virt driver interface that allows different hypervisors (and Ironic for baremetal) to perform the on-compute-node actions needed to start, stop, pause and terminate a VM instance. There are some out-of-tree virt drivers that ostensibly try to keep up with the virt driver interface, but it isn’t technically public so it’s mostly a “use with caution and good luck with that” affair. The scheduler component in Nova used to be ludicrously extensible, with support for all sorts of out-of-tree filters, weighers, even whole replacement scheduler drivers.

That sucked, since there was no way to change anything without breaking the world. So now, we’ve removed a good deal of the extensibility in the scheduler in order to return some level of sanity there.

Similarly, we used to give operators the ability to whole-hog replace entire subsystems like the networking driver with an out-of-tree driver of their own making. We no longer allow this kind of madness. Neutron is the only supported networking driver at this time. Same for volume management. Cinder is the one and only supported volume manager.

OK, so what does Project Mulligan look like?

For Project Mulligan, we’ll be throwing out pretty much everything and starting over. So, out with the chaos and inconsistency. In with sensibility, simplicity and far fewer plug and extension points.

Now that Project Mulligan’s scope has been healthily trimmed, we can focus on only the components and requirements for a simple machine provisioning system.

Project Mulligan will be composed of a number of components with well-defined scopes:

A smart client (mulligan-client) that can itself communicate with all Mulligan service daemons without needing to go through any central API or proxy service
An account service (mulligan-iam) providing identity and account management functionality
A metadata service (mulligan-meta) provides lookup and translation service for common object names, tags and key/value items
An inventory management service (mulligan-resource) is responsible for tracking the resources discovered in an environment and providing a simple placement and reservation engine
A control service (mulligan-control) that can take an incoming request to provision or decommission a machine and send a work request to a task queue
executors (mulligan-executor) will read task requests from a queue and performing that task, such as booting a machine or configuring a machine’s networking

What about the technologies that Project Mulligan will be dependent upon?

I’m envisioning the following dependencies:

etcd for storing various pieces of information and being the notification/watch engine for all services in Project Mulligan.
A relational database server for storing highly relational data that needs to be efficiently queried in aggregate
NATS for a simple task based queue service that the executors will pop tasks from. Or maybe Gearman instead of NATS, since it’s been around longer and Monty won’t blow his shit about yet another not-invented-here simple queueing application being hosted by the CNCF…

From a birdseye view, the topology of Project Mulligan and its components and dependent low-level services looks like this:

All Project Mulligan services and workers will be entirely stateless. All state will be persisted to either etcd or a RDBMS.

The account service

In order to provide multi-tenancy from the start, Project Mulligan clearly needs a service that will provide account and identity management support. If OpenStack Keystone were a separate, stand-alone service that had a gRPC API interface, we’d use that. But since it’s not, we’ll have to write our own from scratch, potentially with an adapter that understands the useful part of the OpenStack Keystone REST API — you know, users, projects and roles (assignments). The rest of the Keystone API, including the hand-rolled token API, would be cut out. Application credentials would stay but would just be a type of account instead of a separate thing that lived under the user resource.

We’ll use JSON Web Tokens (JWT) as the payload format for any authorization tokens that need to be passed between Project Mulligan service endpoints (or outside to external endpoints).

The metadata service

All objects in the Project Mulligan system will be identified by a UUID. However, as we all know, remembering UUIDs and passing them around manually is a pain in the ass. So, we need a way of associating a unique name and URI-clean slug with an object in the system. The metadata service will provide a simple UUID -> name or slug lookup (and reverse lookup) service. Data stores for individual services, whether those data stores be SQL databases or not, will not store any name or slug information. Only UUID keys for objects that the service controls. The metadata service will act as a cache for this kind of name/slug identifying information. It will be backed by an etcd data store.

The second purpose of the metadata service will be to store and retrieve two other kinds of information decorating objects in the system:

tags: these are simple strings that are settable on any object in the system and are not validated or protected in any way
metadefs: these are specific named attributes that may be defined by the system or an end user, protected against changes and given a validation schema

If you’re familiar with the OpenStack Glance Metadefs concept, that’s pretty much what the second item is.

The resource service

How resources are tracked, claimed and consumed in a software system is a critical concept to get right, and get right from the start. If there’s one thing I’ve learned working on the scheduler and placement services in OpenStack, it’s that the consistency and accuracy of the data that goes into an inventory management system dictates the quality of all systems built on top of that data, including capacity management, billing, reservations, scheduling/placement and quota management.

You need to accurately and efficiently represent both the structure of the resources and providers of resources within the system as well as the process by which those resources are consumed by users of the system.

You cannot build a quality resource management system on top of a free-for-all land-grab where each vendor essentially redefines its notion of what a resource is. It was tried in OpenStack Nova. It failed and is still causing headaches today. Vendors have shifted their focus from the scorched Earth they’ve left behind in OpenStack for the new fertile hype-hunting grounds of Kubernetes, with Intel and NVIDIA pushing for more and more extensibility in how they track resource inventory and consumption. With that extensibility comes a complete lack of interoperability and, unless the Kubernetes community is very careful, a quagmire of unmaintainable code.

Project Mulligan’s resource management service will be responsible for storing inventory information for resource providers in the system. The structure and relationship of providers to each other will also be the purview of the resource management service. In other words, the resource management service will understand groups of providers, trees of providers, relative distances between providers, etc.

The process by which consumers of resources request those resources is called a resource claim. I’ve come to realize over the last four years or so that the system that handles the consumption and claiming of resources must be able to understand the temporal aspects of the resource request as well as the quantitative aspects of the request. What this means is that I believe the system that doles out resources to consumers of those resources needs to implement a resource reservation system itself. After all, a resource reservation system is simply a resource allocation system with an extra temporal dimension. Storing this temporal data along with inventory and usage information makes the most natural sense from a systems design perspective.

So, good news for the OpenStack Blazar project, your functionality is going to be subsumed by the Project Mulligan resource management service.

The control service and executors

The control service will be nothing more than a place to send command requests. These command requests will be validated by the control service and then packaged up into tasks that get pushed onto a simple queue managed by NATS.

The executors will pull a task off the task queue and execute it, sending the results of the task back on to a separate results queue. The results queue will be processed by an executor whose job will be to simply save the results state to some data store.

Redoing the API

A project’s API is its primary user interface. As such, its API is critically important to both the success of the project as well as the project’s perceived
quality and ease of use.

There is no REST API in Project Mulligan. Only gRPC APIs are supported.

The reason for this choice is that gRPC is versioned from the get-go with a sane set of clear rules for describing the evolution of the request and response payloads.

No more inane and endless debates about “proper” REST-ness or HATEOS or which HTTP code thought up in the 1990s is more appropriate for describing a particular application failure.

No more trying to shoehorn a control plane API into a data plane API or vice versa.

Astute readers will note that there is no top-level API or proxy server in Project Mulligan.

However, this doesn’t mean that there isn’t a Project Mulligan public API. The Project Mulligan API is simply the set of gRPC API methods exposed by Project Mulligan’s gRPC service components. What is not part of Project Mulligan’s public API are the internal task payload formats that get sent from the mulligan-control service to the NATS queue(s) for eventual processing by a mulligan-executor worker.

The key to making this work is developing a smart client program that contains much of the catalog, service mesh and route-dispatching functionality that a traditional top-level API or proxy server would contain.

The problem with embedding a lot of logic into client programs, though, is that you need to duplicate that logic for each programming language binding you create. I recognize this is an issue. To address it, Project Mulligan will automatically generate this core routing logic code and generate smart clients in a variety of programming languages as part of its automated build process.

Call me crazy, I know… but gRPC, in combination with Google Protocol Buffers’
protoc compiler does this exact thing: it generates a client and server binding for whatever programming language you want after looking at the .proto files that describe an interface.

What about extensibility?

I’m not interested in having Project Mulligan become a generic framework for constructing cloud applications. I’m not interested in allowing Project Mulligan’s scope and purpose to be extended or redefined by adding API abstraction machinery ala Kubernetes custom resource definitions (CRDs) to Project Mulligan.

Project Mulligan is what it is: a simple machine provisioning system. It’s not designed to be extensible. It should do one thing and do that thing cleanly, simply and efficiently.

If you want a framework for creating something that isn’t machine provisioning, feel free to go use Kubernetes’ CRDs. Note that you’ll still need Kubernetes to be installed on some machines somewhere. After all, code needs to run on a machine. And Project Mulligan is all about demystifying that process of provisioning machines.

Important things to get right, from the beginning

Here are a few things that I think need to be done correctly, right from the start of Project Mulligan.

Installation and configuration
In-place updates and re-configuration
Multi-tenancy and isolation
Partitioning and failure domains

Install and configuration

Project Mulligan will be able to be installed via traditional software package managers like apt or yum on Linux.

In addition to traditional software packages, we’ll build Docker images for the various service components.

If an organization has already deployed a container orchestration system like Kubernetes and wants to deploy Project Mulligan in a highly-available manner, they can consume these Docker images and build their own Helm charts for deploying Project Mulligan.

Configuration of Project Mulligan should be as easy as possible. There should be no need for hundreds of configuration options and a simple bootstrapping command from the mulligan-client should be enough to get a working system up and running.

In-place updates and re-configuration

Project Mulligan’s components should be able to be updated in-place and separately from each other, with no ordering dependencies.

Each component should respond to a SIGTERM with a graceful shutdown of the process, draining any queued work properly.

When the component’s package is upgraded and the component restarted, the gRPC server contained within the component will likely contain a newer version ofgRPC API request payloads. Because of the way gRPC method versioning works, other components (or the smart clients) that communicate with the component will continue to be sending request payloads formatted for an older version of the component’s API method. The upgraded server component will respond to the older clients with the message format the older client expects. This will facilitate Project Mulligan’s components being upgraded without any ordering
dependency.

I really like OpenStack Swift’s approach to upgrades. There are no separate administrative commands for performing database schema migrations. No separate utility for performing data migrations. Everything is simply handled within the controllers themselves, with the controllers executing SQL commands to migrate the schema forward as needed.

Much more operator-friendly than having a separate nova-manage db sync && nova-manage api_db sync && nova-manage db online_data_migrations --max-count=X way of performing simple forward-only schema and data migrations.

Re-configuring a running Project Mulligan component should be possible in order to tinker with runtime settings like log output verbosity. In order to facilitate this runtime re-configuration, each Project Mulligan service component will have a gRPC API method called $ServiceConfigure() that accepts a set of configuration options that are tunable at runtime.

Multi-tenancy and isolation

From the beginning, nearly all ⁴ OpenStack components were designed for multi-tenancy. This means the systems are designed for multiple groups of one or more users to simultaneously utilize the service without impact to each other.

The actions those users take should not impact other users of the system, nor should the resources owned by one user have any access to resources owned by another user. In other words, by default, strong isolation exists between tenants of the system.

Project Mulligan will likewise be built from the ground up with multi-tenancy as a design tenet. Resources should be wholly owned by a tenant (group of users) and isolated from each other.

Partitioning and failure domains

Project Mulligan should have clear and concise definitions of the ways in which the system itself may be divided — both for reasons of scale as well as for limiting a failure’s effects.

This implies that all objects in the system should have an attribute that describes the shard or partition that the object “lives in”.

Kubernetes refers to a collection of controllers and nodes as a “cluster”, which is the closest concept it has ⁵ to a system partition or shard. There is no geographic nor failure domain connotation to a Kubernetes cluster.

OpenStack unfortunately has no consistent terminology nor model of partitioning. Some services have the concept of an “availability zone”. However, as I’ve noted on the openstack-dev mailing list before, the OpenStack term of “availability zone” is a complete pile of vestigial poo that should be flushed down the sewer pipes.

In Project Mulligan, we’ll call this partitioning concept a “region”. Regions may be hierarchical in nature, and there need not be a single root region. Regions may indicate via attributes what users should expect with regards to failure of various systems — power, network, etc — and their impact on other regions. Regions will have attributes that indicate their visibility to users, allowing both internal and external partitions of the system to be managed jointly but advertised separately.

Conclusion

In case it hasn’t become obvious by now, Project Mulligan isn’t OpenStack v2. It’s a complete change of direction, an entirely trimmed and reworked vision of a very small chunk of what today’s OpenStack services are.

Is this better than slow, incremental backwards-compatible changes to OpenStack v1, as some have suggested? I have no idea. But I’m damn sure it will be more fun to work on. And isn’t that what life is really all about?

Thanks for reading.

-jay

Footnotes

This makes perfect sense considering the origins of the Neutron project, which was founded by folks from Nicira which ended up as VMWare’s NSX technology — an L2-centric software defined networking technology. FYI, some of those same people are the driving force behind the Cilium project that is now showing promise in the container world. Isn’t it great to be able to walk away from your original ideas (granted, after a nice hiatus) and totally re-start without any of the baggage of the original stuff you wrote? I agree, which is why Project Mulligan will be a resounding success.
Even the comically-named “core extensions“. sigh.
The base Neutron plugin actually allows the plugin to not use Neutron’s database for state persistence, which is basically the Neutron authors relenting to vendor pressure to just have Neutron be a very thin shim over some proprietary network administration technology — like Juniper Contrail.
One the largest, most glaring exceptions to this multi-tenancy rule, unfortunately is OpenStack Ironic, the baremetal provisioning system.
Kubernetes actually doesn’t really define a “cluster” to be something specific, unlike many other terms and concepts that Kubernetes does define — like Pod, Deployment, etc. The term “cluster” seems to be just something the Kubernetes community landed on to refer to a collection of nodes comprising a single control plane…

Project Mulligan – OpenStack Redo

jaypipes — Fri, 10 Aug 2018 16:56:08 +0000

Jess Frazelle’s tweet recently got me thinking. ¹

What if I could go back and undo basically the last eight years and remake OpenStack (whatever “OpenStack” has come to entail)? What if we could have a big do-over?

In this two-part blog post, I will be describing “Project Mulligan”, the OpenStack Redo.

This first post is about mission, scope, community, and governance.

The second part describes the architecture and technology choices that Project Mulligan will employ.

This is obviously a highly opinionated reflection on what I personally would change about the world I’ve lived in for nearly a decade.

I’m bound to offend lots of people along the way in both the OpenStack and Kubernetes communities. Sorry in advance. Try not to take things personally — in many cases I’m referring as much to myself as anyone else and so feel free to join me on this self-deprecating journey.

Background

I’ve been involved in the OpenStack community for more than eight years now. I’ve worked for five different companies on OpenStack and cloud-related projects, with a focus on compute infrastructure (as opposed to network or storage infrastructure). I’ve been on the OpenStack Technical Committee, served as a Project Team Lead (PTL) and am on a number of core reviewer teams.

When it comes to technical knowledge, I consider myself a journeyman. I’m never the smartest person in the room, but I’m not afraid to share an opinion that comes from a couple decades of programming experience.

I’ve also managed community relations for an open source company, given and listened to lots of talks at conferences, and met a whole bunch of really smart and talented individuals in my time in the community.

All this to say that I feel I do have the required background and knowledge to at least put forth a coherent vision for Project Mulligan and that I am as much responsible as anyone else for the mess that OpenStack has become.

Redoing the mission

When OpenStack began, we dreamt big. The mission of OpenStack was big, bold and screamed of self-confidence. We wanted to create an open source cloud operating system.

The #1 goal in those days was expansion. Specifically, expansion of user footprint and industry mindshare. It was all about quantity versus quality. Get as much of the pie as possible.

As time rolled on, the mission got wordier, but remained as massive and vague as “cloud operating system” ever was. In 2013, the mission looked like this:

to produce the ubiquitous Open Source Cloud Computing platform that will meet the needs of public and private clouds regardless of size, by being simple to implement and massively scalable.

See the word “ubiquitous” in there? That pretty much sums up what OpenStack’s mission has been since the beginning: get installed in as many places as possible.

While “simple to implement” and “massively scalable” were aspirational, neither were realistic and both were subject to interpretation (though I think it is safe to say OpenStack has never been “simple to implement”).

Today, the mission continues to be ludicrously broad, vague, and open-ended, to the point that it’s impossible to tell what OpenStack is by reading the mission:

to produce a ubiquitous Open Source Cloud Computing platform that is easy to use, simple to implement, interoperable between deployments, works well at all scales, and meets the needs of users and operators of both public and private clouds.

“Meets the needs of users and operators of both public and private clouds” is about as immeasurable of a thing as I can think of. Again, it’s aspirational, but so broad as to be meaningless outside anything but the most abstract discussions.

Project Mulligan is getting a new mission in life:

demystify the process of provisioning compute infrastructure

It’s aspirational but not open-ended; singularly focused on the compute provisioning process.

Why “demystify”?

Despite “easy to use” and “simple to implement” being in OpenStack’s current mission, I believe OpenStack v1 has utterly failed to simplify a complex and often burdensome process. In contrast, OpenStack v1 has made a complex process (of provisioning infrastructure pieces) even more convoluted and error-prone.

If you ask me why I think OpenStack v1 has failed to deliver on these aspects of its mission, my response is that OpenStack v1 doesn’t know what it wants to be.

It has no identity other than being open and welcoming to anyone and everyone that wants to jump on the Great Cloud Bandwagon in the Sky. ²

And because of this identity crisis, there is zero focus on any one particular thing.

Well, that ends with Project Mulligan.

Project Mulligan isn’t trying to be a “cloud operating system”. Heck, it doesn’t even care what “cloud” is. Or isn’t. Or might be in the future for a DevOpsSysAdminUserator.

“OK, Jay, but what really IS ‘compute infrastructure’?”

I’m glad you asked, because that’s a perfect segue into a discussion about the scope of Project Mulligan.

Redoing the scope

Defining the scope of OpenStack is like attempting to bathe a mud-soaked cat in a bubble bath — a slippery affair that only ends up getting the bather muddy and angering the cat.

The scope of OpenStack escapes definition due to the sheer expanse of OpenStack’s mission.

Now that we’ve slashed Project Mulligan’s mission like Freddy Krueger on holiday in a paper factory, defining the scope of Project Mulligan is a much easier task.

We’re going to start with a relatively tiny scope (compared to OpenStack v1’s), and if the demand is there, we’ll expand it later. Maybe. If I’m offered enough chocolate chip cookies.

The scope of Project Mulligan is:

singular baremetal and virtual machine resource provisioning

I’ve chosen each word in the above scope carefully.

singular

“singular” was chosen to make it clear that Project Mulligan doesn’t attempt to provision multiple identical things in the same operation.

baremetal and virtual machine

“baremetal and virtual machine” was selected to disambiguate Project Mulligan’s target deployment unit. It’s not containers. It’s not applications. It’s not lambda functions. It’s not unikernels or ACIs or OCIs or OVFs or debs or RPMs or Helm Charts or any other type of package.

Project Mulligan’s target deployment unit is a machine — either baremetal or virtual.

A machine is what is required to run some code on. Containers, cgroups, namespaces, applications, packages, and yes, serverless/lambda functions require a machine to run on. That’s what Project Mulligan targets: the machine.

resource

The word “resource” was used for good reason: a resource is something that is used or consumed by some other system. How those systems describe, request, claim and ultimately consume resources is such a core concept in any software system that extreme care must be taken to ensure that the mechanics of resource management are done right, and done in a way that doesn’t hinder the creation of higher-level systems and platforms that utilize resource and usage information (like quota management and reservation systems, for example).

I go into a lot of detail below in the second part of this blog post on “Redoing the architecture” about resource management and why it’s important to be part of Project Mulligan.

provisioning

At its core, the purpose of Project Mulligan is to demystify the arcane and hideously complex process inherent in provisioning machines. Provisioning involves the setup and activation of the machine. It does not involve operational support of the machine, nor does it involve moving the machine, restarting it, pausing it, or throwing it a birthday party.

The only things that are important to be in Project Mulligan’s scope are the items that enable its mission and that cannot be fulfilled by other existing libraries or systems in a coherent way.

From an interface perspective, this means the scope of Project Mulligan is the following:

Disovering and managing hardware inventory (these are the resources that will be provided to consumers of Project Mulligan’s APIs)
Requesting, claiming and ultimately consuming those machine resources

That’s pretty much it.

Who is Project Mulligan’s target audience?

I got some early feedback on some of these ideas from OpenStack’s bull in a China shop, Monty Taylor.

Suffice to say, Monty didn’t like Project Mulligan, and for very good reason.

Users of “cloud APIs” are not Project Mulligan’s audience.

Cloud APIs are those things that allow a user to manage their applications that run on cloud (i.e. someone else’s) infrastructure. Cloud APIs allow lifecycle management of virtual or baremetal machines and containers. Cloud APIs allow a user to create some storage buckets and place objects in those buckets. Cloud APIs enable the construction of private networks that applications running in that cloud can use to communicate with each other and with the outside world.

Cloud APIs encourage users to think in terms of automated use and re-use of some technology infrastructure.

Cloud APIs are awesome!

But that’s not what Project Mulligan is for. Project Mulligan’s target audience is the providers of machines, not the users of those machines.

I see a clear line between these audiences. And I’d like Project Mulligan to focus on the operator/provider audience, not the cloud API user audience.

Does the cloud ecosystem need One True Cloud API ³? Perhaps it does. But Project Mulligan won’t be focused on making that. Besides, Monty will need something to occupy his time when not tweeting.

But what about everything else?!

I imagine at this point, I’ve offended more than three quarters of the universe by not including in Project Mulligan’s scope any of the following:

Object storage
Network provisioning and management
Containers
Security
Monitoring and metrics
Orchestration
Filesystems
Deployment automation
Configuration management
AmigaOS

Are these things important? Yep. Well, OK, maybe not AmigaOS. Do I want them in Project Mulligan’s mission statement or scope? No. No, I don’t.

For the record, I believe there are certain projects inside the OpenStack ecosystem that can and should live entirely outside of OpenStack.

The Gnocchi project has already led the way in this regard by decoupling itself from OpenStack’s community and tooling and standing on its own two feet out there in the big beautiful world. Good for Gnocchi. Seriously, good for Gnocchi.

I believe Swift, Keystone, Designate and Cinder should do the same. Yes, that’s right folks. I’m formally proposing that projects that can stand alone and provide a well-defined, small scope of service live outside of the OpenStack community. I believe that by doing so, these projects will encourage re-use outside of the OpenStack echo-chamber and be a greater benefit to the broad cloud software ecosystem.

Some projects, like Swift, are already ready for the umbilical cord to be cut. I’d actually go so far as to say Swift was born into the OpenStack world as a full-fledged adult to begin with. It uses no shared OpenStack Oslo Python libraries, has no inter-service dependencies to speak of, does not rely on a coordinated release cycle, and its API is rock-solid and (yay!) does one thing and one thing well.

Cinder, Designate and Keystone are more traditional OpenStack projects with the requisite heavy dependency on a number of common OpenStack Oslo Python libraries. And their APIs are not nearly as clear-cut as Swift’s. But, that said, the services themselves are well-scoped and would benefit the larger cloud software ecosystem by living outside the OpenStack bubble. Cinder already has made inroads into the Kubernetes community by being a provider of persistent volumes there. Keystone and Designate, if run as separate projects outside the OpenStack community, would likely be forced to maintain a small scope of service.

What benefit does the OpenStack echo-chamber give to these projects, anyway? Well, let’s take a look at that topic in the next section about redoing the community.

Redoing the community

I have a number of thoughts about matters of community. My views on the subject have changed — and continue to change. These views vary depending on the day of the week, whether I’m currently taking anti-depressants, and whether I’ve happened to upset more than three people in the last 24 hours while reviewing their code.

After witnessing and participating in eight years of OpenStack’s governance style, I’m eager for a change. Well, actually, a number of changes, presented here in no particular order.

Less talk, more do

I know that some colleagues in the community adore lengthy conversations on IRC and mailing list posts that meander around like a bored teenager on Quaaludes.

However, I’m tired of talking. I want to see some action taken that really kicks Project Mulligan into high gear.

Code talks volumes compared to endless specifications. I’d much rather see a pile of proof-of-concept code than spend six months endlessly debating the minutae of an upgrade path on a specification.

But unless we have leaders with a “walk, don’t talk” mentality — people who aren’t afraid to break things — I’m afraid nothing will ever change.

We need no-bullshit, this is how it’s gonna be, types of people willing to say “oh fuck no” when needed ⁴. But also people who say “alright, show me in code” and “I accept the fact that we won’t be able to see the future and accurately predict every tiny detail of an upgrade path or side-effect of this feature, but let’s just do it and if we need to fix things later, we will.”

These leaders will always need to bear the brunt of inevitable criticism that will pour forth from those who seek to commandeer Project Mulligan for their own devices.

Speaking of that, let’s talk a bit about vendors.

Vendors should wear helmets

Vendors should be required to wear helmets at all times when visiting Project Mulligan. And not some silly London bobby helmet. I’m talking about those bright yellow construction helmets that both announce intentions as well as protect brain matter.

Nothing gets my goat more than a vendor that isn’t up front about their vendory-ness.

Is it too much for me to want to hear, just once, “hey, I’m aware this feature is a pet project of mine and really only helps me out”.

Or better yet, “yeah, I understand that this project is made up entirely of my own internal engineers and really its just a way for us to protect our intellectual property by being first to market and (ab)using the open source ecosystem for marketing and hiring advantages”.

Unfortunately, all too often, vendors pretend like there’s nothing to see here; move along…

I am sure that as people read this, some folks are thinking, “Hmm, am I a vendor?”

Well, you might be a vendor if…

You only ask for features that further your own company’s objectives
You cannot articulate why anyone other than your company would want a feature
You don’t review code outside of your own company
You only work on code for a driver to enable your own company’s technology
If you provide any documentation links at all, you can only provide links to internal documentation that first needs to go through legal review in order for that link to be made public

“But Jay, 95% of contributors to OpenStack work at some company that pays them to write code or work on deployment stuff for OpenStack. Don’t shit where you eat, my friend.”

Yes, you’re absolutely correct about that, my friend. And thank you for being concerned about my alimentary canal.

That said, being “a vendor” is exhibiting a certain set of behaviours when participating (or not) in an open source community. Being “a vendor” doesn’t mean “you work for a company”.

I’m afraid to be the bearer of bad news, but Project Mulligan, unlike OpenStack, isn’t going to cater to the vendors.

Project Mulligan won’t kick vendors out, but at the same time, we’re not going to go out of our way to deify “Platinum Members” or any such silliness. Vendors are free to take Project Mulligan’s source code and use it for their own products, but the Project Mulligan community won’t be spending its time promoting those products, developing “certification” policies for vendors to work towards, or attempting to cozy up to vendor-centric trade organizations that aren’t particularly related to the core mission of Project Mulligan.

Which brings me to something that bothers me about Kubernetes, from a community perspective.

Many questions on the Kubernetes user mailing list seem to be for specific vendor products — i.e. Google Cloud Platform, Google Kubernetes Engine, Red Hat OpenShift ⁵, etc — instead of being about Kubernetes itself. This is indicative of the tight coupling between the Kubernetes project and the vendors that host a Kubernetes SaaS offering.

While there are occasionally questions on the OpenStack mailing lists about a particular distribution of OpenStack — Mirantis OpenStack/Fuel, Red Hat OpenStack Platform or RDO, etc — not only are these questions rare, but they are often answered with pointers to the vendor’s support organization or bug tracker. In addition, you don’t see questions about getting support for one of the public clouds that run OpenStack.

This doesn’t seem to be the case for Kubernetes, where the vendored SaaS offerings don’t seem to be distinguishable from the open source project itself. Or at least, they don’t seem to be distinguishable for a great number of Kubernetes users. And the engineers working at the primary Kubernetes vendors don’t seem to have much of a problem with this equating of a product with the open source project that underpins those products.

Releases and time-based release cycles

It seems to me that coordinated releases are merely designed to coincide with the OpenStack Summit marketing events. I see no benefit whatsoever to time-based release cycles other than for marketing coordination purposes.

The time-based release cycles set artificial deadlines and freeze dates for no good reason and just end up placing a bunch of bureaucratic red tape and slowing to a crawl a process that frankly should be a simple thing that can be done (in a fully automated fashion) whenever necessary.

OpenStack Swift has been doing non-time-based releases this way for more than 8 years, eschewing the time-based release mechanics that most of the rest of OpenStack projects follow in favor of just tagging their source repository with a version tag when they feel that a certain fix or feature warrants a new release tag. And it works perfectly fine for Swift. The Swift project’s sensible decoupling and reliance on far fewer inter-project dependencies makes this possible.

I don’t see operators clamouring for a six-month release cycle — jeez, most operators have so much internal integration and legacy junk that they move at the speed of molasses with regards to migrating to more modern OpenStack releases. All operators care about is can they get from their old-ass version to some less old-ass version of their deployed software in a way that doesn’t result in too much downtime for their users.

I also don’t care about long-term support (LTS) releases. If distributions want to support those, they should feel free to do so. Many distributions make lots of money selling FUD about software that changes too quickly for their legacy customers to “handle in the way they’ve become accustomed”. So, it seems fair to me to have those vendors fully own their own LTS release processes and spend the money to maintain that code. I personally don’t feel it should be something an upstream development community should be involved in.

Therefore, Project Mulligan won’t be following a time-based release cycle. There will be no “spec freeze dates”, no “code freeze dates”, no “dependent library freezes”, no “milestone releases”. None of that.

When a feature or set of bug fixes warrants a “release”, then the source repository will get a new tag which will trigger the automated build of release artifacts. No red tape, no freeze dates.

Who gets to decide what “warrants a release”? Anyone should be able to propose that the source repository be tagged for release. If there are no objections from anyone in the contributor community, one of the core committers can trigger the release by pushing a tag. Simple is that.

Conferences and foundations

There are plenty people who like the large OpenStack Summit conferences.

I am not one of those people.

I have not attended the last two OpenStack summits; instead I’ve chosen to stick to the OpenStack Project Team Gathering events that are strictly for developers to discuss implementation proposals and brainstorm system design.

Perhaps this is because I remember the very first OpenStack Design Summit in Austin, Texas. There were about 150 engineers there, if I recall correctly. We discussed implementation possibilities, design choices, how to test Python software properly, how to set up an automated testing infrastructure system, and how to organize developers who were just beginning to come together around infrastructure software.

That first get-together, and the developer events that followed for the next couple years or so (until the OpenStack Foundation became a thing), were modeled after the successful Ubuntu Design Summit (UDS) concept where contributors to the Ubuntu Linux distribution would get together every six months to decide what would be the focus of the next release, what were the implementation proposals for various system pieces, and what decisions needed to be made to further the distribution.

The entire purpose of the original design summits was to be a working event. There were no presentations, no vendor displays, no sales people mixing with engineers. It was an event designed to get actual work done.

The need for OpenStack to become ubiquitous in the cloud world, however, meant that those halcyon days of productive, technical meetings slowly evolved into a more traditional IT conference, with keynotes, big-money sponsors, lots of marketing and sales people, along with the inevitable buzzword-ification of the whole affair.

Now, if you look at the session schedules for the OpenStack summits, all you see is buzzword soup, with never-ending mentions of “hyperscale”, “hyper-converged infrastructure”, “edge”, “NFV”, “carrier-grade”, “containers”, “MEC”, etc.

This obsession with following and promoting the hype of the day has led to a ludicrous lineup of vendor-driven sessions that tend to drive people who are looking for solid technical content screaming for the hills.

Here’s a smattering of sessions from the most recent OpenStack Summit in Vancouver that are perfect examples of the type of artisanal, farm-to-table buzzword vendor bullshit that has pervaded these events.

That last one is basically just an advertisement for Intel storage technology.

And then there are the sessions (keynotes even!) that are devoted to completely non-existent vaporware — such as Akraino — because the companies behind the vaporware idea (in the case of Akraino, that would be AT&T and, yes again, Intel) are a powerful vendor lobby inside the OpenStack Foundation since they are platinum foundation members.

These companies have successfully played the OpenStack Foundation and the Linux Foundation against each other to see which foundation will buy into their particular flavor of vaporware du jour. Intel and subsidiary Windriver Systems did this with the StarlingX project as well.

This kind of vendor-driven political bullshit is the reason that Project Mulligan will be quite different when it comes to community management.

Project Mulligan won’t have a foundation, sorry. I personally like quite a few of the employees at the OpenStack Foundation. I even (mostly) support recent transition to becoming an Open Infrastructure Foundation. That said, I can’t stand the hype machining that the foundation encourages along with the Linux Foundation ⁶ and its Cloud Native Computing Foundation (CNCF) offshoot and OPNFV project. ⁷

Nor will Project Mulligan have marketing conferences. For get-togethers, there will be a return to the original design summit idea, crossed with a “hack days” type of flavor. We’ll call them “Design Days”.

All developers as well as operators of Project Mulligan will be welcome at these design days. Developers of Project Mulligan means anyone who has committed code or anyone who wants to contribute code to Project Mulligan and learn about it.

The events will be self-funded and bootstrap-organized, likely at university venues that donate space over a long weekend. I used to organize these types of events in the MySQL community and they were very successful in bringing community contributors in the ecosystem together for a fun and productive few days.

On to the architecture and technology fun

Are you now craving more fast and loose opinions on how OpenStack lost its way with its mission? Well, then, proceed apace to part two for some tasty blather about blowing up the OpenStack technology and system architecture and starting over in Project Mulligan.

Footnotes

Yes, I’m fully aware Jess Frazelle wasn’t actually asking the OpenStack community what the next version of OpenStack would look like. Rather, she was opining that some negative aspects of the OpenStack ecosystem and approach have snuck into Kubernetes. Still, I think it’s an interesting question to ponder, regardless.
Numerous folks have pointed to the “Big Tent” initiative from 2014-2015 as being the reason that OpenStack “lost its focus”. I’ve repeatedly called bullshit on this assertion, and will again do so here. The Big Tent initiative did not redefine OpenStack’s mission. It was a restructuring of how the OpenStack Technical Committee evaluated new project applicants to the OpenStack ecosystem. This is why the Big Tent was officially called “Project Structure Reform“. It changed governance procedures so that there was no more “Supreme Court of OpenStack” that had to be groveled to each time a new project came around. It absolutely did not broaden what OpenStack’s scope or mission was. Despite this, many people, even people knowledgeable of OpenStack governance internals, continue to equate the overly broad mission of OpenStack (which, again, has barely changed since 2012) with the “Big Tent” initiative.
As opposed to the existing environment that cloud users find themselves in, which is that every cloud vendor (AWS, GCP, Azure, DigitalOcean, etc) have their own APIs for performing actions against the resources exposed by their particular underlying infrastructure. You buy into that vendor’s API when you buy into the service they are providing. This is called lock-in.
OpenStack (as a whole as well as individual OpenStack projects) has suffered greatly from the inability of the maintainer community to say “no” (or even “fuck no”). Some people think it’s hard to say “no” to a feature request. Personally, I have no problem whatsoever saying “no” to virtually everything (saying “no” is basically my default answer to everyone other than my wife). Jess Frazelle’s article entitled “The Art of Closing” should be required reading for any contributor submitting a feature request and any maintainer looking for ways to not crush tender contributor feelings on a feature request (if that’s the sort of thing that keeps you up at night).
The current incarnation of OpenShift as of July 2018. They keep changing the damn thing’s purpose, rewriting it in different languages, and gradually updating the website so that you never quite know what http://openshift.com will lead to in any given month.
Tell me again why an open source foundation needs a Chief Revenue Officer?
My prediction is that eventually the Linux Foundation will end up subsuming the OpenStack (neÃ© Open Infrastructure) Foundation and turning it into one of its projects like OPNFV. At which point, there will be a huge backlash from the CNCF folks who despise all things OpenStack that they consider to be vendor-driven, legacy (in CNCF world, legacy == The World Before Docker, so anything before 2013.) and over-architected while forgetting that many of the same people and companies that initially developed those over-architected OpenStack solutions are now, gasp!, working on CNCF projects. At which point the Apache Foundation will put out an announcement saying how they could see all of this coming years ago. And we’ll all have a big come to Jesus (come to Jess?) moment and realize that, holy shit, we’re all actually working on the same kinds of problems and mostly we’ve just been letting our biases about programming languages, SQL vs. NoSQL, and Slack vs. IRC drive wedges in between what should be fairly rock-solid relationships.

Understanding the OpenStack CI System

jaypipes — Thu, 23 Jan 2014 18:05:51 +0000

This post describes in detail the upstream OpenStack continuous integration platform. In the process, I’ll be describing the code flow in the upstream system — from the time the contributor submits a patch to Gerrit, all the way through the creation of a devstack environment in a virtual machine, the running of the Tempest test suite against the devstack installation, and finally the reporting of test results and archival of test artifacts. Hopefully, with a good understanding of how the upstream tooling works, setting up your own linked external testing platform will be easier.

[toc]

Some History and Concepts

Over the past four years, there has been a steady evolution in the way that the source code of OpenStack projects is tested and reviewed. I remember when we used Bazaar for source control and Launchpad merge proposals for code review. There was no automated or continuous testing to speak of in those early days, which put pressure on core reviewers to do testing of proposed patches locally. There was also no standardized integration test suite, so often a change in one project would inadvertantly break another project.

Thanks to the work of many contributors, particularly those patient souls in the OpenStack Infrastructure team, today there is a robust platform supporting continuous integration testing for OpenStack and Stackforge projects. At the center of this platform are the Jenkins CI servers, the Gerrit git and patch review server, and the Zuul gating system.

The Code Review System

When a contributor submits a patch to one of the OpenStack projects, one pushes their code to the git server managed by Gerrit running on review.openstack.org. Typically, contributors use the git-review Git plugin, which simplifies submitting to a git server managed by Gerrit. Gerrit controls which users or groups are allowed to propose code, merge code, and administer code repositories under its management. When a contributor pushes code to review.openstack.org, Gerrit creates a Changeset representing the proposed code. The original submitter and any other contributors can push additional amendments to that Changeset, and Gerrit collects all of the changes into the Changeset record. Here is a shot of a Changeset under review. You can see a number of patches (changes) listed in the review screen. Each of those patches was an amendment to the original commit.

Individual patches amend the changeset

For each patch in Gerrit, there are three sets of “labels” that may be applied to the patch. Anyone can comment on a Changeset and/or review the code. A review is shown on the patch in the Code-Review column in the patch “labels matrix”:

The “label matrix” on a Gerrit patch

Non-core team members may give the patch a Code-Review label of +1 (Looks good to me), 0 (No strong opinion), or -1 (I would prefer you didn’t merge this). Core team members can give any of those values, plus +2 (Looks good to me, approved) and -2 (Do not submit).

The other columns in the label matrix are Verified and Approved. Only non-interactive users of Gerrit, such as Jenkins, are allowed to add a Verified label to a patch. The external testing platform you will set up is one of these non-interactive users. The value of the Verified label will be +1 (check pipeline tests passed), -1 (check pipeline tests failed), +2 (gate pipeline tests passed), or -2 (gate pipeline tests failed).

Only members of the OpenStack project’s core team can add an Approved label to a patch. It is either a +1 (Approved) value or not, appearing as a check mark in the Approved column of the label matrix:

An approved patch.

Continuous Integration Testing

Continuous integration (CI) testing is the act of running tests that validate a full application environment on a continual basis — i.e. when any change is proposed to the application. Typically, when talking about CI, we are referring to tests that are run against a full, real-world installation of the project. This type of testing, called integration testing, ensures that proposed changes to one component do not cause failures in other components. This is especially important for complex multi-project systems like OpenStack, with non-trivial dependencies between subsystems.

When code is pushed to Gerrit, a series of jobs are triggered that run a series of tests against the proposed code. Jenkins is the server that executes and manages these jobs. It is a Java application with an extensible architecture that supports plugins that add functionality to the base server.

Each job in Jenkins is configured separately. Behind the scenes, Jenkins stores this configuration information in an XML file in its data directory. You may manually edit a Jenkins job as an administrator in Jenkins. However, in a testing platform as large as the upstream OpenStack CI system, doing so manually would be virtually impossible and fraught with errors. Luckily, there is a helper tool called Jenkins Job Builder (JJB) that constructs these XML configuration files after reading a set of YAML files and job templating rules. We will describe JJB later in the article.

The “Gate”

When we talk about “the gate”, we are talking about the process by which code is kept out of a set of source code branches if certain conditions are not met.

OpenStack projects use a method of controlling merges into certain branches of their source trees called the Non-Human Gatekeeper model ^[1]. Gerrit (the non-human) is configured to allow merges by users in a group called “Non-Interactive Users” to the master and stable branches of git repositories under its control. The upstream main Jenkins CI server, as well as Jenkins CI systems running at third party locations, are the users in this group.

So, how do these non-interactive users actually decide whether to merge a proposed patch into the target branch? Well, there is a set of tests (different for each project) — unit, functional, integration, upgrade, style/linting — that is marked as “gating” that particular project’s source trees. For most of the OpenStack projects, there are unit tests (run in a variety of different supported versions of Python) and style checker tests for HACKING and PEP8 compliance. These unit and style tests are run in Python virtualenvs managed by the tox testing utility.

In addition to the Python unit and style tests, there are a number of integration tests that are executed against full installations of OpenStack. The integration tests are simply subsets of the Tempest integration test suite. Finally, many projects also include upgrade and schema migration tests in their gate tests.

How Upstream Testing Works

Graphically, the upstream continuous integration gate testing system works like this:

We step through this event flow in detail below, referencing the numbered steps in bold.

The Gerrit Event Stream and Zuul

After a contributor has pushed (1a) a new patch to a changeset or a core team member has reviewed the patch and added an Approved +1 label (1b), Gerrit pushes out a notification event to its event stream (2). This event stream can have a number of subscribers, including the Gerrit Jenkins plugin and Zuul. Zuul was developed to manage the many complex graphs of interdependent branch merge proposals in the upstream system. It monitors in-progress jobs for a set of related patches and will pre-emptively cancel any dependent test jobs that would not succeed due to a failure in a dependent patch ^[2].

In addition to this dependency monitoring, Zuul is responsible for constructing the pipelines of jobs that should be executed on various events. One of these pipelines is called the “gate” pipeline, appropriately named for the set of jobs that must succeed in order for a proposed patch to be merged into a target branch.

Zuul’s pipelines are configured in a single file called layout.yaml in the OpenStack-Infra config project. Here’s a snippet from that file that constructs the gate pipeline:

  - name: gate
    description: Changes that have been approved by core developers...
    failure-message: Build failed. For information on how to proceed...
    manager: DependentPipelineManager
    precedence: low
    trigger:
      gerrit:
        - event: comment-added
          approval:
            - approved: 1
        - event: comment-added
          comment_filter: (?i)^\s*reverify( (?:bug|lp)[\s#:]*(\d+))\s*$
    start:
      gerrit:
        verified: 0
    success:
      gerrit:
        verified: 2
        submit: true
    failure:
      gerrit:
        verified: -2

Zuul listens to the Gerrit event stream (3), and matches the type of event to one or more pipelines (4). The matching conditions for the gate pipeline are configured in the trigger:gerrit: section of the YAML snippet above:

    trigger:
      gerrit:
        - event: comment-added
          approval:
            - approved: 1
        - event: comment-added
          comment_filter: (?i)^\s*reverify( (?:bug|lp)[\s#:]*(\d+))\s*$

The above indicates that Zuul should fire the gate pipeline when it sees reviews with an Approved +1 label, and any comment to the review that contains “reverify” with or without a bug identifier. Note that there is a similar pipeline that is fired when a new patchset is created or when a review comment is made with the word “recheck”. This pipeline is called the check pipeline. Look in the layout.yaml file for the configuration of the check pipeline.

Once the appropriate pipeline is matched, Zuul executes (5) that particular pipeline for the project that had a patch proposed.

“But wait, hold up…“, you may be asking yourself, “how does Zuul know which Jenkins jobs to execute for a particular project and pipeline?“. Great question!

Also in the layout.yaml file, there is a section that configures which Jenkins jobs should be run for each project. Let’s take a look at the configuration of the gate pipeline for the Cinder project:

  - name: openstack/cinder
    template:
      - name: python-jobs
...snip...
    gate:
      - gate-cinder-requirements
      - gate-tempest-dsvm-full
      - gate-tempest-dsvm-postgres-full
      - gate-tempest-dsvm-neutron
      - gate-tempest-dsvm-large-ops
      - gate-tempest-dsvm-neutron-large-ops
      - gate-grenade-dsvm

Each of the lines in the gate: section indicate a specific Jenkins job that should be run in the gate pipeline for Cinder. In addition, there is the python-jobs item in the template: section. Project templates are a way that Zuul consolidates configuration of many similar jobs into a simple template configuration. The project template definition for python-jobs looks like this (still in layout.yaml:

project-templates:
  - name: python-jobs
...snip...
    gate:
      - 'gate-{name}-docs'
      - 'gate-{name}-pep8'
      - 'gate-{name}-python26'
      - 'gate-{name}-python27'

So, on determing which Jenkins jobs should be executed for a particular pipeline, Zuul sees the python-jobs project template in the Cinder configuration and expands that to execute the following Jenkins jobs:

gate-cinder-docs
gate-cinder-pep8
gate-cinder-python26
gate-cinder-python27

Jenkins Job Creation and Configuration

I previously mentioned that the configuration of an individual Jenkins job is stored in a config.xml file in the Jenkins data directory. Now, at last count, the upstream OpenStack Jenkins CI system has just shy of 2,000 jobs. It would be virtually impossible to manage the configuration of so many jobs using human-based processes. To solve this dilemma, the Jenkins Job Builder (JJB) python tool was created. JJB consumes YAML files that describe both individual Jenkins jobs as well as templates for parameterized Jenkins jobs, and writes the config.xml files for all Jenkins jobs that are produced from those templates. Important: Note that Zuul does not construct Jenkins jobs. JJB does that. Zuul simply configures which Jenkins jobs should run for a project and a pipeline.

There is a master projects.yaml file in the same directory that lists the “top-level” definitions of jobs for all projects, and it is in this file that many of the variables that are used in job template instantiation are defined (including the {name} variable, which corresponds to the name of the project.

When JJB constructs the set of all Jenkins jobs, it reads the projects.yaml file, and for each project, it sees the “name” attribute of the project, and substitutes that name attribute value wherever it sees {name} in any of the jobs that are defined for that project. Let’s take a look at the Cinder project’s definition in the projects.yaml file here:

- project:
    name: cinder
    github-org: openstack
    node: bare-precise
    tarball-site: tarballs.openstack.org
    doc-publisher-site: docs.openstack.org

    jobs:
      - python-jobs
      - python-grizzly-bitrot-jobs
      - python-havana-bitrot-jobs
      - openstack-publish-jobs
      - gate-{name}-pylint
      - translation-jobs

You will note one of the items in the jobs section is called python-jobs. This is actually not a single Jenkins job, but actually a job group. A job group definition is merely a list of jobs or job templates. Let’s take a look at the definition of the python-jobs job group:

- job-group:
    name: python-jobs
    jobs:
      - '{name}-coverage'
      - 'gate-{name}-pep8'
      - 'gate-{name}-python26'
      - 'gate-{name}-python27'
      - 'gate-{name}-python33'
      - 'gate-{name}-pypy'
      - 'gate-{name}-docs'
      - 'gate-{name}-requirements'
      - '{name}-tarball'
      - '{name}-branch-tarball'

Each of the items listed in the jobs section of the python-jobs job group definition above is a job template. Job templates are expanded in the same way as Zuul project templates and JJB job groups are expanded. Let’s take a look at one such job template in the list above, called gate-{name}-python27.

(Hint: all Jenkins jobs for any OpenStack or Stackforge project are described in the OpenStack-Infra Config project’s modules/openstack_projects/files/jenkins_jobs/config/ directory).

The python-jobs.yaml file in the modules/openstack_project/files/jenkins_job_builder/config directory contains the definition of common Python project Jenkins job templates. One of those job templates is gate-{name}-python27:

- job-template:
    name: 'gate-{name}-python27'
... snip ...
    builders:
      - gerrit-git-prep
      - python27:
          github-org: '{github-org}'
          project: '{name}'
      - assert-no-extra-files

    publishers:
      - test-results
      - console-log

    node: '{node}'

Looking through the above job template definition, you will see a section called “builders“. The builders section of a job template lists (in sequential order of expected execution) the executable sections or scripts of the Jenkins job. The first executable section in the gate-{name}-python27 job template is called “gerrit-git-prep“. This executable section is defined in macros.yaml, which contains a number of commonly-run scriptlets. Here’s the entire gerrit-git-prep macro definition:

- builder:
    name: gerrit-git-prep
    builders:
      - shell: "/usr/local/jenkins/slave_scripts/gerrit-git-prep.sh https://review.openstack.org http://zuul.openstack.org git://git.openstack.org"

So, gerrit-git-prep is simply executing a Bash script called “gerrit-git-prep.sh” that is stored in the /usr/local/jenkins/slave_scripts/ directory. Let’s take a look at that file. You can find it in the /modules/jenkins/files/slave_scripts/ ^[3]
directory in the same OpenStack Infra Config project:

#!/bin/bash -e

GERRIT_SITE=$1
ZUUL_SITE=$2
GIT_ORIGIN=$3

# ... snip ...

set -x
if [[ ! -e .git ]]
then
    ls -a
    rm -fr .[^.]* *
    if [ -d /opt/git/$ZUUL_PROJECT/.git ]
    then
        git clone file:///opt/git/$ZUUL_PROJECT .
    else
        git clone $GIT_ORIGIN/$ZUUL_PROJECT .
    fi
fi
git remote set-url origin $GIT_ORIGIN/$ZUUL_PROJECT

# attempt to work around bugs 925790 and 1229352
if ! git remote update
then
    echo "The remote update failed, so garbage collecting before trying again."
    git gc
    git remote update
fi

git reset --hard
if ! git clean -x -f -d -q ; then
    sleep 1
    git clean -x -f -d -q
fi

if [ -z "$ZUUL_NEWREV" ]
then
    git fetch $ZUUL_SITE/p/$ZUUL_PROJECT $ZUUL_REF
    git checkout FETCH_HEAD
    git reset --hard FETCH_HEAD
    if ! git clean -x -f -d -q ; then
        sleep 1
        git clean -x -f -d -q
    fi
else
    git checkout $ZUUL_NEWREV
    git reset --hard $ZUUL_NEWREV
    if ! git clean -x -f -d -q ; then
        sleep 1
        git clean -x -f -d -q
    fi
fi

if [ -f .gitmodules ]
then
    git submodule init
    git submodule sync
    git submodule update --init
fi

The purpose of the script above is simple: Check out the source code of the proposed Gerrit changeset and ensure that the source tree is clean of any cruft from a previous run of a Jenkins job that may have run in the same Jenkins workspace. The concept of a workspace is important. When Jenkins runs a job, it must execute that job from within a workspace. The workspace is really just an isolated shell environment and filesystem directory that has a set of shell variables export’d inside it that indicate a variety of important identifiers, such as the Jenkins job ID, the name of the source code project that has triggered a job, the SHA1 git commit ID of the particular proposed changeset that is being tested, etc ^[4].

The next builder in the job template is the “python27” builder, which has two variables injected into itself:

      - python27:
          github-org: '{github-org}'
          project: '{name}'

The github-org variable is a string of the already existing {github-org} variable value. The project variable is populated with the value of the {name} variable. Here’s how the python27 builder is defined (in macros.yaml:

- builder:
    name: python27
    builders:
      - shell: "/usr/local/jenkins/slave_scripts/run-unittests.sh 27 {github-org} {project}"

Again, just a wrapper around another Bash script, called run-unittests.sh in the /usr/local/jenkins/slave_scripts directory. Here’s what that script looks like:

version=$1
org=$2
project=$3

# ... snip ...

venv=py$version

# ... snip ...

source /usr/local/jenkins/slave_scripts/select-mirror.sh $org $project

tox -e$venv
result=$?

echo "Begin pip freeze output from test virtualenv:"
echo "======================================================================"
.tox/$venv/bin/pip freeze
echo "======================================================================"

if [ -d ".testrepository" ] ; then
# ... snip ...
    .tox/$venv/bin/python /usr/local/jenkins/slave_scripts/subunit2html.py ./subunit_log.txt testr_results.html
    gzip -9 ./subunit_log.txt
    gzip -9 ./testr_results.html
# ... snip ...
fi

# ... snip ...

In short, for the Python 2.7 builder, the above runs the command tox -epy27 and then runs a prettifying script and gzips up the results of the unit test run. And that’s really the meat of the Jenkins job. We will discuss the publishing of the job artifacts a little later in this article, but if you’ve gotten this far, you have delved deep into the mines of the OpenStack CI system. Congratulations!

Devstack-Gate and Running Tempest Against a Real Environment

OK, so unit tests running in a simple Jenkins slave workspace are one thing. But what about Jenkins jobs that run integration tests against a full set of OpenStack endpoints, interacting with real database and message queue services? For these types of Jenkins jobs, things are more complicated. Yes, I know. You probably think things have been complicated up until this point, and you’re right! But the simple unit test jobs above are just the tip of the proverbial iceberg when it comes to the OpenStack CI platform.

For these complex Jenkins jobs, an additional set of tools are added to the mix:

Nodepool — Provides virtual machine instances to Jenkins masters for running complex, isolation-sensitive Jenkins jobs
Devstack-Gate — Scripts that create an OpenStack environment with Devstack, run tests against that environment, and archive logs and results

Assignment of a Node to Run a Job

Different Jenkins jobs require different workspaces, or environments, in which to run. For basic unit or style-checking test jobs, like the gate-{name}-python27 job template we dug into above, not much more is needed than a tox-managed virtualenv running in a source checkout of the project with a proposed change. However, for Jenkins jobs that run a series of integration tests against a full OpenStack installation, a workspace with significantly more resources and isolation is necessary. For these latter types of jobs, the upstream CI platform uses a pool of virtual machine instances. This pool of virtual machine instances is managed by a tool called nodepool. The virtual machines run in both HP Cloud and Rackspace Cloud, who graciously donate these instances for the upstream CI system to use. You can see the configuration of the Nodepool-managed set of instances here.

Instances that are created by Nodepool run Jenkins slave software, so that they can communicate with the upstream Jenkins CI master servers. A script called prepare_node.sh runs on each Nodepool instance. This script just git clones the OpenStack Infra config project to the node, installs Puppet, and runs a Puppet manifest that sets up the node based on the type of node it is. There are bare nodes, nodes that are meant to run Devstack to install OpenStack, and nodes specific to the Triple-O project. The node type that we will focus on here is the node that is meant to run Devstack. The script that runs to prepare one of these nodes is prepare_devstack_node.sh, which in turn calls prepare_devstack.sh. This script caches all of the repositories needed by Devstack, along with Devstack itself, in a workspace cache on the node. This workspace cache is used to enable fast reset of the workspace that is used during the running of a Jenkins job that uses Devstack to construct an OpenStack environment.

Devstack-Gate

The Devstack-Gate project is a set of scripts that are executed by certain Jenkins jobs that need to run integration or upgrade tests against a realistic OpenStack environment. Going back to the Cinder project configuration in the Zuul layout.yaml file:

  - name: openstack/cinder
    template:
      - name: python-jobs
... snip ...
    gate:
      - gate-cinder-requirements
      - gate-tempest-dsvm-full
      - gate-tempest-dsvm-postgres-full
      - gate-tempest-dsvm-neutron
      - gate-tempest-dsvm-large-ops
      - gate-tempest-dsvm-neutron-large-ops
      - gate-grenade-dsvm
... snip ...

Note the highlighted line. That Jenkins job template is one such job that needs an isolated workspace that has a full OpenStack environment running on it. Note that “dsvm” stands for “Devstack virtual machine”.

Let’s take a look at the JJB configuration of the gate-tempest-dsvm-full job:

- job-template:
    name: '{pipeline}-tempest-dsvm-full{branch-designator}'
    node: '{node}'
... snip ...
    builders:
      - devstack-checkout
      - shell: |
          #!/bin/bash -xe
          export PYTHONUNBUFFERED=true
          export DEVSTACK_GATE_TIMEOUT=180
          export DEVSTACK_GATE_TEMPEST=1
          export DEVSTACK_GATE_TEMPEST_FULL=1
          export BRANCH_OVERRIDE={branch-override}
          if [ "$BRANCH_OVERRIDE" != "default" ] ; then
              export OVERRIDE_ZUUL_BRANCH=$BRANCH_OVERRIDE
          fi
          cp devstack-gate/devstack-vm-gate-wrap.sh ./safe-devstack-vm-gate-wrap.sh
          ./safe-devstack-vm-gate-wrap.sh
      - link-logs

    publishers:
      - devstack-logs
      - console-log

The devstack-checkout builder is simply a Bash script macro that looks like this:

- builder:
    name: devstack-checkout
    builders:
      - shell: |
          #!/bin/bash -xe
          if [[ ! -e devstack-gate ]]; then
              git clone git://git.openstack.org/openstack-infra/devstack-gate
          else
              cd devstack-gate
              git remote set-url origin git://git.openstack.org/openstack-infra/devstack-gate
              git remote update
              git reset --hard
              if ! git clean -x -f ; then
                  sleep 1
                  git clean -x -f
              fi
              git checkout master
              git reset --hard remotes/origin/master
              if ! git clean -x -f ; then
                  sleep 1
                  git clean -x -f
              fi
              cd ..
          fi

All the above is doing is git clone’ing the devstack-gate repository into the Jenkins workspace, and if the devstack-gate repository already exists, checks out the latest from the master branch.

Returning to our gate-tempest-dsvm-full JJB job template, we see the remaining part of the builder is a Bash scriptlet like so:

          #!/bin/bash -xe
          export PYTHONUNBUFFERED=true
          export DEVSTACK_GATE_TIMEOUT=180
          export DEVSTACK_GATE_TEMPEST=1
          export DEVSTACK_GATE_TEMPEST_FULL=1
          export BRANCH_OVERRIDE={branch-override}
          if [ "$BRANCH_OVERRIDE" != "default" ] ; then
              export OVERRIDE_ZUUL_BRANCH=$BRANCH_OVERRIDE
          fi
          cp devstack-gate/devstack-vm-gate-wrap.sh ./safe-devstack-vm-gate-wrap.sh
          ./safe-devstack-vm-gate-wrap.sh

Not all that complicated. It exports some environment variables and copies the devstack-vm-gate-wrap.sh script out of the devstack-gate repo that was clone’d in the devstack-checkout macro to the work directory and then runs that script.

The devstack-vm-gate-wrap.sh script is responsible for setting even more environment variables and then calling the devstack-vm-gate.sh script, which is where the real magic happens.

Construction of OpenStack Environment with Devstack

The devstack-vm-gate.sh script is responsible for constructing a full OpenStack environment and running integration tests against that environment. To construct this OpenStack environment, it uses the excellent Devstack project. Devstack is an elaborate series of Bash scripts and functions that clones each OpenStack project source code into /opt/stack/new/$project ^[5]— , runs python setup.py install in each project checkout, and starts each relevant OpenStack service (e.g. nova-compute, nova-scheduler, etc) in a separate Linux screen session.

Devstack’s creation script (stack.sh) is called from the script after creating the localrc file that stack.sh uses when constructing the Devstack environment.

Execution of Integration Tests Against an OpenStack Environment

Once the OpenStack environment is constructed, the devstack-vm-gate.sh script continue on to run a series of integration tests:

    cd $BASE/new/tempest
    if [[ "$DEVSTACK_GATE_TEMPEST_ALL" -eq "1" ]]; then
        echo "Running tempest all test suite"
        sudo -H -u tempest tox -eall -- --concurrency=$TEMPEST_CONCURRENCY
        res=$?
    elif [[ "$DEVSTACK_GATE_TEMPEST_FULL" -eq "1" ]]; then
        echo "Running tempest full test suite"
        sudo -H -u tempest tox -efull -- --concurrency=$TEMPEST_CONCURRENCY
        res=$?

You will note that the $DEVSTACK_GATE_TEMPEST_FULL Bash environment variable was set to “1” in the gate-tempest-dsvm-full Jenkins job builder scriptlet.

sudo -H -u tempest tox -efull triggers the execution of Tempest’s integration test suite. Tempest is the collection of canonical OpenStack integration tests that are used to validate that OpenStack APIs work according to spec and that patches to one OpenStack service do not inadvertently cause failures in another service.

If you are curious what actual commands are run, you can check out the tox.ini file in Tempest:

[testenv:full]
# The regex below is used to select which tests to run and exclude the slow tag:
# See the testrepostiory bug: https://bugs.launchpad.net/testrepository/+bug/1208610
commands =
  bash tools/pretty_tox.sh '(?!.*\[.*\bslow\b.*\])(^tempest\.(api|scenario|thirdparty|cli)) {posargs}'

In short, the above runs the Tempest API, scenario, CLI, and thirdparty tests.

Archival of Test Artifacts

The final piece of the puzzle is archiving all of the artifacts from the Jenkins job execution. These artifacts include log files from each individual OpenStack service running in Devstack’s screen sessions, the results of the Tempest test suite runs, as well as echo’d output from the devstack-vm-gate* scripts themselves.

These artifacts are gathered together by the devstack-logs and console-log JJB publisher macros:

- publisher:
    name: console-log
    publishers:
      - scp:
          site: 'static.openstack.org'
          files:
            - target: 'logs/$LOG_PATH'
              copy-console: true
              copy-after-failure: true


- publisher:
    name: devstack-logs
    publishers:
      - scp:
          site: 'static.openstack.org'
          files:
            - target: 'logs/$LOG_PATH'
              source: 'logs/**'
              keep-hierarchy: true
              copy-after-failure: true

Conclusion

I hope this article has helped you understand a bit more how the OpenStack continuous integration platform works. We’ve stepped through the flow through the various components of the platform, including which events trigger what actions in each components. You should now have a good idea how the various parts of the upstream CI infrastructure are configured and where to go look in the source code for more information.

The next article in this series discusses how to construct your own external testing platform that is linked with the upstream OpenStack CI platform. Hopefully, this article will provide you most of the background information you need to understand the steps and tools involved in that external testing platform construction.

^[1]— The link describes and illustrates the non-human gatekeeper model with Bazaar, but the same concept is applicable to Git. See the OpenStack GitWorkflow pages for an illustration of the OpenStack specific model.
^[2]— Zuul really is a pretty awesome bit of code kit. Jim Blair, the author, does an excellent job of explaining the merge proposal dependency graph and how Zuul can “trim” dead-end branches of the dependency graph in the Zuul documentation.
^[3]— Looking for where a lot of the “magic” in the upstream gate happens? Take an afternoon to investigate the scripts in this directory.
^[4]— Gerrit Jenkins plugin and Zuul export a variety of workspace environment variables into the Jenkins jobs that they trigger. If you are curious what these variables are, check out the Zuul documentation on parameters.
^[5]— The reason the projects are installed into /opt/stack/new/$project is because the current HEAD of the target git branch for the project is installed into /opt/stack/old/$project. This is to allow an upgrade test tool called Grenade to test upgrade paths.

Character Sets, Collations and the JÃ¶rmungandr

jaypipes — Thu, 02 Oct 2008 16:51:00 +0000

One of the (many) ongoing discussions in the Drizzle developer community is the level of support the database server kernel should provide for non-Unicode character set encodings. Actually, when I say non-Unicode, I actually mean non-UTF8, since we’ve stripped out all other character sets and “standardized” on 4-byte UTF8. I’ll come back to why exactly I put standardized in quotes in just a bit…but to sum up, in childish terms, my thoughts after spending 4 hours tonight reading about character sets and collations, here is an exchange between Toru and myself on Freenode #drizzle:

 tmaesaka: how do you write "I wish everyone would just speak English" in Japanese? ;)
 ã¿ã‚“ãªè‹±èªžä½¿ã†ã¨ã„ã„ã®ã«ã€‚

A Little Background

For those of you new to the world of character sets and collations, I’ll briefly summarize the concepts and terms I’ll talk about in this article. Incidentally, I consider myself to be in this crowd, since I’ve never really had to deal with anything more than a cursory knowledge of them in reference to how they work in MySQL (not the internals).

Character Sets and Encodings

A character set, or character encoding scheme, is a system for matching characters — such as “A” or “ã¿” or “ÃŸ” — with a machine-readable code for the character. This machine-readable code can be represented simply as a decimal number, or in more complex character sets, a hexidecimal number. The “encoding” of the character set is the protocol, or instructions, that the character set uses in order to enable the computer to understand a series of byte sequences and interpret the sequence as a specific character.

Character sets such as ASCII are very simple, and consist of a single 8-bit byte for each character contained in the character set — with only 7 bits actually used for the character code. ASCII consists of the English-language alphabetic characters, including their captalized forms, the numbers 0 through 9, a variety of punctuation and “common” symbols like “$” and “;”, and a series of non-printable “control” characters. This encoding scheme works wonderfully for those of us in the U.S., but it is utterly lacking when it comes to representing the myriad characters and symbols in other languages.

Other more-complex character encodings are localized for a specific language, or writing system. For instance, the Shift_JIS character encoding scheme encodes, in 2 bytes, the ASCII character set (with 2 exceptions), the “half-width Katakana” characters, and the JIS X 0208 set of kanji symbols. Sound complicated? It is. And it gets even more complicated the further down the rabbit-hole one goes.

Which leads me to Unicode…

What the Heck is Unicode and UTF?

Many folks think that Unicode is merely another character set or encoding scheme. It’s not. It’s actually more than that. It’s an entire system which endeavours to standardize the way that computers can read, sort, and transform characters encoded in various character sets.

Actually, The Unicode standard according to Wikipedia

…consists of a repertoire of more than 100,000 characters, a set of code charts for visual reference, an encoding methodology and set of standard character encodings, an enumeration of character properties such as upper and lower case, a set of reference data computer files, and a number of related items, such as character properties, rules for normalization, decomposition, collation, rendering and bidirectional display order…

Got all that?

So, Unicode is a set of standards for dealing with lots of varying languages and characters, and transcoding character codes from one encoding scheme to another. What, then, is UTF[8|16|32]?

UTF stands for Unicode Transformation Format, and is a set of mapping methods for translating one of Unicode’s 1,114,112 code points (characters or control sequences) to a hexadecimal number. UTF8 is a variably-sized mapping method, which uses between one and four bytes to represent one of the code points. ASCII and most Western character sets take up 1 byte of storage, whilst CJK (Chinese/Japanese/Korean) characters typically consume 3 bytes of space per character. It is important to note that this 3 bytes is one more byte per character than encoding schemes like Shift_JIS, which use either 1 or 2 bytes for characters. Yoshinori Matsunobu published a short article today on these storage space differences.

UTF16 is a variable-width mapping scheme which uses the first 16 bits of the hexadecimal number to represent what “category” or “plane” of characters the code point belongs to. UTF16 generally uses a little bit less storage space for CJK characters versus UTF8. However, when analyzing actual CJK text, which includes spaces and other ASCII characters, the storage difference seems to be negligible. UTF32 is a fixed-length mapping method which uses 4 bytes to store each code point.

UTF8 is dominant in the web space, with all modern browsers able to understand and encode for UTF8.

OK, So What is a Collation?

So, if a character encoding scheme, such as UTF8, is used to identify a set of characters and symbols as a machine-readable sequence of bytes, then what exactly is a collation, and why are they important?

Glad you asked. A collation , or collating sequence, refers to the order in which different characters in a character set should appear when sorted in a list. The alphabetic collating sequence is the one some of us, in our little English-only world, are familiar with. But in various regions of the world, the same set of characters may be ordered differently when appearing in a list of characters. And therefore, even with a character encoding scheme like UTF8, one must also specify a collation when listing textual results in a specific order.

In MySQL, as well as Drizzle, the method for ordering results by a specific collation is fairly simple: one merely specifies the collation in the ORDER BY clause, like the example below shows:

mysql> CREATE TABLE utf8_tests (
    ->   my_text VARCHAR(100) NOT NULL
    -> ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO utf8_tests VALUES ('comb'),('cukor'),('csak'),('folyik'),('folyÃ³'),('folyosÃ³'),('fÅ‘'),('fÃ¶dÃ©m');
Query OK, 8 rows affected (0.00 sec)
Records: 8  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM utf8_tests ORDER BY my_text COLLATE utf8_general_ci;
+----------+
| my_text  |
+----------+
| comb     | 
| csak     | 
| cukor    | 
| fÃ¶dÃ©m  | 
| fÅ‘      | 
| folyÃ³   | 
| folyik   | 
| folyosÃ³ | 
+----------+
8 rows in set (0.00 sec)

mysql> SELECT * FROM utf8_tests ORDER BY my_text COLLATE utf8_hungarian_ci;
+----------+
| my_text  |
+----------+
| comb     | 
| csak     | 
| cukor    | 
| fÅ‘      | 
| fÃ¶dÃ©m  | 
| folyÃ³   | 
| folyik   | 
| folyosÃ³ | 
+----------+
8 rows in set (0.00 sec)

You’ll notice that the words “fÅ‘” and “fÃ¶dÃ©m” are reversed depending on the collation used in the ORDER BY clause.

Any Hungarians reading this article? If there are, you’ll likely have already spotted the problem with the above output. The problem is that it’s wrong. “csak” should appear after cukor, since “cs” is a digraph (two-characters interpreted as one) which comes after “c” in the Hungarian alphabet.

The above behaviour is known bug in MySQL since August 2005, over three years. The above bug is something I noticed while reading up on collations and comparing what’s going on in MySQL/Drizzle to what the standard expects. The ICU project has a set of HTML pages where you can type in a list of words in a language and sort by various collations, and it will show you the correct sort order. I ran into the bug above, as well as a new bug in the German collation I found today.

Where Drizzle Is Right Now

Currently, all but the UTF8 character set have been removed from Drizzle. Furthermore, the UTF8 implementation in Drizzle is full 4-byte UTF8, which differs from the 3-byte variety used in MySQL <= 5.1. There are two major benefits that this decision and subsequent removal has given Drizzle:

Reductions in the size and complexity of the Drizzle parser — removal of some CONVERT() stuff, introducers, and more
Easier to understand (and potentially refactor) code surrounding character sets and collations

So, it seems that although we’ve stripped out a lot of complexity by moving to only UTF8 and its collations, we’ve inherited a system that, frankly, was never designed to handle complex collations. Instead, it is designed to be fast, not entirely accurate. So, what is a project to do?

We have a number of options, all of which we’ve been debating over on the mailing lists:

Use libICU for all character set and collation services — libICU is a full-featured library written by experts in the languages and transcoding fields, why not take advantage of that expertise?
Use GLib’s locale facilities — this has mostly been ruled out because of performance concerns over non-reentrant code dependent on setlocale()
Write our own — this is essentially where we are right now
Use C++’s facilities, as Monty demonstrated on the mailing lists — actually, I also have O’Reilly’s C++ Cookbook, so I know where that code originated…

libICU is, frankly, quite a large library, and it’s not certain that the performance of it would be satisfactory. However, I can certainly envision taking libICU’s test case suite and converting it to the Drizzle test suite format. This would certainly poke holes in our current character set handling that need to be discovered.

Although Yoshinori-san’s objections about UTF8 storage requirements versus localized Japanese character sets are valid, I don’t think at this point that we’ll re-introduce non-UTF8 character sets into the server at this time. If there is a huge uproar over this, in the future, pluggable character sets are a possibility, after changes to the plugin API to enable it. Pluggable collations too…

This last option is the one which interests me the most, and I find most appealing. In fact, I compiled a small test program based on the C++ facilities which actually produces the correct collation order for the bug demonstrated above:

#include 
#include 
#include 
#include 
#include 
 
using namespace std;
 
bool localeLessThan(const string& s1, const string& s2) {
 
  locale locale1("hu_HU.utf8");
 
  const collate& col= use_facet >(locale1);
 
  const char* pb1= s1.data();
  const char* pb2= s2.data();
 
  return (col.compare(pb1, pb1 + s1.size(),
                      pb2, pb2 + s2.size()) < 0);
}
 
int main(int argc, char** argv) {
 
  string s1 = "comb";
  string s2 = "csak";
  string s3 = "cukor";
 
  vector all_the_strings;
 
  all_the_strings.push_back(s1);
  all_the_strings.push_back(s2);
  all_the_strings.push_back(s3);
 
  sort(all_the_strings.begin(), all_the_strings.end(), localeLessThan);
  for (vector::const_iterator p= all_the_strings.begin();
       p != all_the_strings.end();
       ++p)
    cout << *p << endl;
 
}

Compiling and running the program shows the correct sorted order for the words:

[518][jpipes@serialcoder: /home/jpipes/repos/drizzle/test-hun]$ g++ test.cc 
[519][jpipes@serialcoder: /home/jpipes/repos/drizzle/test-hun]$ ./a.out 
comb
cukor
csak

I’m thinking that the refactoring work that still needs to be completed around CHARSET_INFO and MY_CHARSET_HANDLER should experiment with the technique above and verify any performance regression (or improvement) that may occur. Accuracy, in my opinion, and the ability to let a library not written by Drizzle developers do the heavy lifting, is more important than a small performance increase.

The Edwin Strikes Back

As he is sometimes prone to do, my dear colleague, Edwin DeSouza, shot me an email with a link he thought I might find interesting. He saw I had been chatting about character sets and had discovered a forum thread over on the Ruby Forum which went into some detail about the difficulties surrounding encoding conversion, localization, and internationalization.

Suffice it to say, “some detail” would be the understatement of the year in this case. The forum thread is longer than JÃ¶rmungandr, the mythical Norse sea serpent.

Notable voices on the thread include Matz, creator of Ruby and a core influencer in its direction, and Tim Bray, of our own Sun Microsystems and XML fame. The original poster, one Michael Selig, began the thread, entitled Character encodings – a radical suggestion, with an ostensibly simple suggestion:

Remove internal support for non-ASCII encodings completely, and when
reading/writing UTF-16 (and UTF-32) files automatically transcode
to/from UTF-8.

Unfortunately for Michael, this small suggestion was the online equivalent of stepping in a pile of elephant dung.

Until reading the above-mentioned forum thread, I really had no idea about the complexities involved in character set handling, especially in the Asian countries. If you are interested in character sets, collations, and Unicode vs. local encodings, reading through the forum thread will truly enlighten you as to the various arguments for and against UTF8. It’s highly recommended reading, but be warned, it may leave you gasping for breath at some points…enjoy.

Enabling and Fixing Drizzle Test Cases

jaypipes — Wed, 10 Sep 2008 18:56:00 +0000

When Brian began the work on refactoring the MySQL 6.0 Server source code into what has now become the Drizzle Project, a number of code pieces were removed, including some major MySQL functionality such as stored procedures, server-side prepared statements, SQL Mode, some legacy code, and a variety of data types. The goal, of course, was to reduce the server code base down to a more streamlined and eventually modular kernel.

Of course, that vision is great, but it’s got some side effects! One of those side effects is a dramatic reduction in the number of test cases that pass the test suite in their current form, and an increase in the number of tests that have been disabled. I re-enabled and fixed a few tests yesterday, but as of this writing, there are only 54 of 408 tests currently passing in the test suite.

This is to be expected. You can’t just go and strip a huge chunk of the parser and functionality out of the server and expect the original test suite to run without problems So, Brian disabled some tests while removing code sections anticipating that these tests would eventually be re-enabled and any regressions fixed. Well, we are now at that point. With Brian’s work this week to remove the last vestiges of non-UTF8 character set support, fixing and re-enabling disabled tests in the test suite is now a high priority project. Luckily, this is a project which almost anyone — even non-coders — can get involved in and make a difference.

This article will explain the process of running the Drizzle test suite and identifying test cases which can be re-enabled or should be fixed. We’ll focus on stuff that you can help with as a contributor who wants to start getting involved in Drizzle and making an impact without having C/C++ coding experience. If you haven’t caught my previous articles on using Launchpad.net for code management, I’d suggest reading those now. In addition, although we won’t be doing C/C++ coding, you’ll need a build environment established in order to properly run the test suite. So, I’d also suggest reading my article on setting up a C/C++ development environment for Drizzle.

The Test Suite Basics

NOTE: This section describes the Drizzle test suite. However, if you are contributing to the MySQL Server project, the instructions in this section are exactly the same if you are working with the MySQL Server. Just change dtr to mtr.

The Drizzle test suite is a composite of a main Perl script — tests/test-run.pl — and a couple other tools. After you have built Drizzle with the standard build process, you will see a program in the /tests source directory called dtr. This is the test suite runner. When you issue the command:

make test

This test runner is called with some command-line options and a list of tests to run. You can verify this behaviour by looking at /tests/Makefile.am and seeing the actual command for the test make target.

The general form for running a test case is the following:

cd tests ./dtr $testname1 $testname2 ... $testnameN

There are a number of command-line options that the test suite runner accepts, and I’ll cover a smattering of them in this article.

Running a Test

So, how do you know what the names of the tests are? Good question! In the /tests directory, you will find a /t directory which contains all the test cases contained in the main test suite. The main test suite are tests not specific to a functionality of the server like replication. For instance, the /t/select.test is the test case in the main test group corresponding to testing of the SELECT syntax and functionality. Other test cases for specific functional pieces can be found in /tests/suite. Running this t/select.test test case through the runner, we’d do this:

./dtr select

Note that you do not need to add the .test suffix. You should see results similar to the following:

[505][jpipes@serialcoder: /home/jpipes/repos/drizzle/trunk/tests]$ ./dtr select
Logging: ./dtr select

MySQL Version 7.0.0
Using dynamic switching of binlog format
Using MTR_BUILD_THREAD      = 0
Using MASTER_MYPORT         = 9306
Using MASTER_MYPORT1        = 9307
Using SLAVE_MYPORT          = 9308
Using SLAVE_MYPORT1         = 9309
Using SLAVE_MYPORT2         = 9310
Killing Possible Leftover Processes
Removing Stale Files
Creating Directories
Saving snapshot of installed databases
=======================================================

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

main.select                    [ pass ]           9673
-------------------------------------------------------
Stopping All Servers
All 1 tests were successful.
The servers were restarted 1 times
Spent 9.673 of 21 seconds executing testcases

As you can see, the test suite fires up a Drizzle server, loads the test file and performs the tests contained in the file. The tests in the file generally consist of SQL statements that are executed against one or more servers, but they can also be commands such as creating a new connection, logging output, and other things. For this article, we’ll be focusing on the SQL command tests. In a followup article, I may highlight some of the other test-case commands available to you.

Failing Test Cases

Well, it’s all fine and dandy if a test case succeeds like in the example above, but like I mentioned in the introduction of this article, we’re focused on the test cases that aren’t succeeding and getting these test cases to succeed! So, how do we find those tests which are failing? One method is to look at the Drizzle Build Farm and track down failures occurring in the test runs. Another way is to simply run a series of tests and see what fails. For simplicity’s sake, I’ve done a little research already and know a number of tests that are failing. So, we’ll go ahead and take a look at a test case that I know needs some TLC.

The test case I’ve chosen is the func_math test from the main test suite. It’s small and provides a good example of how we can work to fix up the failures. Here is what I get when running this test:

[505][jpipes@serialcoder: tests]$ ./dtr func_math
Logging: ./dtr func_math

MySQL Version 7.0.0
Using dynamic switching of binlog format
Using MTR_BUILD_THREAD      = 0
Using MASTER_MYPORT         = 9306
Using MASTER_MYPORT1        = 9307
Using SLAVE_MYPORT          = 9308
Using SLAVE_MYPORT1         = 9309
Using SLAVE_MYPORT2         = 9310
Killing Possible Leftover Processes
Removing Stale Files
Creating Directories
Saving snapshot of installed databases
=======================================================

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

main.func_math                 [ fail ]

drizzletest: At line 134: query 'create table t1 (a varchar(90), ts datetime not null, index (a)) engine=innodb 
default charset=utf8' failed: 1064: You have an error in your SQL syntax; check the manual that corresponds 
to your MySQL server version for the right syntax to use near 'charset=utf8' at line 1

The result from queries just before the failure was:
< snip >
656	405
122	405
645	405
INSERT INTO t1 VALUES (3);
SELECT CAST(RAND(2) * 1000 AS UNSIGNED), CAST(RAND(a) * 1000 AS UNSIGNED) 
FROM t1;
CAST(RAND(2) * 1000 AS UNSIGNED)	CAST(RAND(a) * 1000 AS UNSIGNED)
656	405
122	405
645	405
858	656
354	906
SELECT CAST(RAND(2) * 1000 AS UNSIGNED), CAST(RAND(a) * 1000 AS UNSIGNED) 
FROM t1 WHERE a = 1;
CAST(RAND(2) * 1000 AS UNSIGNED)	CAST(RAND(a) * 1000 AS UNSIGNED)
656	405
122	405
645	405
DROP TABLE t1;
create table t1 (a varchar(90), ts datetime not null, index (a)) engine=innodb default charset=utf8;

More results from queries before failure can be found in 
/home/jpipes/repos/drizzle/trunk/tests/var/log/func_math.log

Stopping All Servers
Restoring snapshot of databases
Resuming Tests

-------------------------------------------------------
Stopping All Servers
Failed 1/1 tests, 0.00% were successful.

The log files in var/log may give you some hint
of what went wrong.
If you want to report this error, please read first the documentation at
http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html
The servers were restarted 1 times
Spent 0.000 of 7 seconds executing testcases

mysql-test-run in default mode: *** Failing the test(s): main.func_math
mysql-test-run: *** ERROR: there were failing test cases

As you can see, the test fails and outputs the source of the failure.

Fixing a Broken Test

Now that we’ve identified a failing test, we need to follow a process in order to fix it. The process you should follow is this:

Make a change to the test case file
Re-run the test through dtr using the --record option
If any failure occurs, go back to #1
Once the test succeeds under the --record option, a test result file will be written to the /tests/r/ directory. We’ll need to bzr commit the changes to the test and the result file and push to a branch on Launchpad.
Edit tests/Makefile.am and ensure the newly-passing test is included in the make test target

In this case, the failure is due to a mere syntax issue. We’ve removed character set support and standardized entirely on UTF8, and so the support in the parser syntax for the phrase DEFAULT CHARSET=utf8 is gone. To fix this test, we need to remove the pieces of the old MySQL syntax which are no longer supported in Drizzle.

So, we pop open our favorite editor and open up the /tests/t/func_math.test file. Go ahead and remove all instances of default charset=utf8. And then re-run the test with the --record. You should see the following:

[508][jpipes@serialcoder: tests]$ ./dtr --record func_math

=======================================================

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

main.func_math                 [ fail ]

drizzletest: At line 160: query 'create table t1
(f1 varchar(32) not null,
f2 smallint(5) unsigned not null,
f3 int(10) unsigned not null default '0')
engine=myisam' failed: 1064: You have an error in your SQL syntax; check the manual 
that corresponds to your MySQL server version for the right syntax to use near '(5) unsigned not null,
f3 int(10) unsigned not null default '0')
engine=myisam' at line 2

Again, it looks like we’ve run into another syntax problem. Above, the test case contains the old ZEROFILL syntax, which allows you to specify a number in parentheses after an integer data type. This functionality, a legacy from Unireg times, is not supported in Drizzle. So, we must remove it. After removing the (XX) ZEROFILL syntax from the CREATE TABLE definitions in the test case file, I re-run the test:


=======================================================

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

main.func_math                 [ fail ]

drizzletest: At line 230: query 'CREATE TABLE t1(a SET('a','b','c'))' failed: 1064: 
You have an error in your SQL syntax; check the manual that corresponds to your 
MySQL server version for the right syntax to use near 'SET('a','b','c'))' at line 1

Once again, we’ve run into a failure. This time, it’s because of the SET data type. This data type has been removed from Drizzle. So, we must remove it from the test case here. After doing so, I re-run the test case, and finally we see a success:

[509][jpipes@serialcoder: tests]$ ./dtr --record func_math
Logging: ./dtr --record func_math

=======================================================

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

main.func_math                 [ pass ]            107
-------------------------------------------------------
Stopping All Servers
All 1 tests were successful.
The servers were restarted 1 times
Spent 0.107 of 8 seconds executing testcases

Cool. Looks good. Now we edit tests/Makefile.am and add the newly successful test to the make test target.

cd tests vim Makefile.am

Here is what the section in the Makefile.am looks like, with the bolded line being the line I add for our newly successful test:

test-drizzle:
	  $(PERL) -I$(top_srcdir)/tests/lib \
		$(top_srcdir)/tests/test-run.pl --fast --reorder --force \
        1st \
        alter_table \
        bench_count_distinct \
	bulk_replace \
	comment_column2 \
	comments \
	consistent_snapshot \
        count_distinct \
        count_distinct2 \
        count_distinct3 \
	create_select_tmp \
        ctype_filename \
        delete \
        distinct \
	drizzleslap \
        endspace \
        flush2 \
        func_equal \
        func_group_innodb \
        func_isnull \
        func_like \
	func_math \
        greedy_optimizer \
        group_min_max_innodb \
        heap_auto_increment \

Alright, cool. OK, now we simply need to verify our test case and result file changes, edit our make test target, and commit our changes. First, verification:

[511][jpipes@serialcoder: tests]$ bzr status
modified:
  tests/Makefile.am
  tests/r/func_math.result
  tests/t/func_math.test

Looks good. The final step is committing our work and then pushing to a code branch on Launchpad.net. Below, I am pushing to the branch lp:~drizzle-developers/drizzle/enable-tests, which is a team branch used to push code for the various test cleanups.

[514][jpipes@serialcoder: tests]$ bzr commit Makefile.am t/func_math.test \
> r/func_math.result -m "Fixed syntax errors in func_math test and re-enable \
> the test in the make test target"
Committing to: /home/jpipes/repos/drizzle/trunk/
modified tests/Makefile.am
modified tests/r/func_math.result
modified tests/t/func_math.test
Committed revision 405.                                                                                                                  
[515][jpipes@serialcoder: tests]$ bzr push lp:~drizzle-developers/drizzle/enable-tests
Pushed up to revision 405.

And that’s that! Test fixed, case, result and Makefile.am edited, and changes committed.

A Contributor’s Guide to Launchpad.net – Part 2 – Code Management

jaypipes — Thu, 28 Aug 2008 23:07:00 +0000

In this second part of my Launchpad guidebook series, I’ll be covering the code management and repository features of Launchpad.net. If you missed the first part of my series, go check it out and get established on Launchpad.net. Then pop back to this article to dive into the magic of http://code.launchpad.net. In this article, we’ll cover the following aspects of the code management pieces of Launchpad:

The Structure of Project Source Code on Launchpad.net
Pulling Code into a Local Repository
Creating a Local Working Branch for Bug Fixing
Pushing Code to Launchpad
Notifying a Merge Captain of Your Code Pushes
Keeping Trunk Up To Date
Merging Local Code with a Trunk Branch

For the following article, we’ll be acting as if you are contributing to the MySQL Server project and wish to create a patch to fix a bug in the MySQL server. We’ll be working through all the steps to do so. If you are looking to contribute to a different project, or your own project, simply replace the names and URLs in the article with ones for your particular project.

The Structure of Project Source Code on Launchpad.net

Projects hosted on Launchpad.net are organized using the terminology branches, repositories and series. A branch is simply a Bazaar branch of the project’s source code. A repository is a collection of a project’s Bazaar branches. A series is a named branch which represents something special for the project — usually a tagged release or a development branch.

Because Launchpad uses Bazaar, a distributed version control system, sometimes it takes a little while to get used to the fact that there is not a “central” source tree to which you check code into and out of. Instead, what you do is “pull” a named branch from a Bazaar server to your local workstation and work on that local branch in peace and quiet until you want to “push” your code into another branch. The branch to which you “push” may be the “trunk”, or “active development” branch, but normally you will not push to the trunk branch. Instead, you’ll push your local branch containing your source code changes to a personal branch on the Launchpad.net server and propose that your branch be “merged into trunk”. The image to the right, from the Bazaar user guide, show the general flow of code using this type of process, called a “decentralized system with a human gatekeeper”.

This pull, code, push and merge process is the recommended way to manage code changes for a project. It allows a core set of “merge captains” to review and check your code before merging your code into the active development branch. It is this process which I will be demonstrating in this article.

Pulling Code into a Local Repository

When you work on a project in Launchpad, you work on code in a local branch of the project. To get rolling, you will first want to set up a local repository if you haven’t already done so. To do so, we use the bzr init-repo command:

NOTE: You will need Bazaar installed to do so. Don’t have Bazaar installed? See my previous article on Getting a C/C++ Development Environment Established. Not coding in C/C++? Don’t worry, just read the section of that article on installing Bazaar.

cd ~/repos # Change this to the folder in which you plan to have bzr repositories... bzr init-repo mysql-server-5.1

At this point, a shared repository will be created. What the heck is a shared repository? Well, it’s basically a special folder that Bazaar knows contains information about source code branches. It facilitates speedier branching and merging and is especially useful for larger source trees and changeset histories like the MySQL server. You can verify that Bazaar knows something about your newly created repository by checking for a .bzr hidden folder in your repository folder:

[504][jpipes@serialcoder: /home/jpipes/repos]$ ls -la mysql-server-5.1/
total 12
drwxr-xr-x 3 jpipes jpipes 4096 2008-08-26 15:00 .
drwxr-xr-x 7 jpipes jpipes 4096 2008-08-26 15:00 ..
drwxr-xr-x 4 jpipes jpipes 4096 2008-08-26 15:00 .bzr

The next step is pulling the active development series of your particular project. In our example, we’ll pull the active development branch of the “5.1” series of the MySQL Server. To do so, we use the bzr branch command:

cd mysql-server-5.1/ bzr branch lp:mysql-server/5.1 trunk

WARNING: When doing the initial pull of the first branch in a shared repository, the branch command can take quite some time to execute, especially when pulling a branch of a project like the MySQL Server which has a huge history of changesets to it. Be prepared to wait a while, and if Bazaar looks like it’s stuck doing stuff, just leave it alone. It typically will take about 80-90 minutes to complete the first time!

In the bzr branch command above, lp: designates we are looking for a branch residing on the Launchpad.net Bazaar servers. The colon is followed by the name of the project, in this case mysql-server, followed by a slash and the name of the series, in this case 5.1. You can always check the names of a project’s series by going to the main code area of a project. For MySQL Server, that address would be http://code.launchpad.net/mysql-server

When finished the initial branch, you’ll see something like the following, shown with the time command to illustrate the amount of time you should expect for the MySQL Server initial branch:

[511][jpipes@serialcoder: /home/jpipes/repos/mysql-server-5.1]$ time bzr branch lp:mysql-server/5.1 trunk 
Server is too old for streaming pull, reconnecting.  (Upgrade the server to Bazaar 1.2 to avoid this)
Branched 2719 revision(s).

real	91m30.337s
user	14m6.825s
sys	6m7.355s

The total amount of “stuff” that is downloaded for the 5.1 server is around 600MB, so it shouldn’t be surprising that it takes some time to do the initial branch…

You can ignore the message about upgrading the server to Bazaar 1.2; it’s because I’m not using the very recent Bazaar 1.6 client. The performance of this first branch is currently being investigated by John Arbash Meinel, one of Bazaar’s developers, whom I spent some “quality time” with on IRC today. He mentions that the developers are working on something called “shallow trees” which should significantly speed up initial branching for large projects.

Creating a Local Working Branch for Bug Fixing

OK, if you’ve gotten this far, then you will have a local shared repository that contains a single branch which contains the source code and changeset history for the 5.1 series of the MySQL Server. What we want to do now is fix a bug in the MySQL Server 5.1 locally on our workstation. This will be a fictitious bug called Bug#99999 – “authors.h file doesn’t contain MY NAME!”

Certainly, we could just start hacking up the code in the trunk branch we just pulled. But, that’s not the most practical way of doing structured development on a local workstation. Instead, you should create a branch from trunk which will house only the changes specific to what you are working on: in this case, our fictitious bug#99999. Why is this a better practice than simply making the changes in the trunk branch? Well, a couple reasons:

It keeps the trunk branch free of any of your changes. This is very important if you want to keep trunk up-to-date with changes from the rest of the project’s developers and not have to constantly resolve merge conflicts (more on that later). Having a clean trunk branch local to your workstation means you can quickly and easily make local working branches from trunk for anything you are working on.

It creates an easy-to-understand naming structure for your local repository. After a while, you might have a repository with the following structure:

~/repos
  /mysql-server-5.1
    /trunk # perhaps a cron job keeps this in synch every morning...?
    /bugXXXX-fix-invalid-pointer-in-binlog
    /bugXXXX-wrong-error-message-on-create-table
    /my-fancy-storage-engine

OK, so hopefully I’ve convinced you to follow the advice of creating separate local working branches for actually changing source code. Let’s create a local working branch to do the fix for our Bug#99999. We use the bzr branch command, as before, but this time we’ll be branching from our local trunk, and not the Launchpad trunk branch for the 5.1 series. You will notice that the branching will be significantly faster than before. This is due both to doing things locally, and to the fact that you now have a shared repository set up. Enter the following:

bzr branch trunk bug99999-fix-authors-file

Once completed, you should see something like the following:

[571][jpipes@serialcoder: /home/jpipes/repos/mysql-server-5.1]$ bzr branch trunk bug99999-fix-authors-file
Branched 2719 revision(s).

We’re now ready to start fixing our bug. Hop into the newly created working branch and open up the sql/authors.h file in your editor of choice (here, I’ll use Vim)

cd bug99999-fix-authors-file

vim sql/authors.h

The “fix” for this bug is simply adding your name to the sql/authors.h file of course. Go ahead and add your name to the list in that file and save and close the file. Now, that has to be the easiest bug fix ever. No wait, I take that back.

Before we commit anything, let’s first check to see what changes we have made in the local branch. To do so, we use the bzr status command, like so:

bzr status

If you’ve done everything up until now, you should see something very similar to the below:

[503][jpipes@serialcoder: /home/jpipes/repos/mysql-server-5.1/bug99999-fix-authors-file]$ bzr status
modified:
  sql/authors.h

The next thing we’ll need to do is commit our changes to the local branch. For those of you used to CVS or Subversion, this step will look familiar, however remember that with Bazaar you are committing to the local branch, not a central repository. (This isn’t always the case, but for now, assume it is…)

Like in other source control systems, we’ll use the commit command. There are a number of command options that you can use with the commit command, and I will outline two of them here. The most important is the -m option, which allows you to enter a string which will be the comment for the set of changes in this commit. This is an extremely useful option for smaller changesets. For larger ones, leave off the -m option and your environment’s editor will pop up after hitting enter to allow you to enter in larger comments.

TIP: Remember, a best practice whenever you commit source code to a revision control system is to make the changeset comments as descriptive as possible, so other developers can clearly tell what you were doing.

The second option I’ll tell you about is a nifty one which integrates Bazaar with the Launchpad.net bug tracking system. If you have a Bug report that is managed by Launchpad.net, you can supply the bug number to the --fixes option and automatically close a bug report with your bzr commit. Pretty cool, eh?

Below, I show how to use these two options with the bzr commit command:

bzr commit -m "Add my name to authors.h" --fixes lp:99999

You should see something like the following appear:

Committing to: /home/jpipes/repos/mysql-server/bug99999-fix-authors-file/
modified sql/authors.h
Committed revision 2720.

Did you know that Bazaar supports more bug tracking systems than just Launchpad.net? Check out the Bazaar user guide section on Bug Tracking integration for more information.

Pushing Code to Launchpad

Now that you’ve made your code changes, it’s time to push those changes up to a branch on Launchpad.net. Why do we want to push the local branch changes to Launchpad.net, instead of sending the changes to another team member (using the bzr send or bzr export command)? Well, first of all, pushing the changes to Launchpad.net allows anyone to see and review your code changes, making the Launchpad.net website an easy and centralized place to do that. Secondly, having the branch on Launchpad.net allows you to get more out of the Launchpad.net platform and integrate your branch and code with other features of the platform, such as Bug Tracking and the Blueprints task management and milestone system.

To get your local branch to Launchpad.net, you use the bzr push command. The push command takes as an argument the address of the branch to which you wish to push your local changes. If you are pushing changes to a new branch on Launchpad.net, the system will create that new branch for you automatically. If you are pushing to an existing branch, the system simply uploads your changesets and applies them to that branch.

On Launchpad, there are a number of locations where we can push Bazaar branches. Each Launchpad user gets their own http://code.launchpad.net/~username area in which to put branches. Whenever you are a member of a project team, you also can push code into the code.launchpad.net/~username/projectname/ location. Also, each Launchpad user has a “Junk” area (code.launchpad.net/~username/+junk/ that they can post any old branch to. After either of these locations, you put the name of the branch you are pushing to (or creating).

To see an example, let’s use our “Junk” area for right now, and push our local bug99999-fix-authors-file branch to Launchpad.net. We want to push our local bug-fix branch to a general branch in our junk folder which will contain all of our bug fixing efforts. Why? Well, there’s no need to create separate branches for each bug fix on Launchpad.net just so people can see the code in the single bug fix. We can push all our code changes to a general branch and then point reviewers to the specific revision we worked on. This saves a whole lot of time when pushing branches. So, here is how we push:

bzr push lp:~jaypipes/+junk/mysql-server # Of course, replace jaypipes with your own username!

After a while, you’ll see the following:

Created new branch.

You can now go view your branch and it’s associated code changes by visiting the branch’s code URL, which will be http://code.launchpad.net/~yourusername/+junk/mysql-server if you’ve been following the steps in this article.

Belonging to a Project Team

Having a branch in your junk folder is fine, but Launchpad is all about belonging to a community of developers! When you belong to a project team, you are automatically able to push your branches to the project’s code area. Importantly, if you belong to a project team, then you can use Launchpad’s ability to propose a branch merge — something you cannot do if you push to your “Junk” folder. To push one of your local branches to a project, you would do:

cd ~/repos/projectname/branchname>

bzr push lp:~yourusername/projectname/branchname

Once pushed, the branch will be visible to anyone when they look at the project’s code branch listing, which is always at http://code.launchpad.net/projectname. If you click on your branch, you will go to the branch’s main page. You’ll notice links for a number of actions that you can take, including one called “Propose for merging into another branch”. I’ll be talking a lot about these options in future articles in this series, since many of them relate to the other parts of the Launchpad platform.

Notifying a Merge Captain of Your Code Pushes

OK, now that your code is up in a branch, the next step is to ask for a review of your code to be merged into the development branch of the project. In this case, I’m pretty sure our Bug#99999 fix isn’t going be passing any code reviews to get into the MySQL Server, but I’ll explain the process anyway for reference.

A merge captain is someone who chaperons a project’s main development branch by being a gatekeeper for new code being merged into it. This is a critical role that some folks think is a fun or coveted job. It’s not. As a merge captain for the Drizzle project, I can tell you that in fact it’s mainly grunt work comprising tedious pulling, merging, building the code, and running of test suites. So, be nice to your merge captains! The best way to be nice to your merge captain is to follow these two simple rules:

Always make comments for your commits that are descriptive and explain clearly what your code is doing and what tasks or bugs the work addresses
Follow a standard, agreed upon process for notifying a merge captain of merge requests

In the case of the second rule, the standard process uses the Launchpad.net platform to notify the merge captain of a request — instead of, for instance, bugging the crap out of the captain on IRC to merge your branch.

To notify the merge captain about your branch, navigate to the main code page of the branch you pushed. Remember that to do this step, you must be a member of the project team! On the branch’s main page, look for the link “Propose merging into another branch” and click it. You will be taken to the Propose Branch for Merging page. Select the target branch. The default will be the active development branch for the project in question. You may also provide a branch name. In the “whiteboard” text area, you can provide a brief description of the changes contained in your branch. Typically, you will want to keep the option “Needs Review” selected, and then click the Register button.

At this point, an email will be fired off to any subscribers for the project’s trunk branch commits. The merge captain will, of course, be one of the ones which receives this email and will initiate the review of your code and the merging of it into trunk (or whichever target you specified).

Keeping Trunk Up to Date

The final two parts of this article looks at how to keep your own local branches up to date with the development branch of your project and also shows you how to merge someone else’s branch with your own.

To pull all the latest changes from a remote branch into a local branch, the bzr pull command is used. You navigate to the local branch you wish to pull changes for and then issue the pull command.

This will bring in all the merges and changesets of the branch you originally branched from into the local branch. In our case, the local “trunk” branch was branched originally from the branch at lp:mysql-server/5.1 and so if we do:

cd ~/repos/mysql-server-5.1/trunk

bzr pull

We will update our local trunk branch with the changes in the active development branch on Launchpad.

Merging Local Code with a Trunk Branch

A merge, as alluded to above, is simply when you want to combine the code changes of one branch with the code in another. We use the bzr merge command to do so. The merge command takes a single argument: the location of the branch you wish to merge into the current one.

To demonstrate how the merge command is used, I’ll just paste the work I just did for the Drizzle project in merging in a contributor’s fixes into trunk:

[509][jpipes@serialcoder: /home/jpipes/repos/drizzle]$ cd trunk/
[510][jpipes@serialcoder: /home/jpipes/repos/drizzle/trunk]$ bzr merge ../grant-bug261687/
 M  drizzled/sql_derived.cc                                            
All changes applied successfully.
[511][jpipes@serialcoder: /home/jpipes/repos/drizzle/trunk]$ bzr commit -m \
"Merged Grant's fixes for sql_derived.  Fixes bug#261687" --fixes lp:261687
Committing to: /home/jpipes/repos/drizzle/trunk/
modified drizzled/sql_derived.cc
Committed revision 373.

The above was executed after a pulled Grant’s Launchpad.net branch called bug261687, built the code, and ran the test suite to ensure no failures. I change to my local branch of the drizzle project development branch (called “trunk” locally). I then merge Grant’s branch into trunk locally with the bzr merge ../grant-bug261687 command. After merging, I must commit the trunk code. I do so, making a comment that this is a merge of Grant’s work, and noting the bug # which the changeset fixes. Once I do this, I am free to bzr push lp:drizzle and push the local merge to Launchpad…

Conclusion

As you can see, Launchpad and Bazaar is a feature-full code management system with a lot of bells and whistles. Hopefully, this article can get you started in your adventures in contributing to projects hosted on Bazaar/Launchpad.net. See you on Launchpad! Up next in this series: how to do task management with Launchpad’s Blueprints system.

A Contributor’s Guide to Launchpad.net – Part 1 – Getting Started

jaypipes — Fri, 22 Aug 2008 13:41:14 +0000

This post is the first in a series of articles which serves to highlight the services of the Launchpad platform which hosts the MySQL Server, MySQL Forge, MySQL Sandbox and Drizzle Server projects. I will be walking you through the various pieces of the platform and provide examples of using each of the services. I will cover in depth the source code management services which all three projects now rely upon. The code management services are the critical piece of the development platform. In addition, I will show you how to use the Blueprints, Bugs, Answers and Translations services that many MySQL ecosystem projects, including the MySQL Sandbox and MySQL Forge, use.

In this first article, I will walk through the critical first step of establishing a Launchpad.net account and setting up the OpenSSH and OpenPGP keys for your account. In follow-up posts, I cover the code management system, blueprints system, and more.

Creating Your Account on Launchpad.net

The first thing to do is obviously create your account on Launchpad.net. Doing so is trivially easy. Go to the registration page and enter in your email address. Launchpad will then email (subject line: “Finish your Launchpad registration”) that address with a link to start the registration process. Click on the link in that email. You’ll be asked to provide a Display Name and a password. You can choose to have your email address hidden from other users or not. After filling in the information, click the Continue button and you’ll end up in your account profile area.

Once in the profile area, go ahead and fill in any of the information you want to be public about yourself by clicking the “Change Details” link in the right of the page. You can upload an avatar image for yourself and fill in a little “Bio” section. In a sub-navigation area at the top of the page, you will see links to edit your Email Settings, SSH Keys, GPG Keys, and Passwords. We’ll cover the SSH and GPG Keys in a bit. For now, explore this area and set your preferences the way you like.

Also, at the very bottom of the Change Details page, you also have some other links to edit your IRC nicks and other stuff.

Your OpenSSH Keys

One important thing to do when setting up your account is to upload your public SSH key to Launchpad.net. This helps the code management system by facilitating the bzr+ssh protocol and allowing you to push a bazaar branch to the Launchpad.net supermirror (more on that later). If you have already generated your public key, go ahead and skip the next subsection and proceed to “Upload your public key“.

To generate your SSH key, you will need to have OpenSSH installed. For Ubuntu users, simply do sudo apt-get install openssh. For other Linux users, use your package manager of choice. Windows users can use the PuTTY key generator, and should follow instructions on the excellent Launchpad.net Help wiki. Once you have OpenSSH installed, it’s time to generate your key. Do so with the following in a terminal:

ssh-keygen

When prompted, to accept the default key file names (~/.ssh/id_rsa.pub and ~/.ssh/id_rsa, and then a password for protecting the key file.

Upload your public key

You can output your public key using the following:

cat ~/.ssh/id_rsa.pub

Go ahead and copy the public key as-is; you’ll need it soon.

Now that you’ve generated your SSH keys, go ahead and click on the “SSH Keys” in the sub-navigation bar in your profile. Simply copy your public key into the text area marked “Add an SSH Key” and then click “Import Public Key”. OK, all set, let’s tackle the GPG Keys next.

GPG Keys (Optional, but Recommended)

Before we generate a GPG key and upload one to Launchpad.net, you will first want to ensure that you have a mail reader capable of decrypting PGP-encrypted emails. Personally, I use Thunderbird and the excellent Enigmail plugin for this, but you will want to use your own preferred MUA. Use this help article for assistance in setting up PGP for your mail client of choice.

OK, next up we’ll go ahead generate GnuPG keys for use with email security and the Launchpad.net mailing lists. To generate a GnuPG key pair, issue the following in a terminal:

gpg --gen-key

This will start a series of questions for you to answer, including which email address the key is for, your name, a passphrase and how many bits to make your key file (I chose 2048). Here is what the series will look like in a terminal:

[510][jpipes@serialcoder: /home/jpipes/.gnupg]$ gpg --gen-key
gpg (GnuPG) 1.4.6; Copyright (C) 2006 Free Software Foundation, Inc.
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under certain conditions. See the file COPYING for details.

Please select what kind of key you want:
   (1) DSA and Elgamal (default)
   (2) DSA (sign only)
   (5) RSA (sign only)
Your selection? 1
DSA keypair will have 1024 bits.
ELG-E keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048) 
Requested keysize is 2048 bits
Please specify how long the key should be valid.
         0 = key does not expire
        = key expires in n days
      w = key expires in n weeks
      m = key expires in n months
      y = key expires in n years
Key is valid for? (0) 
Key does not expire at all
Is this correct? (y/N) y

You need a user ID to identify your key; the software constructs the user ID
from the Real Name, Comment and Email Address in this form:
    "Heinrich Heine (Der Dichter) "

Real name: Jay Pipes
Email address: xxxxxxxxx
Comment: 
You selected this USER-ID:
    "Jay Pipes "

Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O
You need a Passphrase to protect your secret key.

We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
...
gpg: key 9C5804A8 marked as ultimately trusted
public and secret key created and signed.

gpg: checking the trustdb
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
pub   1024D/9C5804A8 2008-08-22
      Key fingerprint = 16C5 50D4 7061 03A1 48CD  3826 0CAD 7BD9 9C58 04A8
uid                  Jay Pipes 
sub   2048g/F7F4A925 2008-08-22

OK, once generated, verify that everything worked as expected using the following:

gpg --list-keys

which should produce output similar to the following:

[512][jpipes@serialcoder: /home/jpipes/.gnupg]$ gpg --list-keys
/home/jpipes/.gnupg/pubring.gpg
-------------------------------
pub   1024D/9C5804A8 2008-08-22
uid                  Jay Pipes 
sub   2048g/F7F4A925 2008-08-22

The next step is to push your public key to the Ubuntu key server — technically, you could push it to any old key server, but to keep things simple, just use the ubuntu key server. You’ll need to take note of your public key ID, which is the 8-character hex number following 1024D in the list-keys output. In my case, that public key ID is “9C5804A8”. Issue the following command, substituting your public key ID:

gpg --send-keys --keyserver keyserver.ubuntu.com @Your_Public_Key

The final step is to make Launchpad.net aware of your new GPG key. To do so, you need to send your GPG fingerprint to Launchpad. To grab your fingerprint, issue the following command:

gpg --fingerprint

which should produce something very similar to:

[514][jpipes@serialcoder: /home/jpipes/.gnupg]$ gpg --fingerprint
/home/jpipes/.gnupg/pubring.gpg
-------------------------------
pub   1024D/9C5804A8 2008-08-22
      Key fingerprint = 16C5 50D4 7061 03A1 48CD  3826 0CAD 7BD9 9C58 04A8
uid                  Jay Pipes 
sub   2048g/F7F4A925 2008-08-22

Copy the key fingerprint and paste it into the text area in your Open GPG keys page in your profile, then click the “Import Key” button. Launchpad.net will email the address of the key with a confirmation message. Click the link in the email under “Please go here to finish adding the key to your Launchpad account:”. And you are all done.

Now you have all the pieces set up to begin working with Launchpad.net effectively. In the next few posts in this series, I’ll walk you through how to best use the platform to be a productive contributor to the MySQL ecosystem.

Getting a Working C/C++ Development Environment for Developing Drizzle

jaypipes — Thu, 21 Aug 2008 20:34:00 +0000

This article explains how to set up a properly functioning C/C++ development environment on Linux. The article is aimed at developers interested in contributing to the Drizzle server project, but the vast majority of the content applies equally well to developers wishing to contribute to the MySQL server or any other open source project written in C/C++

IMPORTANT: This article doesn’t get into any religious battles over IDEs or particular editors. IDEs and editors are what you use to edit code. What this article covers is the surrounding libraries, toolchain, and dependencies needed to get into the development or contirbution process. That said, go Vim.

The examples shown use the Debian/Ubuntu methods of obtaining code packages and installing them â€” specifically, using apt-get install to install packages. If you are running on a RPM-based distro, simply change the commands to use your package manager of preference, for instance yum install. Solaris users should get as far as they can in the installation/setup process and hop over to the Freenode #drizzle channel for help from one of Solaris experts. The Drizzle wiki has more information on specific packages needed for different distributions.

OK, let’s get started, shall we?

Installing Bazaar and Some Bzr Goodies

So, what is this Bazaar thing and why do you need it? Good question! Bazaar is the revision control system that the Drizzle development team (as well as the MySQL engineering team) uses for source code control. The good folks over at Canonical maintain and enhance the excellent Bazaar-NG system and have an online platform called Launchpad.net which is tightly integrated with Bazaar. Launchpad.net is kind of like SourceForge.net, only focused around Bazaar as the revision control system, and includes a number of nifty little features that make it easier to manage and maintain teams of developers working on the code base. The Drizzle Server project is hosted on Launchpad.net at http://launchpad.net/drizzle.

To install Bazaar, issue the following:

sudo apt-get install bzr

Some Optional Goodies

Once installed, you might want to install a few more things that will make your bzr life easier. The bzrtools package is a collection of command-line and graphical utilities for bzr. meld is a graphical merge conflict resolution utility that I have found invaluable at times. PoEdit is an easy way to work with the GetText translation utilities. To install these tools, do:

sudo apt-get install bzrtools meld poedit

OK, that’s it for Bazaar for now. Let’s move on to getting your development toolchain installed.

The Required Toolchain Packages and Library Dependencies

In order to compile the Drizzle server, you will need a working GNU Toolchain with the C++ development tools.

The Easy Way

The best and most consistent way to get these build dependencies if you are working on Debian/Ubuntu, is to add the Drizzle Developers Personal Package Archive to your apt path and install the Drizzle dependencies.

This way to do this via the terminal:

# For Ubuntu >= 9.10 (Karmic):
sudo add-apt-repository ppa:drizzle-developers/ppa
sudo apt-get update
sudo apt-get install drizzle-dev

For versions of Ubuntu prior to 9.10, simply head over to the Drizzle Developers PPA and follow the instructions on that page.

The Hard Way

The following packages are required for building Drizzle.

binutils (includes ld and as)
gcc/g++ (4.2+)
autoconf (2.59+)
automake (1.10+)
libtool (1.5.24+)
m4 (1.4.8+)
bison/yacc (2.3)
pkg-config (0.22+)

To install the above toolchain, do the following:

sudo apt-get install libc-dev gcc g++ bison binutils automake autoconf m4 pkg-config libtool

NOTE: Thanks to Tom Hanlon…older versions of Ubuntu may need to specify libc6-dev, not libc-dev!

Once apt-get finishes installing the above, you’ll have a system capable of compiling C/C++ programs, for the most part. The Drizzle server needs some additional libraries and header files in order to compile. I list them here along with a brief description of the library or file.

libpcre3 – A standard PCRE regular expression library
libpam0g – A pluggable authentication modules library (a horrible package name, no?…)
libncurses – A library for displays of terminal applications (used by the drizzle client)
libprotobuf – Google Proto Buffers library, used by the server in message communication
gettext – i10n and l10n services
libevent – socket event handling
libz – compression
libreadline – Command-line editing utilities
uuid-dev – UUID headers
libboost-dev – BOOST C++ headers and library
libboost-program-options-dev – BOOST program_options library headers
libdrizzle-dev – libdrizzle headers
gperf – GNU hash function generator
libgcrypt11-dev – GNU Crypto library

The following command should install the required libraries with the exception of Google Proto Buffers, which is described in the following section.

sudo apt-get install libpcre3-dev libpam0g libncurses5-dev libpam0g-dev gettext libevent-dev libz-dev libreadline-dev uuid-dev  libgcrypt11-dev libboost-dev libboost-program-options-dev libdrizzle-dev gperf

Installing Google Proto Buffers

After installing the libraries and toolchain, you’ll need to install the Google Proto Buffers library. Again, the easiest way to do so is via the Drizzle Developers PPA, which you should have added to your repositories above. If you did, simply do:

sudo apt-get install libprotobuf-dev protobuf-compiler

Setting Up a Local Bazaar Repository for Drizzle

Now that you’ve installed all the required toolchain and dependencies, it’s time to use Bazaar to pull the development branch of Drizzle and compile the Drizzle server. The first step to do is to set up that local bzr repository. Myself, I have all my bzr repositories in a directory called ~/repos, and that is what the below examples show, but you are of course welcome to put your repos wherever you prefer. To set up a directory and a drizzle repo under your home directory, do the following:

cd ~
mkdir repos
cd repos
bzr init-repo drizzle
cd drizzle

At this point, you have a local bzr repository. Let’s now create a local branch of the development source code trunk that we can play with. To do so, we use the bzr branch command, like so:

bzr branch lp:drizzle trunk

This tells bzr to go grab the main development branch of the “drizzle” project that resides on the Launchpad.net servers (thus, the lp: prefix), and create a local branch called “trunk”. The branch operation may take a little while to complete when you do it for the first time. Subsequent branch and merge operations are much, much quicker than the first branch into a repository. When the branch succeeds, go ahead and look at the files that have been downloaded into your “trunk” branch:

cd trunk
ls -la

You should see something like the following:

jpipes@serialcoder:~/repos/drizzle/trunk$ ls -la
total 264
drwxr-xr-x 13 jpipes jpipes  4096 2010-02-25 12:39 .
drwxr-xr-x 16 jpipes jpipes  4096 2010-03-10 19:44 ..
-rw-r--r--  1 jpipes jpipes 76502 2010-02-24 23:30 ABOUT-NLS
-rw-r--r--  1 jpipes jpipes   377 2010-02-24 23:30 AUTHORS
drwxr-xr-x  5 jpipes jpipes  4096 2010-02-24 23:30 .bzr
-rw-r--r--  1 jpipes jpipes  5983 2010-02-25 12:39 .bzrignore
drwxr-xr-x  2 jpipes jpipes  4096 2010-03-10 15:27 client
drwxr-xr-x  2 jpipes jpipes  4096 2010-02-24 23:30 config
-rw-r--r--  1 jpipes jpipes  5350 2010-02-24 23:30 configure.ac
-rw-r--r--  1 jpipes jpipes 19071 2010-02-24 23:30 COPYING
-rw-r--r--  1 jpipes jpipes 56574 2010-02-24 23:30 Doxyfile
drwxr-xr-x 16 jpipes jpipes 12288 2010-03-10 15:27 drizzled
-rw-r--r--  1 jpipes jpipes  5962 2010-02-24 23:30 DRIZZLE.FAQ
-rw-r--r--  1 jpipes jpipes  5139 2010-02-24 23:30 EXCEPTIONS-CLIENT
drwxr-xr-x  2 jpipes jpipes  4096 2010-02-24 23:30 extra
drwxr-xr-x  2 jpipes jpipes  4096 2010-02-24 23:30 gnulib
drwxr-xr-x  2 jpipes jpipes  4096 2010-02-24 23:30 m4
-rw-r--r--  1 jpipes jpipes  4545 2010-02-24 23:30 Makefile.am
-rw-r--r--  1 jpipes jpipes    41 2010-02-24 23:30 NEWS
drwxr-xr-x 55 jpipes jpipes  4096 2010-03-05 13:11 plugin
drwxr-xr-x  2 jpipes jpipes  4096 2010-02-24 23:30 po
-rw-r--r--  1 jpipes jpipes   821 2010-02-24 23:30 README
drwxr-xr-x  3 jpipes jpipes  4096 2010-02-24 23:30 support-files
drwxr-xr-x  8 jpipes jpipes  4096 2010-03-10 15:27 tests

Compiling Drizzle

OK, you are now ready to compile the server and client tools contained in your branch. The way to do so is the following:

./config/autorun.sh
./configure --with-debug
make
make test

If all goes well, drizzle will compile and build, get installed, and output the (hopefully!) passing test results. The output at the end should be similar to the following:

...
collation_dictionary.slap      [ pass ]             54
blackhole.blackhole            [ pass ]             13
archive.archive                [ pass ]            509
archive.archive_aio_posix      [ pass ]            487
archive.archive_basic          [ pass ]              2
archive.archive_discover       [ pass ]              2
-------------------------------------------------------
Stopping All Servers
All 290 tests were successful.
The servers were restarted 36 times
Spent 206.430 of 290 seconds executing testcases

And that’s it.

OK, so that’s it for this first article. I hope you’ve found it helpful in getting a development environment set up so that you can feel comfortable contributing to the Drizzle project. Join me on Freenode’s #drizzle IRC channel to get some help. We’re always available.

MySQL Connection Management in PHP – How (Not) To Do Things

jaypipes — Mon, 07 Aug 2006 19:31:00 +0000

I’ll warn you right now, this is going to be a long article. More than likely, I’ll put a version of this up on the MySQL developer zone and PHP zone. This article is intended to highlight various basic topics concerning proper methods of handling connections to MySQL databases in PHP, guidelines for caching dynamic content, and a technique called “lazy loading”. Hopefully by the end of the article you’ll have learned how to combat a very widespread and potentially devastating scalability problem seen in an enormous number of PHP web applications.

An introduction to the problem

Before I start the discussion on connecting to MySQL servers via PHP, it’s worth pointing out that the relative cost of connecting to a MySQL database, versus connecting to a PostgreSQL or Oracle installation, is very, very low. However, the fact that connecting to a MySQL resource is inexpensive does not mean that connection resources can be abused. Anyone who has ever seen the dreaded “Too many connections” error which occurs when MySQL attempts to service more connections than the number of concurrent connections afforded by the max_connections configuration variable, knows what I am talking about.

Connecting to a MySQL resource, while inexpensive, requires PHP to issue a call to the mysql client API via mysql_connect(). Passed as variables to this function are the connection arguments for the database host, user, password, etc. Once the MySQL database server has received the connection information, it will either return a success value, or a failure. Upon failure, the mysql_error() function can be used to determine what went wrong during the connection attempt (typically user credential problems or the max connections issue).

So, where, you ask, is the problem? Well, the issue that I commonly see is that connections are made to MySQL resources when they do not need to be made. But, you say, almost all web applications serve dynamic content, so therefore doesn’t dynamic content virtually require a connection to a database be made?

Well, not really in many, many cases

Let’s take as an example a very, very popular PHP web application installed on hundreds of thousands of servers worldwide: the blogging application WordPress. Now, before I go any further, I want to say that the reader should not think that I am attacking WordPress in this article, or deliberately trying to point out shortcomings in their software. By contrast, I picked WordPress to demonstrate that the problem described in this article is widespread among PHP web applications. At the very end of the article, I’ll present a patch to the current WordPress source code which fixes the issue identified in this article. I will post the patch to the wp-testers mailing list after I complete the article. Promise.

Now, a blogging application is indeed a data-driven web application. Typical blog software involves the posting of articles, the management of comments, the display of such articles and comments, and, of course, pages which serve to provide the various RSS and Atom feeds for the blog. So, one might argue that blogging software is highly dynamic, and therefore would necessarily issue calls to connect to the database upon every visit to the blogsite.

This, however, is not particularly true. Even on extremely busy blogs, content doesn’t change on a continual basis. This point is even more relevant when you consider that after a certain number of days after an article is posted, content becomes almost entirely static. Keep this point in mind as you follow through the next sections, which walk through the page invocation process which WordPress executes upon every hit to a blog page (including feed pages).

Investigating WordPress page invocation

On all PHP pages in the WordPress main directory — which include index.php, wp-atom.php, wp-rss, etc. — each file begins with the following include:

require(./wp-blog-header.php);

The wp-blog-header.php page does a couple things but mostly serves to include the following:

require_once( dirname(__FILE__) . '/wp-config.php');

OK, so we’re heading over to wp-config.php… and we find some good stuff, such as the defines for the database connection parameters, and:

define('ABSPATH', dirname(__FILE__).'/');
require_once(ABSPATH.'wp-settings.php');

OK, so a quick :find wp-settings.php later, we open up the first meaty file of our page invocation. The first 72 code lines of wp-settings.php do some housekeeping stuff, like checking if the PHP version is adequate and if the MySQL extension is installed (tangent: is it really necessary to do this on every web page invocation?!). After that, we see the following include:

define('WPINC', 'wp-includes');
require_once (ABSPATH . WPINC . '/wp-db.php');

Into the heart of the beast

OK, so thus far there’s been nothing spectacular or extraordinary about the code. Just a few includes to make file maintenance orderly, but nothing unusual. However, the wp-includes/wp-db.php file contains perhaps the most common PHP/MySQL gotcha seen in today’s web applications.

The file starts out with some defines and then the class definition of the WordPress database abstraction object, called wpdb. The wpdb class contains the very typical methods commonly seen in a DB abstraction layer: get_col(), get_row(), get_results(), etc, which allow a query string to be passed in and executed against a MySQL database. However, there is one major problem in the design, which manifests itself in the last line of the file:

$wpdb = new wpdb(DB_USER, DB_PASSWORD, DB_NAME, DB_HOST);

Well, what’s so wrong with that, you say? Well, what does the new operator do? It creates an object of a class type specified and, during the creation of the object, calls the class constructor, which is the class method with a name identical to the class name — in this case, the method called wpdb, shown here:

	// ==================================================================
	//	DB Constructor - connects to the server and selects a database

	function wpdb($dbuser, $dbpassword, $dbname, $dbhost) {
		$this->dbh = @mysql_connect($dbhost, $dbuser, $dbpassword);
		if (!$this->dbh) {
			$this->bail("... content snipped ... ");
		}

		$this->select($dbname);
	}

Can anyone tell what is happening on every page invocation to a WordPress blog? Yup. A connection is being made to the underlying MySQL database. On a heavily hit blog site, this code can easily lead to the dreaded too many connections error because a connection to the database is being made even for mostly static content. There are a couple ways to combat this problem: Lazy loading and Content Caching. These two techniques can be used together to eliminate a huge portion of the database queries and connections in typical web applications.

Lazy Loading

The code in the wpdb class isn’t fundamentally wrong. It just needs some tweaking to ensure that a connection to the database is only made if a query is executed against the database. A technique called lazy loading essentially delays the connection to the database until the last minute, instead of upon creation of the database abstraction object.

The MySQL Forge database abstraction layer uses lazy loading in just this way. The name of the class is SqlConnection, and it has an empty constructor. Instead of connection logic embedded in the constructor, the object has a Connect() method, which looks like the following. The code has been modified only to remove the logic which automatically handles master/slave replication switching:

    /**
     * Attempt to connect the resource based on supplied parameters. 
     * 
     * @return  boolean 
     * @access  public
     *  
     * @param   string  (optional) Host name (Server name)
     * @param   string  (optional) User Name
     * @param   string  (optional) User Password
     * @param   string  (optional) Database Name
     */  
    function Connect() {
        if (func_num_args() == 4) {
            
            // A different database has been requested other than the 
            // standard global config settings
            $host = func_get_arg(1);
            $user = func_get_arg(2);
            $pass = func_get_arg(3);
            $dbname = func_get_arg(4);
            
        }
        else {
            $host = SQL_HOST;
            $user = SQL_USER;
            $pass = SQL_PASS;
            $dbname = SQL_DB_NAME;
        }

        /**
         * Short circuit out when already
         * connected.  To reconnect, pass
         * args again
         */
        if (is_resource($this->_Cnn) && func_num_args() != 4) {return true;}
            
        if (! $this->_Cnn = mysql_connect($host, $user, $pass)) {
            trigger_error(get_class($this) . 
                          "::Connect() Could not connect to server: " . 
                          mysql_error(), E_USER_ERROR);
            return false;
        }
        else {
            if (! mysql_select_db($dbname, $this->_Cnn)) {
                trigger_error(get_class($this) . 
                              "::Connect() Could not connect to specified database on server: " . 
                              mysql_error(), E_USER_ERROR);
                return false;
            }
            else {
                return true;
            }            
        }
    }

The _Cnn member variable is a reference to a MySQL database resource that is returned upon a successful call to mysql_connect(). Notice that the Connect() method has logic which ensures that if an existing connection has already been made during page execution, then the Connect() method simply returns true.

You may be surprised to find out that, just like WordPress, the MySQL Forge software creates a database abstraction object upon each call to the MySQL Forge website. The following code is included in all page invocations:

/**
 * Fine to establish a global connection
 * here, since connect doesn't occur until
 * SQL execution.
 */
require_once(DIR_LIB . 'class/SqlConnection.php');
$GLOBALS['Db'] =& new SqlConnection();

The difference is that a connection to the database is not made in the SqlConnection class constructor, so having the object instantiated doesn’t consume database resources, unlike the wpdb class. So, if the constructor doesn’t call Connect(), then when exactly will mysql_connect() be called? Here, we see the Execute() method of SqlConnection:

    /**
     * Executes the supplied SQL statement and returns
     * the result of the call.
     * 
     * @return  bool   
     * @access  public
     *  
     * @param   string  SQL to execute
     */  
    function Execute( $Sql ) {
        
        /* Auto-connect to database */
        if (! $this->_Cnn) {
            $this->Connect();
        }
        
        if (!$this->_Res = mysql_query($Sql, $this->_Cnn)) {
            trigger_error(get_class($this) . 
                          "::Execute() Could not execute: " . 
                          mysql_error() . 
                          " (SQL: " . $Sql . ")", E_USER_ERROR);
            return false;
        }
        else {
            return true;
        }
    }

In the Execute() method, you can see that if the _Cnn member variable is not set (meaning, a previous connection has not been made to the database), then the SqlConnection connects, otherwise, it simply executes the supplied string against that connection via the mysql_query() function and stores the returned result resource in the _Res member variable for use by other methods in the class.

Other methods of SqlConnection simply wrap the Execute() call and provide result sets in various forms. What this means is that on page evocations to MySQL Forge, unless dynamic data is actually needed, no connections to the database are actually created. Which leads us nicely to the other technique for handling semi-dynamic content web requests: Content Caching.

Content Caching

Caching is perhaps the most fundamental concept discussed in the field of computer sciences when it comes to performance of both hardware and software. A cache, simply defined, is a storage area for data that has been parsed, retrieved, calculated, or otherwise generated in an expensive operation. The cache functions to alleviate the need for various resources to regenerate the cached data upon every request for the data.

Caches exist everywhere in both hardware and software. For instance, on a hardware level, modern CPUs usually have at least two levels of hardware caches (usually called the L1 and L2 caches). These CPU-connected fast-access caches exist so that the CPU does not need to call a kernel-level RAM memory page access call, which is a relatively expensive operation since the speed of access to a RAM page is much slower than the access speed to the locally connected Lx caches. When speaking about caches, it’s important to recognize that everything is relative to something else. Accessing a hard disk is much more expensive than accessing a page of RAM, which is much more expensive that accessing a line of bytes stored in the L1 cache. Likewise, in application-specific caches (which we’ll be talking about next), the relative cost of accessing cached data is lower than retrieving the same information from the MySQL database.

So, let’s talk a bit about basic content caching for a PHP web application. Although these examples use PHP, the discussion of application caching applies to all languages. Every web scripting language provides similar functionality to implement caching.

Application content caching occurs when a standard call to the database is replaced with a call to an application content cache. In these examples, we’ll implement a simple file-based cache; other solutions are, of course, available, including using memcached or a static content web server proxy to serve web content. WordPress actually implements its own caching mechanism, called ObjectCache. You can take a look at the implementation in the wp-includes/cache.php. However, this implementation has a couple design limitations that make it unsuitable for a discussion on general caching. It uses a tight coupling with other WordPress functions and objects, which makes the caching mechanism unfriendly for general re-use.

A Simple File Cache Engine

Before we get into the implementation of the CacheEngine class that MySQL Forge uses, let’s first take a look at some code from the /lib/class/ProjectMemberFinder.php class that handles requests to retrieve information about the members involved in a project listed in the MySQL Forge project directory:

    /**
     * Return project members based on project ID value
     * 
     * @return  array
     * @param   int     project ID
     */
    function &GetByProjectId($Project) {
    
        /**
         * ProjectMembers don't change that often, 
         * so cache the output of these calls.
         */
        $cache_id = 'project_members-' . $Project;
        
        if ($cache = $GLOBALS['CEngine']->GetFromCache($cache_id, $Seconds=0, $IsObject=true)) {
            return $cache;
        }
        
        $sql = "SELECT 
                    pm.project
                  , pm.member
                  , fu.display_name
                  , pmr.description as role
                  , pm.can_write
                  , pm.can_read
                  , pm.joined_on
                  , pm.last_source_login
                  , pm.last_source_commit
              FROM " . $GLOBALS['SqlTables']['ProjectMember'] . " pm 
                  INNER JOIN " . $GLOBALS['SqlTables']['ForgeUser'] . " fu
                      ON pm.member = fu.user_id 
                  INNER JOIN " . $GLOBALS['SqlTables']['ProjectMemberRole'] . " pmr
                       ON pm.role = pmr.project_member_role_id
              WHERE pm.project = " . (int) $Project;

        $results = $GLOBALS['Db']->GetRecords($sql);

        $GLOBALS['CEngine']->WriteToCache($cache_id, $results);

        return $results;
    }

OK, so the first thing you will notice is that there’s a comment saying basically, “look, this information really doesn’t change all that much. Let’s go ahead and cache the results of the database query for later re-use”. We first ask the global CacheEngine object ($GLOBALS[‘CEngine’]) if we have a cached version of the supplied Project’s project members list:

if ($cache = $GLOBALS['CEngine']->GetFromCache($cache_id, $Seconds=0, $IsObject=true)) {
    return $cache;
}

The GetFromCache() method of the CacheEngine class returns the requested data, or FALSE. So, in the above code, we simply return the cached data if it is available in the cache. The $Seconds argument to the GetFromCache() method is simply the number of seconds that the cached data should be considered valid. Passing a zero as this argument means we always consider the data valid. The $IsObject argument tells the CacheEngine to return the cached data as an array or an object. We’ll see how this is implemented in a little bit.

OK, so if the cached data does not exist in the cache, the ProjectMemberFinder::GetByProjectId() method continues on to request the data from the underlying database. The global Db abstraction layer object (described earlier) has its GetRecords() method called, with a SQL string passed as a parameter:

$results = $GLOBALS['Db']->GetRecords($sql);

It is the next line of code that facilitates the caching of this data in our content cache:

$GLOBALS['CEngine']->WriteToCache($cache_id, $results);

So, upon the first invocation of the GetByProjectId() method of ProjectMemberFinder, for each unique supplied Project ID value, we issue a request to the database and then cache the results for each subsequent call to the function. This saves us an enormous amount of database interaction, increasing the overall scalability of the system since the software can handle more concurrent requests, since the database connection will no longer be a bottleneck to the system.

There are a couple cases that we need to handle when processing cache requests, including how to invalidate data in the cache. We’ll get to these cases in a minute. First, let’s take a look at the CacheEngine class’ two main methods: WriteToCache() and GetFromCache().

The GetFromCache() Method

As you saw above, the GetFromCache() method takes three arguments and returns either FALSE, or the cached data. Let’s take a closer look at the CacheEngine::GetFromCache() method.

	/**
	 * Retrieves a Cache file and returns either an object
	 * or a string
	 *
	 * @return	mixed
	 * @param	string	Name of File in Cache
	 * @param	int		Number of Seconds until File is considered old
	 * @param	bool	Return an object from Cache?
	 * @access	public
	 */
	function GetFromCache( $FileName , $Seconds = 0, $IsObject = false) {

		$this->_BuildFileName($FileName);
		
		$return = false;
		if ($Seconds == 0) {
			if (file_exists($this->_CacheFilePath)) {
				$return = $this->_ReadFromCache();
			}
			else {
				return false;
			}
		} 
		else {
		        $refresh_time = time() - (int) $Seconds;
			if (filemtime($this->_CacheFilePath) > $refresh_time) {
				$return = $this->_ReadFromCache();
			}
			else {
                                /** Cached data not valid, remove it */
                                $this->RemoveFromCache($FileName);
				return false;
			}
		}
		if ($IsObject) {
		    $return = unserialize($return);
		}
		return $return;
	}

The GetFromCache() function should be fairly easy to follow. The meat of the function lies in either checking that the file exists (if there is no time limit on the cached entry) with the PHP file_exists() function or checking the modification time of the file using the filemtime() function otherwise. The $IsObject flag argument simply runs an unserialize() on the data coming back from the internal _ReadFromCache() method, which we will look at next:

	/**
	 * Reads the local file from the cache directory
	 *
	 * @return	mixed
	 * @access	private
	 */
	function _ReadFromCache() {
	    $mq_setting = get_magic_quotes_runtime();
	    set_magic_quotes_runtime(0);
	    if (!$return_data = @ file_get_contents($file)) {
	    	trigger_error(get_class() . 
				'::_ReadFromCache(): Unable to read file contents'
				, E_USER_ERROR);
	    }
	    set_magic_quotes_runtime($mq_setting);
	    return $return_data;
	}

This function handles reading the cached data from a cache file. The magic quotes runtime (perhaps the most annoying PHP feature ever) is turned off before reading the file to prevent automatic escaping of certain characters in the data, and then turned back to its original setting immediately after.

So, the reading of a cache file is fairly simple. Let’s take a look at the write mechanism of the CacheEngine class. This code is adapted from a technique which George Schlossnagle details in his excellent read, “Advanced PHP Programming” (Developer’s Library, 2004) called file swapping. The technique facilitates lock-less writing of a cache file while allowing for simultaneous read requests of the cache file. Let’s take a look:

	/**
	 * Writes data to the cache
	 *
	 * @return	mixed
	 * @param	string	File Name (may be encoded)
	 * @param	mixed	Data to write
	 * @access	public
	 */
	function WriteToCache( $FileName, $Data ) {
	    if (is_array($Data) || is_object($Data)) {
		$Data = serialize($Data);
	    }
	    $this->_BuildFileName($FileName); 
	    /** 
	     * Use a file swap technique to avoid need
	     * for file locks
	     */
	    if (!$file = fopen($this->_CacheFilePath . getmypid(), "wb")) {
		trigger_error(get_class($this) . 
				'::WriteToCache(): Could not open file for writing.'
				, E_USER_ERROR);
		return false;
	    }
	    $len_data = strlen($Data);
            fwrite($file, $Data, $len_data);
            fclose($file);
	    /** Handle file swap */
	    rename($this->_CacheFilePath . getmypid(), $this->_CacheFilePath);	    
	    return true;
	}

The code above opens a temporary file for writing. Notice that the actual cache file name is appended with the getmypid() function output, essentially making the filename a temporary, unique filename. Then, given the fopen() call was able to open the file for writing, the data is then written to the file, and the file closed. Finally, the rename() function is used to change the temporary filename to the actual cache filename. Because the rename() function simply acts on the file inode (a structure which stores information about the file, not the file contents itself, the rename() operation is a) very quick, and b) allows other processes to read from any existing cache file with the existing name without blocking the rename operation.

Cache invalidation

OK, so our CacheEngine class now has most of the functionality needed to effectively cache data from the database. However, we still need a method of removing old cache data files. Hence, the very simple RemoveFromCache() method:

	/**
	 * Removes a cache file from the cache directory
	 *
	 * @return	mixed
	 * @param	string	File Name to remove (will be encoded)
	 * @access	public
	 */
	function RemoveFromCache( $file_name ) {
	    $this->_BuildFileName($file_name); 
	    if (!file_exists($this->_CacheFilePath)) {
		return true;
       	    }
	    else {
        	if (!unlink($this->_CacheFilePath)) {
		    trigger_error(get_class($this) . 
				'Unable to remove from cache file'
				, E_USER_ERROR);   
		    return false;
		}
		else {
		    clearstatcache();
		    return true;
		}
	    }
	}

All the above function does is remove the cached file if it exists. So, when would we call this function? Well, if you were paying attention earlier, you wold have already seen one occasion. In the GetFromCache() method, the RemoveFromCache() method is called when the modification time of the file exceeds the expiration time supplied to that function. Additionally, the RemoveFromCache() method would be called when we manually want to remove a cache file, for instance, if the list of project members changes, as this snippet from the /lib/class/ProjectMember.php class illustrates. The snippet comes from the ApproveMembershipRequest() method of that class:

...
            /* Remove from review queue */
            $sql = "UPDATE ... "
            if ($GLOBALS['Db']->Execute($sql)) {
                $GLOBALS['Db']->Commit();
                $GLOBALS['Cache']->RemoveFromCache('project_members-' . $Project);
                return true;
            }
...

As you can see, when an approval is made, the database is updated and the cache for the project members is removed, so that upon the next request for this project’s members, the cache file will be regenerated, ensuring valid, up-to-date data.

Summary

So, this article has been an exploration into some simple steps that you can take in order to increase the scalability of your web applications by lazy loading and content caching techniques. Below are the full class files for the CacheEngine class and the SqlConnection class used in the MySQL Forge application. Feel free to use as you wish. Additionally, a patch to the WordPress source code to enable lazy loading is included below.

As always, comments and suggestions for this article are more than encouraged and appreciated.

Managing Many to Many Relationships in MySQL – Part 1

jaypipes — Fri, 09 Dec 2005 01:32:00 +0000

Flexible, Scalable Key Mapping Solutions

In working to answer questions on the MySQL forums, I’ve noticed
a few questions that repeatedly come up on a number of the forum areas. One of these particular questions deals with how to manage — construct, query, and maintain — many to many relationships in your schema. I decided to put together a two-part article series detailing some of the common dilemmas which inevitably arise when tackling the issue of relating two entities where one entity can be related to many instances of another, and vice versa.

Hopefully, this article will shed some light on how to structure your schema effectively to produce fast, efficient queries, and also will illustrate how key map tables can be queried for a variety of different purposes. I’ll predominantly be using standard SQL, so although I’m using MySQL as the database of choice here, the code you see is for the most part not limited to running on just MySQL. In this first part, we’ll review the concepts involved in many-to-many relationships, and discuss the most common methods for use in storing the data composing the relationship. In the second part of the article, which I should complete in about another week, we’ll look at some solid performance numbers regarding the various approaches. Also, I’ll show you how to use MySQL 5 stored procedures and views in order to most effectively manage many-to-many relationships.

A Review of Relational Concepts

For those of you unaware of what a many-to-many relationship is, let’s first briefly discuss some basic
definitions I’ll be using in the article. First, an entity, in the database world, is simply a singular
object or concept. An entity, just like a real world object, may have one or more attributes which describe different aspects of the entity. A table in a properly normalized database contains records which pertain to a single entity. These records represent instances of the entity.

To illustrate, let’s assume we’re building a website that specializes in used-car sales. We need to design a schema which will handle the searching and storage of a variety of different auto makes and models, and allow customers to filter results based on a set of features they require in the automobile.

figure A – Auto Entity

The primary entity in our schema could be considered the Auto entity. It might have some attributes such as a manufacturer, model, model year, and so forth. Figure A shows a depiction of this simple Auto entity, with the primary key attribute (auto_id) in bold and above all other descriptive attributes like manufacturer, model, etc.

One to Many Relationships

Entities in the database can relate to each other in a number of ways. The most common type of relationship is called a one-to-many relationship. Here, one instance of an entity can relate, or be attached to, many instances of another entity. In our sample schema, the Auto entity can have only one auto manufacturer. However, an auto manufacturer can produce many automobiles. Therefore, we say that the relationship from Auto Manufacturer to Auto is a one-to-many relationship. Figure B depicts a one-to-many relationship between the Auto Manufacturer entity and the Auto entity. In the figure, the line between the two entities represents the relationship. This is a common way to represent relationships, with the “one” side of the relationship having a single line and the “many” side of the relationship having a set of three lines and a circle.

figure B – One to Many Relationship

Relationships are implemented via common attributes in each Entity called key attributes. In a one-to-many relationship, these key attributes take the form of a parent key and a child, or foreign, key. The relationship is maintained by “connecting” the two attributes together via these key attributes. In the case of our Auto Manufacturer to Auto relationship, the key attributes are AutoManufacturer.auto_manufacturer_id and Auto.auto_manufacturer_id. If we wanted to list the Auto’s manufacturer name (not Manufacturer ID), we would use an INNER JOIN from the Auto table to the AutoManufacturer table along this key relationship, as shown below.

SELECT am.name
FROM Auto a
INNER JOIN AutoManufacturer am
ON a.auto_manufacturer_id = am.auto_manufacturer_id
WHERE a.auto_id = 12345;

This simple example shows how a one-to-many relationship is implemented using a simple INNER JOIN to find the intersection of two tables where the key attributes AutoManufacturer.auto_manufacturer_id and Auto.auto_manufacturer_id contain matching entries.

Many to Many Relationships

A many to many relationship is realized between two entities when either entity may be associated with more than one instance of the other entity. For example, imagine the relationship between an Auto (as in car) entity and an AutoFeature entity representing any of the myriad options a car may come with.

figure C – Many to Many Relationship

In this case, we know that any particular automobile can have many features. Likewise we know that a specific automobile feature, say power windows, may be in any number of automobiles. There’s no way for us to visually represent the association between the two entities without using a third entity, which stores the mapping of the relationship between the Auto entity and the AutoFeature entity. In this case, I use the Auto2AutoFeature entity. For mapping tables, I tend to use this Something2Something naming scheme to clearly show that it is a table which primarily serves to map the relationship from one thing to another, but of course that is merely a stylistic convention, nothing more.

Schema Representations for Many to Many Relationships

There are a few common ways for representing many-to-many relationships within structured SQL data, all of which will be detailed below:

Using multiple fields of an on/off data type to store many “keys” in a single table
Using the INT, BIGINT, or SET data type (or equivalent in other RDBMS) to store a fixed number of key flags in a single table field
Using a CHAR string having one byte of storage per key needed, with the string acting as one long imploded array in a single table field
Using a relationship, or mapping, table, like the one in figure C, to store one or more keys related to an entity

All of these methods has distinct advantages and disadvantages, in both ease of use and performance. We’ll look at each here, along with some sample code to show how common queries are performed across each storage schema.

The Multiple Field Method

In this method of defining a many-to-many relationship, the concepts of normalization are thrown away in favor of what some consider to be a simpler and more rational schema. Multiple fields, each representing an on/off value for a foreign key, are used in only a single table in order to achieve the results desired. Any data type representing an on/off value may be used to represent the key fields — CHAR(1) with ‘T’ or ‘F’, ‘Y’ or ‘N’, or a TINYINT UNSIGNED with 0 and 1 values, or an ENUM(‘Y’,’N’) etc. Below, you’ll see what a sample Auto table using this method might look like with CHAR(1) data types used for the auto option key fields:

CREATE TABLE Auto (
  auto_id INT UNSIGNED NOT NULL AUTO_INCREMENT
  , auto_manufacturer_id SMALLINT UNSIGNED NOT NULL
  , auto_model VARCHAR(20) NOT NULL
  , model_year SMALLINT UNSIGNED NOT NULL
  , asking_price DECIMAL(12,2) UNSIGNED NOT NULL
  , has_air_conditioning CHAR(1) NOT NULL DEFAULT 'N'
  , has_power_windows CHAR(1) NOT NULL DEFAULT 'N'
  , has_power_steering CHAR(1) NOT NULL DEFAULT 'N'
  , has_moonroof CHAR(1) NOT NULL DEFAULT 'N'
  , has_disk_brakes CHAR(1) NOT NULL DEFAULT 'N'
  , has_power_seats CHAR(1) NOT NULL DEFAULT 'N'
  , has_leather CHAR(1) NOT NULL DEFAULT 'N'
  , PRIMARY KEY pk_Auto (auto_id)
);

There are a couple advantages to this type of approach:

Given a simple SELECT * FROM the table, it is fairly easy to understand (immediately) what options the car has.
Only one table is involved in determining options for the car.

This last point is often the reason why many novice developers choose to use this approach; it’s easier and more straightforward to filter a resultset based on a single option:

SELECT *
FROM Auto a
WHERE a.has_leather = 'Y';

or more complicated request:

SELECT *
FROM Auto a
WHERE (a.has_leather = 'Y' OR a.has_power_windows = 'Y')
AND a.has_power_steering = 'Y';

The code is easy to understand for most anyone looking at it, and there’s no special syntax or SQL tricks needed in order to query on an OR condition or multiple various filters. Unfortunately, this approach has a number of downsides. Most important among them are:

If you need to add an auto feature, you’ve got to ALTER TABLE Auto ADD COLUMN …
This disadvantage is often overlooked in early design phases by novice programmers or design teams because everyone always assumes that they know every option that the customer might want to use. However, this is rarely the reality. Customers change their minds and will inevitably ask to add or remove Auto Features from the list. With this method, that requires a fundamental change in the schema: removing or adding columns. When doing so, especially on large tables, deadlocks can easily occur while read requests wait for an exclusive write lock to finish while the table is rebuilt to the new schema. (That’s a bad thing.)
There isn’t any useful place for an index on any of these data fields.
“Wait a minute!” you say, “You can place an index on any of these fields!” Sure, that is correct. Of course you can place an index on any of these fields. In fact, you can place an index on all of them if you really wanted to. But none of those indexes is likely to be used effectively by MySQL. Why? Well, that’s simple. In the schema above, there are only two values possible for each field — a ‘Y’ or ‘N’ (or 0 and 1, ‘T’ and ‘F’, etc). Let’s assume a table of 500K auto records. If the only two values in the field are ‘Y’ and ‘N’, what use would an index be? For a common auto feature, say “has_leather”. probably half of the records would contain a ‘Y’ and half an ‘N’. What use would this be to an index? None. In fact, an index would slow MySQL down, as so many index records would have to be read, with a lookup operation to the data record from each index record. The selectivity, or distribution, of key values is extremely low (see my previous article for more explanation on this) and therefore the index has limited use to MySQL.
There is very little flexibility offered by this method.
Imagine if you were to give the customer the ability to add and remove auto features at will. You would have to GRANT the customer (or the web-based or application user) the ability to ALTER TABLE. This isn’t generally considered to be the most secure or effective method of managing change…

Also, Let’s say you came up with a list of 300 auto features. Are you going to add 300 fields to the table? Ever tried doing a SELECT * from a table with 300 fields? Even if you use the /G option in the mysql client, you’d still have a mess!

So, for all the reasons outlined above, I recommend against this approach for all but the simplest and non-production environments. It isn’t flexible enough to withstand change, and the limitations of performance far outweight any advantage to ease of use.

Wait a Minute!

“But wait!” you say, “MySQL itself uses this strategy for its own mysql database! The mysql.user table has fields like “Select_priv”, “Insert_priv”, and “Delete_priv”. Don’t you think MySQL knows what they’re doing!?”

Yes, of course MySQL knows what they’re doing. But, you have to remember one thing about the mysql system schema tables. They’re always in memory. There is no performance degradation related to the mysql.user table’s multiple field definitions for privileges because upon startup, the MySQL server actually loads all user information (privileges) into a hash table of structs containing privilege information. When a request is received to perform some operation, this hash table of user privileges is checked using very fast logical OR and AND operations. So, there aren’t any performance issues associated with the mysql tables. This is, by the way, why you must issue a FLUSH PRIVILEGES when manually changing any of the mysql tables. The FLUSH PRIVILEGES reloads this in-memory hash table.

So, for the mysql schema, the tables are designed for ease of use and simplicity, so this method was used to represent privileges. Do I favor it? Not particularly, but I can see why it was done…

The INT, BIGINT, or SET Bitmap Method

With this next method, the INT, BIGINT, or MySQL SET data type is used to store from zero to 64 separate key flags within a single field. Again, with this method, we denormalize the schema by reducing this many-to-many relationship down into a single field in a single table. Below, you’ll see an example of our Auto table converted to use the SET data type instead of the 7 distinct CHAR(1) fields from Method #1:

CREATE TABLE Auto (
  auto_id INT UNSIGNED NOT NULL AUTO_INCREMENT
  , auto_manufacturer_id SMALLINT UNSIGNED NOT NULL
  , auto_model VARCHAR(20) NOT NULL
  , model_year SMALLINT UNSIGNED NOT NULL
  , asking_price DECIMAL(12,2) UNSIGNED NOT NULL
  , auto_options SET('Air Conditioning'
    ,   'Power Windows'
    ,   'Power Steering'
    ,   'Moonroof'
    ,   'Disk Brakes'
    ,   'Power Seats'
    ,   'Leather') NOT NULL
  , PRIMARY KEY pk_Auto (auto_id)
);

Although the SET (and ENUM, SET’s one-to-many column type cousin) are listed in the MySQL manual as string column types, they are internally represented as integers. Below, when we cover bitwise operations, you’ll see how you can use the SET data type the same way as you would an INT or BIGINT, though there are advantages to simply sticking with the SET data type for simplicity’s sake. As with the multiple-field method, there are advantages to the SET method. They include:

You can store a large (64 elements) number of key values in a small storage unit.
For the SET data type, 1 byte of storage is used for up to 8 elements, 2 bytes for up to 16 elements, 3 bytes for up to 24 elements, 4 bytes of storage for up to 32 elements, and 8 bytes of storage for 33-64 possible elements. Compared with method #1 (and method #4 below), this does save storage space in the table
MySQL automatically handles showing the descriptive value of the SET column value
With the INT and BIGINT data types, MySQL will simply show the numeric representation of the field value. With the SET data type, by default, MySQL shows the descriptive string value of the column, instead of it’s numeric value (more below). This can be a handy feature.
MySQL provides the handy FIND_IN_SET() function to quickly filter for a single key value.
The FIND_IN_SET() function can be used to query for rows in which a specific key value is turned on for the particular row. If you want to find whether more than one key values are turned on, you can use an AND expression in the WHERE clause. For instance, let’s say we wanted to find all records in which the automobile had both the leather and power windows options. We could use the following:
```
SELECT *
FROM Auto a
WHERE FIND_IN_SET('Leather', auto_options)>0
AND FIND_IN_SET('Power Windows', auto_options)>0;
```
Although there are some performance issues with SET fields (see below, on lack of indexing), the FIND_IN_SET() function is highly optimized to work specifically with the integer data housed beneath the surface. Bitwise operations are used to determine whether the row matches the specific key flag queried for. These bitwise operations are generally very fast, and we’ll cover them below.

Besides FIND_IN_SET(), any bitwise operator that you would normally use on integer data can be used on SET, INT and BIGINT columns.

Bitwise operations are performed on the actual binary representation of the data. Bits (each representing a key, are turned on or off by placing the corresponding bit in the binary number to 1. The table below shows the binary and decimal representations of the first byte (8 bits) of an integer bitmap. As you can see, each “on” bit simply turns on the appropriate power of 2.

Binary	Decimal
0000 0001	1
0000 0010	2
0000 0100	4
0000 1000	8
0001 0000	16
0010 0000	32
0100 0000	64
1000 0000	128

Clearly, the larger the storage unit, the more bits can be used to represent the key flags. A BIGINT can store up to 64 unique values, an INT 32 unique values, and so on. One advantage to using the SET type is that it automatically chooses the smallest storage type needed to store the required number of set elements.

To have more than one key flag turned on within the bitmap field, the flag bits are added together. Therefore, if the third (2³) and fourth (2⁴) flags (0000 0100 and 0000 1000) are turned on, the bitmap field would contain 0000 1100, or the number 12 in decimal. If ever you want to see the decimal or binary representation of a SET field column, you can use +0 and the BIN() function, as shown below:

SELECT
  auto_id
, auto_options
, auto_options+0 as 'dec'
, LPAD(BIN(auto_options+0),8,'0') as 'bin'
FROM Auto a;

+---------+-----------------------------------------------+-----+----------+
| auto_id | auto_options                                  | dec | bin      |
+---------+-----------------------------------------------+-----+----------+
|       1 | Power Windows,Power Steering,Moonroof         |  14 | 00001110 |
|       2 | Power Windows,Power Steering,Moonroof,Leather |  78 | 01001110 |
|       3 | Power Windows,Moonroof,Leather                |  74 | 01001010 |
|       4 | Power Windows,Moonroof                        |  10 | 00001010 |
+---------+-----------------------------------------------+-----+----------+
4 rows in set (0.00 sec)

Use the LPAD() function to pretty up the output to a pre-determined width, with leading zeroes.

This means that you can do more complex querying using the numeric values of the SET elements (key values). For instance, suppose we wanted to find all those records which did not have Leather or Power Steering, but did have Power Windows. From the output above, we can easily see that the auto with ID#4 is the only record which will meet our criteria.

But, how do we structure our WHERE expression in SQL? Looking back at our table schema, we know that the numeric position (starting from the number 1) of the key values in our WHERE clause will be:

Key Value	Numeric Position in SET
Leather	7
Power Steering	3
Power Windows	2

Using the MySQL bitwise functions, we can issue our query like so:

SELECT auto_id, auto_options
FROM Auto a
WHERE auto_options & (POW(2,6)+POW(2,2)) = 0
AND auto_options & POW(2,1) > 0;

+---------+------------------------+
| auto_id | auto_options           |
+---------+------------------------+
|       4 | Power Windows,Moonroof |
+---------+------------------------+
1 row in set (0.00 sec)

In the code above, the POW() function is used to get the correct bit set for each desired element in the query. We substract 1 from the number of the element, because the “on” bits are counted from the right side of the byte structure and determined as a power of 2. For the first part of the WHERE expression:

WHERE auto_options & (POW(2,6)+POW(2,2)) = 0

we ensure that the result of the bitwise & operator (which returns a 1 for each bit where both sides of the equation have the specified bit or bits turned on) results in 0. This ensures that only rows which do not have the Power Steering and Leather options are returned. The second part of the WHERE expression uses the bitwise & operator against the Power Windows option, and filters where the result is greater than zero, so that the resulting rows are known to contain the Power Windows option.

Besides violating normalization rules (and I won’t get into an idealistic debate about that here…there’s more than enough discussion online about that), there are some concrete reasons why not to use the SET data type for handling many-to-many relationships. Although I began writing this article quite a while before Sheeri Kritzer posted about the ENUM/SET type, some points are worth repeating. Foremost among these are the following:

Using the SET data type for many-to-many relationships imposes a 64-element limit on the number of keys available to relate to the main entity.
Using the SET data type with the built-in SET functions or any of the bitwise operators prohibits the MySQL optimizer from using indexes on the column.
This is a major performance drawback to scalability and the primary reason I choose not to use this data type for anything but the smallest projects or for fields containing 3 or fewer elements. Why 3 or fewer? Well, because even if an index ould be used against the SET field, the chances of the index selectivity being large enough to filter an adequate amount of rows to make the index useful to the optimizer is already very small.
Again, working with the SET data type is not particularly flexible.

Changing elements of a SET data type once data is already in the table can be a real pain in the behind! I won’t go into the details, as the MySQL site covers many of the main points, and Beat Vontobel’s blog post on SETs covers what the MySQL site doesn’t (nice work, Beat!)

The Long String Method

A third common approach to many-to-many relationships is to use a single long string field to store essentially a concatenated version of Method #1’s multiple fields. An example of this method would be the schema below:

CREATE TABLE Auto (
  auto_id INT UNSIGNED NOT NULL AUTO_INCREMENT
  , auto_manufacturer_id SMALLINT UNSIGNED NOT NULL
  , auto_model VARCHAR(20) NOT NULL
  , model_year SMALLINT UNSIGNED NOT NULL
  , asking_price DECIMAL(12,2) UNSIGNED NOT NULL
  , auto_options CHAR(7) NOT NULL DEFAULT 'NNNNNNN'
  , PRIMARY KEY pk_Auto (auto_id)
);

There are few advantages to this approach, either performance or maintenance-wise, but it can come in handy in at least one particular circumstance: when application code either relies, or is made much more simple, by the use of imploded arrays.

Sometimes, particularly if you’ve inherited some legacy code which is simply too much of a nuisance to change, it can make sense to work with what you’ve got. If the application code relies heavily on returned field values being concatenated string values, this method might work well. The application returns a long list of either Y or N values representing whether keys are on or off. For instance, if we had an automobile with the Leather and Power Windows options, and we used the same order as the SET example above, the returned auto_options field would be: “NYNNNNNY”.

Perhaps the application was a PHP program which walked the string using something like the following:

In this case, the PHP program has the entire key table in memory (the $all_options array). All the program needs to do is slice up the long string; the position of the character corresponds to the auto feature in the $all_options array. If there was little complex querying in the application, yet there were hundreds or thousands of these key values for each row, this might be a decent method.

Except for this scenario, however, this method is generally not desirable, for the following reasons:

Too little flexibility
Again, flexibility arises. In this method, what happens if you want to remove an Auto Feature from the list? Doing so in the PHP code would be fairly simple; just remove the element from the $all_options array. But, unforunately, once you did that, the offsets into the character string would be skewed. Likewise, if you wanted to add an Auto Feature, you’d have to change the table schema — not a very flexible plan. Making matters worse, you could only add an Auto Feature to the end of the character string. Adding one in the middle (say, if you wanted to keep some sort of alphabetical ordering in the PHP code) would again cause the offsets to be skewed.
Performance degrades dramatically because indexes can rarely be used

This disadvantage becomes even more noticeable as queries become more complex. Sure, an index might be used if you were querying on the first one (or more ordered) Auto Features. For instance, assuming an index on the auto_options field, the following query would indeed use an index:
```
SELECT *
 FROM Auto a
WHERE a.auto_options LIKE 'Y%';
```
But, unfortunately, that’s about the limit of an indexes’ usefulness for this method. In the example above, we could use an index to find all records having Air Conditioning. But, what happens when we want to, say, find all automobiles which have Power Seats and Power Windows (the sixth and second keys)? Now we’re looking at the following SQL statement:
```
SELECT *
FROM Auto a
WHERE SUBSTRING(a.auto_options, 6,1) = 'Y'
AND SUBSTRING(a.auto_options, 2,1) = 'Y';
```
Besides being a mess, this code will never be able to use an index because of the use of the SUBSTRING() function on the left side of the WHERE equations. And, because indexes won’t be used, the performance of this schema will not scale well at all.

The Mapping Table Method

Alright, the final (and my preferred) method for managing many-to-many relationships is to use a key mapping table (sometimes called a relationship table). In the beginning of the article, in figure C, you saw an E-R diagram showing a key mapping table relating the AutoFeature entity to the Auto entity. Note that this method is the only truly normalized method of managing many-to-many relationships. Up until now in the article, the schema organization has been in a single table. Now, through the use of three distinct tables, we are able to normalize the schema into the following DDL:

CREATE TABLE Auto (
  auto_id INT UNSIGNED NOT NULL AUTO_INCREMENT
  , auto_manufacturer_id SMALLINT UNSIGNED NOT NULL
  , auto_model VARCHAR(20) NOT NULL
  , model_year SMALLINT UNSIGNED NOT NULL
  , asking_price DECIMAL(12,2) UNSIGNED NOT NULL
  , PRIMARY KEY pk_Auto (auto_id)
);

CREATE TABLE AutoFeature (
  auto_feature_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT
  , feature_name VARCHAR(80) NOT NULL
  , PRIMARY KEY pk_AutoFeature (auto_feature_id)
);

CREATE TABLE Auto2AutoFeature (
  auto_id INT UNSIGNED NOT NULL
  , auto_feature_id SMALLINT UNSIGNED NOT NULL
  , PRIMARY KEY pk_Auto2AutoFeature (auto_id, auto_feature_id)
);

Before we discuss the advantages to this method, it’s worth pointing out a couple drawbacks to storing many-to-many relationships in this normalized fashion.

Generally, the key mapping table method will use the most overall storage of any of these described approaches.
As with all matters in the database design world, everything comes with a tradeoff. The key mapping table is no exception. The biggest tradeoff by far is the storage space needed to store the many-to-many relationship. Instead of a single field, or multiple small fields, in a single table, we now must store the foreign keys of each entity multiple times, with each unique combination occupying a single row in the relationship table.

Additionally, because indexes can be effectively used against the key mapping table, we now need space to store the index records as well as the data records. As with any index, performance of INSERT and UPDATE operations (especially on high-volume mixed OLTP/OLAP applications) can suffer. However, any performance impact on INSERT and UPDATE operations is usually far outweighed by the performance benefits for SELECT operations. As always, however, benchmarking and testing is always a good idea; not only in the beginning of the project, but also at timed intervals as the database grows and matures.
There are now one or two more tables to maintain for the schema
Whilst having an extra table or two will provide us with the most flexibility, that flexibility comes at the cost of extra tables which must be maintained by both the application and the database administrator. For just a few seldom-changing keys, the hassle of maintaining extra tables and relationships may not be worth the added flexibility.

Now, on to the benefits of this approach, along with some examples of how to retrieve result sets using the key mapping table.

Robust Indexing possibilities are now supported.
Because both sides of the many-to-many relationship are separate fields in distinct rows of the key mapping table, our queries can support various indexing strategies. It’s best to see this in action, so I’ll demonstrate with two simple query examples.

Let’s assume that we wish to find all the options available, in a list, for a specific automobile. This is a fairly simple query, but a good starter:
```
SELECT af.feature_name
FROM AutoFeature af
INNER JOIN Auto2AutoFeature a2af
ON af.auto_feature_id = a2af.auto_feature_id
WHERE a2af.auto_id = 7;
```
```
+------------------+
| feature_name     |
+------------------+
| Air Conditioning |
| Disk Brakes      |
+------------------+
2 rows in set (0.00 sec)
```
Pretty simple query. Just an INNER JOIN from the AutoFeature table into our key mapping table on the auto_feature_id column, then a WHERE condition specifying the needed vehicle’s ID. This kind of output isn’t possible with the previous 3 methods without a lot of headache, but let’s face it: this is generally the format that an application needs data, correct? In a list, or an array. Now, with a simple query like this, that kind of output is easy.

But, are our indexes in operation for the above query? Let’s find out:
```
mysql> EXPLAIN SELECT af.feature_name
    -> FROM AutoFeature af
    -> INNER JOIN Auto2AutoFeature a2af
    -> ON af.auto_feature_id = a2af.auto_feature_id
    -> WHERE a2af.auto_id = 7 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a2af
         type: ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 2
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: af
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 2
          ref: test.a2af.auto_feature_id
         rows: 1
        Extra:
2 rows in set (0.00 sec)
```
Indeed, they are. In the EXPLAIN output above, you’ll see that the optimizer is using a constant (ref: const) on the PRIMARY KEY to filter records. In the “Extra” column, you’ll note that MySQL helpfully tells us that it’s “Using index”. Many novice (or even intermediate) database developers new to MySQL will assume that if they see “Using index” in the Extra column, that they have an optimal query plan in place for the SQL code. This is actually a bit of a misnomer. When you see “Using index” in the Extra column, it means that MySQL is able to use the index records (as opposed to the data records which the index is attached to) in order to complete the query. In other words, MySQL doesn’t have to even go to the data records; everything it needs for the query is covered by the index records. This situation is called a covering index, and is something that goes hand-in-hand with key mapping tables. Why? Because, frankly, the entire table IS the index!

For this reason, and others, which we’ll get to below, key mapping tables, when properly structured, are ideal for joining operations. In EXPLAIN plans for queries which use the key mapping table, you will often see the use of a covering index, because all of the joined columns are available in the index records. Practically speaking, this means that for MyISAM tables, the key_buffer will contain all the information that the query needs already in RAM; there is no need to access the data in the .MYD file, saving disk accesses and making the query performance lightning fast. For InnoDB tables, the access is just as quick. The PRIMARY KEY will be housed in the InnoDB data page buffer pool (since it is a clustered index and is the actual data records). So, likewise, queries will be lightning fast as the records are in memory…

So, what about other types of queries; do they also benefit from the PRIMARY KEY index? Let’s find out. Here’s another fairly simple query which attempts to find all automobiles having the Leather option:
```
SELECT a.*
FROM Auto a
INNER JOIN Auto2AutoFeature a2af
ON a.auto_id = a2af.auto_id
WHERE a2af.auto_feature_id = 7;
```
```
+---------+----------------------+------------+------------+--------------+
| auto_id | auto_manufacturer_id | auto_model | model_year | asking_price |
+---------+----------------------+------------+------------+--------------+
|       3 |                    1 | 1          |       2003 |      5000.00 |
|       4 |                    1 | 2          |       2005 |     38000.00 |
|       5 |                    1 | 2          |       2004 |     31000.00 |
|       6 |                    1 | 3          |       2003 |     10000.00 |
|       8 |                    2 | 1          |       2004 |     12000.00 |
|      10 |                    2 | 2          |       2005 |     32000.00 |
|      11 |                    2 | 2          |       2005 |     42000.00 |
+---------+----------------------+------------+------------+--------------+
7 rows in set (0.00 sec)
```
Let’s use an EXPLAIN to see if we’ve indeed got an ideal execution plan:
```
mysql> EXPLAIN SELECT a.*
    -> FROM Auto a
    -> INNER JOIN Auto2AutoFeature a2af
    -> ON a.auto_id = a2af.auto_id
    -> WHERE a2af.auto_feature_id = 7 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 16
        Extra:
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: a2af
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 6
          ref: test.a.auto_id,const
         rows: 1
        Extra: Using index
2 rows in set (0.00 sec)
```
Uh oh! What happened? We’ve encountered the dreaded ALL access type! As you can see, MySQL chooses to do a table scan of the Auto table, and for each auto_id, do a lookup into the Auto2AutoFeature table along the auto_id key, filtering on the Auto2AutoFeature PRIMARY KEY’s auto_feature_id column (Look for “ref: test.a.auto_id,const”). But, you ask, why didn’t MySQL just find the auto_id records in Auto2AutoFeature that had the Leather option, and then join that smaller resultset to the Auto table?

This is where the “robust indexing strategies” I spoke about earlier come into play, and where many novices get tripped up when dealing with key mapping tables.

The reason that MySQL chose the access plan above is because the auto_feature_id column (on which we are supplying a constant filter value of 7) is on the right side of the PRIMARY KEY index. In order for MySQL to use an index effectively, it must be able to apply a constant or range filter value to the columns of an index, from LEFT to RIGHT. This is why, although you do see the term “Using index” in the Extra column of the second EXPLAIN, MySQL actually has chosen a sub-optimal plan. The solution? We add another index to the key mapping table which allows the reverse join direction to be followed.
```
CREATE UNIQUE INDEX ix_ReversePK ON  Auto2AutoFeature (auto_feature_id, auto_id);
```
Now, let’s take another stap at the same EXPLAIN from above:
```
mysql> EXPLAIN SELECT a.*
    -> FROM Auto a
    -> INNER JOIN Auto2AutoFeature a2af
    -> ON a.auto_id = a2af.auto_id
    -> WHERE a2af.auto_feature_id = 7 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a2af
         type: ref
possible_keys: PRIMARY,ix_ReversePK
          key: ix_ReversePK
      key_len: 2
          ref: const
         rows: 7
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.a2af.auto_id
         rows: 1
        Extra:
2 rows in set (0.00 sec)
```
See the difference?! Now, the new reverse direction index we just added is first used to filter the Auto2AutoFeature records, and THEN the Auto table is joined. This is the optimal query plan for this type of query.

So, to sum up this advantage, it should be said that with key mapping tables, you’re given much more ability to tune and optimize your queries, but at the same time, you’ve got to know what you’re looking for, and got to be willing to put in the time to analyze your queries. But, hey! If you’ve read all the way through this article so far, I’m willing to bet you’ll do just that.
Flexibility to add and remove elements easily, and without schema changes.
One fatal flaw of the previously covered methods is their inability to deal easily with change. Let’s face it, change happens constantly in the business world. If you’re designing applications for use in this world, you had better make them able to deal with constant change. Otherwise, you will be on the phone 24/7 fulfilling request after request to make small changes to this type of data. Do yourself the favor, and give yourself back that time from the start.

With key mapping tables, adding new elements to the many-to-many relationship is simple. You just add a record to the master table:
```
INSERT INTO AutoFeature (feature_name) VALUES  ('GPS Navigation');
```
All done. If you want to remove one, simply issue a DELETE with a join into the key mapping table. This will remove the parent and all child relationships:
```
DELETE  AutoFeature, Auto2AutoFeature
FROM AutoFeature
INNER JOIN Auto2AutoFeature
ON AutoFeature.auto_feature_id = Auto2AutoFeature.auto_feature_id
WHERE AutoFeature.auto_feature_id = 4;
```
```
Query OK, 5 rows affected (0.00 sec)
```

That wraps up this part’s coverage of the key mapping technique. It was just a brief overview. Next time, we’ll cover how to get the most out of your key mapping tables, including how to structure complex joins for various query filters against the many-to-many keys.

Conclusion

In this first part of the series, I’ve tried to thoroughly explain four common approaches for managing many-to-many relationships. Each approach has it’s distinct benefits and drawbacks, and I’ve tried to represent those aspects clearly and fairly. Please use the comments section below to let me know if you think I skimmed over certain methods or aspects, or didn’t give enough weight to one or the other. Also, feel free to point out any methods I may have left out completely; I’m always looking to round out my material in a balanced and thoughful way, and welcome any constructive criticism!

In the second part of the series, I’ll be examining ways we can use stored procedures, views, and the INFORMATION_SCHEMA in MySQL 5 to construct an easy management system for key mapping tables, and I’ll present some more advanced SQL that enables you to have full control in querying across these key mapping tables, allowing you to drill down accurately into the wealth of information stored there. We’ll also explore methods of increasing the performance of key mapping tables through the use of alternate indexes.

I’ll also be covering some benchmarks I’ll construct this coming week which show the relative performance and scalability of these methods, as well as the storage needs of each. Hopefully, that part of the article series will give some qualitative numbers for those looking for that kind of thing! Till then, Cheers.