Removing Barriers to the Community – MySQL Moves to Bazaar

I’ve been at MySQL for two and a half years now as a community manager. Through this time, I’ve learned a number of valuable lessons about what it means to be a community manager, what it means to belong to a community of technologists, and what it means to be an open source advocate. I think back now on the attitude and preconceptions I brought with me when I joined MySQL, and reflect about the changes in my own attitude that have happened.

One of the biggest “aha” moments I have had in the past three years is the following:

It is the role of a community manager to remove the barriers — both technical and ideological — between the user/developer community and the company or group of individuals which produces the open source software

Some barriers are small. Sometimes these barriers can be overcome by a simple email to an annoyed community member who has misunderstood a poorly communicated company objective.

And, sometimes barriers are large, and require a concerted effort of many people to overcome. One of these large barriers — the usage of the closed-source BitKeeper source control system — was recently removed and is a shining example of what can happen when many advocates within a company come together for the benefit of the community as a whole.


Yesterday, Kaj Arnö announced that MySQL has officially moved away from BitKeeper and has shifted development to use the free and open source Bazaar revision control system. This is an immensely important move for the MySQL community and the MySQL engineering team and its importance cannot be understated. While BitKeeper is technically a fast and feature-full revision control system, there was a significant barrier to the community that it enforced: the free BK client was essentially a read-only client, without even basic abilities to do diffs on a source code branch. This inhibited contributions from the external community by pipelining contributions into closed BK branches with no commit access for external contributors.

With the move to MySQL hosting its source code on the Launchpad.net service in Bazaar trees, we have removed this barrier to open source development. On Launchpad, there is now an overarching mysql super-project which contains all MySQL-related projects. Under this super-project, there are two projects which contain branches of the MySQL Server. One project, “mysql-server” contains the official MySQL server code branches. The other, “sakila-server“, contains branches of the server codebase that are being worked on by external community members.

Why did we set up two separate projects, one called “mysql-server” and one called “sakila-server“? There are a number of reasons.

Showcase Community Work

First, and perhaps most importantly, the “mysql-server” project contains all official branches and MySQL engineering team trees. Consequently, the code area of Launchpad for this project, is beginning to fill up with lots of different branches, and it is a little difficult to differentiate which branches are written by community members. We wanted a separate area that community members could showcase their work, and not have their work “swamped” by all the different team trees.

Use of Launchpad’s Bugs and Blueprints Services

Secondly, one of the issues we ran into was how do community-written projects track roadmap tasks or bugs in their distribution? MySQL has its own Bug Tracking system and its own Worklog system which it uses to track “roadmap” type tasks. Unfortunately, there is no way for community-developed projects to use these systems, as they are heavily customized for MySQL’s internal procedures. Furthermore, Launchpad.net already offers services for tracking bugs and assigning roadmap tasks to projects. So, with a separate sub-project (sakila-server), community developers now have a built-in bug and blueprint (roadmap) services ready to use for their code branches.

Full Community Control of the Project

Finally, within the “mysql-server” sub-project, community members wouldn’t have the rights to “brand” their projects the way they wanted to. As MySQL is a trademarked and copyrighted product, community members would be restricted to how they could administer and brand their own branches. Having their branches under a separate sakila-server sub-project, those restrictions go away and they have full administrator rights over everything, including branding…

I think this move is fantastic, and we welcome your input and feedback about ways we can continue to open up, be transparent, and work better with our community. We’re all ears. Let a hundred dolphins swim.

Please be sure to check out both server code project areas and investigate the various branches that community members are working on!

It’s About the Product, Silly

Today there was a recent flurry of blog posts, starting with Charles Babcock’s interview of Jonathan Schwartz about Sun’s strategy of targeting Web 2.0 developers. This brought to light an interesting topic about open source development communities, the perceived insularity of Sun towards the external OpenSolaris developer community, and why Linux will apparently always be more popular and technically stronger than OpenSolaris.

The initial interview led Amanda McPherson of the Linux Foundation to take issue, and long comments on those posts from supporters and objectors, shed light on a rift between OpenSolaris insiders, Linux community developers, and Sun’s overall marketing approach around open source.

Amid various snits about Sun’s “fluffy marketing” practices[1] towards OpenSolaris, there was a defection of a high-level OpenSolaris developer, Roy Fielding. The defection ostensibly was due to Roy’s frustration that Sun was not living up to its promises of a truly open development model and community, a frustration that Steven O’Grady finds uninspiring.

Unfortunately, these conversations are really just a sideshow of personal and anti-corporate rhetoric that miss the underlying truth about open source (and indeed closed source) products/projects: it’s not about the marketing, it’s about the product, silly.

I’m sure my colleagues of the marketing persuasion may take issue with that statement, so let me elaborate why I believe this to be true, and why I think that OpenSolaris has a chance of becoming a true competitor to Linux in the future — and that chance has absolutely nothing to do with Sun’s OpenSolaris marketing team (sorry, folks.).

It’s All About the Product (Quality)

As much as developer communities[2] like to think of themselves as the purveyors of a grand vision of truth towards the user community, all developer communities are dwarfed by their corresponding user communities. Members of a user community are unlikely to be aware of discussions within the developer community nor care about the debates within that community.

User communities use the product. It is the product which counts, and nothing more. And the user community will continue to use a product as long as other competing products are inferior in three basic ways:

  • Ease of use
  • Performance
  • Stability/reliability

Before you say “whoah, doesn’t licensing count, too?!”. Sure, it does, but I group licensing in with “Ease of Use”. For some groups, a BSD-style license is “easier to use” because of a perceived viral effect of the GPL on their own code (especially for embedded products). For other groups, the GPL-style licenses are easier to use because they provide certain safeguards and benefits for their own products or environments.

So, my point is that it is not the marketing of a product that counts — indeed, false marketing statements or tactics can backfire quickly. What counts is the product itself. Marketing should be the function of promoting the innovative advantages of a product in the above three categories.

Likewise, developer communities should be about improving a product in these three ways. And, NO, “features” is not a differentiation that the user community (or customers) use in picking one competing product over another. Features that deliver better ease of use, faster performance, or more stability are what can drive differentiation. Features that do none of the above are a waste of space.

Why Open Development Models Improve Product Quality

An open development model and community is simply a method of attaining better product quality. Sure, happy developer communities translate to happy user communities — happy developers make better code and better code is enjoyed by users. But simply having a “community”, as the OpenSolaris marketing team would have you believe, does not magically make a good product. A community can take many shapes. It is my belief that a more open development community does make a better product. Why?

Increased Modularization of the Core Product

In my estimation, there are many reasons for open communities producing better products than closed communities. First and foremost amongst these reasons is the first evolutionary process which occurs when development of a product is opened up in any way: increased modularization and pluggability of the core product.

The vast majority of open source projects which have a truly open development community exhibit much higher architectural modularization than projects with closed development communities. Apache, Eclipse, Linux, PHP — these are all projects with open development communities and they each exhibit extremely high levels of modularity. Why? Because modularization enables development to continue/begin on a specific feature or part of the codebase without affecting or hindering the development of the core of the product. This enables all sorts of things from easier versioning and release practices to an increase in external development of needed features.

Contrast the architecture of these projects with the architecture of a closed development project such as MySQL. The architectural differences are sometimes stunning, especially in early versions of MySQL. When I wrote the System Internals chapter in Pro MySQL in 2005, I wrote:

MySQL’s architecture consists of a web of interrelated function sets, which work together to
fulfill the various needs of the database server. A number of authors have implied that these
function sets are indeed components, or entirely encapsulated packages; however, there is
little evidence in the source code that this is the case.

I wrote that chapter against the MySQL 5.0.2 code base. Admittedly, lots of improvements have made it into MySQL since those days, but much of the same interweaving of subsystems still exists, and complicates the code base substantially. I believe that if an open development model had been instituted at MySQL years ago, we would have a very different and much more modular core database kernel than we have today. Heck, the embedded library might even work correctly…

Modularization Leads Directly To Innovation

So, besides enabling cleaner release management and increased development of external features, why is a modular product better than one that isn’t? This all goes back to the “Big Three Basics”: Ease of use, performance, and stability. Let’s tackle the last one first.

Increased Stability

Why would modularity lead to an increase in stability? This can be answered quite easily. When a product’s architecture undergoes the refactoring needed to modularize, two things happen. First, a “core kernel” or runtime begins to take shape. This kernel takes shape because developers in an open development community want to be able to develop against a firm set of application programming interfaces (APIs). Programming against an API enables a developer of an extending feature to form a contract with other developers in the community: “I will write my feature to connect to other stuff in a predefined and contractual way”.

Secondly, this method &mdash programming against an API — allows the developer to develop her feature in isolation from ongoing development in other areas. Development in other areas can count on other developers’ code not interfering with their own code. In addition, work done on one feature or module can be released irrespective of the hangups or problems occurring in other parts of the code.

So why does this lead to greater product stability and reliability? Because of the need for APIs (so that other developers can code modules outside of the interference of other modules) a set of APIs begins to emerge that outline the “black box” of the core product. This standardization process of creating APIs for connecting to the core kernel acts as a stabilizing force for the product in general. The stability of the APIs, and their consistency in implementation, can be directly correlated to the stability of the underlying product as a whole. The reason for the correlation is that when changes to an API are more difficult to effect &mdash because of their corresponding effects on other code — stability increases due to fewer changes in the way that components interact with each other.

Case in point: the “pluggable” storage engine API in MySQL was introduced in early MySQL 5.0 versions. This API has changed numerous times over the past two years as different vendors (internal and external to MySQL) needed or wanted additional functionality. The API has started to stabilize now, but I see additional changes in the future. What has happened to the stability of the underlying code base of MySQL? I will leave that answer to the reader’s intuition.

Increased Performance

The reason I believe modularization leads to better performing software is because the act of modularizing identifies pieces of the software which can and should be “pulled from the core”. Pieces not central to the functioning of the core kernel are pulled out of the core product and placed, rightfully, into modules which provide additional, value-added-for-a-certain-group-which-needs-it features. By placing extra features outside of the realm of the “necessary”, the core kernel begins to take the shape of a lithe marathon runner and not the bloated monolith it once was.

You may wonder why this removal of extra features can lead to better performance of the product as a whole. By removing non-essential parts of the software into modules, the code for the core part of the software is simplified and shortened. Now, simple, shorter code isn’t necessarily faster just for being shorter (though often it can be). However, the ease of maintenance on a simplified core software kernel allows developers to focus on performance-related tasks involving that simplified kernel. Improving performance of essential runtime functionality is much easier and faster when the code isn’t littered with calls and code paths that correspond to the functionality that belongs in an external module. If performance issues in the core kernel are easier and faster to fix, then the product as a whole becomes better performing.

Secondly, the modularization of a code base means that users who do not need an array of features can use a smaller, streamlined product that only fits their needs. Often, when users can make use of a streamlined project that doesn’t contain code they don’t need or want, the resulting binary is smaller and faster.

Increased Ease of Use

How modularization increases ease of use is related to how it increases performance: by enabling developers to only focus on the module at hand, or the core itself, instead of a mix of both, the developer is freed to focus on fixing ease of use bugs and addressing usability issues. The more time a developer can spend actually doing the things that comprise her outstanding bug and feature list, and not on “how will my code affect ten other developer’s code”, the better the chances of a product’s usability becoming better.

Finally, Increased Competition Fosters Innovation

I think it’s fairly obvious that increases in stability, performance and ease of use translate into increases in innovation for the product as a whole.

My final thought on why modularization leads to increased innovation is that modularizing a product leads to competition in the developer marketplace for a specific feature. This competition spurs the developers of a competing module to do it better and faster. Case in point: the pluggable storage engine API “modularization” effort at MySQL spurred the creation of numerous competitive engines — from Paul McCullagh’s PBXT engine to SolidDB’s transactional engine to the new Falcon and Maria engines. Each engine demonstrates different characteristics, benefits and disadvantages. Without each other, the wellspring of innovation would be much drier.

Innovation Leads to Market Share

So, assuming you’ve come with me in my theory that open development communities foster modularization of a product, and that this modularization leads to more innovation for the product as a whole, then I think you will be able to make this final step easily: innovation in a product leads to greater market share.

It doesn’t matter whether the product is the result of a group of loosely-affiliated individuals or a company like Sun or MySQL, each producer of a product wants their product to have the most market share compared to competing products. It’s human nature; we just want to be popular!

Jonathan Schwartz is Right About Community, But Wrong About Why He’s Right

So, in Babcock’s interview, Jonathan Schwartz says,

Everything begins with the development of a community

Jonathan is right about everything beginning with the development of a community. However, I believe Jonathan, and a number of folks I’ve met personally at Sun, feel that simply having a community automatically makes a good and popular product. This simply isn’t true. It’s about the product, first. Then the community. A vibrant community (both developer and user) springs from the fountain of enthusiasm about a product. If that enthusiasm wanes, because of anything from the closing of the development model to the inability to get their voices heard regarding a product roadmap or architecture, then the community begins to die along with the product. It is a delicate thing, this product and community balance sheet.

This well is poisoned; the company has consumed its own future
and any pretense that the projects will ever govern themselves
(as opposed to being governed by whatever pointy-haired boss
is hiding behind the scenes) is now a joke. Sun should move on,
dissolve the charter that it currently ignores, and adopt the
governing style of MySQL. That company doesn’t pretend to let
their community participate in decisions, and yet they still
manage to satisfy most of their users. Let everyone else go
back to writing code/documentation for hire.



There’s nothing particularly wrong with that choice — it is
a perfectly valid open source model for corporations that
don’t need active community participation. IMO, the resulting
code tends to suck a lot more than community-driven projects,
but it is still open source.

The above is a quote from Roy Fielding’s email to the OpenSolaris developer’s mailing list in which, at the end, he resigns from the OpenSolaris community. I think it is telling about how delicate the balance between product and community really is. He points out that it was the broken promises about a truly open development model that was the deciding factor in his resignation. I suspect that frustration at not having a voice in the architectural decisions surrounding OpenSolaris were also a big factor, too. Pointedly, he states that MySQL has never pretended to let our community participate in decisions, but MySQL still enjoys a large and vibrant user community.

I theorize that the MySQL user community is in danger of becoming fragmented and is increasingly fragile because of the “we decide” nature of MySQL’s closed development model. I believe that MySQL achieved ubiquity and a huge user community because of this:

  1. In the beginning, it “just worked”, was fast, reliable, and easy to use.
  2. It became ubiquitous in both language packaging and Linux distributions very early on and only because of #1 above

In other words, it was about the product itself.

If the product loses touch with its roots and because of a closed development model moves further away from easy to use, fast, and stable, the user community and MySQL’s ubiquitous nature, will move away to competitors.

Again, an open development community is a method to achieve greater innovation in a product. I believe it is silly for either MySQL or Sun to make any steps that do not open up its development activity to the wider community. If openness spurs innovation of the product, closing off of development only means that the company is stifling innovation and decreasing its own market share. If a method is there to increase innovation, why disregard it?

Why OpenSolaris Could Be a Real Competitor to Linux

It’s about Innovation in the Performance, Stability and Ease of Use of a product. As Ian Murdock wrote on the OpenSolaris blog:

My basic observation, as someone
who came into the OpenSolaris community from the outside – even perhaps
from the competition – and who represents the target market this
community needs to reach was this: That the packaging and presentation
of OpenSolaris as it stands today represents a barrier to adoption and,
thus, an obstacle to growing the OpenSolaris community and bringing in
new users. To lower these barriers, OpenSolaris needs to be more than
just the code base. It needs to be a binary that users can easily
download and install to get easy access to OpenSolaris technology. Put
another way, as I said in a blog post in June, we need to have a
better answer to the question, “Where do I download OpenSolaris?”

Ian understands that a product gets adoption when innovations in ease of use, performance, and stability are demonstrated. In the above quote, he talks about the problems in ease of use that are barriers to adoption of OpenSolaris.

Similarly, there are a number of performance-related and stability-related innovations that are contained within OpenSolaris. But the product will not achieve true competitive status with Linux until the development community is opened up and innovation from the external community — in the form of a voice and control over architectural issues and design — is truly embraced at Sun.

If Sun embraces an open development model and embraces its developer community in discussions on roadmap and architectural decisions, I think OpenSolaris has an excellent chance of competing head-to-head with Linux on the innovation front. If the product is more innovative on a variety of levels, it will gain market share.

What This Means for MySQL

As Brian Aker rightly points out, MySQL has never been an open development model. But steps have been taken to open it up. The resignation of Fielding suggests that it’s not the lack of openness which frustrated him, but the promises of Sun to do so. This means that MySQL must be very careful not to over-promise on something it cannot or will not deliver regarding openness.

In addition, if we are to reverse the current course of a community-in-flux, we must embrace the fear we have of opening up our development process. We must get out of the cathedral and put up shop in the bazaar. We won’t be jumping out of the cathedral’s top window; more likely we’ll rappel down Rapunzel’s hair slowly and carefully. I see very good prospects about opening up in the future and I am excited. It will be happening in the nick of time.


Take everything I write with a grain of salt. After all, I’m just a lowly community relations manager.

Footnotes

[1] Amanda McPherson writes:

We may not have fancy Linux analyst days and mountains of spin, but it is all about the development community. Literally.

to which Mike Dolan responded in comments:

Yes, yes, and yes. This Sun nonsense needs to be called out; it’s great to see someone else actually looking into the critical details behind Sun’s fluffy marketing.

[2] I differentiate “developer community” as the developers which develop the product, as different from the “user community” which a) develops products on top of or for the product or b) simply use the product.

Just Chill…Chilll Out, OK? There Ain’t No Devil in PDOv2

A number of people have emailed me wondering why I haven’t blogged about the Sun/MySQL deal. Well, I’m still working out my thoughts on that, so I’ll leave it to another day. Besides, haven’t there been enough blog posts about it already?! ;)

As a PHP community member and a person who has been participating on MySQL’s behalf in the much-maligned PDOv2 working group, there is a more important and pressing topic of conversation that I’d like to comment on. Namely, the recent events surrounding the publication of the FAQ about PDOv2. There are many different topics being bandied around the PHP community schoolyard — some on-topic, some wildly off-topic and tangential. These are the issues I think represent what the majority of conversations have been about:

  • WTF? Who is this private, clandestine, devil-worshipping PDOv2 working group anyway and why wasn’t the PHP community involved in their discussions?!
  • WTF? What is this proposed Contributor License Agreement and why do we have to go over this all over again?!
  • WTF? This is only going to fracture the community into two diabolically and diametrically opposed groups!
  • WTF? This is just the database vendors trying to market their own agendas and push PHP in their own directions!

A number of emotional responses to the issues have already been made, but I write here to make a plea for calm, mature discussion about the topics at hand. It serves little purpose to be dogmatic or reactionary about this stuff, and I hope in this entry to make the case for rational, open discussion from here forward.

About the Devil-Worshipping Working Group

There have been a number of comments about how the working group composed of representatives from the database vendors, Zend and developers of PDOv1 have met in this clandestine, non-public way, and how this secretive meeting is against the principles of the PHP community, and worse, open source in general. Here is my response to these comments: Get over it.

As developers and users of PHP, we live in a world of both individuals and corporations. As such, both sides must recognize the different needs and desires of each group. The needs of large corporations and of individuals sometimes differ, and the ways in which those needs are met are often different.

One example of how those varying needs are different is the process by which legal discussions can take place. Large corporations, and their legal teams, need a different venue in which to discuss their common legal concerns — a venue which in large part must be free of the distraction of long public mailing lists. Often, the legal issues discussed in such venues are private and confidential to the companies. For instance, lawyers from various companies must be free to discuss common concerns that affect them and not the wider public audience before their stance on an issue is made public.

The discussions of the working group members up until this point have been of this nature — discussions of common concerns and agreements regarding legal ramifications of their contributions to the PDO project. It made no sense to open up these discussions on a wider mailing list. Now that such discussions are over, the working group has put together an FAQ which attempts to explain the group’s collective thoughts. The working group as it is now will cease to be the only ones discussing PDOv2 and will be just another voice in the PHP community as we address the issues surrounding CLAs, licenses, and such.

So, in short, if you are angry about not being included in the working group discussions to date: get over it and join the ongoing discussion in a useful way.

About the Need for a New PDO

Although Pierre and others have issued an emotional, albeit non-substantive, plea to ignore PDOv2, there are numerous reasons why I believe development of PDOv2 is needed, and why open discussions about the issues should proceed in a mature way.

Lack of Contributions to PDOv1

Although some in the community seem to think that community outside of contributors from database vendors would do a better job at maintaining and enhancing PDO, the evidence does not support this claim. I call out Pierre’s comment on Antony’s blog which says:

However you are definitively right, PDO needs love but will it be enough to convince their original authors to bring it at another level? to something we really need? :-)

In a perfect world, there would many contributors in the PHP development community who would be both willing and able to “show PDO some love”. But in the real world, such people have not come out of the woodwork. In fact, the developers who understand the various database vendor APIs work for the database vendors. So, if we want an improved, standardized PDO that is test-covered, the most pragmatic solution is to find a way to have employees at the database vendors able to contribute. We must begin to live in the real world, and not in a fantasy land.

PHP community members should be excited about the database vendors desire to contribute to an improved PDO, not lambasting them. Another quote from Pierre:

The biggest mistake in PDO (and the biggest mistake is being repeated again by the same persons and some newcomers) was to think they know better. They have a good understandings of DBs but sadly not of what many of us were actually looking for or about our needs.

If we could return to reality and get past strict ideology, I think you will find that the database vendors are begging for the chance to have the PHP community tell us precisely what it wants. Sure, we must get over this initial hurdle regarding the details of if, when, and where a CLA should be used, but once that is past us, we, the database vendors are eagerly anticipating the ensuing enhancements to be made to both the PDO core and our own drivers!

About the Need for Standards in PDOv2

Another reason we must press on and solve the philosophical and legal issues at hand is the burning need for database vendors to work with the PHP community on standardizing the currently messy way in which each vendor driver exposes, transfers, and writes data to and from its database. If we continue down the path of ideological crusade, we simply prolong the standardization work.

About the Need for a Metadata Interface

Let’s face it. Compared to JDBC/ODBC, the ability of PHP developers to retrieve database metadata — be it schematic, columnar, or tabular — is really poor. The reasons for this are mostly to do with the underlying drivers for each vendor. If the database vendors are locked out of participating in the development of PDOv2, expect to see little change in this regard. We need the participation of all parties to resolve differences and hammer out the kinks in the underlying libraries.

It Ain’t About Open Source, Silly

As some ideologues can tend to do, some valid concerns are often stretched beyond their limits. The current concerns about the Contributor License Agreement abolishing open source within PHP is a perfect example of this. Richard Thomas writes on Mike Willbanks’ blog:

who cares if a CLA is worthless if it makes vendors feel warm inside and gets them to contribute?

to which Larry Garfield responded:

I care, because with a CLA it’s not Free Software anymore. I’m not allowed to look at the code unless I sign an extra agreement that I won’t do… something. Sorry, not Free Software. I’m not interested.

This simply is wrong and misses the concept of a contributor license agreement.
Luckily, Mike responded with a poignant reminder to not stretch the boundaries of an argument beyond their natural conclusions:

To clarify a few things. A CLA does not mean that it would be closed source and you would have to sign a CLA before hand. Take a look at the Zend Framework for instance. You have to sign a CLA before you contribute but that certainly doesn’t make it so you can not view the source code.



Further, if the source code was not available then how would we build from the source? I think you are looking at this from the wrong viewpoint without actually reading into it fully.

Lukas Smith responded with some more common sense:

Indeed, the concern with CLA is mostly about the ability to openly participate and not about being able to read the source.

In Summary

Let us, the PHP community, the database vendors, the core developers of PHP, stay on track. The potential of PDOv2 is worth more than ideological bickering: it benefits the entire substrate of the PHP user community that we improve and enhance PDO. Let us march forward towards that goal. Yes, there are issues to be resolved — perhaps most importantly, the code and spec boundaries to which a CLA will bind — but the future is bright for data access through PDO. Let us not darken the skies with rhetoric and ideology. Let us come together and make something that is greater than the sum of its parts.

YouTube Scaling Video – Cuong Do


Alex forwarded me a link to an awesome scaling presentation by Cuong Do from YouTube about their architecture for scaling. He describes problems with massive scale-out, moving static content serving from Apache to lighttpd, how they use Python (including ways they speed it up) for the main application, certain custom C extensions for encryption, and their long road to MySQL partitioning.

A very interesting part of the presentation is Cuong’s discussion of their difficulties dealing with thumbnail images for the videos. Each video has 4 thumbnails attached to it. The went through a whole series of issues in trying to scale the thumbnail serving, including hitting the ext3 files per directory limit one fateful day…

The section on scaling MySQL is a must see for anyone working on a Web 2.0 platform that can possibly see an enormous increase in growth. I’ve seen many presentations that describe similar problems with large-scale sites running MySQL and YouTube/Google have got the formula right: using sharding (partitioning) with replicas in each shard. Their is a very interesting question at the end from an attendee about how YouTube decides which users go into which shard of the cluster. Probably not as simple as what you might expect.

One particularly funny segment of the presentation is when Cuong describes his decision to delete the swap partition from the master DB server (because the 2.4.x kernel was claiming RAM for the paging cache instead of giving it to MySQL…) It was a shot in the dark…

As a final word, I will note that I was out drinking in Portland last Thursday with the YouTuber (who will remain unnamed) who is asleep in the server room on the fifth slide. :)

Congrats to the PostgreSQL/EnterpriseDB/Sun Team for Benchmarks

I wanted to write a quick shout-out to congratulate the PostgreSQL development team, the folks at Sun who work with Josh Berkus, and the folks at EnterpriseDB, all of whom contributed to the excellent benchmark results for this quarter’s SPECjAppServer2004 benchmark suite. I’m looking forward to seeing Josh at OSCON in a couple weeks and meeting a few more of the PostgreSQL developers than I did last year.

I know that the PostgreSQL developer team has spent a considerable amount of time and effort improving performance bottlenecks and streamlining code for the PostgreSQL 8.2 release, and the benchmarks show the results of that hard work. It’s great to see the pressure put on Oracle and the “big guys” from open source databases. In the end, the competition created in the industry by the constant improvement delivered by open source development teams will only serve to benefit the end user with better performing, easier to use, and more reliable software.

Cheers to the PG dev team — job well done.

Giving Tutorial at OSCON, July 23rd, Portland, Oregon


I will be giving a 3 hour tutorial on July 23rd at this year’s OSCON entitled “Target Practice: A Workshop in Tuning MySQL Queries“. If you attended my tutorial last year, this one is quite different. It’s much more of a workshop-type tutorial than last year’s lecture-style tutorial, so there’s loads of demos, code examples, and I’ll have lots of goodies to pass out (books, shirts, etc) for folks who answer questions correctly or shout out interesting questions…

Here’s a quick overview of the tutorial:

This tutorial is for all those database developer gun-slingers who want to rid their applications of poorly performing queries and their outlaw cousin, the inefficient schema.

Take aim at poor application performance by learning how to read and understand MySQL query execution plans and tailor your SQL code and schema for optimal performance and scalability. The tutorial is designed as a hands-on workshop that demonstrates how query and index tuning can dramatically decrease application response time and make your all-important customer happy.

All of the following core concepts will be covered in detail:

  • The ins and outs of the MySQL query optimization process
  • The EXPLAIN command, in-depth
  • Profiling queries using the SHOW PROFILE command new in MySQL 5.0.35 Community Server

Hands-on demonstrations of:

  • How simple changes to indexing or schema design have dramatic effects on performance
  • How to set up a simple, but effective, benchmarking framework on your local development machine to harness techniques you learn during the tutorial

While there are some rather advanced topics covered in the tutorial, audience members with light to intermediate experience using MySQL should be able to follow along with the material. Laptops aren’t required, but for those who do have one, the tutorial is designed so attendees can follow along with the demonstrations on their laptops using a local instance of MySQL.

MySQL Conference Speaker Spotlight: Episode 6 – Matt Casters

Matt Casters has been around the open source business intelligence, ETL, and data warehousing scene for quite some time. He is the original developer of Kettle, an open source Java tool now packaged by Pentaho as Pentaho Data Integration. He’s also doing a couple a couple of absolutely dynamite sessions at the MySQL conference in April, which I asked him recently to comment on.

Our own Roland Bouman is a huge Pentaho fan, and is also doing a session called BI4DBAs, which features HOWTOs in using Pentaho, BIRT (Business Intelligence Reporting Tool), and Eclipse for business intelligence dashboard creation. These three sessions promise to deliver some of the best “holy crap, lthat’s really cool” looks from session attendees.

The first and longer 90 minute session is entitled “Exploiting MySQL 5.1 for Advanced Business Intelligence Applications“. It focuses on how to harness the power of MySQL 5.1’s newest features to create the next generation of business intelligence applications. But, nicely, Matt will be covering in the first part of the session an overview of what composes the business intelligence environment — what does it mean? what are dashboards? what is data mining versus data warehousing? what the heck is predictive analysis. After that, he’ll move on to use cases showing real world examples of how to set up BI applications using Pentaho and MySQL. I asked Matt what he wanted attendees of this first session to get out of the talk…

Covering all the latest cool features in 5.1 would take too much time so we
are going to focus on what’s really interesting for the people that want to
handle massive amounts of data and/or want to speed up their BI environment.
Obviously we will touch upon the table partitioning features as those are by
far the nicest thing to happen to MySQL for BI in a long time. We’ll
explain what it means for your data warehouse and BI applications. Going
beyond that we will show the attendees how you can not only partition data
in a single table, we will also explain the concept of database partitioning
that we introduced in the latest GA version. Finally, we will show you some
typical use-cases for all that data and structure so you can see the
possible results as far as OLAP, dashboards, reporting, mining and so forth
is concerned.


You can expect me to go hands on for a large part of the time and I will try
to make the examples as real world as possible ;-)

The second of Matt’s two sessions is called “Addressing Data Chaos: Using MySQL and Kettle to Deliver World-class Data Warehouses“. It is a 60 minute session on Wednesday, April 25th, from 10:45am to 11:45am. I asked Matt to describe what attendees should expect from the talk, and what he plans to cover…

Relational databases such as MySQL are causing a change in the way people
and organisations treat their data. Storing data in an RDBMS used to be a
big thing, but now anyone can afford to set up a MySQL database and to slam
data into it. However, this is also causing a few pains for organisations
as that data is scattered, inconsistent and hard to cross reference.
What I will explain in detail is what we touched upon during the MySQL
Webinar on Kettle we did last year: how do you create that consistent view
using a MySQL data warehouse. Although I obviously need to explain the
architecture of a typical data warehouse, we will also show a lot of
examples and explain solutions for a lot of typical problems.

Finally, I asked Matt what he thought about the other sessions at the conference, and whether he was particularly excited about meeting someone in particular, or attending a specific tutorial or session…

I’m a sucker for big data volumes, so I’ll probably be joining the “Logging
Terabytes of Hits with MySQL
” talk and probably also “MySQL Server Settings
Tuning
“.


Right after my talk I’m hoping to have a chance to go see “A Storage Engine
for Amazon S3
” as S3 is something I’ve been working with myself to test the
database partitioning stuff on that’s in Kettle. There’s actually so much
to see I’ll probably have to make some tough choices later on ;-)

Thanks much, Matt, and I look forward to meeting you at the conference!

Book your registration for the conference now and save a lot of money with the early bird registration!

MySQL Conference Speaker Spotlight: Episode 5 – Zak Greant

Despite Jeremy Cole‘s stinginess in his replies to my last speaker spotlight entry ( ;) ) I’m continuing my efforts this round with Zak Greant, who will be leading a session for MySQL newbies at the conference called “MySQL Sandalcamp: A Relaxed Introduction to MySQL“. This hour-long session is aimed at beginners who want to learn the basics of creating and querying tables, setting up a user, etc.

One of the coolest blokes I’ve had the privelege of meeting, Zak is one of those personalities who leave an instant impression on you (if you meet him, check out his wedding “ring” … it’s actually a tattoo…). Anyway, nowadays, he runs a small consulting business called Foo Associates and always seems to be a busy, busy guy.

As program chair of the conference this year, I noted that there were a lot of submissions for very advanced topics or high-end performance tuning, etc. One thing that I liked about Zak’s submission was it’s light tone and his acknowledgment that not all users of MySQL a) want or b) care to do really advanced things with MySQL. It’s simply not in the job description (or they just don’t want to be a geek :) ). IT managers, marketing folks, or sales folks who simply want to learn the MySQL terminology so that when we say “Oh, just throw a LIMIT clause on it.”, they’ll know what we mean… I asked Zak what the inspiration behind his session was.

I’ve attended each MySQL Users Conference, since chairing the first one in 2003. Each year, there is a good amount of content for developers and a good amount of content for IT decision makers, but there isn’t much content for the sales and marketing staff who are attending. I thought that it might be a good idea to give these people a simple and friendly introduction to MySQL, so that they can better connect to attendees and so that they get the benefit of using a great tool for their own work.

I also heard from Arjen that Zak had been out at linux.conf.au recently, and wondered what he’d been doing there, and whether he’d attended a cool session or run into a fellow geek he hadn’t seen in a while (besides Arjen, of course).

I visited linux.conf.au on behalf of the Mozilla Foundation and was there mostly to talk to Australian geeks about Mozilla. While I was there, I caught a good and practical session from Joe “Zonker” Brockmeier on how to promote FLOSS projects and how to relate well with the technology press. (http://lca2007.linux.org.au/talk/213)

So, finally, I asked Zak if he’d taken a look at the conference schedule for the 2007 MySQL Conference and wondered if he’d identified a particular session or tutorial that he thought would be particularly interesting…

I can’t just choose one talk. I’m curious about Falcon and will try to catch the sessions on it. I’m really curious to see how proprietary company Solid Information Technology presents their solidDB storage engine.
I’d also like to see the Amazon S3 storage engine session.

Cheers, Zak! Thanks a bunch and see you soon!

Interesting GPL Development Shaping Up

Read a NewsForge article today about how the two lead developers of the GPU Gnutella client have amended the GPL to include a provision that bans the software’s use by military organizations. Specifically, provision states:

the program and its derivative work will neither be modified or executed to harm any human being nor through inaction permit any human being to be harmed.

Interestingly, Richard Stallman, founder of the Free Software movement, doesn’t think distributors have the right to restrict the software user’s activities by restricting the software’s use in this way, though he said “Nonetheless, I don’t think the requirement is entirely vacuous, so we cannot disregard it as legally void.” It will be fascinating to see how this plays out, as it has further-ranging consequences than just this limited example. For instance, what about a clause that stated the program or its derivative work cannot be executed to spy on the privacy of citizens? … Hmmm.