Archive for category Open Source

O’Gara Cloud Computing Article Off Base

Maureen O’Gara, self-described as “the most read technology reporter for the past 20 years”, has written an article about Drizzle at Rackspace for one of Sys-con’s online zines called Cloud Computing Journal, of which she is an editor.

I tried commenting on Maureen’s article on their website, but the login system is apparently borked, at least for registered users who use OpenID, which it wants to still have a separate user ID and login. Note to sys-con.com: OpenID is designed so that users don’t have to remember yet another login for your website.

Besides having little patience for content-sparse websites that simply provide an online haven for dozens of Flash advertisements per web page, the article had some serious problems with it, not the least of which was using large chunks of my Happiness is a Warm Cloud article without citation. Very professional.

OK, to start with, let’s take this quote from the article:

Drizzle runs the risk of not being as stable as MySQL, because the Drizzle team is taking things out and putting other stuff in. Of course it may be successful in trying to create a product that’s more stable than MySQL. But creating a stable DBMS engine is something that has always taken years and years.

This is just about the most naïve explanation for whether a product will or will not be stable that I’ve ever read. If Maureen had bothered to email or call any one of the core Drizzle developers, they’d have been happy to tell her what is and is not stable about Drizzle, and why. Drizzle has not changed the underlying storage engines, so the InnoDB storage engine in Drizzle is the same plugin as available in MySQL (version 1.0.6).

The pieces of MySQL which were removed from Drizzle happen to be the parts of MySQL which have had the most stability issues — namely the additional features added to MySQL 5.0: stored procedures, views, triggers, stored functions, the INFORMATION_SCHEMA implementation, and server-side cursors and prepared statements. In addition to these removed features of MySQL, Drizzle also has no built-in Query Cache, does not support anything other than UTF-8 character sets, and has removed the MySQL replication system and binary logging — moving a rewrite of these pieces out into the plugin ecosystem.

The pieces that were added to Drizzle have mostly been done by adding plugins that provide functionality. Maureen, the reason this was done was precisely to allow for greater stability of the kernel by segregating new features and functionality into the plugin ecosystem, where they can be properly versioned and quarantined, therefore increasing kernel stability. It’s pretty much the biggest principle of Drizzle’s design…

The core developers of Drizzle (and much of the Drizzle community) would also have been happy to tell Maureen how the Drizzle team defines “stability”: when the community says Drizzle is stable — simple as that.

OK, so the next thing I took objection to is the following line:

Half of Rackspace’s customers are on MySQL so there’ll be some donkey-style nosing to get them to migrate.

I think my Rackspace colleagues might have quite a bit to say about the above. I haven’t seen any Rackers talking about mass migration from MySQL to Drizzle. As far as I have seen, the plan is to provide Drizzle as an additional service to Rackspace customers.

Rackspace evidently wants its new boys, who were not the core pillars of the MySQL engineering team, to hitch MySQL, er, Drizzle to Cassandra

MySQL != Drizzle. Implying that the two are equal do a disservice to both, as they have very different target markets and developer audiences.

The smart money is betting that even if a good number of high-volume web sites go down this route, an even higher number such as Facebook and Google will continue with relational databases, primarily MySQL.

Again, probably best to do your homework on this one, too. Facebook runs an amalgamation of a custom MySQL version and storage engines, distributed key-value stores, and Memcached servers. I would think that Facebook moving to Drizzle would be one tough migration. Thousands (tens of thousands?) of MySQL servers all running custom software and integrated into their caching layers is a huge barrier to entry, and not one I would expect a large site like Facebook to casually undertake. But, the same could be said about a move to SQL Server or Oracle, for that matter, and has little to do with Drizzle.

Google is moving away from using MySQL entirely. Mark Callaghan, previously at Google, has moved over to Facebook (possibly because of this trend at Google to get rid of MySQL), and Anthony Curtis, formerly of MySQL, then Google, left Google partially because of this reason.

OK, so the next quote got me really fired up because it demonstrates a complete lack of understanding (maybe not Maureen’s, but the unnamed source it’s from at least):

Somebody – sorry we forget who exactly – claimed that as GPL 2 code Drizzle “severely limits revenue opportunities. For Rackspace, the opportunity to have some key Drizzle developers on its payrolls basically comes down to a promotional benefit, trying to position Rackspace as particularly Drizzle-savvy in the eyes of the community and currying favor for its seemingly generous contributions. What’s unclear is whether they may develop some Drizzle-related functionality that they will then not release as open source and just rent out to Rackspace hosting customers…that would be a way for them to differentiate themselves from competitors and GPLv2 would in principle allow this.”

A few points to make about the above quote.

First, name your source. I find it difficult to believe that the most-read technology writer would not write down a source. Is it the same person you deliberately left out of a quote from my Happiness article? (why did you do that, btw?).

Second, the MySQL server source code is licensed under the GPL 2, and so is Drizzle’s kernel, because it is a derivative work of the MySQL server.

Let me be clear: Developers who contribute code to Drizzle do so under the GPLv2 if that contribution is in the Drizzle kernel. If the code contribution is a plugin, the contributor is free to pick whatever license they choose.

Third, licensing has little if anything to do with revenue at all. The license is besides the point. There are two things which dictate the company’s revenue derivation from software:

  1. Copyright ownership
  2. Principles of the Company

Drizzle, Rackspace, or any company a Drizzle contributor works for, does not have the copyright ownership of the MySQL source code, from which Drizzle’s kernel is derived. Oracle does. Therefore, companies do not have any right to re-sell Drizzle (under any license) without explicit permission from Oracle. Period. Has nothing to do with the GPLv2.

That said, contributors do have the right to make money on plugins built for the Drizzle server, and Rackspace, while not having expressed any interest to yours truly in doing so, has the right like any other Drizzle contributor, to make money on plugins its contributors create for Drizzle.

It is my knowledge (after actually having talked to Rackspace managers and decision makers), that Rackspace is not interested in getting into the business of selling commercial Drizzle plugins. Their core direction is to create value for their customers, and I fail to see how getting into the commercial software sales business meets that goal.

Next time, please feel free to contact myself or any other Drizzle contributor to get the low-down on Drizzle-related stuff. We’ll be nice. I promise.

Describing Drizzle’s Development Process

Yesterday, I was working on a survey that Selena Deckelmann put together for open source databases. She will be presenting the results at Linux.conf.au this month.

One of the questions on the survey was this:

How would you describe your development process?

followed by these answer choices:

  • Individuals request features
  • Large/small group empowered to make decisions
  • Benevolent dictator
  • Other, please specify:____________

I thought a bit about the question and then answered the following in the “Other, please specify:” area:

Bit of a mix between all three above.

The more I think about it, the more I really do feel that Drizzle’s development process is indeed a mixture of individuals, groups, and a Benevolent dictator. And I think it works pretty well. :) Here’s some of the reasons why I believe our development process is effective in enabling contributions by being a mix of the above three styles.

Who’s the Benevolent Dictator of Drizzle?

First, let me get the BDFL question out of the way. We’ve made a big deal in the Drizzle community and mailing lists that anyone and everyone is encouraged to participate in the development process — so why would I say that Drizzle has a benevolent dictator?

Well, although he would probably disagree with the tile of BDFL, Brian Aker does have some dictator-like abilities with regards to the development process, and rightfully so. Brian came up with many of the concepts that Drizzle aspires to be, and Brian has more experience working on the code base than any other contributor.

After having worked closely with Brian now for 18 months or so, I can definitively say that Brian’s brain works in a very, well, interesting way. Those of us who work with him understand that sometimes his brain works so fast, his typing fingers struggle to keep up, resulting in something I call “Krowspeak”. It’s kinda funny sometimes trying to translate :)

With this wonderfully unique noodle, Brian tends to knock out large chunks of code at a time, and often he wants to push these chunks of code into our build and regression system and into trunk to see the results of his work quickly. Sometimes, this can cause other branches to get out of sync and get merge conflicts, and Brian will inform branch owners of the conflicts and work with them to resolve them.

So, regarding dictator-like development processes, I suppose we have Brian acting as the merge dictator because he’s got a lot of experience and understands best how both his code and other’s code integrates. We tried a little while back having myself and Monty Taylor be merge captains, but that distribution of merge work actually created a number of other problems and we’ve since gone back to Brian being the merge captain by himself, with Lee, Monty, and myself improving our automated build and regression system to help Brian with the repetitive work.

That said, what Brian does not do is make decisions in a dictator-like way. Decisions about the code style, reviews, features, syntax changes, etc are made on the mailing list by consensus vote. If a consensus is not reached, generally, no change is made which would depend on the decision. Brian does not influence the direction of the software or the source code style any more than anyone else on the mailing list which expresses an opinion about an issue; and for this, I greatly respect his wisdom to seek consensus in an open and community-oriented way.

Groups Empowered to Make Decisions

I’m assuming that what Selena’s “large/small group empowered to make decisions” answer meant was what is sometimes called “Cabal Leadership” of a project. In other words, there is some group which steers the project and makes decisions about the project which affect the rest of the project’s contributors.

Drizzle has at least one such group, the Sun Microsystems Drizzle Team, which is composed of Brian, Monty Taylor, Lee Bieber, Eric Day, Stewart Smith, and myself. One might call us the core committers for Drizzle.

However, while the Sun Drizzle team certainly is empowered to guide development, it is no different than any other group of developers that choose to contribute to Drizzle. There isn’t a “what the Sun Drizzle team decides” rule in effect. Our “power” in the development process is no greater or less than any other group of contributors. We act merely as a team of individuals who work on the Drizzle code and advocate for the project’s goals.

Individuals Empowered to Make Decisions

One thing I’ve been impressed with in the past 18 months is how the Drizzle community has embraced the opinions and work of individual contributors. I believe Toru Maesaka, Andrew Hutchings, Diego Medina and Padraig O’Sullivan were among the first individuals to begin actively contributing to Drizzle. Since then, dozens of others have joined the developer and advocate community, with each individual carving out a piece of the source code or community activities that they want to work on.

I have learned much from all these individuals over the last year or so, and I’ve tried my best to share knowledge and encourage others to do the same. Our IRC channel and mailing list are active places of discussion. Our code reviews are always completely open to the public for comments and discussed transparently on Launchpad, and this code review process has been a great mixing bowl of opinion, discussion, learning and debate. I love it.

More and more we have developers showing up and taking ownership of a bug, a blueprint, or just a part of the code that interests them. And nobody stands in their way and says “Oh, no, you shouldn’t work on that because <insert another contributor’s name> owns that code.” Instead, what you will more likely see on the lists or on IRC is a response like “hey, that’s awesome! be sure to chat with <insert another contributor’s name>. They are interested in that code, too, and you should share ideas!” This is incredibly refreshing to see.

In short, the Drizzle developer process is a nice mix of empowered individuals and groups, and a dash of dictatorship just to keep things moving efficiently. It’s open, transparent, and fun to work on Drizzle. Come join us :)

A Laptop for Developers without paying The Windows Tax

I find it amazing that the U.S. Department of Justice can continue to cover its eyes and ears while Microsoft is allowed to exert its monopolistic power over all hardware manufacturers.

About 20 months ago, I was able to purchase a Lenovo Thinkpad T61 from the lenovo.com website without an operating system installed. Today, I went to purchase a new Lenovo Thinkpad laptop, again without having to pay the Windows Tax. Turns out Lenovo has stopped offering this option. What a complete PILE OF SHIT. Somebody in Microsoft’s “Business Development” or “Partners” team must have told Lenovo to stop offering its customers a simple choice of not having to pay the OEM license fees for Windows. And there’s nothing anyone can do about it. Microsoft is just too big and too pervasive for anybody to have a damn effect on them.

Frankly, it’s anti-choice, anti-competition, anti-innovation behaviour from Microsoft.

And its ridiculous.

Does anyone out there know how to get a decent laptop any more without having to fork over my money to a software giant that continues to bully all competition out of the market? Your suggestions are most welcome.

P.S. Mac is not an option for me. Sorry.

P.P.S The only thing this post has to do with MySQL is the general discussion on the acquisition of Sun by Oracle, and the pending investigation into possibly monopoly concerns by the EC…but of course I can’t comment on that directly…grr.

UPDATE:

Seems DELL offers laptops with Ubuntu installed instead of Windows, at least according to search results from their website. Yeah! \o/ Of course, now I have to just figure out how to get to that customization option. When I’ve gone through the customization screens, no option other than Windows is available. :(

UPDATE 2:

The DELL representative on their online chat program was quite helpful and offered this link to laptops they offer with no Windows Tax.

If Ever There Was a Sure-fire Tenant…

My wife and I have a double that we rent out to two couples. Luckily, one of these couples has been in one side of the double for a couple years now. They are quite stable, and are excellent renters, at least as much as a landlord likes. Stability == good for landlords.

I was reviewing some code today, and a thought crossed my mind that sparked my landlord brain. I was staring at the copyright and license header in a Drizzle plugin and it struck me…

If ever there was a steady, rock-solid tenant, I would guess that it would be the occupant of this address:

51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA

Now, I don’t know if the Free Software Foundation owns the address above, or if they rent. But, it occurred to me that if they do rent the fifth floor of that building, that the owner of that building must have one of the most trustworthy and reliable tenants ever.

I mean, we all know how much of a pain in the ass it is to move households. You’ve got to notify everyone about the new address, find friends who will help move furniture for the price of a six-pack, etc. But think of the giant problem the FSF would have if it ever decided to move. Think of the tens (hundreds?) of thousands of source code files which would suddenly have an erroneous address. I wonder if the owner of that Franklin Street building has thought of this, and has smiled, knowing just how much of a pain changing addresses would be for the FSF. Hmmm, food for thought. :)

Removing Barriers to the Community – MySQL Moves to Bazaar

I’ve been at MySQL for two and a half years now as a community manager. Through this time, I’ve learned a number of valuable lessons about what it means to be a community manager, what it means to belong to a community of technologists, and what it means to be an open source advocate. I think back now on the attitude and preconceptions I brought with me when I joined MySQL, and reflect about the changes in my own attitude that have happened.

One of the biggest “aha” moments I have had in the past three years is the following:

It is the role of a community manager to remove the barriers — both technical and ideological — between the user/developer community and the company or group of individuals which produces the open source software

Some barriers are small. Sometimes these barriers can be overcome by a simple email to an annoyed community member who has misunderstood a poorly communicated company objective.

And, sometimes barriers are large, and require a concerted effort of many people to overcome. One of these large barriers — the usage of the closed-source BitKeeper source control system — was recently removed and is a shining example of what can happen when many advocates within a company come together for the benefit of the community as a whole.


Yesterday, Kaj Arnö announced that MySQL has officially moved away from BitKeeper and has shifted development to use the free and open source Bazaar revision control system. This is an immensely important move for the MySQL community and the MySQL engineering team and its importance cannot be understated. While BitKeeper is technically a fast and feature-full revision control system, there was a significant barrier to the community that it enforced: the free BK client was essentially a read-only client, without even basic abilities to do diffs on a source code branch. This inhibited contributions from the external community by pipelining contributions into closed BK branches with no commit access for external contributors.

With the move to MySQL hosting its source code on the Launchpad.net service in Bazaar trees, we have removed this barrier to open source development. On Launchpad, there is now an overarching mysql super-project which contains all MySQL-related projects. Under this super-project, there are two projects which contain branches of the MySQL Server. One project, “mysql-server” contains the official MySQL server code branches. The other, “sakila-server“, contains branches of the server codebase that are being worked on by external community members.

Why did we set up two separate projects, one called “mysql-server” and one called “sakila-server“? There are a number of reasons.

Showcase Community Work

First, and perhaps most importantly, the “mysql-server” project contains all official branches and MySQL engineering team trees. Consequently, the code area of Launchpad for this project, is beginning to fill up with lots of different branches, and it is a little difficult to differentiate which branches are written by community members. We wanted a separate area that community members could showcase their work, and not have their work “swamped” by all the different team trees.

Use of Launchpad’s Bugs and Blueprints Services

Secondly, one of the issues we ran into was how do community-written projects track roadmap tasks or bugs in their distribution? MySQL has its own Bug Tracking system and its own Worklog system which it uses to track “roadmap” type tasks. Unfortunately, there is no way for community-developed projects to use these systems, as they are heavily customized for MySQL’s internal procedures. Furthermore, Launchpad.net already offers services for tracking bugs and assigning roadmap tasks to projects. So, with a separate sub-project (sakila-server), community developers now have a built-in bug and blueprint (roadmap) services ready to use for their code branches.

Full Community Control of the Project

Finally, within the “mysql-server” sub-project, community members wouldn’t have the rights to “brand” their projects the way they wanted to. As MySQL is a trademarked and copyrighted product, community members would be restricted to how they could administer and brand their own branches. Having their branches under a separate sakila-server sub-project, those restrictions go away and they have full administrator rights over everything, including branding…

I think this move is fantastic, and we welcome your input and feedback about ways we can continue to open up, be transparent, and work better with our community. We’re all ears. Let a hundred dolphins swim.

Please be sure to check out both server code project areas and investigate the various branches that community members are working on!

It’s About the Product, Silly

Today there was a recent flurry of blog posts, starting with Charles Babcock’s interview of Jonathan Schwartz about Sun’s strategy of targeting Web 2.0 developers. This brought to light an interesting topic about open source development communities, the perceived insularity of Sun towards the external OpenSolaris developer community, and why Linux will apparently always be more popular and technically stronger than OpenSolaris.

The initial interview led Amanda McPherson of the Linux Foundation to take issue, and long comments on those posts from supporters and objectors, shed light on a rift between OpenSolaris insiders, Linux community developers, and Sun’s overall marketing approach around open source.

Amid various snits about Sun’s “fluffy marketing” practices[1] towards OpenSolaris, there was a defection of a high-level OpenSolaris developer, Roy Fielding. The defection ostensibly was due to Roy’s frustration that Sun was not living up to its promises of a truly open development model and community, a frustration that Steven O’Grady finds uninspiring.

Unfortunately, these conversations are really just a sideshow of personal and anti-corporate rhetoric that miss the underlying truth about open source (and indeed closed source) products/projects: it’s not about the marketing, it’s about the product, silly.

I’m sure my colleagues of the marketing persuasion may take issue with that statement, so let me elaborate why I believe this to be true, and why I think that OpenSolaris has a chance of becoming a true competitor to Linux in the future — and that chance has absolutely nothing to do with Sun’s OpenSolaris marketing team (sorry, folks.).

It’s All About the Product (Quality)

As much as developer communities[2] like to think of themselves as the purveyors of a grand vision of truth towards the user community, all developer communities are dwarfed by their corresponding user communities. Members of a user community are unlikely to be aware of discussions within the developer community nor care about the debates within that community.

User communities use the product. It is the product which counts, and nothing more. And the user community will continue to use a product as long as other competing products are inferior in three basic ways:

  • Ease of use
  • Performance
  • Stability/reliability

Before you say “whoah, doesn’t licensing count, too?!”. Sure, it does, but I group licensing in with “Ease of Use”. For some groups, a BSD-style license is “easier to use” because of a perceived viral effect of the GPL on their own code (especially for embedded products). For other groups, the GPL-style licenses are easier to use because they provide certain safeguards and benefits for their own products or environments.

So, my point is that it is not the marketing of a product that counts — indeed, false marketing statements or tactics can backfire quickly. What counts is the product itself. Marketing should be the function of promoting the innovative advantages of a product in the above three categories.

Likewise, developer communities should be about improving a product in these three ways. And, NO, “features” is not a differentiation that the user community (or customers) use in picking one competing product over another. Features that deliver better ease of use, faster performance, or more stability are what can drive differentiation. Features that do none of the above are a waste of space.

Why Open Development Models Improve Product Quality

An open development model and community is simply a method of attaining better product quality. Sure, happy developer communities translate to happy user communities — happy developers make better code and better code is enjoyed by users. But simply having a “community”, as the OpenSolaris marketing team would have you believe, does not magically make a good product. A community can take many shapes. It is my belief that a more open development community does make a better product. Why?

Increased Modularization of the Core Product

In my estimation, there are many reasons for open communities producing better products than closed communities. First and foremost amongst these reasons is the first evolutionary process which occurs when development of a product is opened up in any way: increased modularization and pluggability of the core product.

The vast majority of open source projects which have a truly open development community exhibit much higher architectural modularization than projects with closed development communities. Apache, Eclipse, Linux, PHP — these are all projects with open development communities and they each exhibit extremely high levels of modularity. Why? Because modularization enables development to continue/begin on a specific feature or part of the codebase without affecting or hindering the development of the core of the product. This enables all sorts of things from easier versioning and release practices to an increase in external development of needed features.

Contrast the architecture of these projects with the architecture of a closed development project such as MySQL. The architectural differences are sometimes stunning, especially in early versions of MySQL. When I wrote the System Internals chapter in Pro MySQL in 2005, I wrote:

MySQL’s architecture consists of a web of interrelated function sets, which work together to
fulfill the various needs of the database server. A number of authors have implied that these
function sets are indeed components, or entirely encapsulated packages; however, there is
little evidence in the source code that this is the case.

I wrote that chapter against the MySQL 5.0.2 code base. Admittedly, lots of improvements have made it into MySQL since those days, but much of the same interweaving of subsystems still exists, and complicates the code base substantially. I believe that if an open development model had been instituted at MySQL years ago, we would have a very different and much more modular core database kernel than we have today. Heck, the embedded library might even work correctly…

Modularization Leads Directly To Innovation

So, besides enabling cleaner release management and increased development of external features, why is a modular product better than one that isn’t? This all goes back to the “Big Three Basics”: Ease of use, performance, and stability. Let’s tackle the last one first.

Increased Stability

Why would modularity lead to an increase in stability? This can be answered quite easily. When a product’s architecture undergoes the refactoring needed to modularize, two things happen. First, a “core kernel” or runtime begins to take shape. This kernel takes shape because developers in an open development community want to be able to develop against a firm set of application programming interfaces (APIs). Programming against an API enables a developer of an extending feature to form a contract with other developers in the community: “I will write my feature to connect to other stuff in a predefined and contractual way”.

Secondly, this method &mdash programming against an API — allows the developer to develop her feature in isolation from ongoing development in other areas. Development in other areas can count on other developers’ code not interfering with their own code. In addition, work done on one feature or module can be released irrespective of the hangups or problems occurring in other parts of the code.

So why does this lead to greater product stability and reliability? Because of the need for APIs (so that other developers can code modules outside of the interference of other modules) a set of APIs begins to emerge that outline the “black box” of the core product. This standardization process of creating APIs for connecting to the core kernel acts as a stabilizing force for the product in general. The stability of the APIs, and their consistency in implementation, can be directly correlated to the stability of the underlying product as a whole. The reason for the correlation is that when changes to an API are more difficult to effect &mdash because of their corresponding effects on other code — stability increases due to fewer changes in the way that components interact with each other.

Case in point: the “pluggable” storage engine API in MySQL was introduced in early MySQL 5.0 versions. This API has changed numerous times over the past two years as different vendors (internal and external to MySQL) needed or wanted additional functionality. The API has started to stabilize now, but I see additional changes in the future. What has happened to the stability of the underlying code base of MySQL? I will leave that answer to the reader’s intuition.

Increased Performance

The reason I believe modularization leads to better performing software is because the act of modularizing identifies pieces of the software which can and should be “pulled from the core”. Pieces not central to the functioning of the core kernel are pulled out of the core product and placed, rightfully, into modules which provide additional, value-added-for-a-certain-group-which-needs-it features. By placing extra features outside of the realm of the “necessary”, the core kernel begins to take the shape of a lithe marathon runner and not the bloated monolith it once was.

You may wonder why this removal of extra features can lead to better performance of the product as a whole. By removing non-essential parts of the software into modules, the code for the core part of the software is simplified and shortened. Now, simple, shorter code isn’t necessarily faster just for being shorter (though often it can be). However, the ease of maintenance on a simplified core software kernel allows developers to focus on performance-related tasks involving that simplified kernel. Improving performance of essential runtime functionality is much easier and faster when the code isn’t littered with calls and code paths that correspond to the functionality that belongs in an external module. If performance issues in the core kernel are easier and faster to fix, then the product as a whole becomes better performing.

Secondly, the modularization of a code base means that users who do not need an array of features can use a smaller, streamlined product that only fits their needs. Often, when users can make use of a streamlined project that doesn’t contain code they don’t need or want, the resulting binary is smaller and faster.

Increased Ease of Use

How modularization increases ease of use is related to how it increases performance: by enabling developers to only focus on the module at hand, or the core itself, instead of a mix of both, the developer is freed to focus on fixing ease of use bugs and addressing usability issues. The more time a developer can spend actually doing the things that comprise her outstanding bug and feature list, and not on “how will my code affect ten other developer’s code”, the better the chances of a product’s usability becoming better.

Finally, Increased Competition Fosters Innovation

I think it’s fairly obvious that increases in stability, performance and ease of use translate into increases in innovation for the product as a whole.

My final thought on why modularization leads to increased innovation is that modularizing a product leads to competition in the developer marketplace for a specific feature. This competition spurs the developers of a competing module to do it better and faster. Case in point: the pluggable storage engine API “modularization” effort at MySQL spurred the creation of numerous competitive engines — from Paul McCullagh’s PBXT engine to SolidDB’s transactional engine to the new Falcon and Maria engines. Each engine demonstrates different characteristics, benefits and disadvantages. Without each other, the wellspring of innovation would be much drier.

Innovation Leads to Market Share

So, assuming you’ve come with me in my theory that open development communities foster modularization of a product, and that this modularization leads to more innovation for the product as a whole, then I think you will be able to make this final step easily: innovation in a product leads to greater market share.

It doesn’t matter whether the product is the result of a group of loosely-affiliated individuals or a company like Sun or MySQL, each producer of a product wants their product to have the most market share compared to competing products. It’s human nature; we just want to be popular!

Jonathan Schwartz is Right About Community, But Wrong About Why He’s Right

So, in Babcock’s interview, Jonathan Schwartz says,

Everything begins with the development of a community

Jonathan is right about everything beginning with the development of a community. However, I believe Jonathan, and a number of folks I’ve met personally at Sun, feel that simply having a community automatically makes a good and popular product. This simply isn’t true. It’s about the product, first. Then the community. A vibrant community (both developer and user) springs from the fountain of enthusiasm about a product. If that enthusiasm wanes, because of anything from the closing of the development model to the inability to get their voices heard regarding a product roadmap or architecture, then the community begins to die along with the product. It is a delicate thing, this product and community balance sheet.

This well is poisoned; the company has consumed its own future
and any pretense that the projects will ever govern themselves
(as opposed to being governed by whatever pointy-haired boss
is hiding behind the scenes) is now a joke. Sun should move on,
dissolve the charter that it currently ignores, and adopt the
governing style of MySQL. That company doesn’t pretend to let
their community participate in decisions, and yet they still
manage to satisfy most of their users. Let everyone else go
back to writing code/documentation for hire.



There’s nothing particularly wrong with that choice — it is
a perfectly valid open source model for corporations that
don’t need active community participation. IMO, the resulting
code tends to suck a lot more than community-driven projects,
but it is still open source.

The above is a quote from Roy Fielding’s email to the OpenSolaris developer’s mailing list in which, at the end, he resigns from the OpenSolaris community. I think it is telling about how delicate the balance between product and community really is. He points out that it was the broken promises about a truly open development model that was the deciding factor in his resignation. I suspect that frustration at not having a voice in the architectural decisions surrounding OpenSolaris were also a big factor, too. Pointedly, he states that MySQL has never pretended to let our community participate in decisions, but MySQL still enjoys a large and vibrant user community.

I theorize that the MySQL user community is in danger of becoming fragmented and is increasingly fragile because of the “we decide” nature of MySQL’s closed development model. I believe that MySQL achieved ubiquity and a huge user community because of this:

  1. In the beginning, it “just worked”, was fast, reliable, and easy to use.
  2. It became ubiquitous in both language packaging and Linux distributions very early on and only because of #1 above

In other words, it was about the product itself.

If the product loses touch with its roots and because of a closed development model moves further away from easy to use, fast, and stable, the user community and MySQL’s ubiquitous nature, will move away to competitors.

Again, an open development community is a method to achieve greater innovation in a product. I believe it is silly for either MySQL or Sun to make any steps that do not open up its development activity to the wider community. If openness spurs innovation of the product, closing off of development only means that the company is stifling innovation and decreasing its own market share. If a method is there to increase innovation, why disregard it?

Why OpenSolaris Could Be a Real Competitor to Linux

It’s about Innovation in the Performance, Stability and Ease of Use of a product. As Ian Murdock wrote on the OpenSolaris blog:

My basic observation, as someone
who came into the OpenSolaris community from the outside – even perhaps
from the competition – and who represents the target market this
community needs to reach was this: That the packaging and presentation
of OpenSolaris as it stands today represents a barrier to adoption and,
thus, an obstacle to growing the OpenSolaris community and bringing in
new users. To lower these barriers, OpenSolaris needs to be more than
just the code base. It needs to be a binary that users can easily
download and install to get easy access to OpenSolaris technology. Put
another way, as I said in a blog post in June, we need to have a
better answer to the question, “Where do I download OpenSolaris?”

Ian understands that a product gets adoption when innovations in ease of use, performance, and stability are demonstrated. In the above quote, he talks about the problems in ease of use that are barriers to adoption of OpenSolaris.

Similarly, there are a number of performance-related and stability-related innovations that are contained within OpenSolaris. But the product will not achieve true competitive status with Linux until the development community is opened up and innovation from the external community — in the form of a voice and control over architectural issues and design — is truly embraced at Sun.

If Sun embraces an open development model and embraces its developer community in discussions on roadmap and architectural decisions, I think OpenSolaris has an excellent chance of competing head-to-head with Linux on the innovation front. If the product is more innovative on a variety of levels, it will gain market share.

What This Means for MySQL

As Brian Aker rightly points out, MySQL has never been an open development model. But steps have been taken to open it up. The resignation of Fielding suggests that it’s not the lack of openness which frustrated him, but the promises of Sun to do so. This means that MySQL must be very careful not to over-promise on something it cannot or will not deliver regarding openness.

In addition, if we are to reverse the current course of a community-in-flux, we must embrace the fear we have of opening up our development process. We must get out of the cathedral and put up shop in the bazaar. We won’t be jumping out of the cathedral’s top window; more likely we’ll rappel down Rapunzel’s hair slowly and carefully. I see very good prospects about opening up in the future and I am excited. It will be happening in the nick of time.


Take everything I write with a grain of salt. After all, I’m just a lowly community relations manager.

Footnotes

[1] Amanda McPherson writes:

We may not have fancy Linux analyst days and mountains of spin, but it is all about the development community. Literally.

to which Mike Dolan responded in comments:

Yes, yes, and yes. This Sun nonsense needs to be called out; it’s great to see someone else actually looking into the critical details behind Sun’s fluffy marketing.

[2] I differentiate “developer community” as the developers which develop the product, as different from the “user community” which a) develops products on top of or for the product or b) simply use the product.

Just Chill…Chilll Out, OK? There Ain’t No Devil in PDOv2

A number of people have emailed me wondering why I haven’t blogged about the Sun/MySQL deal. Well, I’m still working out my thoughts on that, so I’ll leave it to another day. Besides, haven’t there been enough blog posts about it already?! ;)

As a PHP community member and a person who has been participating on MySQL’s behalf in the much-maligned PDOv2 working group, there is a more important and pressing topic of conversation that I’d like to comment on. Namely, the recent events surrounding the publication of the FAQ about PDOv2. There are many different topics being bandied around the PHP community schoolyard — some on-topic, some wildly off-topic and tangential. These are the issues I think represent what the majority of conversations have been about:

  • WTF? Who is this private, clandestine, devil-worshipping PDOv2 working group anyway and why wasn’t the PHP community involved in their discussions?!
  • WTF? What is this proposed Contributor License Agreement and why do we have to go over this all over again?!
  • WTF? This is only going to fracture the community into two diabolically and diametrically opposed groups!
  • WTF? This is just the database vendors trying to market their own agendas and push PHP in their own directions!

A number of emotional responses to the issues have already been made, but I write here to make a plea for calm, mature discussion about the topics at hand. It serves little purpose to be dogmatic or reactionary about this stuff, and I hope in this entry to make the case for rational, open discussion from here forward.

About the Devil-Worshipping Working Group

There have been a number of comments about how the working group composed of representatives from the database vendors, Zend and developers of PDOv1 have met in this clandestine, non-public way, and how this secretive meeting is against the principles of the PHP community, and worse, open source in general. Here is my response to these comments: Get over it.

As developers and users of PHP, we live in a world of both individuals and corporations. As such, both sides must recognize the different needs and desires of each group. The needs of large corporations and of individuals sometimes differ, and the ways in which those needs are met are often different.

One example of how those varying needs are different is the process by which legal discussions can take place. Large corporations, and their legal teams, need a different venue in which to discuss their common legal concerns — a venue which in large part must be free of the distraction of long public mailing lists. Often, the legal issues discussed in such venues are private and confidential to the companies. For instance, lawyers from various companies must be free to discuss common concerns that affect them and not the wider public audience before their stance on an issue is made public.

The discussions of the working group members up until this point have been of this nature — discussions of common concerns and agreements regarding legal ramifications of their contributions to the PDO project. It made no sense to open up these discussions on a wider mailing list. Now that such discussions are over, the working group has put together an FAQ which attempts to explain the group’s collective thoughts. The working group as it is now will cease to be the only ones discussing PDOv2 and will be just another voice in the PHP community as we address the issues surrounding CLAs, licenses, and such.

So, in short, if you are angry about not being included in the working group discussions to date: get over it and join the ongoing discussion in a useful way.

About the Need for a New PDO

Although Pierre and others have issued an emotional, albeit non-substantive, plea to ignore PDOv2, there are numerous reasons why I believe development of PDOv2 is needed, and why open discussions about the issues should proceed in a mature way.

Lack of Contributions to PDOv1

Although some in the community seem to think that community outside of contributors from database vendors would do a better job at maintaining and enhancing PDO, the evidence does not support this claim. I call out Pierre’s comment on Antony’s blog which says:

However you are definitively right, PDO needs love but will it be enough to convince their original authors to bring it at another level? to something we really need? :-)

In a perfect world, there would many contributors in the PHP development community who would be both willing and able to “show PDO some love”. But in the real world, such people have not come out of the woodwork. In fact, the developers who understand the various database vendor APIs work for the database vendors. So, if we want an improved, standardized PDO that is test-covered, the most pragmatic solution is to find a way to have employees at the database vendors able to contribute. We must begin to live in the real world, and not in a fantasy land.

PHP community members should be excited about the database vendors desire to contribute to an improved PDO, not lambasting them. Another quote from Pierre:

The biggest mistake in PDO (and the biggest mistake is being repeated again by the same persons and some newcomers) was to think they know better. They have a good understandings of DBs but sadly not of what many of us were actually looking for or about our needs.

If we could return to reality and get past strict ideology, I think you will find that the database vendors are begging for the chance to have the PHP community tell us precisely what it wants. Sure, we must get over this initial hurdle regarding the details of if, when, and where a CLA should be used, but once that is past us, we, the database vendors are eagerly anticipating the ensuing enhancements to be made to both the PDO core and our own drivers!

About the Need for Standards in PDOv2

Another reason we must press on and solve the philosophical and legal issues at hand is the burning need for database vendors to work with the PHP community on standardizing the currently messy way in which each vendor driver exposes, transfers, and writes data to and from its database. If we continue down the path of ideological crusade, we simply prolong the standardization work.

About the Need for a Metadata Interface

Let’s face it. Compared to JDBC/ODBC, the ability of PHP developers to retrieve database metadata — be it schematic, columnar, or tabular — is really poor. The reasons for this are mostly to do with the underlying drivers for each vendor. If the database vendors are locked out of participating in the development of PDOv2, expect to see little change in this regard. We need the participation of all parties to resolve differences and hammer out the kinks in the underlying libraries.

It Ain’t About Open Source, Silly

As some ideologues can tend to do, some valid concerns are often stretched beyond their limits. The current concerns about the Contributor License Agreement abolishing open source within PHP is a perfect example of this. Richard Thomas writes on Mike Willbanks’ blog:

who cares if a CLA is worthless if it makes vendors feel warm inside and gets them to contribute?

to which Larry Garfield responded:

I care, because with a CLA it’s not Free Software anymore. I’m not allowed to look at the code unless I sign an extra agreement that I won’t do… something. Sorry, not Free Software. I’m not interested.

This simply is wrong and misses the concept of a contributor license agreement.
Luckily, Mike responded with a poignant reminder to not stretch the boundaries of an argument beyond their natural conclusions:

To clarify a few things. A CLA does not mean that it would be closed source and you would have to sign a CLA before hand. Take a look at the Zend Framework for instance. You have to sign a CLA before you contribute but that certainly doesn’t make it so you can not view the source code.



Further, if the source code was not available then how would we build from the source? I think you are looking at this from the wrong viewpoint without actually reading into it fully.

Lukas Smith responded with some more common sense:

Indeed, the concern with CLA is mostly about the ability to openly participate and not about being able to read the source.

In Summary

Let us, the PHP community, the database vendors, the core developers of PHP, stay on track. The potential of PDOv2 is worth more than ideological bickering: it benefits the entire substrate of the PHP user community that we improve and enhance PDO. Let us march forward towards that goal. Yes, there are issues to be resolved — perhaps most importantly, the code and spec boundaries to which a CLA will bind — but the future is bright for data access through PDO. Let us not darken the skies with rhetoric and ideology. Let us come together and make something that is greater than the sum of its parts.

YouTube Scaling Video – Cuong Do


Alex forwarded me a link to an awesome scaling presentation by Cuong Do from YouTube about their architecture for scaling. He describes problems with massive scale-out, moving static content serving from Apache to lighttpd, how they use Python (including ways they speed it up) for the main application, certain custom C extensions for encryption, and their long road to MySQL partitioning.

A very interesting part of the presentation is Cuong’s discussion of their difficulties dealing with thumbnail images for the videos. Each video has 4 thumbnails attached to it. The went through a whole series of issues in trying to scale the thumbnail serving, including hitting the ext3 files per directory limit one fateful day…

The section on scaling MySQL is a must see for anyone working on a Web 2.0 platform that can possibly see an enormous increase in growth. I’ve seen many presentations that describe similar problems with large-scale sites running MySQL and YouTube/Google have got the formula right: using sharding (partitioning) with replicas in each shard. Their is a very interesting question at the end from an attendee about how YouTube decides which users go into which shard of the cluster. Probably not as simple as what you might expect.

One particularly funny segment of the presentation is when Cuong describes his decision to delete the swap partition from the master DB server (because the 2.4.x kernel was claiming RAM for the paging cache instead of giving it to MySQL…) It was a shot in the dark…

As a final word, I will note that I was out drinking in Portland last Thursday with the YouTuber (who will remain unnamed) who is asleep in the server room on the fifth slide. :)

Interview with me on HowSoftwareIsBuilt.com

A few weeks ago, I had a chance to speak with Scott Swigart about MySQL, open source, development and community challenges, and other stuff. He sent me a link to the published interview, available on HowSoftwareIsBuilt.com. It was very interesting reading the comments of some of the other interviewees, like Stormy Peters, from OpenLogic, and Patrick Hogan, from NASA.

Congrats to the PostgreSQL/EnterpriseDB/Sun Team for Benchmarks

I wanted to write a quick shout-out to congratulate the PostgreSQL development team, the folks at Sun who work with Josh Berkus, and the folks at EnterpriseDB, all of whom contributed to the excellent benchmark results for this quarter’s SPECjAppServer2004 benchmark suite. I’m looking forward to seeing Josh at OSCON in a couple weeks and meeting a few more of the PostgreSQL developers than I did last year.

I know that the PostgreSQL developer team has spent a considerable amount of time and effort improving performance bottlenecks and streamlining code for the PostgreSQL 8.2 release, and the benchmarks show the results of that hard work. It’s great to see the pressure put on Oracle and the “big guys” from open source databases. In the end, the competition created in the industry by the constant improvement delivered by open source development teams will only serve to benefit the end user with better performing, easier to use, and more reliable software.

Cheers to the PG dev team — job well done.