Lumen software decided to choose (or re-choose) PostgreSQL over MySQL. On any given day, this news would not have caused so much as a stir in my throat. There are definitely some great reasons to choose PostgreSQL over MySQL or, for that matter, any RDBMS over another. But, one thing that really bothers me is when a decision is made to chose one over another with few valid or factually correct reasons.
No piece of software fits every use case. Ask a guy who is 6 foot 6 inches tall whether “one size fits all” really does. It doesn’t. And, likewise, database management systems will never fit all needs. And I don’t think they ever should.
How Did I Hear About This?
I’m less concerned with Lumen’s final vote than with the process that led up to it: open source enables a truly competitive market where IT buyers can first be IT trialers. Lumen didn’t need to make a decision based on marketing gimmicks, a sophisticated demo, etc. It chose its database based on actual, hands-on experience.
Inside my mind, I let out a “Nice! I could not agree more.” So, encouraged by Matt’s words, I was actually looking forward to reading the Lumen blog entry, as I figured it would contain a number of the really good points for choosing PostgreSQL over MySQL for certain applications.
Contrary to what you might think with me being a MySQL employee, I actually value good, accurate information about other RDBMS, and I’m certainly not the kind of person that would recommend MySQL over another RDBMS just because I work for MySQL. In fact, I was consulted recently about a data warehousing system and recommended both MySQL and PostgreSQL for different parts of the system based on the strengths of each of those platforms in relation to the needs of the application.
But what I can’t stand is when decisions are made based on non-facts, on myths or old information, or decisions that are clearly the result of biases toward one platform over another. These types of decisions are not decisions at all, but the justification for a pre-determined outcome.
So, let’s take a closer look at the reasons that were stated in the Lumen decision.
Record locking in MySQL can only be set for the whole table
Yes, here we go again! 🙂 Another ancient myth about MySQL dating back to before 2001. Here is the full reason given by Lumens:
Record locking in MySQL can only be set for the whole table. PostgreSQL however is able to record lock at the record level. This is a crucial factor in commercial, heavy traffic transaction based applications. For example, if a user accesses a record for update, in MySQL all other users will wait until that user has updated their information.
This is flatly wrong. A simple ENGINE=InnoDB is all that is needed to obtain row-level locking. Or, of course, you could use PBXT which also has fine grained locking.
More telling of the overall decision-making process was the statement “This is a crucial factor in commercial, heavy traffic transaction based applications.” This statement is really kind of silly. Is Lumen’s team saying that all commercial applications are “heavy traffic and transaction based”? Are they saying that all transaction-based applications are commercial and heavy traffic? The terms “commercial” and “heavy traffic” are meaningless without context — what is “commercial”? what does “heavy traffic” mean to Lumen? Transaction-based, of course, is contextual, but MySQL fully supports transaction-based systems with InnoDB and PBXT.
The statement above actually tells you more about the way Lumen has designed their application than it does about one RDBMS’ capabilities versus another’s. Row-level locking is only “crucial” for applications that rely heavily on UPDATE statements. In MySQL and other RDBMS systems, applications that rely on INSERT and SELECT statements, even with thousands of INSERTs and SELECTs per second, do not need row-level locking. In fact, using AUTO_INCREMENT in MySQL instead of a SEQUENCE allows certain applications to obtain very high throughput without record-locking at all. The statement above just shows a lack of understanding of how MySQL functions and is optimized.
PostgreSQL has Schemas, Triggers and Checks and MySQL does not
Here is the full thought train from Lumen:
PostgreSQL has Schemas, Triggers and Checks and MySQL does not. These allow us to create logic at the database level which guarantees data integrity.
The above statement is 2/3 factually incorrect and 1/3 irrelevant.
While it is true that MySQL does not support check constraints, MySQL has supported triggers and the INFORMATION_SCHEMA virtual database since October 2005 in MySQL 5. So, the first sentence of the above statement is 66% wrong.
I have to assume that Lumen meant INFORMATION_SCHEMA when they say “Schema”, since really a “Schema” is just a database, which all RDBMS obviously support… Unless, of course, Lumen meant the “CREATE SCHEMA” ability that PostgreSQL added in November 2002 to support cross-database queries, which MySQL already had the ability to do…
Regardless, the INFORMATION_SCHEMA or the CREATE SCHEMA have nothing to do with the ability of a developer to “to create logic at the database level which guarantees data integrity.” So, thus, the statement is 1/3 irrelevant to the argument it proposes.
What This Statement Really Says
But, again, this statement is most interesting for what it tells the reader about the design choices that Lumen makes with regards to its application. It is clear from the statement that Lumen would like a database that allows them to put logic in the database server to control data integrity. This is a valid desire, of course, and PostgreSQL certainly has many more database-level features for this kind of application design than does MySQL — CHECK CONSTRAINTs, UDDTs (more on them later), and all kinds of fancy scriptable procedures and such.
IMO, applications that rely on this kind of functionality in the database are setting the stage for an application deployment environment that is centered around a beefy, expensive database server, and in doing so, a central point of failure for the application. There are many in the technology field, of course, that feel everything should be in the database server — for reasons of security, maintenance, consistency, etc. I am not one of those people. To me, a scalable system is one which is designed so that the reliance on a single node in the application architecture is reduced as much as possible, and that logic and state are spread amongst as many nodes as needed for throughput, with the database server simply doing what it does best: reading and writing data. But, of course, this is just my opinion, and there is nothing particularly wrong with the stance that Lumen takes. I just wish they would be factually accurate with their statement.
PostgreSQL permits the creation of User Types … as arrays, arrays of arrays, etc.
PostgreSQL permits the creation of User Types such INT, REAL, etc., as arrays, arrays of arrays, etc. It also uses it as the Type for a Database Field. Developers are able to store and work in these database objects without conversion to regular data types. This allows developers to work with objects on an application level, read and write directly to the database. This provides better performance and data integrity.
In addition to the above point from Lumen is a related point:
MySQL provides no Inheritance of tables. This allows us to use an Object Oriented Method while developing a Database Structure. This is used when creating and maintenance databases structures.
I have bolded the parts of the statements I find problematic. While it is certainly true that MySQL does not provide any array data type (though it does provide the ENUM and SET types), it is faulty to claim that having arrays as built in types “provides better performance.” I see no benchmark results claiming the truth of this fact. Furthermore, an alternate theory suggests that if your application servers handle array conversion, and not the database server, scalability will increase as the database server is not bogged down doing work that a commodity farm server can…
On to the notion of object-oriented construction of databases…
Certainly it is true that MySQL does not (and AFAIK never will) support Table Inheritance. However, I highly doubt the follow up statements “This allows us to use an Object Oriented Method while developing a Database Structure. This is used when creating and maintenance databases structures.” Using an object-oriented methodology has very little to do with the underlying schema. In fact, many efficient schemas break the logical model for performance reasons, relying on the ORM or other object-oriented model, to tie the strings together. Again, this statement says less about whether MySQL or PostgreSQL is a better database for designing object-oriented applications and more about Lumen’s personal choice in having the database the center of the application. Again, nothing wrong with this choice, but it doesn’t say that one is better than the other, just that it fits Lumen’s particular architectural choice. So, I would have much preferred they keep the mentions of “performance” and “This is used when … structures” out of the statement and just stuck with:
Certain features of PostgreSQL, such as Table Inheritance and array types, more closely matched our internal coding style and preferences for keeping code inside the database server.
There are no sequences in MySQL, only auto_increment.
The full quote:
There are no sequences in MySQL, only auto_increment. PostgreSQL supports sequences, and this is a very useful tool if you need to store Â«Master-DetailÂ» joined data. This is also useful if you need to obtain a unique sequence of numbers and guarantee they are unique.
I’m not sure I get the logic here, and probably need Lumen to explain it further, but there’s nothing that prevents a master-detail relationship from working with an AUTO_INCREMENT primary key field. In fact, I would argue that working with SEQUENCEs in PostgreSQL is much, much harder than using the AUTO_INCREMENT functionality in MySQL. As far as obtaining unique sequences, I don’t see much of a point of doing this except when working in a replicated environment, in which case MySQL has the auto_increment_increment and auto_increment_offset variables to guarantee uniqueness of auto_increment sequences from multiple servers…
Cost was also a factor.
Cost was also a factor. MySQL is not free when used commercially. PostgreSQL is under the BSD license, so it may be distributed freely.
In my opinion, this is the root reason my PostgreSQL was chosen in this case, and of course, it is a very valid reason! No arguments there, except to point out that MySQL is free to use and embed when the source code of your application is also released under the GPL. If, like Lumen, you only want to use open source software, and not release your own as open source and GPL, then yes, licensing becomes an issue.
I have no qualms at all with Lumen choosing PostgreSQL for their application development. I only wish that the blog entry in question which was titled “PostgreSQL More Functional for Commercial Open Source SaaS” would have actually been more factually correct and more honest with the fact that Lumen’s team simply had a development style match with the functionality offered by PostgreSQL.