<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>join-fu! &#187; Articles</title>
	<atom:link href="http://www.joinfu.com/category/articles/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.joinfu.com</link>
	<description>the art of sql</description>
	<lastBuildDate>Mon, 23 Jan 2012 20:21:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Character Sets, Collations and the Jörmungandr</title>
		<link>http://www.joinfu.com/2008/10/character-sets,-collations-and-the-jrmungandr/</link>
		<comments>http://www.joinfu.com/2008/10/character-sets,-collations-and-the-jrmungandr/#comments</comments>
		<pubDate>Thu, 02 Oct 2008 16:51:00 +0000</pubDate>
		<dc:creator>jaypipes</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://joinfu.com/2008/10/character-sets,-collations-and-the-jrmungandr</guid>
		<description><![CDATA[One of the (many) ongoing discussions in the Drizzle developer community is the level of support the database server kernel should provide for non-Unicode character set encodings. Actually, when I say non-Unicode, I actually mean non-UTF8, since we&#8217;ve stripped out all other character sets and &#8220;standardized&#8221; on 4-byte UTF8. I&#8217;ll come back to why exactly [...]]]></description>
			<content:encoded><![CDATA[<p>
One of the (many) <a href="https://lists.launchpad.net/drizzle-discuss/msg01554.html" >ongoing discussion</a>s in the <a href="http://launchpad.net/drizzle"  title="Drizzle Project">Drizzle</a> developer <a href="https://lists.launchpad.net/drizzle-discuss/"  title="Drizzle Mailing Lists">community</a> is the level of support the database server kernel should provide for non-Unicode character set encodings.  Actually, when I say non-Unicode, I actually mean non-UTF8, since we&#8217;ve stripped out all other character sets and &#8220;standardized&#8221; on 4-byte UTF8.  I&#8217;ll come back to why exactly I put <em>standardized</em> in quotes in just a bit&#8230;but to sum up, in childish terms, my thoughts after spending 4 hours tonight reading about character sets and collations, here is an exchange between <a href="http://torum.net/"  title="Toru Maesaka">Toru</a> and myself on Freenode #drizzle:
</p>
<pre>
&lt;jaypipes&gt; tmaesaka: how do you write "I wish everyone would just speak English" in Japanese? <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />
&lt;tmaesaka&gt; みんな英語使うといいのに。
</pre>
<h2>A Little Background</h2>
<p>
For those of you new to the world of character sets and collations, I&#8217;ll briefly summarize the concepts and terms I&#8217;ll talk about in this article.  Incidentally, I consider myself to be in this crowd, since I&#8217;ve never really had to deal with anything more than a cursory knowledge of them in reference to how they work in MySQL (not the internals).
</p>
<h3>Character Sets and Encodings</h3>
<p>
A character set, or character encoding scheme, is a system for matching characters &mdash; such as &#8220;A&#8221; or &#8220;み&#8221; or &#8220;ß&#8221; &mdash; with a machine-readable code for the character.  This machine-readable code can be represented simply as a decimal number, or in more complex character sets, a hexidecimal number.  The &#8220;encoding&#8221; of the character set is the protocol, or instructions, that the character set uses in order to enable the computer to understand a series of byte sequences and interpret the sequence as a specific character.
</p>
<p>
<img src="http://upload.wikimedia.org/wikipedia/commons/8/85/ASCII_Code_Chart-Quick_ref_card.jpg" width="400" height="300" style="float: right; margin: 0px 0px 10px 10px;" /><br />
Character sets such as <a href="http://en.wikipedia.org/wiki/ASCII"  title="ASCII">ASCII</a> are very simple, and consist of a single 8-bit byte for each character contained in the character set &mdash; with only 7 bits actually used for the character code.  ASCII consists of the English-language alphabetic characters, including their captalized forms, the numbers 0 through 9, a variety of punctuation and &#8220;common&#8221; symbols like &#8220;$&#8221; and &#8220;;&#8221;, and a series of non-printable &#8220;control&#8221; characters.  This encoding scheme works wonderfully for those of us in the U.S., but it is utterly lacking when it comes to representing the myriad characters and symbols in other languages.
</p>
<p>
Other more-complex character encodings are localized for a specific language, or writing system.  For instance, the <a href="http://en.wikipedia.org/wiki/Shift-JIS"  title="Shit_JIS Character Encoding Scheme">Shift_JIS</a> character encoding scheme encodes, in 2 bytes, the ASCII character set (with 2 exceptions), the &#8220;half-width Katakana&#8221; characters, and the <a href="http://en.wikipedia.org/wiki/JIS_X_0208"  title="JIS X 0208">JIS X 0208</a> set of <em>kanji</em> symbols.  Sound complicated?  It is.  And it gets even more complicated the further down the <a href="http://en.wikipedia.org/wiki/Rabbit_hole" >rabbit-hole</a> one goes.
</p>
<p>
Which leads me to Unicode&#8230;
</p>
<h3>What the Heck is Unicode and UTF?</h3>
<p>
Many folks think that <a href="http://en.wikipedia.org/wiki/Unicode"  title="Unicode">Unicode</a> is merely another character set or encoding scheme.  It&#8217;s not.  It&#8217;s actually more than that.  It&#8217;s an entire system which endeavours to standardize the way that computers can read, sort, and transform characters encoded in various character sets.
</p>
<p>
Actually, The Unicode standard according to Wikipedia
</p>
<blockquote><p>
&#8230;consists of a repertoire of more than 100,000 characters, a set of code charts for visual reference, an encoding methodology and set of standard character encodings, an enumeration of character properties such as upper and lower case, a set of reference data computer files, and a number of related items, such as character properties, rules for normalization, decomposition, collation, rendering and bidirectional display order&#8230;
</p></blockquote>
<p>
Got all that?
</p>
<p>
So, Unicode is a set of standards for dealing with lots of varying languages and characters, and <em>transcoding</em> character codes from one encoding scheme to another.  What, then, is UTF[8|16|32]?
</p>
<p>
UTF stands for <em>Unicode Transformation Format</em>, and is a set of mapping methods for translating one of Unicode&#8217;s 1,114,112 <em>code points</em> (characters or control sequences) to a hexadecimal number.  <a href="http://en.wikipedia.org/wiki/UTF-8"  title="UTF8">UTF8</a> is a variably-sized mapping method, which uses between one and four bytes to represent one of the code points.  ASCII and most Western character sets take up 1 byte of storage, whilst CJK (Chinese/Japanese/Korean) characters typically consume 3 bytes of space per character.  <strong>It is important to note that this 3 bytes is one more byte per character than encoding schemes like Shift_JIS, which use either 1 or 2 bytes for characters</strong>.  <a href="http://blogs.mysql.com/yoshinori/"  title="Yoshinori Matsunobu">Yoshinori Matsunobu</a> published a short article today on these <a href="http://blogs.mysql.com/yoshinori/2008/10/02/unicode-is-not-a-silver-bullet/" >storage space differences</a>.
</p>
<p>
UTF16 is a variable-width mapping scheme which uses the first 16 bits of the hexadecimal number to represent what &#8220;category&#8221; or &#8220;plane&#8221; of characters the code point belongs to.  UTF16 generally uses a little bit less storage space for CJK characters versus UTF8.  However, when analyzing actual CJK text, which includes spaces and other ASCII characters, <a href="https://lists.launchpad.net/drizzle-discuss/msg01610.html" >the storage difference seems to be negligible</a>.  UTF32 is a <em>fixed-length</em> mapping method which uses 4 bytes to store each code point.
</p>
<p>
UTF8 is dominant in the web space, with all modern browsers able to understand and encode for UTF8.
</p>
<h3>OK, So What is a Collation?</h3>
<p>
So, if a character encoding scheme, such as UTF8, is used to identify a set of characters and symbols as a machine-readable sequence of bytes, then what exactly is a <em>collation</em>, and why are they important?
</p>
<p>
Glad you asked.  A collation , or <a href="http://en.wikipedia.org/wiki/Collating_sequence"  title="collating sequence">collating sequence</a>, refers to the order in which different characters in a character set should appear when sorted in a list.  The alphabetic collating sequence is the one some of us, in our little English-only world, are familiar with.  But in various regions of the world, the same set of characters may be ordered differently when appearing in a list of characters.  And therefore, even with a character encoding scheme like UTF8, one must also specify a collation when listing textual results in a specific order.
</p>
<p>
In MySQL, as well as Drizzle, the method for ordering results by a specific collation is fairly simple: one merely <a href="http://dev.mysql.com/doc/refman/5.0/en/charset-collate.html"  title="Collation MySQL">specifies the collation</a> in the <tt>ORDER BY</tt> clause, like the example below shows:
</p>
<pre>
mysql> CREATE TABLE utf8_tests (
    ->   my_text VARCHAR(100) NOT NULL
    -> ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO utf8_tests VALUES ('comb'),('cukor'),('csak'),('folyik'),('folyó'),('folyosó'),('fő'),('födém');
Query OK, 8 rows affected (0.00 sec)
Records: 8  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM utf8_tests ORDER BY my_text COLLATE utf8_general_ci;
+----------+
| my_text  |
+----------+
| comb     |
| csak     |
| cukor    |
| födém  |
| fő      |
| folyó   |
| folyik   |
| folyosó |
+----------+
8 rows in set (0.00 sec)

mysql> SELECT * FROM utf8_tests ORDER BY my_text COLLATE utf8_hungarian_ci;
+----------+
| my_text  |
+----------+
| comb     |
| csak     |
| cukor    |
| fő      |
| födém  |
| folyó   |
| folyik   |
| folyosó |
+----------+
8 rows in set (0.00 sec)
</pre>
<p>
You&#8217;ll notice that the words &#8220;fő&#8221; and &#8220;födém&#8221; are reversed depending on the collation used in the <tt>ORDER BY</tt> clause.
</p>
<p>
Any Hungarians reading this article?  If there are, you&#8217;ll likely have already spotted the problem with the above output.  The problem is that it&#8217;s wrong.  &#8220;csak&#8221; should appear <strong>after</strong> cukor, since &#8220;cs&#8221; is a digraph (two-characters interpreted as one) which comes after &#8220;c&#8221; in the Hungarian alphabet.
</p>
<p>
The above behaviour is <a href="http://bugs.mysql.com/bug.php?id=12519"  title="Incorrect Hungarian collation order">known bug in MySQL</a> since August 2005, over three years.  The above bug is something I noticed while reading up on collations and comparing what&#8217;s going on in MySQL/Drizzle to what the standard expects.  The ICU project has a <a href="http://demo.icu-project.org/icu-bin/locexp?d_=en&#038;x=col&#038;_=root" >set of HTML pages</a> where you can type in a list of words in a language and sort by various collations, and it will show you the correct sort order.  I ran into the bug above, as well as a new <a href="http://bugs.mysql.com/bug.php?id=39816"  title="German collation incorrect Bug MySQL">bug in the German collation</a> I found today.
</p>
<h2>Where Drizzle Is <em>Right Now</em></h2>
<p>
Currently, all but the UTF8 character set have been removed from Drizzle.  Furthermore, the UTF8 implementation in Drizzle is full 4-byte UTF8, which differs from the 3-byte variety used in MySQL <= 5.1.  There are two major benefits that this decision and subsequent removal has given Drizzle:
</p>
<ol>
<li>Reductions in the size and complexity of the Drizzle parser &mdash; removal of some CONVERT() stuff, introducers, and more
<li>Easier to understand (and potentially refactor) code surrounding character sets and collations</li>
</ol>
<p>
So, it seems that although we&#8217;ve stripped out a lot of complexity by moving to only UTF8 and its collations, we&#8217;ve inherited a system that, frankly, was never designed to handle complex collations.  Instead, it is designed to be fast, not entirely accurate.  So, what is a project to do? <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />
</p>
<p>
We have a number of options, all of which we&#8217;ve been debating over on the mailing lists:
</p>
<ul>
<li>Use <a href="http://www.icu-project.org/"  title="ICU Project">libICU</a> for all character set and collation services &mdash; libICU is a full-featured library written by experts in the languages and transcoding fields, why not take advantage of that expertise?</li>
<li>Use GLib&#8217;s locale facilities &mdash; this has mostly been ruled out because of performance concerns over non-reentrant code dependent on setlocale()</li>
<li>Write our own &mdash; this is essentially where we are right now</li>
<li>Use C++&#8217;s <tt>&lt;locale&gt;</tt> facilities, as <a href="https://lists.launchpad.net/drizzle-discuss/msg01587.html" >Monty demonstrated on the mailing lists</a> &mdash; actually, I also have O&#8217;Reilly&#8217;s C++ Cookbook, so I know where that code originated&#8230; <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </li>
</ul>
<p>
libICU is, frankly, quite a large library, and it&#8217;s not certain that the performance of it would be satisfactory.  However, I can certainly envision taking libICU&#8217;s test case suite and converting it to the Drizzle test suite format.  This would certainly poke holes in our current character set handling that need to be discovered.
</p>
<p>
Although Yoshinori-san&#8217;s objections about UTF8 storage requirements versus localized Japanese character sets are valid, I don&#8217;t think at this point that we&#8217;ll re-introduce non-UTF8 character sets into the server at this time.  If there is a huge uproar over this, in the future, pluggable character sets are a possibility, after changes to the plugin API to enable it.  Pluggable collations too&#8230;
</p>
<p>
This last option is the one which interests me the most, and I find most appealing.  In fact, I compiled a small test program based on the C++ <tt>&lt;locale&gt;</tt> facilities which actually produces the correct collation order for the bug demonstrated above:
</p>
<pre class="cpp">
<ol>
<li class="li1">
<div class="de1"><span class="co2">#include &lt;locale&gt;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co2">#include &lt;iostream&gt;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co2">#include &lt;vector&gt;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co2">#include &lt;string&gt;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="co2">#include &lt;algorithm&gt;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="kw2">using</span> <span class="kw2">namespace</span> std;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="kw4">bool</span> localeLessThan<span class="br0">&#40;</span><span class="kw4">const</span> string&amp; s1, <span class="kw4">const</span> string&amp; s2<span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">  locale locale1<span class="br0">&#40;</span><span class="st0">&quot;hu_HU.utf8&quot;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">  <span class="kw4">const</span> collate&lt;char&gt;&amp; col= use_facet&lt;collate&lt;char&gt; &gt;<span class="br0">&#40;</span>locale1<span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">  <span class="kw4">const</span> <span class="kw4">char</span>* pb1= s1.<span class="me1">data</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">  <span class="kw4">const</span> <span class="kw4">char</span>* pb2= s2.<span class="me1">data</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">  <span class="kw1">return</span> <span class="br0">&#40;</span>col.<span class="me1">compare</span><span class="br0">&#40;</span>pb1, pb1 + s1.<span class="me1">size</span><span class="br0">&#40;</span><span class="br0">&#41;</span>,</div>
</li>
<li class="li1">
<div class="de1">                      pb2, pb2 + s2.<span class="me1">size</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span> &lt; <span class="nu0">0</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="kw4">int</span> main<span class="br0">&#40;</span><span class="kw4">int</span> argc, <span class="kw4">char</span>** argv<span class="br0">&#41;</span> <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">  string s1 = <span class="st0">&quot;comb&quot;</span>;</div>
</li>
<li class="li1">
<div class="de1">  string s2 = <span class="st0">&quot;csak&quot;</span>;</div>
</li>
<li class="li1">
<div class="de1">  string s3 = <span class="st0">&quot;cukor&quot;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">  vector&lt;string&gt; all_the_strings;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">  all_the_strings.<span class="me1">push_back</span><span class="br0">&#40;</span>s1<span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">  all_the_strings.<span class="me1">push_back</span><span class="br0">&#40;</span>s2<span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">  all_the_strings.<span class="me1">push_back</span><span class="br0">&#40;</span>s3<span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">  sort<span class="br0">&#40;</span>all_the_strings.<span class="me1">begin</span><span class="br0">&#40;</span><span class="br0">&#41;</span>, all_the_strings.<span class="me1">end</span><span class="br0">&#40;</span><span class="br0">&#41;</span>, localeLessThan<span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">  <span class="kw1">for</span> <span class="br0">&#40;</span>vector&lt;string&gt;::<span class="me2">const_iterator</span> p= all_the_strings.<span class="me1">begin</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">       p != all_the_strings.<span class="me1">end</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">       ++p<span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">    <span class="kw3">cout</span> &lt;&lt; *p &lt;&lt; endl;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span></div>
</li>
</ol>
</pre>
<p>
Compiling and running the program shows the correct sorted order for the words:
</p>
<pre>
[518][jpipes@serialcoder: /home/jpipes/repos/drizzle/test-hun]$ g++ test.cc
[519][jpipes@serialcoder: /home/jpipes/repos/drizzle/test-hun]$ ./a.out
comb
cukor
csak
</pre>
<p>
I&#8217;m thinking that the refactoring work that still needs to be completed around CHARSET_INFO and MY_CHARSET_HANDLER should experiment with the technique above and verify any performance regression (or improvement) that may occur.  Accuracy, in my opinion, and the ability to let a library <strong>not written by Drizzle developers</strong> do the heavy lifting, is more important than a small performance increase.
</p>
<h2>The Edwin Strikes Back</h2>
<p>
<img src="http://joinfu.com/img/darth-edwin.png" style="float: left; margin: 0px 30px 10px 0px;" /><br />
As he is sometimes prone to do, my dear colleague, <a href="http://www.linkedin.com/pub/7/1a4/385"  title="Edwin DeSouza">Edwin DeSouza</a>, shot me an email with a link he thought I might find interesting.  He saw I had been chatting about character sets and had discovered a forum thread over on the Ruby Forum which went into some detail about the difficulties surrounding encoding conversion, localization, and internationalization.
</p>
<p>
<img src="http://joinfu.com/img/Jormungandr-small.png" style="float: right; margin: 0px 10px 0px 10px;" /><br />
Suffice it to say, &#8220;some detail&#8221; would be the understatement of the year in this case.  The forum thread is longer than <a href="http://en.wikipedia.org/wiki/J%C3%B6rmungandr" >Jörmungandr</a>, the mythical Norse sea serpent.
</p>
<p>
Notable voices on the thread include <a href="http://en.wikipedia.org/wiki/Yukihiro_Matsumoto" >Matz</a>, creator of Ruby and a core influencer in its direction,  and <a href="http://en.wikipedia.org/wiki/Tim_Bray"  title="Tim Bray">Tim Bray</a>, of our own Sun Microsystems and XML fame.  The original poster, one Michael Selig, began the thread, entitled <em><a href="http://www.ruby-forum.com/topic/165927"  title="Character Encodings - A Radical Suggestion">Character encodings &#8211; a radical suggestion</a></em>, with an ostensibly simple suggestion:
</p>
<blockquote><p>
Remove internal support for non-ASCII encodings completely, and when<br />
reading/writing UTF-16 (and UTF-32) files automatically transcode<br />
to/from UTF-8.
</p></blockquote>
<p>
Unfortunately for Michael, this small suggestion was the online equivalent of stepping in a pile of elephant dung.
</p>
<p>
Until reading the above-mentioned forum thread, I really had no idea about the complexities involved in character set handling, especially in the Asian countries.  If you are interested in character sets, collations, and Unicode vs. local encodings, reading through the forum thread will truly enlighten you as to the various arguments for and against UTF8.  It&#8217;s highly recommended reading, but be warned, it may leave you gasping for breath at some points&#8230;enjoy. <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />
</p></p>
]]></content:encoded>
			<wfw:commentRss>http://www.joinfu.com/2008/10/character-sets,-collations-and-the-jrmungandr/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Enabling and Fixing Drizzle Test Cases</title>
		<link>http://www.joinfu.com/2008/09/enabling-and-fixing-drizzle-test-cases/</link>
		<comments>http://www.joinfu.com/2008/09/enabling-and-fixing-drizzle-test-cases/#comments</comments>
		<pubDate>Wed, 10 Sep 2008 18:56:00 +0000</pubDate>
		<dc:creator>jaypipes</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Bazaar]]></category>
		<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[Launchpad]]></category>
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://joinfu.com/2008/09/enabling-and-fixing-drizzle-test-cases</guid>
		<description><![CDATA[When Brian began the work on refactoring the MySQL 6.0 Server source code into what has now become the Drizzle Project, a number of code pieces were removed, including some major MySQL functionality such as stored procedures, server-side prepared statements, SQL Mode, some legacy code, and a variety of data types. The goal, of course, [...]]]></description>
			<content:encoded><![CDATA[<p>
When <a href="http://krow.livejournal.com/"  title="Brian Aker">Brian</a> began the work on refactoring the MySQL 6.0 Server source code into what has now become the <a href="http://launchpad.net/drizzle"  title="Drizzle">Drizzle Project</a>, a number of code pieces were removed, including some major MySQL functionality such as stored procedures, server-side prepared statements, SQL Mode, some legacy code, and a variety of data types.  The goal, of course, was to reduce the server code base down to a more streamlined and eventually modular kernel.
</p>
<p>
Of course, that vision is great, but it&#8217;s got some side effects!  One of those side effects is a dramatic reduction in the number of test cases that pass the test suite in their current form, and an increase in the number of tests that have been disabled.  I re-enabled and fixed a few tests yesterday, but as of this writing, there are only 54 of 408 tests currently passing in the test suite.
</p>
<p>
This is to be expected.  You can&#8217;t just go and strip a huge chunk of the parser and functionality out of the server and expect the original test suite to run without problems <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />   So, Brian disabled some tests while removing code sections anticipating that these tests would eventually be re-enabled and any regressions fixed.  Well, we are now at that point.  With Brian&#8217;s work this week to remove the last vestiges of non-UTF8 character set support, fixing and re-enabling disabled tests in the test suite is now a high priority project.  Luckily, this is a project which almost anyone &mdash; even non-coders &mdash; can get involved in and make a difference.
</p>
<p>
This article will explain the process of running the Drizzle test suite and identifying test cases which can be re-enabled or should be fixed.  We&#8217;ll focus on stuff that you can help with as a contributor who wants to start getting involved in Drizzle and making an impact without having C/C++ coding experience.  If you haven&#8217;t caught my previous articles on <a href="http://www.joinfu.com/index.php?/archives/249-A-Contributors-Guide-to-Launchpad.net-Part-1-Getting-Started.html" >using Launchpad.net</a> for <a href="http://www.joinfu.com/index.php?/archives/250-A-Contributors-Guide-to-Launchpad.net-Part-2-Code-Management.html"  title="Using Launchpad for Code Management">code management</a>, I&#8217;d suggest reading those now.  In addition, although we won&#8217;t be doing C/C++ coding, you&#8217;ll need a build environment established in order to properly run the test suite.  So, I&#8217;d also suggest reading my article on <a href="http://www.joinfu.com/index.php?/archives/248-Getting-a-Working-CC++-Development-Environment-for-Developing-Drizzle.html" >setting up a C/C++ development environment for Drizzle</a>.
</p>
<h2>The Test Suite Basics</h2>
<p style="border: solid 1px #888; background-color: #f7f7f7; padding: 6px 30px;">
<strong style="color: red;">NOTE:</strong> This section describes the Drizzle test suite.  However, if you are contributing to the MySQL Server project, the instructions in this section are exactly the same if you are working with the MySQL Server.  Just change <tt>dtr</tt> to <tt>mtr</tt>. <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />
</p>
<p>
The Drizzle test suite is a composite of a main Perl script &mdash; <tt>tests/test-run.pl</tt> &mdash; and a couple other tools.  After you have built Drizzle with the standard build process, you will see a program in the <tt>/tests</tt> source directory called <tt>dtr</tt>.  This is the test suite runner.  When you issue the command:
</p>
<p><code><br />
make test<br />
</code></p>
<p>
This test runner is called with some command-line options and a list of tests to run.  You can verify this behaviour by looking at <tt>/tests/Makefile.am</tt> and seeing the actual command for the <tt>test</tt> make target.
</p>
<p>
The general form for running a test case is the following:
</p>
<p><code><br />
cd tests<br/><br />
./dtr $testname1 $testname2 ... $testnameN<br />
</code></p>
<p>
There are a number of command-line options that the test suite runner accepts, and I&#8217;ll cover a smattering of them in this article.
</p>
<h2>Running a Test</h2>
<p>
So, how do you know what the names of the tests are?  Good question! <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />   In the <tt>/tests</tt> directory, you will find a <tt>/t</tt> directory which contains all the test cases contained in the main test suite.  The main test suite are tests not specific to a functionality of the server like replication.  For instance, the <tt>/t/select.test</tt> is the test case in the main test group corresponding to testing of the SELECT syntax and functionality.  Other test cases for specific functional pieces can be found in <tt>/tests/suite</tt>.  Running this <tt>t/select.test</tt> test case through the runner, we&#8217;d do this:
</p>
<p><code><br />
./dtr select<br />
</code></p>
<p>
Note that you do not need to add the <tt>.test</tt> suffix.  You should see results similar to the following:
</p>
<pre>
[505][jpipes@serialcoder: /home/jpipes/repos/drizzle/trunk/tests]$ ./dtr select
Logging: ./dtr select
&lt;snip&gt;
MySQL Version 7.0.0
Using dynamic switching of binlog format
Using MTR_BUILD_THREAD      = 0
Using MASTER_MYPORT         = 9306
Using MASTER_MYPORT1        = 9307
Using SLAVE_MYPORT          = 9308
Using SLAVE_MYPORT1         = 9309
Using SLAVE_MYPORT2         = 9310
Killing Possible Leftover Processes
Removing Stale Files
Creating Directories
Saving snapshot of installed databases
=======================================================

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

main.select                    [ pass ]           9673
-------------------------------------------------------
Stopping All Servers
All 1 tests were successful.
The servers were restarted 1 times
Spent 9.673 of 21 seconds executing testcases
</pre>
<p>
As you can see, the test suite fires up a Drizzle server, loads the test file and performs the tests contained in the file.  The tests in the file generally consist of SQL statements that are executed against one or more servers, but they can also be commands such as creating a new connection, logging output, and other things.  For this article, we&#8217;ll be focusing on the SQL command tests.  In a followup article, I may highlight some of the other test-case commands available to you.
</p>
<h2>Failing Test Cases</h2>
<p>
Well, it&#8217;s all fine and dandy if a test case succeeds like in the example above, but like I mentioned in the introduction of this article, we&#8217;re focused on the test cases that <em>aren&#8217;t succeeding</em> and getting these test cases to succeed!  So, how do we find those tests which are failing?  One method is to look at the <a href="http://drizzlebuild.42sql.com/"  title="Drizzle Build Farm">Drizzle Build Farm</a> and track down failures occurring in the test runs.  Another way is to simply run a series of tests and see what fails.  For simplicity&#8217;s sake, I&#8217;ve done a little research already and know a number of tests that are failing.  So, we&#8217;ll go ahead and take a look at a test case that I know needs some <a href="http://en.wiktionary.org/wiki/tender_loving_care" >TLC</a>.
</p>
<p>
The test case I&#8217;ve chosen is the <tt>func_math</tt> test from the main test suite.  It&#8217;s small and provides a good example of how we can work to fix up the failures.  Here is what I get when running this test:
</p>
<pre>
[505][jpipes@serialcoder: tests]$ ./dtr func_math
Logging: ./dtr func_math
&lt;snip&gt;
MySQL Version 7.0.0
Using dynamic switching of binlog format
Using MTR_BUILD_THREAD      = 0
Using MASTER_MYPORT         = 9306
Using MASTER_MYPORT1        = 9307
Using SLAVE_MYPORT          = 9308
Using SLAVE_MYPORT1         = 9309
Using SLAVE_MYPORT2         = 9310
Killing Possible Leftover Processes
Removing Stale Files
Creating Directories
Saving snapshot of installed databases
=======================================================

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

main.func_math                 [ fail ]

drizzletest: At line 134: query 'create table t1 (a varchar(90), ts datetime not null, index (a)) engine=innodb
default charset=utf8' failed: 1064: You have an error in your SQL syntax; check the manual that corresponds
to your MySQL server version for the right syntax to use near 'charset=utf8' at line 1

The result from queries just before the failure was:
< snip >
656	405
122	405
645	405
INSERT INTO t1 VALUES (3);
SELECT CAST(RAND(2) * 1000 AS UNSIGNED), CAST(RAND(a) * 1000 AS UNSIGNED)
FROM t1;
CAST(RAND(2) * 1000 AS UNSIGNED)	CAST(RAND(a) * 1000 AS UNSIGNED)
656	405
122	405
645	405
858	656
354	906
SELECT CAST(RAND(2) * 1000 AS UNSIGNED), CAST(RAND(a) * 1000 AS UNSIGNED)
FROM t1 WHERE a = 1;
CAST(RAND(2) * 1000 AS UNSIGNED)	CAST(RAND(a) * 1000 AS UNSIGNED)
656	405
122	405
645	405
DROP TABLE t1;
create table t1 (a varchar(90), ts datetime not null, index (a)) engine=innodb default charset=utf8;

More results from queries before failure can be found in
/home/jpipes/repos/drizzle/trunk/tests/var/log/func_math.log

Stopping All Servers
Restoring snapshot of databases
Resuming Tests

-------------------------------------------------------
Stopping All Servers
Failed 1/1 tests, 0.00% were successful.

The log files in var/log may give you some hint
of what went wrong.
If you want to report this error, please read first the documentation at

http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html

The servers were restarted 1 times
Spent 0.000 of 7 seconds executing testcases

mysql-test-run in default mode: *** Failing the test(s): main.func_math
mysql-test-run: *** ERROR: there were failing test cases
</pre>
<p>
As you can see, the test fails and outputs the source of the failure.
</p>
<h2>Fixing a Broken Test</h2>
<p>
Now that we&#8217;ve identified a failing test, we need to follow a process in order to fix it.  The process you should follow is this:
</p>
<ol>
<li>Make a change to the test case file</li>
<li>Re-run the test through <tt>dtr</tt> using the <tt>--record</tt> option
<li>If any failure occurs, go back to #1</li>
<li>Once the test succeeds under the <tt>--record</tt> option, a test result file will be written to the <tt>/tests/r/</tt> directory.  We&#8217;ll need to <tt> bzr commit</tt> the changes to the test and the result file and push to a branch on Launchpad.</li>
<li>Edit <tt>tests/Makefile.am</tt> and ensure the newly-passing test is included in the <tt>make test</tt> target
</ol>
<p>
In this case, the failure is due to a mere syntax issue.  We&#8217;ve removed character set support and standardized entirely on UTF8, and so the support in the parser syntax for the phrase <tt>DEFAULT CHARSET=utf8</tt> is gone.  To fix this test, we need to remove the pieces of the old MySQL syntax which are no longer supported in Drizzle.
</p>
<p>
So, we pop open our favorite editor and open up the <tt>/tests/t/func_math.test</tt> file.  Go ahead and remove all instances of <tt>default charset=utf8</tt>.  And then re-run the test with the <tt>--record</tt>.  You should see the following:
</p>
<pre>
[508][jpipes@serialcoder: tests]$ ./dtr --record func_math
&lt;snip&gt;
=======================================================

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

main.func_math                 [ fail ]

drizzletest: At line 160: query 'create table t1
(f1 varchar(32) not null,
f2 smallint(5) unsigned not null,
f3 int(10) unsigned not null default '0')
engine=myisam' failed: 1064: You have an error in your SQL syntax; check the manual
that corresponds to your MySQL server version for the right syntax to use near '(5) unsigned not null,
f3 int(10) unsigned not null default '0')
engine=myisam' at line 2
&lt;snip&gt;
</pre>
<p>
Again, it looks like we&#8217;ve run into another syntax problem.  Above, the test case contains the old ZEROFILL syntax, which allows you to specify a number in parentheses after an integer data type.  This functionality, a legacy from Unireg times, is not supported in Drizzle.  So, we must remove it.  After removing the <tt>(XX)</tt> ZEROFILL syntax from the CREATE TABLE definitions in the test case file, I re-run the test:
</p>
<pre>
&lt;snip&gt;
=======================================================

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

main.func_math                 [ fail ]

drizzletest: At line 230: query 'CREATE TABLE t1(a SET('a','b','c'))' failed: 1064:
You have an error in your SQL syntax; check the manual that corresponds to your
MySQL server version for the right syntax to use near 'SET('a','b','c'))' at line 1
&lt;snip&gt;
</pre>
<p>
Once again, we&#8217;ve run into a failure.  This time, it&#8217;s because of the <tt>SET</tt> data type.  This data type has been removed from Drizzle.  So, we must remove it from the test case here.  After doing so, I re-run the test case, and finally we see a success:
</p>
<pre>
[509][jpipes@serialcoder: tests]$ ./dtr --record func_math
Logging: ./dtr --record func_math
&lt;snip&gt;
=======================================================

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

main.func_math                 [ pass ]            107
-------------------------------------------------------
Stopping All Servers
All 1 tests were successful.
The servers were restarted 1 times
Spent 0.107 of 8 seconds executing testcases
</pre>
<p>
Cool.  Looks good.  Now we edit <tt>tests/Makefile.am</tt> and add the newly successful test to the <tt>make test</tt> target.
</p>
<p><code><br />
cd tests<br/><br />
vim Makefile.am<br />
</code></p>
<p>
Here is what the section in the Makefile.am looks like, with the bolded line being the line I add for our newly successful test:
</p>
<pre>
test-drizzle:
	  $(PERL) -I$(top_srcdir)/tests/lib \
		$(top_srcdir)/tests/test-run.pl --fast --reorder --force \
        1st \
        alter_table \
        bench_count_distinct \
	bulk_replace \
	comment_column2 \
	comments \
	consistent_snapshot \
        count_distinct \
        count_distinct2 \
        count_distinct3 \
	create_select_tmp \
        ctype_filename \
        delete \
        distinct \
	drizzleslap \
        endspace \
        flush2 \
        func_equal \
        func_group_innodb \
        func_isnull \
        func_like \
	<strong>func_math \</strong>
        greedy_optimizer \
        group_min_max_innodb \
        heap_auto_increment \
</pre>
<p>
Alright, cool.  OK, now we simply need to verify our test case and result file changes, edit our make test target, and commit our changes.  First, verification:
</p>
<pre>
[511][jpipes@serialcoder: tests]$ bzr status
modified:
  tests/Makefile.am
  tests/r/func_math.result
  tests/t/func_math.test
</pre>
<p>
Looks good.  The final step is committing our work and then pushing to a code branch on Launchpad.net.  Below, I am pushing to the branch <a href="http://code.launchpad.net/~drizzle-developers/drizzle/enable-tests" >lp:~drizzle-developers/drizzle/enable-tests</a>, which is a team branch used to push code for the various test cleanups.
</p>
<pre>
[514][jpipes@serialcoder: tests]$ bzr commit Makefile.am t/func_math.test \
> r/func_math.result -m "Fixed syntax errors in func_math test and re-enable \
> the test in the make test target"
Committing to: /home/jpipes/repos/drizzle/trunk/
modified tests/Makefile.am
modified tests/r/func_math.result
modified tests/t/func_math.test
Committed revision 405.
[515][jpipes@serialcoder: tests]$ bzr push lp:~drizzle-developers/drizzle/enable-tests
Pushed up to revision 405.
</pre>
<p>
And that&#8217;s that!  Test fixed, case, result and Makefile.am edited, and changes committed.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.joinfu.com/2008/09/enabling-and-fixing-drizzle-test-cases/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Contributor&#8217;s Guide to Launchpad.net &#8211; Part 2 &#8211; Code Management</title>
		<link>http://www.joinfu.com/2008/08/a-contributors-guide-to-launchpadnet-part-2-code-management/</link>
		<comments>http://www.joinfu.com/2008/08/a-contributors-guide-to-launchpadnet-part-2-code-management/#comments</comments>
		<pubDate>Thu, 28 Aug 2008 23:07:00 +0000</pubDate>
		<dc:creator>jaypipes</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Bazaar]]></category>
		<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[Launchpad]]></category>
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://joinfu.com/2008/08/a-contributor&#039;s-guide-to-launchpadnet--part-2--code-management</guid>
		<description><![CDATA[In this second part of my Launchpad guidebook series, I&#8217;ll be covering the code management and repository features of Launchpad.net. If you missed the first part of my series, go check it out and get established on Launchpad.net. Then pop back to this article to dive into the magic of http://code.launchpad.net. In this article, we&#8217;ll [...]]]></description>
			<content:encoded><![CDATA[<p>
In this second part of my Launchpad guidebook series, I&#8217;ll be covering the code management and repository features of Launchpad.net.  If you missed the first part of my series, go <a href="http://www.joinfu.com/2008/08/a-contributors-guide-to-launchpadnet-part-1-getting-started/"  title="Getting Started on Launchpad.net">check it out and get established on Launchpad.net</a>.  Then pop back to this article to dive into the magic of <a href="http://code.launchpad.net" >http://code.launchpad.net</a>.  In this article, we&#8217;ll cover the following aspects of the code management pieces of Launchpad:
</p>
<ul>
<li>The Structure of Project Source Code on Launchpad.net</li>
<li>Pulling Code into a Local Repository</li>
<li>Creating a Local Working Branch for Bug Fixing</li>
<li>Pushing Code to Launchpad</li>
<li>Notifying a Merge Captain of Your Code Pushes</li>
<li>Keeping Trunk Up To Date</li>
<li>Merging Local Code with a Trunk Branch</li>
</ul>
<p>
For the following article, we&#8217;ll be acting as if you are contributing to the <a href="http://launchpad.net/mysql-server"  title="MySQL Server on Launchpad.net">MySQL Server project</a> and wish to create a patch to fix a bug in the MySQL server.  We&#8217;ll be working through all the steps to do so.  If you are looking to contribute to a different project, or your own project, simply replace the names and URLs in the article with ones for your particular project. <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />
</p>
<h2>The Structure of Project Source Code on Launchpad.net</h2>
<p>
Projects hosted on Launchpad.net are organized using the terminology <a href="http://doc.bazaar-vcs.org/bzr.dev/en/user-reference/bzr_man.html#branches"  title="Bazaar branches">branches</a>, <a href="http://doc.bazaar-vcs.org/bzr.dev/en/user-reference/bzr_man.html#repositories"  title="Bazaar repositories">repositories</a> and <a href="https://help.launchpad.net/FeatureHighlights/ProductSeries"  title="Launchpad.net series">series</a>.  A branch is simply a Bazaar branch of the project&#8217;s source code.  A repository is a collection of a project&#8217;s Bazaar branches.  A series is a named branch which represents something special for the project &mdash; usually a tagged release or a development branch.
</p>
<p>
<img src="http://doc.bazaar-vcs.org/bzr.dev/en/user-guide/images/workflows_gatekeeper.png" style="float: right; margin: 10px 0px 20px 20px;" /><br />
Because Launchpad uses <a href="http://bazaar-vcs.org"  title="Bazaar VCS">Bazaar</a>, a <a href="http://en.wikipedia.org/wiki/Distributed_revision_control"  title="Distributed Version Control">distributed version control system</a>, sometimes it takes a little while to get used to the fact that there is not a &#8220;central&#8221; source tree to which you check code into and out of.  Instead, what you do is &#8220;pull&#8221; a named branch from a Bazaar server to your local workstation and work on that local branch in peace and quiet until you want to &#8220;push&#8221; your code into another branch.  The branch to which you &#8220;push&#8221; may be the &#8220;trunk&#8221;, or &#8220;active development&#8221; branch, but normally you will not push to the trunk branch.  Instead, you&#8217;ll push your local branch containing your source code changes to a personal branch on the Launchpad.net server and propose that your branch be &#8220;merged into trunk&#8221;.  The image to the right, from the <a href="http://doc.bazaar-vcs.org/bzr.dev/en/user-guide/index.html"  title="Bazaar User Guide">Bazaar user guide</a>, show the general flow of code using this type of process, called a &#8220;decentralized system with a human gatekeeper&#8221;.
</p>
<p>
This pull, code, push and merge process is the recommended way to manage code changes for a project.  It allows a core set of &#8220;merge captains&#8221; to review and check your code before merging your code into the active development branch.  It is this process which I will be demonstrating in this article.
</p>
<h2>Pulling Code into a Local Repository</h2>
<p>
When you work on a project in Launchpad, you work on code in a local branch of the project.  To get rolling, you will first want to set up a local repository if you haven&#8217;t already done so.  To do so, we use the <tt>bzr init-repo</tt> command:
</p>
<p style="border: solid 1px #888; background-color: #f7f7f7; padding: 6px 30px;">
<strong style="color: red;">NOTE:</strong> You will need Bazaar installed to do so.  Don&#8217;t have Bazaar installed?  See my previous article on <a href="http://www.joinfu.com/2008/08/getting-a-working-c-c-plusplus-development-environment-for-developing-drizzle/"  title="Getting a C/C++ Development Environment Established">Getting a C/C++ Development Environment Established</a>.  Not coding in C/C++?  Don&#8217;t worry, just read the section of that article on installing Bazaar.
</p>
<p><code><br />
cd ~/repos <span style="font-variant: italic; color: green;"># Change this to the folder in which you plan to have bzr repositories...</span><br />
bzr init-repo mysql-server-5.1<br />
</code></p>
<p>
At this point, a <a href="http://bazaar-vcs.org/SharedRepositoryTutorial"  title="Shared Repository Bazaar">shared repository</a> will be created.  What the heck is a shared repository?  Well, it&#8217;s basically a special folder that Bazaar knows contains information about source code branches.  It facilitates speedier branching and merging and is especially useful for larger source trees and changeset histories like the MySQL server.  You can verify that Bazaar knows something about your newly created repository by checking for a <tt>.bzr</tt> hidden folder in your repository folder:
</p>
<pre>
[504][jpipes@serialcoder: /home/jpipes/repos]$ ls -la mysql-server-5.1/
total 12
drwxr-xr-x 3 jpipes jpipes 4096 2008-08-26 15:00 .
drwxr-xr-x 7 jpipes jpipes 4096 2008-08-26 15:00 ..
drwxr-xr-x 4 jpipes jpipes 4096 2008-08-26 15:00 .bzr
</pre>
<p>
The next step is pulling the active development series of your particular project.  In our example, we&#8217;ll pull the active development branch of the &#8220;5.1&#8243; series of the MySQL Server.  To do so, we use the <tt>bzr branch</tt> command:
</p>
<p><code><br />
cd mysql-server-5.1/<br/><br />
bzr branch lp:mysql-server/5.1 trunk<br />
</code></p>
<p style="border: solid 1px #888; background-color: #f7f7f7; padding: 6px 30px;">
<strong style="color: red;">WARNING:</strong> When doing the initial pull of the first branch in a shared repository, the branch command can take quite some time to execute, especially when pulling a branch of a project like the MySQL Server which has a huge history of changesets to it.  Be prepared to wait a while, and if Bazaar looks like it&#8217;s stuck doing stuff, just leave it alone. <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />   It typically will take about <strong>80-90 minutes</strong> to complete the first time!
</p>
<p>
In the <tt>bzr branch</tt> command above, <strong>lp:</strong> designates we are looking for a branch residing on the Launchpad.net Bazaar servers.  The colon is followed by the name of the project, in this case <strong>mysql-server</strong>, followed by a slash and the name of the series, in this case <strong>5.1</strong>.  You can always check the names of a project&#8217;s series by going to the main code area of a project.  For MySQL Server, that address would be <a href="http://code.launchpad.net/mysql-server" >http://code.launchpad.net/mysql-server</a>
</p>
<p>
When finished the initial branch, you&#8217;ll see something like the following, shown with the <tt>time</tt> command to illustrate the amount of time you should expect for the MySQL Server initial branch:
</p>
<pre>
[511][jpipes@serialcoder: /home/jpipes/repos/mysql-server-5.1]$ time bzr branch lp:mysql-server/5.1 trunk
Server is too old for streaming pull, reconnecting.  (Upgrade the server to Bazaar 1.2 to avoid this)
Branched 2719 revision(s).

real	91m30.337s
user	14m6.825s
sys	6m7.355s
</pre>
<p>
The total amount of &#8220;stuff&#8221; that is downloaded for the 5.1 server is around <strong>600MB</strong>, so it shouldn&#8217;t be surprising that it takes some time to do the initial branch&#8230;
</p>
<p>
You can ignore the message about upgrading the server to Bazaar 1.2; it&#8217;s because I&#8217;m not using the very recent Bazaar 1.6 client.  The performance of this first branch is currently being investigated by <a href="http://jam-bazaar.blogspot.com/"  title="John Arbash Meinel">John Arbash Meinel</a>, one of Bazaar&#8217;s developers, whom I spent some &#8220;quality time&#8221; with on IRC today. <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />   He mentions that the developers are working on something called &#8220;shallow trees&#8221; which should significantly speed up initial branching for large projects.
</p>
<h2>Creating a Local Working Branch for Bug Fixing</h2>
<p>
OK, if you&#8217;ve gotten this far, then you will have a local shared repository that contains a single branch which contains the source code and changeset history for the 5.1 series of the MySQL Server.  What we want to do now is fix a bug in the MySQL Server 5.1 locally on our workstation.  This will be a fictitious bug called <strong>Bug#99999 &#8211; &#8220;authors.h file doesn&#8217;t contain <em>MY NAME!</em></strong>&#8221; <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />
</p>
<p>
Certainly, we <em>could</em> just start hacking up the code in the trunk branch we just pulled.  <em>But</em>, that&#8217;s not the most practical way of doing structured development on a local workstation.  Instead, you should create a branch from trunk which will house only the changes specific to what you are working on: in this case, our fictitious bug#99999.  Why is this a better practice than simply making the changes in the trunk branch?  Well, a couple reasons:
</p>
<ul>
<li>It keeps the trunk branch free of any of your changes.  This is very important if you want to keep trunk up-to-date with changes from the rest of the project&#8217;s developers and not have to constantly resolve merge conflicts (more on that later).  Having a clean trunk branch local to your workstation means you can quickly and easily make local working branches from trunk for anything you are working on.</li>
<li>It creates an easy-to-understand naming structure for your local repository.  After a while, you might have a repository with the following structure:
<pre>
~/repos
  /mysql-server-5.1
    /trunk # perhaps a cron job keeps this in synch every morning...?
    /bugXXXX-fix-invalid-pointer-in-binlog
    /bugXXXX-wrong-error-message-on-create-table
    /my-fancy-storage-engine
</pre>
</li>
</ul>
<p>
OK, so hopefully I&#8217;ve convinced you to follow the advice of creating separate local working branches for actually changing source code.  <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />   Let&#8217;s create a local working branch to do the fix for our Bug#99999.  We use the <tt>bzr branch</tt> command, as before, but this time we&#8217;ll be branching from our local trunk, and not the Launchpad trunk branch for the 5.1 series.  You will notice that the branching will be <strong>significantly</strong> faster than before.  This is due both to doing things locally, and to the fact that you now have a shared repository set up.  Enter the following:
</p>
<pre>
bzr branch trunk bug99999-fix-authors-file
</pre>
<p>
Once completed, you should see something like the following:
</p>
<pre>
[571][jpipes@serialcoder: /home/jpipes/repos/mysql-server-5.1]$ bzr branch trunk bug99999-fix-authors-file
Branched 2719 revision(s).
</pre>
<p>
We&#8217;re now ready to start fixing our bug.  Hop into the newly created working branch and open up the <tt>sql/authors.h</tt> file in your editor of choice (here, I&#8217;ll use <a href="http://vim.org"  title="Vim">Vim</a>)
</p>
<pre>
cd bug99999-fix-authors-file<br/>
vim sql/authors.h
</pre>
<p>
The &#8220;fix&#8221; for this bug is simply adding your name to the <tt>sql/authors.h</tt> file of course.  Go ahead and add your name to the list in that file and save and close the file.  Now, that has to be the easiest bug fix <em>ever</em>.  No wait, I <a href="http://www.flamingspork.com/blog/2006/06/29/smallest-patch-ever/" >take that back</a>.
</p>
<p>
Before we commit anything, let&#8217;s first check to see what changes we have made in the local branch.  To do so, we use the <tt>bzr status</tt> command, like so:
</p>
<pre>
bzr status
</pre>
<p>
If you&#8217;ve done everything up until now, you should see something very similar to the below:
</p>
<pre>
[503][jpipes@serialcoder: /home/jpipes/repos/mysql-server-5.1/bug99999-fix-authors-file]$ bzr status
modified:
  sql/authors.h
</pre>
<p>
The next thing we&#8217;ll need to do is commit our changes to the local branch.  For those of you used to CVS or Subversion, this step will <em>look</em> familiar, however remember that with Bazaar you are committing to the <strong>local branch</strong>, not a central repository.  (This isn&#8217;t always the case, but for now, assume it is&#8230;)
</p>
<p>
Like in other source control systems, we&#8217;ll use the <tt>commit</tt> command.  There are a number of command options that you can use with the commit command, and I will outline two of them here.  The most important is the <tt>-m</tt> option, which allows you to enter a string which will be the comment for the set of changes in this commit.  This is an extremely useful option for smaller changesets.  For larger ones, leave off the <tt>-m</tt> option and your environment&#8217;s editor will pop up after hitting enter to allow you to enter in larger comments.
</p>
<p style="border: solid 1px #888; background-color: #f7f7f7; padding: 6px 30px;">
<strong>TIP:</strong> Remember, a best practice whenever you commit source code to a revision control system is to make the changeset comments as descriptive as possible, so other developers can clearly tell what you were doing.
</p>
<p>
The second option I&#8217;ll tell you about is a nifty one which integrates Bazaar with the Launchpad.net bug tracking system.  If you have a Bug report that is managed by Launchpad.net, you can supply the bug number to the <tt>--fixes</tt> option and automatically close a bug report with your bzr commit.  Pretty cool, eh? <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />
</p>
<p>
Below, I show how to use these two options with the <tt>bzr commit</tt> command:
</p>
<pre>
bzr commit -m "Add my name to authors.h" --fixes lp:99999
</pre>
<p>
You should see something like the following appear:
</p>
<pre>
Committing to: /home/jpipes/repos/mysql-server/bug99999-fix-authors-file/
modified sql/authors.h
Committed revision 2720.
</pre>
<p>
Did you know that Bazaar supports more bug tracking systems than just Launchpad.net?  Check out the <a href="http://doc.bazaar-vcs.org/bzr.dev/en/user-reference/bzr_man.html#bug-tracker-settings" >Bazaar user guide section on Bug Tracking integration</a> for more information.
</p>
<h2>Pushing Code to Launchpad</h2>
<p>
Now that you&#8217;ve made your code changes, it&#8217;s time to push those changes up to a branch on Launchpad.net.  Why do we want to push the local branch changes to Launchpad.net, instead of sending the changes to another team member (using the <tt>bzr send</tt> or <tt>bzr export</tt> command)?  Well, first of all, pushing the changes to Launchpad.net allows anyone to see and review your code changes, making the Launchpad.net website an easy and centralized place to do that.  Secondly, having the branch on Launchpad.net allows you to get more out of the Launchpad.net platform and integrate your branch and code with other features of the platform, such as Bug Tracking and the Blueprints task management and milestone system.
</p>
<p>
To get your local branch to Launchpad.net, you use the <tt>bzr push</tt> command.  The push command takes as an argument the address of the branch to which you wish to push your local changes.  If you are pushing changes to a <em>new</em> branch on Launchpad.net, the system will create that new branch for you automatically.  If you are pushing to an existing branch, the system simply uploads your changesets and applies them to that branch.
</p>
<p>
On Launchpad, there are a number of locations where we can push Bazaar branches.  Each Launchpad user gets their own <tt>http://code.launchpad.net/~username</tt> area in which to put branches.  Whenever you are a member of a project team, you also can push code into the <tt>code.launchpad.net/~username/projectname/</tt> location.  Also, each Launchpad user has a &#8220;Junk&#8221; area (<tt>code.launchpad.net/~username/+junk/</tt> that they can post any old branch to.  After either of these locations, you put the name of the branch you are pushing to (or creating).
</p>
<p>
To see an example, let&#8217;s use our &#8220;Junk&#8221; area for right now, and push our local bug99999-fix-authors-file branch to Launchpad.net.  We want to push our local bug-fix branch to a general branch in our junk folder which will contain <strong>all of our bug fixing efforts</strong>.  Why?  Well, there&#8217;s no need to create separate branches for each bug fix on Launchpad.net just so people can see the code in the single bug fix.  We can push all our code changes to a general branch and then point reviewers to the specific revision we worked on.  This saves a whole lot of time when pushing branches.  So, here is how we push:
</p>
<pre>
bzr push lp:~jaypipes/+junk/mysql-server <span style="color: red;"># Of course, replace jaypipes with your own username!</span>
</pre>
<p>
After a while, you&#8217;ll see the following:
</p>
<pre>
Created new branch.
</pre>
<p>
You can now go view your branch and it&#8217;s associated code changes by visiting the branch&#8217;s code URL, which will be http://code.launchpad.net/~yourusername/+junk/mysql-server if you&#8217;ve been following the steps in this article.
</p>
<h3>Belonging to a Project Team</h3>
<p>
Having a branch in your junk folder is fine, but Launchpad is all about belonging to a community of developers!  When you belong to a project team, you are automatically able to push your branches to the project&#8217;s code area.  Importantly, if you belong to a project team, then you can use Launchpad&#8217;s ability to propose a branch merge &mdash; something you cannot do if you push to your &#8220;Junk&#8221; folder.  To push one of your local branches to a project, you would do:
</p>
<pre>
cd ~/repos/projectname/branchname><br/>
bzr push lp:~yourusername/projectname/branchname
</pre>
<p>
Once pushed, the branch will be visible to anyone when they look at the project&#8217;s code branch listing, which is always at http://code.launchpad.net/projectname.  If you click on your branch, you will go to the branch&#8217;s main page. You&#8217;ll notice links for a number of actions that you can take, including one called &#8220;Propose for merging into another branch&#8221;.  I&#8217;ll be talking a lot about these options in future articles in this series, since many of them relate to the other parts of the Launchpad platform.
</p>
<h2>Notifying a Merge Captain of Your Code Pushes</h2>
<p>
OK, now that your code is up in a branch, the next step is to ask for a review of your code to be merged into the development branch of the project.  In this case, I&#8217;m pretty sure our Bug#99999 fix isn&#8217;t going be passing any code reviews to get into the MySQL Server, but I&#8217;ll explain the process anyway for reference. <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />
</p>
<p>
A merge captain is someone who chaperons a project&#8217;s main development branch by being a gatekeeper for new code being merged into it.  This is a critical role that some folks think is a fun or coveted job.  It&#8217;s not.  As a merge captain for the <a href="http://launchpad.net/drizzle"  title="Drizzle">Drizzle project</a>, I can tell you that in fact it&#8217;s mainly grunt work comprising tedious pulling, merging, building the code, and running of test suites.  So, be nice to your merge captains!  The best way to be nice to your merge captain is to follow these two simple rules:
</p>
<ol>
<li>Always <strong>make comments for your commits</strong> that are descriptive and explain clearly what your code is doing and what tasks or bugs the work addresses</li>
<li><strong>Follow a standard</strong>, agreed upon <strong>process</strong> for notifying a merge captain of merge requests</li>
</ol>
<p>
In the case of the second rule, the standard process uses the Launchpad.net platform to notify the merge captain of a request &mdash; instead of, for instance, bugging the crap out of the captain on IRC to merge your branch. <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />
</p>
<p>
To notify the merge captain about your branch, navigate to the main code page of the branch you pushed.  Remember that to do this step, you must be a member of the project team!  On the branch&#8217;s main page, look for the link &#8220;Propose merging into another branch&#8221; and click it.  You will be taken to the Propose Branch for Merging page.  Select the <strong>target branch</strong>.  The default will be the active development branch for the project in question.  You may also provide a branch name.  In the &#8220;whiteboard&#8221; text area, you can provide a brief description of the changes contained in your branch.  Typically, you will want to keep the option &#8220;Needs Review&#8221; selected, and then click the <strong>Register</strong> button.
</p>
<p>
At this point, an email will be fired off to any subscribers for the project&#8217;s trunk branch commits.  The merge captain will, of course, be one of the ones which receives this email and will initiate the review of your code and the merging of it into trunk (or whichever target you specified).
</p>
<h2>Keeping Trunk Up to Date</h2>
<p>
The final two parts of this article looks at how to keep your own local branches up to date with the development branch of your project and also shows you how to merge someone else&#8217;s branch with your own.
</p>
<p>
To pull all the latest changes from a remote branch into a local branch, the <tt>bzr pull</tt> command is used.  You navigate to the local branch you wish to pull changes for and then issue the pull command.
</p>
<p>
This will bring in all the merges and changesets of the branch you <em>originally branched from</em> into the local branch.  In our case, the local &#8220;trunk&#8221; branch was branched originally from the branch at lp:mysql-server/5.1 and so if we do:
</p>
<pre>
cd ~/repos/mysql-server-5.1/trunk<br/>
bzr pull
</pre>
<p>
We will update our local trunk branch with the changes in the active development branch on Launchpad.
</p>
<h2>Merging Local Code with a Trunk Branch</h2>
<p>
A merge, as alluded to above, is simply when you want to combine the code changes of one branch with the code in another.  We use the <tt>bzr merge</tt> command to do so.  The merge command takes a single argument: the location of the branch you wish to merge into the current one.
</p>
<p>
To demonstrate how the merge command is used, I&#8217;ll just paste the work I just did for the Drizzle project in merging in a contributor&#8217;s fixes into trunk:
</p>
<pre>
[509][jpipes@serialcoder: /home/jpipes/repos/drizzle]$ cd trunk/
[510][jpipes@serialcoder: /home/jpipes/repos/drizzle/trunk]$ bzr merge ../grant-bug261687/
 M  drizzled/sql_derived.cc
All changes applied successfully.
[511][jpipes@serialcoder: /home/jpipes/repos/drizzle/trunk]$ bzr commit -m \
"Merged Grant's fixes for sql_derived.  Fixes bug#261687" --fixes lp:261687
Committing to: /home/jpipes/repos/drizzle/trunk/
modified drizzled/sql_derived.cc
Committed revision 373.
</pre>
<p>
The above was executed after a pulled Grant&#8217;s Launchpad.net branch called bug261687, built the code, and ran the test suite to ensure no failures.  I change to my local branch of the drizzle project development branch (called &#8220;trunk&#8221; locally).  I then <em>merge</em> Grant&#8217;s branch into trunk locally with the <tt>bzr merge ../grant-bug261687</tt> command.  After merging, I must commit the trunk code.  I do so, making a comment that this is a merge of Grant&#8217;s work, and noting the bug # which the changeset fixes.  Once I do this, I am free to <tt>bzr push lp:drizzle</tt> and push the local merge to Launchpad&#8230;
</p>
<h2>Conclusion</h2>
<p>
As you can see, Launchpad and Bazaar is a feature-full code management system with a lot of bells and whistles. Hopefully, this article can get you started in your adventures in contributing to projects hosted on Bazaar/Launchpad.net.  See you on Launchpad!  Up next in this series: how to do task management with Launchpad&#8217;s Blueprints system.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.joinfu.com/2008/08/a-contributors-guide-to-launchpadnet-part-2-code-management/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Contributor&#8217;s Guide to Launchpad.net &#8211; Part 1 &#8211; Getting Started</title>
		<link>http://www.joinfu.com/2008/08/a-contributors-guide-to-launchpadnet-part-1-getting-started/</link>
		<comments>http://www.joinfu.com/2008/08/a-contributors-guide-to-launchpadnet-part-1-getting-started/#comments</comments>
		<pubDate>Fri, 22 Aug 2008 13:41:14 +0000</pubDate>
		<dc:creator>jaypipes</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Bazaar]]></category>
		<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[Launchpad]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[MySQL Forge]]></category>

		<guid isPermaLink="false">http://joinfu.com/2008/08/a-contributor&#039;s-guide-to-launchpadnet--part-1--getting-started</guid>
		<description><![CDATA[This post is the first in a series of articles which serves to highlight the services of the Launchpad platform which hosts the MySQL Server, MySQL Forge, MySQL Sandbox and Drizzle Server projects. I will be walking you through the various pieces of the platform and provide examples of using each of the services. I [...]]]></description>
			<content:encoded><![CDATA[<p>
This post is the first in a series of articles which serves to highlight the services of the <a href="http://launchpad.net"  title="Launchpad">Launchpad</a> platform which hosts the <a href="http://launchpad.net/mysql-server"  title="MySQL Server">MySQL Server</a>, <a href="http://launchpad.net/mysql-forge"  title="MySQL Forge Launchpad">MySQL Forge</a>, <a href="http://launchpad.net/mysql-sandbox"  title="MySQL Sandbox on Launchpad.net">MySQL Sandbox</a> and <a href="http://launchpad.net/drizzle"  title="Drizzle Server">Drizzle Server</a> projects.  I will be walking you through the various pieces of the platform and provide examples of using each of the services.  I will cover in depth the source code management services which all three projects now rely upon.  The code management services are the critical piece of the development platform.  In addition, I will show you how to use the Blueprints, Bugs, Answers and Translations services that many MySQL ecosystem projects, including the MySQL Sandbox and MySQL Forge, use.
</p>
<p>
In this first article, I will walk through the critical first step of establishing a Launchpad.net account and setting up the OpenSSH and OpenPGP keys for your account.  In follow-up posts, I cover the code management system, blueprints system, and more.
</p>
<h2>Creating Your Account on Launchpad.net</h2>
<p>
The first thing to do is obviously create your account on Launchpad.net.  Doing so is trivially easy.  Go to the <a href="https://launchpad.net/+login"  title="Launchpad Login">registration page</a> and enter in your email address.  Launchpad will then email (subject line: &#8220;Finish your Launchpad registration&#8221;) that address with a link to start the registration process.  Click on the link in that email.  You&#8217;ll be asked to provide a Display Name and a password.  You can choose to have your email address hidden from other users or not.  After filling in the information, click the Continue button and you&#8217;ll end up in your account profile area.
</p>
<p>
Once in the profile area, go ahead and fill in any of the information you want to be public about yourself by clicking the &#8220;Change Details&#8221; link in the right of the page.  You can upload an avatar image for yourself and fill in a little &#8220;Bio&#8221; section.  In a sub-navigation area at the top of the page, you will see links to edit your Email Settings, SSH Keys, GPG Keys, and Passwords.  We&#8217;ll cover the SSH and GPG Keys in a bit.  For now, explore this area and set your preferences the way you like.</p>
<p>
Also, at the very bottom of the Change Details page, you also have some other links to edit your IRC nicks and other stuff.
</p>
<h2>Your OpenSSH Keys</h2>
<p>
One important thing to do when setting up your account is to upload your public SSH key to Launchpad.net.  This helps the code management system by facilitating the bzr+ssh protocol and allowing you to push a bazaar branch to the Launchpad.net supermirror (more on that later).  If you have already generated your public key, go ahead and skip the next subsection and proceed to &#8220;<a href="#upload-ssh-key" >Upload your public key</a>&#8220;.
</p>
<p>
To generate your SSH key, you will need to have OpenSSH installed.  For Ubuntu users, simply do <tt>sudo apt-get install openssh</tt>.  For other Linux users, use your package manager of choice.  Windows users can use the PuTTY key generator, and should follow <a href="https://help.launchpad.net/YourAccount/CreatingAnSSHKeyPair" >instructions on the excellent Launchpad.net Help wiki</a>.  Once you have OpenSSH installed, it&#8217;s time to generate your key.  Do so with the following in a terminal:
</p>
<pre>
ssh-keygen
</pre>
<p>
When prompted, to accept the default key file names (~/.ssh/id_rsa.pub and ~/.ssh/id_rsa, and then a password for protecting the key file.
</p>
<h3>Upload your public key</h3>
<p>
You can output your public key using the following:
</p>
<pre>
cat ~/.ssh/id_rsa.pub
</pre>
<p>
Go ahead and copy the public key as-is; you&#8217;ll need it soon.
</p>
<p>
Now that you&#8217;ve generated your SSH keys, go ahead and click on the &#8220;<a href="https://launchpad.net/people/+me/+editsshkeys"  title="Edit your Launchpad SSH keys">SSH Keys</a>&#8221; in the sub-navigation bar in your profile.  Simply copy your public key into the text area marked &#8220;Add an SSH Key&#8221; and then click &#8220;Import Public Key&#8221;.  OK, all set, let&#8217;s tackle the GPG Keys next.
</p>
<h2>GPG Keys (Optional, but Recommended)</h2>
<p>
Before we generate a GPG key and upload one to Launchpad.net, you will first want to ensure that you have a mail reader capable of decrypting PGP-encrypted emails.  Personally, I use Thunderbird and the excellent Enigmail plugin for this, but you will want to use your own preferred MUA.  Use this <a href="https://help.launchpad.net/ReadingOpenPgpMail" >help article for assistance in setting up PGP</a> for your mail client of choice.
</p>
<p>
OK, next up we&#8217;ll go ahead generate GnuPG keys for use with email security and the Launchpad.net mailing lists.  To generate a GnuPG key pair, issue the following in a terminal:
</p>
<pre>
gpg --gen-key
</pre>
<p>
This will start a series of questions for you to answer, including which email address the key is for, your name, a passphrase and how many bits to make your key file (I chose 2048).  Here is what the series will look like in a terminal:
</p>
<pre>
[510][jpipes@serialcoder: /home/jpipes/.gnupg]$ gpg --gen-key
gpg (GnuPG) 1.4.6; Copyright (C) 2006 Free Software Foundation, Inc.
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under certain conditions. See the file COPYING for details.

Please select what kind of key you want:
   (1) DSA and Elgamal (default)
   (2) DSA (sign only)
   (5) RSA (sign only)
Your selection? 1
DSA keypair will have 1024 bits.
ELG-E keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048)
Requested keysize is 2048 bits
Please specify how long the key should be valid.
         0 = key does not expire
      <n>  = key expires in n days
      <n>w = key expires in n weeks
      <n>m = key expires in n months
      <n>y = key expires in n years
Key is valid for? (0)
Key does not expire at all
Is this correct? (y/N) y

You need a user ID to identify your key; the software constructs the user ID
from the Real Name, Comment and Email Address in this form:
    "Heinrich Heine (Der Dichter) <heinrichh@duesseldorf.de>"

Real name: Jay Pipes
Email address: xxxxxxxxx
Comment:
You selected this USER-ID:
    "Jay Pipes <xxxxxxxxx>"

Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O
You need a Passphrase to protect your secret key.

We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
...
gpg: key 9C5804A8 marked as ultimately trusted
public and secret key created and signed.

gpg: checking the trustdb
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
pub   1024D/9C5804A8 2008-08-22
      Key fingerprint = 16C5 50D4 7061 03A1 48CD  3826 0CAD 7BD9 9C58 04A8
uid                  Jay Pipes <xxxxxxxxxx>
sub   2048g/F7F4A925 2008-08-22
</pre>
<p>
OK, once generated, verify that everything worked as expected using the following:
</p>
<pre>
gpg --list-keys
</pre>
<p>
which should produce output similar to the following:
</p>
<pre>
[512][jpipes@serialcoder: /home/jpipes/.gnupg]$ gpg --list-keys
/home/jpipes/.gnupg/pubring.gpg
-------------------------------
pub   1024D/9C5804A8 2008-08-22
uid                  Jay Pipes <xxxxxxxxxx>
sub   2048g/F7F4A925 2008-08-22
</pre>
<p>
The next step is to push your public key to the Ubuntu key server &mdash; technically, you could push it to any old key server, but to keep things simple, just use the ubuntu key server.  You&#8217;ll need to take note of your <em>public key ID</em>, which is the 8-character hex number following 1024D in the list-keys output.  In my case, that public key ID is &#8220;9C5804A8&#8243;.  Issue the following command, substituting your public key ID:
</p>
<pre>
gpg --send-keys --keyserver keyserver.ubuntu.com @Your_Public_Key
</pre>
<p>
The final step is to make Launchpad.net aware of your new GPG key.  To do so, you need to send your GPG <em>fingerprint</em> to Launchpad.  To grab your fingerprint, issue the following command:
</p>
<pre>
gpg --fingerprint
</pre>
<p>
which should produce something very similar to:
</p>
<pre>
[514][jpipes@serialcoder: /home/jpipes/.gnupg]$ gpg --fingerprint
/home/jpipes/.gnupg/pubring.gpg
-------------------------------
pub   1024D/9C5804A8 2008-08-22
      Key fingerprint = 16C5 50D4 7061 03A1 48CD  3826 0CAD 7BD9 9C58 04A8
uid                  Jay Pipes <xxxxxxxxxx>
sub   2048g/F7F4A925 2008-08-22
</pre>
<p>
Copy the key fingerprint and paste it into the text area in your <a href="https://launchpad.net/people/+me/+editpgpkeys" >Open GPG keys page in your profile</a>, then click the &#8220;Import Key&#8221; button.  Launchpad.net will email the address of the key with a confirmation message.  Click the link in the email under &#8220;Please go here to finish adding the key to your Launchpad account:&#8221;.  And you are all done.
</p>
<p>
Now you have all the pieces set up to begin working with Launchpad.net effectively.  In the next few posts in this series, I&#8217;ll walk you through how to best use the platform to be a productive contributor to the MySQL ecosystem. <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.joinfu.com/2008/08/a-contributors-guide-to-launchpadnet-part-1-getting-started/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Getting a Working C/C++ Development Environment for Developing Drizzle</title>
		<link>http://www.joinfu.com/2008/08/getting-a-working-c-c-plusplus-development-environment-for-developing-drizzle/</link>
		<comments>http://www.joinfu.com/2008/08/getting-a-working-c-c-plusplus-development-environment-for-developing-drizzle/#comments</comments>
		<pubDate>Thu, 21 Aug 2008 20:34:00 +0000</pubDate>
		<dc:creator>jaypipes</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://joinfu.com/2008/08/getting-a-working-c/c++-development-environment-for-developing-drizzle</guid>
		<description><![CDATA[This article explains how to set up a properly functioning C/C++ development environment on Linux. The article is aimed at developers interested in contributing to the Drizzle server project, but the vast majority of the content applies equally well to developers wishing to contribute to the MySQL server or any other open source project written [...]]]></description>
			<content:encoded><![CDATA[<p>This article explains how to set up a properly functioning C/C++ development environment on Linux.  The article is aimed at developers interested in contributing to the <a title="Drizzle Server" href="http://launchpad.net/drizzle">Drizzle server project</a>, but the vast majority of the content applies equally well to developers wishing to contribute to the <a title="MySQL Server Code" href="http://launchpad.net/mysql-server">MySQL server</a> or any other open source project written in C/C++</p>
<p><strong>IMPORTANT</strong>: This article doesn&#8217;t get into any religious battles over IDEs or particular editors.  IDEs and editors are what you <em>use</em> to edit code.  What this article covers is the surrounding libraries, toolchain, and dependencies needed to get into the development or contirbution process.  That said, <a title="Vim" href="http://www.vim.org/">go Vim</a>. <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>The examples shown use the Debian/<a href="http://ubuntu.com">Ubuntu</a> methods of obtaining code packages and installing them — specifically, using <tt>apt-get install</tt> to install packages.  If you are running on a RPM-based distro, simply change the commands to use your package manager of preference, for instance <tt>yum install</tt>.  Solaris users should get as far as they can in the installation/setup process and hop over to the Freenode #drizzle channel for help from one of Solaris experts.  The Drizzle wiki has <a href="http://drizzle.org/wiki/Compiling">more information</a> on specific packages needed for different distributions.</p>
<p>OK, let&#8217;s get started, shall we?</p>
<h2>Installing Bazaar and Some Bzr Goodies</h2>
<p>So, what is this Bazaar thing and why do you need it?  Good question!  <a title="Bazaar NG" href="http://bazaar-vcs.org">Bazaar</a> is the revision control system that the Drizzle development team (as well as the MySQL engineering team) uses for source code control.  The good folks over at <a title="Canonical" href="http://canonical.com">Canonical</a> maintain and enhance the excellent Bazaar-NG system and have an online platform called <a title="Launchpad.net" href="http://launchpad.net">Launchpad.net</a> which is tightly integrated with Bazaar.  Launchpad.net is kind of like SourceForge.net, only focused around Bazaar as the revision control system, and includes a number of nifty little features that make it easier to manage and maintain teams of developers working on the code base.  The Drizzle Server project is hosted on Launchpad.net at <a href="http://launchpad.net/drizzle">http://launchpad.net/drizzle</a>.</p>
<p>To install Bazaar, issue the following:</p>
<pre>sudo apt-get install bzr
</pre>
<h3>Some Optional Goodies</h3>
<p>Once installed, you might want to install a few more things that will make your <tt>bzr</tt> life easier.  The <tt>bzrtools</tt> package is a collection of command-line and graphical utilities for <tt>bzr</tt>.  <tt>meld</tt> is a graphical merge conflict resolution utility that I have found invaluable at times.  <tt>PoEdit</tt> is an easy way to work with the GetText translation utilities.  To install these tools, do:</p>
<pre>sudo apt-get install bzrtools meld poedit
</pre>
<p>OK, that&#8217;s it for Bazaar for now.  Let&#8217;s move on to getting your development toolchain installed.</p>
<h2>The Required Toolchain Packages and Library Dependencies</h2>
<p>In order to compile the Drizzle server, you will need a working <a title="GNU Toolchain" href="http://en.wikipedia.org/wiki/GNU_toolchain">GNU Toolchain</a> with the C++ development tools.</p>
<h3>The Easy Way</h3>
<p>The best and most consistent way to get these build dependencies if you are working on Debian/Ubuntu, is to add the Drizzle Developers Personal Package Archive to your apt path and install the Drizzle dependencies.</p>
<p>This way to do this via the terminal:</p>
<pre># For Ubuntu &gt;= 9.10 (Karmic):
sudo add-apt-repository ppa:drizzle-developers/ppa
sudo apt-get update
sudo apt-get install drizzle-dev
</pre>
<p>For versions of Ubuntu prior to 9.10, simply head over to the Drizzle Developers PPA and follow the instructions on that page.</p>
<h3>The Hard Way</h3>
<p>The following packages are required for building Drizzle.</p>
<ul>
<li>binutils (includes <tt>ld</tt> and <tt>as</tt>)</li>
<li><tt>gcc</tt>/g++ (4.2+)</li>
<li><tt>autoconf</tt> (2.59+)</li>
<li><tt>automake</tt> (1.10+)</li>
<li><tt>libtool</tt> (1.5.24+)</li>
<li><tt>m4</tt> (1.4.8+)</li>
<li><tt>bison</tt>/yacc (2.3)</li>
<li><tt>pkg-config</tt> (0.22+)</li>
</ul>
<p>To install the above toolchain, do the following:</p>
<pre>sudo apt-get install libc-dev gcc g++ bison binutils automake autoconf m4 pkg-config libtool
</pre>
<div style="background-color: #f7f7f7; color: red; padding: 8px 30px; margin: 5px 0px;"><strong>NOTE:</strong> Thanks to Tom Hanlon&#8230;older versions of Ubuntu may need to specify <strong>libc6-dev</strong>, not libc-dev!</div>
<p>Once apt-get finishes installing the above, you&#8217;ll have a system capable of compiling C/C++ programs, for the most part.  The Drizzle server needs some additional libraries and header files in order to compile.  I list them here along with a brief description of the library or file.</p>
<ul>
<li><tt>libpcre3</tt> &#8211; A standard PCRE regular expression library</li>
<li><tt>libpam0g</tt> &#8211; A pluggable authentication modules library (a <strong>horrible package name, no?&#8230;</strong>)</li>
<li></li>
<li><tt>libncurses</tt> &#8211; A library for displays of terminal applications (used by the drizzle client)</li>
<li><tt>libprotobuf</tt> &#8211; <a title="Google Proto Buffers" href="http://code.google.com/apis/protocolbuffers/docs/overview.html">Google Proto Buffers</a> library, used by the server in message communication</li>
<li><tt>gettext</tt> &#8211; i10n and l10n services</li>
<li><tt>libevent</tt> &#8211; socket event handling</li>
<li><tt>libz</tt> &#8211; compression</li>
<li><tt>libreadline</tt> &#8211; Command-line editing utilities</li>
<li><tt>uuid-dev</tt> &#8211; UUID headers</li>
<li><tt>libboost-dev</tt> &#8211; BOOST C++ headers and library</li>
<li><tt>libboost-program-options-dev</tt> &#8211; BOOST program_options library headers</li>
<li><tt>libdrizzle-dev</tt> &#8211; libdrizzle headers</li>
<li><tt>gperf</tt> &#8211; GNU hash function generator</li>
<li><tt>libgcrypt11-dev</tt> &#8211; GNU Crypto library</li>
</ul>
<p>The following command should install the required libraries with the exception of Google Proto Buffers, which is described in the following section.</p>
<pre>sudo apt-get install libpcre3-dev libpam0g libncurses5-dev libpam0g-dev gettext libevent-dev libz-dev libreadline-dev uuid-dev  libgcrypt11-dev libboost-dev libboost-program-options-dev libdrizzle-dev gperf
</pre>
<h2>Installing Google Proto Buffers</h2>
<p>After installing the libraries and toolchain, you&#8217;ll need to install the <a title="google proto buffers" href="http://code.google.com/apis/protocolbuffers/docs/overview.html">Google Proto Buffers library</a>.  Again, the easiest way to do so is via the Drizzle Developers PPA, which you should have added to your repositories above.  If you did, simply do:</p>
<pre>sudo apt-get install libprotobuf-dev protobuf-compiler
</pre>
<h2>Setting Up a Local Bazaar Repository for Drizzle</h2>
<p>Now that you&#8217;ve installed all the required toolchain and dependencies, it&#8217;s time to use Bazaar to pull the development branch of Drizzle and compile the Drizzle server.   The first step to do is to set up that local bzr repository.  Myself, I have all my bzr repositories in a directory called <tt>~/repos</tt>, and that is what the below examples show, but you are of course welcome to put your repos wherever you prefer.  To set up a directory and a drizzle repo under your home directory, do the following:</p>
<pre>cd ~
mkdir repos
cd repos
bzr init-repo drizzle
cd drizzle
</pre>
<p>At this point, you have a local bzr repository.  Let&#8217;s now create a local branch of the development source code trunk that we can play with.  To do so, we use the <tt>bzr branch</tt> command, like so:</p>
<pre>bzr branch lp:drizzle trunk
</pre>
<p>This tells bzr to go grab the main development branch of the &#8220;drizzle&#8221; project that resides on the Launchpad.net servers (thus, the lp: prefix), and create a local branch called &#8220;trunk&#8221;.  The branch operation may take a little while to complete when you do it for the first time.  Subsequent branch and merge operations are much, much quicker than the first branch into a repository.  When the branch succeeds, go ahead and look at the files that have been downloaded into your &#8220;trunk&#8221; branch:</p>
<pre>cd trunk
ls -la
</pre>
<p>You should see something like the following:</p>
<pre>jpipes@serialcoder:~/repos/drizzle/trunk$ ls -la
total 264
drwxr-xr-x 13 jpipes jpipes  4096 2010-02-25 12:39 .
drwxr-xr-x 16 jpipes jpipes  4096 2010-03-10 19:44 ..
-rw-r--r--  1 jpipes jpipes 76502 2010-02-24 23:30 ABOUT-NLS
-rw-r--r--  1 jpipes jpipes   377 2010-02-24 23:30 AUTHORS
drwxr-xr-x  5 jpipes jpipes  4096 2010-02-24 23:30 .bzr
-rw-r--r--  1 jpipes jpipes  5983 2010-02-25 12:39 .bzrignore
drwxr-xr-x  2 jpipes jpipes  4096 2010-03-10 15:27 client
drwxr-xr-x  2 jpipes jpipes  4096 2010-02-24 23:30 config
-rw-r--r--  1 jpipes jpipes  5350 2010-02-24 23:30 configure.ac
-rw-r--r--  1 jpipes jpipes 19071 2010-02-24 23:30 COPYING
-rw-r--r--  1 jpipes jpipes 56574 2010-02-24 23:30 Doxyfile
drwxr-xr-x 16 jpipes jpipes 12288 2010-03-10 15:27 drizzled
-rw-r--r--  1 jpipes jpipes  5962 2010-02-24 23:30 DRIZZLE.FAQ
-rw-r--r--  1 jpipes jpipes  5139 2010-02-24 23:30 EXCEPTIONS-CLIENT
drwxr-xr-x  2 jpipes jpipes  4096 2010-02-24 23:30 extra
drwxr-xr-x  2 jpipes jpipes  4096 2010-02-24 23:30 gnulib
drwxr-xr-x  2 jpipes jpipes  4096 2010-02-24 23:30 m4
-rw-r--r--  1 jpipes jpipes  4545 2010-02-24 23:30 Makefile.am
-rw-r--r--  1 jpipes jpipes    41 2010-02-24 23:30 NEWS
drwxr-xr-x 55 jpipes jpipes  4096 2010-03-05 13:11 plugin
drwxr-xr-x  2 jpipes jpipes  4096 2010-02-24 23:30 po
-rw-r--r--  1 jpipes jpipes   821 2010-02-24 23:30 README
drwxr-xr-x  3 jpipes jpipes  4096 2010-02-24 23:30 support-files
drwxr-xr-x  8 jpipes jpipes  4096 2010-03-10 15:27 tests
</pre>
<h2>Compiling Drizzle</h2>
<p>OK, you are now ready to compile the server and client tools contained in your branch.  The way to do so is the following:</p>
<pre>./config/autorun.sh
./configure --with-debug
make
make test
</pre>
<p>If all goes well, drizzle will compile and build, get installed, and output the (hopefully!) passing test results.  The output at the end should be similar to the following:</p>
<pre>...
collation_dictionary.slap      [ pass ]             54
blackhole.blackhole            [ pass ]             13
archive.archive                [ pass ]            509
archive.archive_aio_posix      [ pass ]            487
archive.archive_basic          [ pass ]              2
archive.archive_discover       [ pass ]              2
-------------------------------------------------------
Stopping All Servers
All 290 tests were successful.
The servers were restarted 36 times
Spent 206.430 of 290 seconds executing testcases
</pre>
<p>And that&#8217;s it. <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>OK, so that&#8217;s it for this first article.  I hope you&#8217;ve found it helpful in getting a development environment set up so that you can feel comfortable contributing to the Drizzle project.  Join me on Freenode&#8217;s #drizzle IRC channel to get some help.  We&#8217;re always available.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.joinfu.com/2008/08/getting-a-working-c-c-plusplus-development-environment-for-developing-drizzle/feed/</wfw:commentRss>
		<slash:comments>21</slash:comments>
		</item>
		<item>
		<title>MySQL Connection Management in PHP &#8211; How (Not) To Do Things</title>
		<link>http://www.joinfu.com/2006/08/mysql-connection-management-in-php--how-not-to-do-things/</link>
		<comments>http://www.joinfu.com/2006/08/mysql-connection-management-in-php--how-not-to-do-things/#comments</comments>
		<pubDate>Mon, 07 Aug 2006 19:31:00 +0000</pubDate>
		<dc:creator>jaypipes</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[PHP]]></category>

		<guid isPermaLink="false">http://joinfu.com/2006/08/mysql-connection-management-in-php--how-not-to-do-things</guid>
		<description><![CDATA[I&#8217;ll warn you right now, this is going to be a long article. More than likely, I&#8217;ll put a version of this up on the MySQL developer zone and PHP zone. This article is intended to highlight various basic topics concerning proper methods of handling connections to MySQL databases in PHP, guidelines for caching dynamic [...]]]></description>
			<content:encoded><![CDATA[<p>
I&#8217;ll warn you right now, this is going to be a long article.  More than likely, I&#8217;ll put a version of this up on the MySQL developer zone and PHP zone.  This article is intended to highlight various basic topics concerning proper methods of handling connections to MySQL databases in PHP, guidelines for caching dynamic content, and a technique called &#8220;lazy loading&#8221;.  Hopefully by the end of the article you&#8217;ll have learned how to combat a very widespread and potentially devastating scalability problem seen in an enormous number of PHP web applications.
</p>
<h3>An introduction to the problem</h3>
<p>
Before I start the discussion on connecting to MySQL servers via PHP, it&#8217;s worth pointing out that the relative cost of connecting to a MySQL database, versus connecting to a PostgreSQL or Oracle installation, is very, very low.  However, the fact that connecting to a MySQL resource is inexpensive <em>does not</em> mean that connection resources can be abused.  Anyone who has ever seen the dreaded &#8220;<a href="http://dev.mysql.com/doc/refman/5.0/en/too-many-connections.html"  title="Too Many Connections error">Too many connections</a>&#8221; error which occurs when MySQL attempts to service more connections than the number of concurrent connections afforded by the max_connections configuration variable, knows what I am talking about.
</p>
<p>
Connecting to a MySQL resource, while inexpensive, requires PHP to issue a call to the mysql client API via <a href="http://www.php.net/mysql_connect"  title="mysql_connect PHP function">mysql_connect</a>().  Passed as variables to this function are the connection arguments for the database host, user, password, etc.  Once the MySQL database server has received the connection information, it will either return a success value, or a failure.  Upon failure, the <a href="http://www.php.net/mysql_error"  title="mysql_error PHP function">mysql_error</a>() function can be used to determine what went wrong during the connection attempt (typically user credential problems or the max connections issue).
</p>
<p>
So, where, you ask, is the problem?  Well, the issue that I commonly see is that connections are made to MySQL resources when they do not need to be made.  But, you say, almost <em>all</em> web applications serve <em>dynamic</em> content, so therefore doesn&#8217;t dynamic content virtually <em>require</em> a connection to a database be made?
</p>
<h3>Well, not really in many, many cases</h3>
<p>
Let&#8217;s take as an example a very, very popular PHP web application installed on <em>hundreds of thousands</em> of servers worldwide: the blogging application <a href="http://www.wordpress.org"  title="Wordpress">WordPress</a>.  Now, before I go any further, I want to say that the reader should not think that I am attacking WordPress in this article, or deliberately trying to point out shortcomings in their software.  By contrast, I picked WordPress to demonstrate that the problem described in this article is widespread among PHP web applications.  At the very end of the article, I&#8217;ll present a patch to the current WordPress source code which fixes the issue identified in this article.  I will post the patch to the wp-testers mailing list after I complete the article.  Promise. <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />
</p>
<p>
Now, a blogging application is indeed a data-driven web application.  Typical blog software involves the posting of articles, the management of comments, the display of such articles and comments, and, of course, pages which serve to provide the various RSS and Atom feeds for the blog.  So, one might argue that blogging software is highly dynamic, and therefore would necessarily issue calls to connect to the database upon every visit to the blogsite.
</p>
<p>
This, however, is not particularly true.  Even on extremely busy blogs, content doesn&#8217;t change on a continual basis.  This point is even more relevant when you consider that after a certain number of days after an article is posted, content becomes almost entirely static.  Keep this point in mind as you follow through the next sections, which walk through the page invocation process which WordPress executes upon <strong>every</strong> hit to a blog page (including feed pages).
</p>
<h3>Investigating WordPress page invocation</h3>
<p>
On all PHP pages in the WordPress main directory &mdash; which include index.php, wp-atom.php, wp-rss, etc. &mdash; each file begins with the following include:</p>
<pre>
require(./wp-blog-header.php);
</pre>
<p>The wp-blog-header.php page does a couple things but mostly serves to include the following:</p>
<pre>
require_once( dirname(__FILE__) . '/wp-config.php');
</pre>
<p>OK, so we&#8217;re heading over to wp-config.php&#8230; and we find some good stuff, such as the defines for the database connection parameters, and:</p>
<pre>
define('ABSPATH', dirname(__FILE__).'/');
require_once(ABSPATH.'wp-settings.php');
</pre>
<p>OK, so a quick :find wp-settings.php later, we open up the first meaty file of our page invocation.  The first 72 code lines of wp-settings.php do some housekeeping stuff, like checking if the PHP version is adequate and if the MySQL extension is installed (tangent: is it <em>really</em> necessary to do this on <em>every</em> web page invocation?!).  After that, we see the following include:</p>
<pre>
define('WPINC', 'wp-includes');
require_once (ABSPATH . WPINC . '/wp-db.php');
</pre>
</p>
<h3>Into the heart of the beast</h3>
<p>
OK, so thus far there&#8217;s been nothing spectacular or extraordinary about the code.  Just a few includes to make file maintenance orderly, but nothing unusual.  However, the wp-includes/wp-db.php file contains perhaps the most common PHP/MySQL gotcha seen in today&#8217;s web applications.
</p>
<p>
The file starts out with some defines and then the class definition of the WordPress database abstraction object, called wpdb.  The wpdb class contains the very typical methods commonly seen in a DB abstraction layer: get_col(), get_row(), get_results(), etc, which allow a query string to be passed in and executed against a MySQL database.  However, there is one major problem in the design, which manifests itself in the last line of the file:</p>
<pre>
$wpdb = new wpdb(DB_USER, DB_PASSWORD, DB_NAME, DB_HOST);
</pre>
</p>
<p>
Well, what&#8217;s so wrong with that, you say?  Well, what does the new operator do?  It creates an object of a class type specified and, during the creation of the object, calls the class <em>constructor</em>, which is the class method with a name identical to the class name &mdash; in this case, the method called wpdb, shown here:</p>
<pre>
	// ==================================================================
	//	DB Constructor - connects to the server and selects a database

	function wpdb($dbuser, $dbpassword, $dbname, $dbhost) {
		$this->dbh = @mysql_connect($dbhost, $dbuser, $dbpassword);
		if (!$this->dbh) {
			$this->bail("... content snipped ... ");
		}

		$this->select($dbname);
	}
</pre>
</p>
<p>
Can anyone tell what is happening on every page invocation to a WordPress blog?  Yup.  A connection is being made to the underlying MySQL database.  On a heavily hit blog site, this code can easily lead to the dreaded too many connections error because a connection to the database is being made even for mostly static content.  There are a couple ways to combat this problem: <em>Lazy loading</em> and <em>Content Caching</em>.  These two techniques can be used together to eliminate a huge portion of the database queries and connections in typical web applications.
</p>
<h3>Lazy Loading</h3>
<p>
The code in the wpdb class isn&#8217;t fundamentally wrong.  It just needs some tweaking to ensure that a connection to the database is only made <em>if a query is executed against the database</em>.  A technique called lazy loading essentially delays the connection to the database until the last minute, instead of upon creation of the database abstraction object.
</p>
<p>
The MySQL Forge database abstraction layer uses lazy loading in just this way.  The name of the class is SqlConnection, and it has an empty constructor.  Instead of connection logic embedded in the constructor, the object has a Connect() method, which looks like the following.  The code has been modified only to remove the logic which automatically handles master/slave replication switching:</p>
<pre>
    /**
     * Attempt to connect the resource based on supplied parameters.
     *
     * @return  boolean
     * @access  public
     *
     * @param   string  (optional) Host name (Server name)
     * @param   string  (optional) User Name
     * @param   string  (optional) User Password
     * @param   string  (optional) Database Name
     */
    function Connect() {
        if (func_num_args() == 4) {

            // A different database has been requested other than the
            // standard global config settings
            $host = func_get_arg(1);
            $user = func_get_arg(2);
            $pass = func_get_arg(3);
            $dbname = func_get_arg(4);

        }
        else {
            $host = SQL_HOST;
            $user = SQL_USER;
            $pass = SQL_PASS;
            $dbname = SQL_DB_NAME;
        }

        /**
         * Short circuit out when already
         * connected.  To reconnect, pass
         * args again
         */
        if (is_resource($this->_Cnn) &#038;&#038; func_num_args() != 4) {return true;}

        if (! $this->_Cnn = mysql_connect($host, $user, $pass)) {
            trigger_error(get_class($this) .
                          "::Connect() Could not connect to server: " .
                          mysql_error(), E_USER_ERROR);
            return false;
        }
        else {
            if (! mysql_select_db($dbname, $this->_Cnn)) {
                trigger_error(get_class($this) .
                              "::Connect() Could not connect to specified database on server: " .
                              mysql_error(), E_USER_ERROR);
                return false;
            }
            else {
                return true;
            }
        }
    }
</pre>
</p>
<p>
The _Cnn member variable is a reference to a MySQL database resource that is returned upon a successful call to mysql_connect().  Notice that the Connect() method has logic which ensures that if an existing connection has already been made during page execution, then the Connect() method simply returns true.
</p>
<p>
You may be surprised to find out that, just like WordPress, the MySQL Forge software creates a database abstraction object upon each call to the MySQL Forge website.  The following code is included in all page invocations:</p>
<pre>
/**
 * Fine to establish a global connection
 * here, since connect doesn't occur until
 * SQL execution.
 */
require_once(DIR_LIB . 'class/SqlConnection.php');
$GLOBALS['Db'] =&#038; new SqlConnection();
</pre>
</p>
<p>
The difference is that a connection to the database is not made in the SqlConnection class constructor, so having the object instantiated doesn&#8217;t consume database resources, unlike the wpdb class.  So, if the constructor doesn&#8217;t call Connect(), then when exactly will mysql_connect() be called?  Here, we see the Execute() method of SqlConnection:</p>
<pre>
    /**
     * Executes the supplied SQL statement and returns
     * the result of the call.
     *
     * @return  bool
     * @access  public
     *
     * @param   string  SQL to execute
     */
    function Execute( $Sql ) {

        /* Auto-connect to database */
        if (! $this->_Cnn) {
            $this->Connect();
        }

        if (!$this->_Res = mysql_query($Sql, $this->_Cnn)) {
            trigger_error(get_class($this) .
                          "::Execute() Could not execute: " .
                          mysql_error() .
                          " (SQL: " . $Sql . ")", E_USER_ERROR);
            return false;
        }
        else {
            return true;
        }
    }
</pre>
</p>
<p>
In the Execute() method, you can see that if the _Cnn member variable is not set (meaning, a previous connection has not been made to the database), then the SqlConnection connects, otherwise, it simply executes the supplied string against that connection via the <a href="http://www.php.net/mysql_query"  title="mysql_query PHP function">mysql_query</a>() function and stores the returned result resource in the _Res member variable for use by other methods in the class.
</p>
<p>
Other methods of SqlConnection simply wrap the Execute() call and provide result sets in various forms.  What this means is that on page evocations to MySQL Forge, unless dynamic data is actually needed, no connections to the database are actually created.  Which leads us nicely to the other technique for handling semi-dynamic content web requests: <em>Content Caching</em>.
</p>
<h3>Content Caching</h3>
<p>
<a href="http://en.wikipedia.org/wiki/Caching"  title="Caching article on Wikipedia">Caching</a> is perhaps the most fundamental concept discussed in the field of computer sciences when it comes to performance of both hardware and software.  A cache, simply defined, is a storage area for data that has been parsed, retrieved, calculated, or otherwise generated in an expensive operation.  The cache functions to alleviate the need for various resources to regenerate the cached data upon every request for the data.
</p>
<p>
Caches exist everywhere in both hardware and software.  For instance, on a hardware level, modern CPUs usually have at least two levels of hardware caches (usually called the L1 and L2 caches).  These CPU-connected fast-access caches exist so that the CPU does not need to call a kernel-level RAM memory page access call, which is a <em>relatively</em> expensive operation since the speed of access to a RAM page is much slower than the access speed to the locally connected Lx caches.  When speaking about caches, it&#8217;s important to recognize that everything is relative to something else.  Accessing a hard disk is much more expensive than accessing a page of RAM, which is much more expensive that accessing a line of bytes stored in the L1 cache.  Likewise, in application-specific caches (which we&#8217;ll be talking about next), the relative cost of accessing cached data is lower than retrieving the same information from the MySQL database.
</p>
<p>
So, let&#8217;s talk a bit about basic content caching for a PHP web application.  Although these examples use PHP, the discussion of application caching applies to all languages.  Every web scripting language provides similar functionality to implement caching.
</p>
<p>
Application content caching occurs when a standard call to the database is replaced with a call to an application content cache.  In these examples, we&#8217;ll implement a simple file-based cache; other solutions are, of course, available, including using memcached or a static content web server proxy to serve web content.  WordPress actually implements its own caching mechanism, called ObjectCache. You can take a look at the implementation in the wp-includes/cache.php.  However, this implementation has a couple design limitations that make it unsuitable for a discussion on general caching.  It uses a tight coupling with other WordPress functions and objects, which makes the caching mechanism unfriendly for <a href="http://en.wikipedia.org/wiki/Loose_coupling"  title="Loose Coupling">general re-use</a>.
</p>
<h4>A Simple File Cache Engine</h4>
<p>
Before we get into the implementation of the CacheEngine class that MySQL Forge uses, let&#8217;s first take a look at some code from the /lib/class/ProjectMemberFinder.php class that handles requests to retrieve information about the members involved in a project listed in the <a href="http://forge.mysql.com/projects/"  title="MySQL Forge Project Directory">MySQL Forge project directory</a>:
</p>
<pre>
    /**
     * Return project members based on project ID value
     *
     * @return  array
     * @param   int     project ID
     */
    function &#038;GetByProjectId($Project) {

        /**
         * ProjectMembers don't change that often,
         * so cache the output of these calls.
         */
        $cache_id = 'project_members-' . $Project;

        if ($cache = $GLOBALS['CEngine']->GetFromCache($cache_id, $Seconds=0, $IsObject=true)) {
            return $cache;
        }

        $sql = "SELECT
                    pm.project
                  , pm.member
                  , fu.display_name
                  , pmr.description as role
                  , pm.can_write
                  , pm.can_read
                  , pm.joined_on
                  , pm.last_source_login
                  , pm.last_source_commit
              FROM " . $GLOBALS['SqlTables']['ProjectMember'] . " pm
                  INNER JOIN " . $GLOBALS['SqlTables']['ForgeUser'] . " fu
                      ON pm.member = fu.user_id
                  INNER JOIN " . $GLOBALS['SqlTables']['ProjectMemberRole'] . " pmr
                       ON pm.role = pmr.project_member_role_id
              WHERE pm.project = " . (int) $Project;

        $results = $GLOBALS['Db']->GetRecords($sql);

        $GLOBALS['CEngine']->WriteToCache($cache_id, $results);

        return $results;
    }
</pre>
<p>
OK, so the first thing you will notice is that there&#8217;s a comment saying basically, &#8220;look, this information really doesn&#8217;t change all that much.  Let&#8217;s go ahead and cache the results of the database query for later re-use&#8221;.  We first ask the global CacheEngine object ($GLOBALS['CEngine']) if we have a cached version of the supplied Project&#8217;s project members list:
</p>
<pre>
if ($cache = $GLOBALS['CEngine']->GetFromCache($cache_id, $Seconds=0, $IsObject=true)) {
    return $cache;
}
</pre>
<p>
The GetFromCache() method of the CacheEngine class returns the requested data, or FALSE.  So, in the above code, we simply return the cached data if it is available in the cache.  The $Seconds argument to the GetFromCache() method is simply the number of seconds that the cached data should be considered valid.  Passing a zero as this argument means we always consider the data valid.  The $IsObject argument tells the CacheEngine to return the cached data as an array or an object.  We&#8217;ll see how this is implemented in a little bit.
</p>
<p>
OK, so if the cached data does <em>not</em> exist in the cache, the ProjectMemberFinder::GetByProjectId() method continues on to request the data from the underlying database.  The global Db abstraction layer object (described earlier) has its GetRecords() method called, with a SQL string passed as a parameter:
</p>
<pre>
$results = $GLOBALS['Db']->GetRecords($sql);
</pre>
<p>
It is the next line of code that facilitates the caching of this data in our content cache:
</p>
<pre>
$GLOBALS['CEngine']->WriteToCache($cache_id, $results);
</pre>
<p>
So, upon the first invocation of the GetByProjectId() method of ProjectMemberFinder, for each unique supplied Project ID value, we issue a request to the database and then cache the results for each subsequent call to the function.  This saves us an enormous amount of database interaction, increasing the overall scalability of the system since the software can handle more concurrent requests, since the database connection will no longer be a bottleneck to the system.
</p>
<p>
There are a couple cases that we need to handle when processing cache requests, including how to <em>invalidate</em> data in the cache.  We&#8217;ll get to these cases in a minute.  First, let&#8217;s take a look at the CacheEngine class&#8217; two main methods: WriteToCache() and GetFromCache().
</p>
<h4>The GetFromCache() Method</h4>
<p>
As you saw above, the GetFromCache() method takes three arguments and returns either FALSE, or the cached data.  Let&#8217;s take a closer look at the CacheEngine::GetFromCache() method.
</p>
<pre>
	/**
	 * Retrieves a Cache file and returns either an object
	 * or a string
	 *
	 * @return	mixed
	 * @param	string	Name of File in Cache
	 * @param	int		Number of Seconds until File is considered old
	 * @param	bool	Return an object from Cache?
	 * @access	public
	 */
	function GetFromCache( $FileName , $Seconds = 0, $IsObject = false) {

		$this->_BuildFileName($FileName);

		$return = false;
		if ($Seconds == 0) {
			if (file_exists($this->_CacheFilePath)) {
				$return = $this->_ReadFromCache();
			}
			else {
				return false;
			}
		}
		else {
		        $refresh_time = time() - (int) $Seconds;
			if (filemtime($this->_CacheFilePath) > $refresh_time) {
				$return = $this->_ReadFromCache();
			}
			else {
                                /** Cached data not valid, remove it */
                                $this->RemoveFromCache($FileName);
				return false;
			}
		}
		if ($IsObject) {
		    $return = unserialize($return);
		}
		return $return;
	}
</pre>
<p>
The GetFromCache() function should be fairly easy to follow.  The meat of the function lies in either checking that the file exists (if there is no time limit on the cached entry) with the PHP <a href="http://www.php.net/file_exists"  title="file_exists PHP function">file_exists</a>() function or checking the modification time of the file using the <a href="http://www.php.net/filemtime"  title="filemtime PHP function">filemtime</a>() function otherwise.  The $IsObject flag argument simply runs an <a href="http://www.php.net/unserialize"  title="unserialize PHP function">unserialize</a>() on the data coming back from the internal _ReadFromCache() method, which we will look at next:
</p>
<pre>
	/**
	 * Reads the local file from the cache directory
	 *
	 * @return	mixed
	 * @access	private
	 */
	function _ReadFromCache() {
	    $mq_setting = get_magic_quotes_runtime();
	    set_magic_quotes_runtime(0);
	    if (!$return_data = @ file_get_contents($file)) {
	    	trigger_error(get_class() .
				'::_ReadFromCache(): Unable to read file contents'
				, E_USER_ERROR);
	    }
	    set_magic_quotes_runtime($mq_setting);
	    return $return_data;
	}
</pre>
<p>
This function handles reading the cached data from a cache file.  The magic quotes runtime (perhaps the most annoying PHP feature ever) is turned off before reading the file to prevent automatic escaping of certain characters in the data, and then turned back to its original setting immediately after.
</p>
<p>
So, the reading of a cache file is fairly simple.  Let&#8217;s take a look at the write mechanism of the CacheEngine class.  This code is adapted from a technique which George Schlossnagle details in his excellent read, &#8220;Advanced PHP Programming&#8221; (Developer&#8217;s Library, 2004) called <em>file swapping</em>.  The technique facilitates lock-less writing of a cache file while allowing for simultaneous read requests of the cache file.  Let&#8217;s take a look:
</p>
<p><pre>
	/**
	 * Writes data to the cache
	 *
	 * @return	mixed
	 * @param	string	File Name (may be encoded)
	 * @param	mixed	Data to write
	 * @access	public
	 */
	function WriteToCache( $FileName, $Data ) {
	    if (is_array($Data) || is_object($Data)) {
		$Data = serialize($Data);
	    }
	    $this->_BuildFileName($FileName);
	    /**
	     * Use a file swap technique to avoid need
	     * for file locks
	     */
	    if (!$file = fopen($this->_CacheFilePath . getmypid(), "wb")) {
		trigger_error(get_class($this) .
				'::WriteToCache(): Could not open file for writing.'
				, E_USER_ERROR);
		return false;
	    }
	    $len_data = strlen($Data);
            fwrite($file, $Data, $len_data);
            fclose($file);
	    /** Handle file swap */
	    rename($this->_CacheFilePath . getmypid(), $this->_CacheFilePath);
	    return true;
	}
</pre>
</p>
<p>
The code above opens a <em>temporary file</em> for writing.  Notice that the actual cache file name is appended with the <a href="http://www.php.net/getmypid"  title="getmypid PHP function">getmypid</a>() function output, essentially making the filename a temporary, unique filename.  Then, given the <a href="http://www.php.net/fopen"  title="fopen PHP function">fopen</a>() call was able to open the file for writing, the data is then written to the file, and the file closed.  Finally, the <a href="http://www.php.net/rename"  title="rename PHP function">rename</a>() function is used to change the temporary filename to the actual cache filename.  Because the rename() function simply acts on the file inode (a structure which stores information about the file, not the file contents itself, the rename() operation is a) very quick, and b) allows other processes to read from any existing cache file with the existing name without blocking the rename operation.
</p>
<h4>Cache invalidation</h4>
<p>
OK, so our CacheEngine class now has most of the functionality needed to effectively cache data from the database.  However, we still need a method of removing old cache data files.  Hence, the very simple RemoveFromCache() method:
</p>
<pre>
	/**
	 * Removes a cache file from the cache directory
	 *
	 * @return	mixed
	 * @param	string	File Name to remove (will be encoded)
	 * @access	public
	 */
	function RemoveFromCache( $file_name ) {
	    $this->_BuildFileName($file_name);
	    if (!file_exists($this->_CacheFilePath)) {
		return true;
       	    }
	    else {
        	if (!unlink($this->_CacheFilePath)) {
		    trigger_error(get_class($this) .
				'Unable to remove from cache file'
				, E_USER_ERROR);
		    return false;
		}
		else {
		    clearstatcache();
		    return true;
		}
	    }
	}
</pre>
<p>
All the above function does is remove the cached file if it exists.  So, when would we call this function?  Well, if you were paying attention earlier, you wold have already seen one occasion.  In the GetFromCache() method, the RemoveFromCache() method is called when the modification time of the file exceeds the expiration time supplied to that function.  Additionally, the RemoveFromCache() method would be called when we manually want to remove a cache file, for instance, if the list of project members changes, as this snippet from the /lib/class/ProjectMember.php class illustrates.  The snippet comes from the ApproveMembershipRequest() method of that class:
</p>
<pre>
...
            /* Remove from review queue */
            $sql = "UPDATE ... "
            if ($GLOBALS['Db']->Execute($sql)) {
                $GLOBALS['Db']->Commit();
                $GLOBALS['Cache']->RemoveFromCache('project_members-' . $Project);
                return true;
            }
...
</pre>
<p>
As you can see, when an approval is made, the database is updated and the cache for the project members is removed, so that upon the next request for this project&#8217;s members, the cache file will be regenerated, ensuring valid, up-to-date data.
</p>
<h4>Summary</h4>
<p>
So, this article has been an exploration into some simple steps that you can take in order to increase the scalability of your web applications by lazy loading and content caching techniques.  Below are the full class files for the CacheEngine class and the SqlConnection class used in the MySQL Forge application.  Feel free to use as you wish.  Additionally, a patch to the WordPress source code to enable lazy loading is included below.
</p>
<ul>
<li><a href="http://joinfu.com/source/CacheEngine.txt"  title="CacheEngine class PHP">CacheEngine class</a></li>
<li><a href="http://joinfu.com/source/SqlConnection.txt"  title="SqlConnection database abstraction class">SqlConnection database abstraction class</a></li>
<li><a href="http://joinfu.com/source/wordpress-lazy-load.patch"  title="Wordpress lazy loading patch">Patch to WordPress to support lazy loading</a></li>
</ul>
<p>As always, comments and suggestions for this article are more than encouraged and appreciated.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.joinfu.com/2006/08/mysql-connection-management-in-php--how-not-to-do-things/feed/</wfw:commentRss>
		<slash:comments>39</slash:comments>
		</item>
		<item>
		<title>Managing Many to Many Relationships in MySQL &#8211; Part 1</title>
		<link>http://www.joinfu.com/2005/12/managing-many-to-many-relationships-in-mysql-part-1/</link>
		<comments>http://www.joinfu.com/2005/12/managing-many-to-many-relationships-in-mysql-part-1/#comments</comments>
		<pubDate>Fri, 09 Dec 2005 01:32:00 +0000</pubDate>
		<dc:creator>jaypipes</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://joinfu.com/2005/12/managing-many-to-many-relationships-in-mysql--part-1</guid>
		<description><![CDATA[Flexible, Scalable Key Mapping Solutions In working to answer questions on the MySQL forums, I&#8217;ve noticed a few questions that repeatedly come up on a number of the forum areas. One of these particular questions deals with how to manage &#8212; construct, query, and maintain &#8212; many to many relationships in your schema. I decided [...]]]></description>
			<content:encoded><![CDATA[<h2><em>Flexible, Scalable Key Mapping Solutions</em></h2>
<p>
In working to answer questions on the <a href="http://forums.mysql.com/">MySQL forums</a>, I&#8217;ve noticed<br />
a few questions that repeatedly come up on a number of the forum areas.  One of these particular questions deals with how to manage &mdash; construct, query, and maintain &mdash; many to many</b> relationships in your schema.  I decided to put together a two-part article series detailing some of the common dilemmas which inevitably arise when tackling the issue of relating two entities where one entity can be related to many instances of another, and vice versa.
</p>
<p>
Hopefully, this article will shed some light on how to structure your schema effectively to produce fast, efficient queries, and also will illustrate how key map tables can be queried for a variety of different purposes.  I&#8217;ll predominantly be using standard SQL, so although I&#8217;m using MySQL as the database of choice here, the code you see is for the most part not limited to running on just MySQL.  In this first part, we&#8217;ll review the concepts involved in many-to-many relationships, and discuss the most common methods for use in storing the data composing the relationship.  In the second part of the article, which I should complete in about another week, we&#8217;ll look at some solid performance numbers regarding the various approaches.  Also, I&#8217;ll show you how to use MySQL 5 stored procedures and views in order to most effectively manage many-to-many relationships.
</p>
<h2>A Review of Relational Concepts</h2>
<p>
For those of you unaware of what a many-to-many relationship is, let&#8217;s first briefly discuss some basic<br />
definitions I&#8217;ll be using in the article.  First, an <em>entity</em>, in the database world, is simply a singular<br />
object or concept.  An entity, just like a real world object, may have one or more <em>attributes</em> which describe different aspects of the entity.  A table in a properly normalized database contains records which pertain to a single entity.  These records represent <em>instances</em> of the entity.
</p>
<p>
To illustrate, let&#8217;s assume we&#8217;re building a website that specializes in used-car sales.  We need to design a schema which will handle the searching and storage of a variety of different auto makes and models, and allow customers to filter results based on a set of features they require in the automobile.
</p>
<p><div  style="font: 10px verdana; color: #999; text-align: center; float: left; margin: 10px 25px 15px 10px;"><img src="http://www.joinfu.com/img/autoentity.gif" /></p>
<p>figure A &#8211; Auto Entity</p></div>
</p>
<p>
The primary entity in our schema could be considered the Auto entity.  It might have some attributes such as a manufacturer, model, model year, and so forth.  <a href="http://www.joinfu.com/img/autoentity.gif"  title="Auto Entity">Figure A</a> shows a depiction of this simple Auto entity, with the primary key attribute (auto_id) in bold and above all other <em>descriptive attributes</em> like manufacturer, model, etc.
</p>
<h3>One to Many Relationships</h3>
<p>
Entities in the database can <em>relate</em> to each other in a number of ways.  The most common type of relationship is called a <em>one-to-many</em> relationship.  Here, <em>one instance</em> of an entity can relate, or be attached to, <em>many instances</em> of another entity.  In our sample schema, the Auto entity can have only <em>one</em> auto manufacturer.  However, an auto manufacturer can produce many automobiles.  Therefore, we say that the relationship from Auto Manufacturer to Auto is a <em>one-to-many relationship</em>.  <a href="http://www.joinfu.com/img/autoentity.gif"  title="Auto Entity">Figure B</a> depicts a one-to-many relationship between the Auto Manufacturer entity and the Auto entity.  In the figure, the line between the two entities represents the relationship.  This is a common way to represent relationships, with the &#8220;one&#8221; side of the relationship having a single line and the &#8220;many&#8221; side of the relationship having a set of three lines and a circle.
</p>
<p><div  style="font: 10px verdana; color: #999; text-align: center; float: right; margin: 10px 15px 15px 25px;"><img src="http://www.joinfu.com/img/onemany.gif" /></p>
<p>figure B &#8211; One to Many Relationship</p></div>
</p>
<p>Relationships are implemented via common attributes in each Entity called <em>key attributes</em>.  In a one-to-many relationship, these key attributes take the form of a <em>parent</em> key and a <em>child</em>, or <em>foreign</em>, key. The relationship is maintained by &#8220;connecting&#8221; the two attributes together via these key attributes.  In the case of our Auto Manufacturer to Auto relationship, the key attributes are <strong>AutoManufacturer.auto_manufacturer_id</strong> and <strong>Auto.auto_manufacturer_id</strong>.  If we wanted to list the Auto&#8217;s manufacturer <em>name</em> (not Manufacturer ID), we would use an INNER JOIN from the Auto table to the AutoManufacturer table along this key relationship, as shown below.
</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> am<span style="color: #66cc66;">.</span>name
<span style="color: #993333; font-weight: bold;">FROM</span> Auto a
<span style="color: #993333; font-weight: bold;">INNER</span> <span style="color: #993333; font-weight: bold;">JOIN</span> AutoManufacturer am
<span style="color: #993333; font-weight: bold;">ON</span> a<span style="color: #66cc66;">.</span>auto_manufacturer_id <span style="color: #66cc66;">=</span> am<span style="color: #66cc66;">.</span>auto_manufacturer_id
<span style="color: #993333; font-weight: bold;">WHERE</span> a<span style="color: #66cc66;">.</span>auto_id <span style="color: #66cc66;">=</span> <span style="color: #cc66cc;">12345</span>;</pre></div></div>

<p>
This simple example shows how a one-to-many relationship is implemented using a simple <strong>INNER JOIN</strong> to find the intersection of two tables where the key attributes <strong>AutoManufacturer.auto_manufacturer_id</strong> and <strong>Auto.auto_manufacturer_id</strong> contain matching entries.
</p>
<h3>Many to Many Relationships</h3>
<p>
A many to many relationship is realized between two entities when either entity may be associated with more than one instance of the other entity.  For example, imagine the relationship between an Auto (as in car) entity and an AutoFeature entity representing any of the myriad options a car may come with.
</p>
<p><div  style="font: 10px verdana; color: #999; text-align: center; float: right; margin: 10px 15px 15px 25px;"><img src="http://www.joinfu.com/img/manymany.gif" /></p>
<p>figure C &#8211; Many to Many Relationship</p></div>
</p>
<p>
In this case, we know that any particular automobile can have many features.  Likewise we know that a specific automobile feature, say power windows, may be in any number of automobiles.  There&#8217;s no way for us to visually represent the association between the two entities without using a third entity, which stores the <em>mapping</em> of the relationship between the Auto entity and the AutoFeature entity.  In this case, I use the Auto2AutoFeature entity.  For mapping tables, I tend to use this Something2Something naming scheme to clearly show that it is a table which primarily serves to map the relationship from one thing <strong>to</strong> another, but of course that is merely a stylistic convention, nothing more.
</p>
<h3>Schema Representations for Many to Many Relationships</h3>
<p>
There are a few common ways for representing many-to-many relationships within structured SQL data, all of which will be detailed below:</p>
<ol>
<li>Using <em>multiple fields</em> of an on/off data type to store many &#8220;keys&#8221; in a single table</li>
<li>Using the INT, BIGINT, or SET data type (or equivalent in other RDBMS) to store a fixed number of key flags in a <em>single</em> table field</li>
<li>Using a CHAR string having one byte of storage per key needed, with the string acting as one long imploded array in a <em>single</em> table field</li>
<li>Using a <em>relationship, or mapping, table</em>, like the one in figure C, to store one or more keys related to an entity</li>
</ol>
<p>All of these methods has distinct advantages and disadvantages, in both ease of use and performance.  We&#8217;ll look at each here, along with some sample code to show how common queries are performed across each storage schema.
</p>
<h4>The Multiple Field Method</h4>
<p>
In this method of defining a many-to-many relationship, the concepts of normalization are thrown away in favor of what some consider to be a simpler and more rational schema.  Multiple fields, each representing an on/off value for a foreign key, are used in only a single table in order to achieve the results desired.  Any data type representing an on/off value may be used to represent the key fields &mdash; CHAR(1) with &#8216;T&#8217; or &#8216;F&#8217;, &#8216;Y&#8217; or &#8216;N&#8217;, or a TINYINT UNSIGNED with 0 and 1 values, or an ENUM(&#8216;Y&#8217;,'N&#8217;) etc.  Below, you&#8217;ll see what a sample Auto table using this method might look like with CHAR(1) data types used for the auto option key fields:
</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> Auto <span style="color: #66cc66;">&#40;</span>
  auto_id <span style="color: #993333; font-weight: bold;">INT</span> <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #993333; font-weight: bold;">AUTO_INCREMENT</span>
  <span style="color: #66cc66;">,</span> auto_manufacturer_id <span style="color: #993333; font-weight: bold;">SMALLINT</span> <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>
  <span style="color: #66cc66;">,</span> auto_model <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">20</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>
  <span style="color: #66cc66;">,</span> model_year <span style="color: #993333; font-weight: bold;">SMALLINT</span> <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>
  <span style="color: #66cc66;">,</span> asking_price <span style="color: #993333; font-weight: bold;">DECIMAL</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">12</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">2</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>
  <span style="color: #66cc66;">,</span> has_air_conditioning <span style="color: #993333; font-weight: bold;">CHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #993333; font-weight: bold;">DEFAULT</span> <span style="color: #ff0000;">'N'</span>
  <span style="color: #66cc66;">,</span> has_power_windows <span style="color: #993333; font-weight: bold;">CHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #993333; font-weight: bold;">DEFAULT</span> <span style="color: #ff0000;">'N'</span>
  <span style="color: #66cc66;">,</span> has_power_steering <span style="color: #993333; font-weight: bold;">CHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #993333; font-weight: bold;">DEFAULT</span> <span style="color: #ff0000;">'N'</span>
  <span style="color: #66cc66;">,</span> has_moonroof <span style="color: #993333; font-weight: bold;">CHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #993333; font-weight: bold;">DEFAULT</span> <span style="color: #ff0000;">'N'</span>
  <span style="color: #66cc66;">,</span> has_disk_brakes <span style="color: #993333; font-weight: bold;">CHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #993333; font-weight: bold;">DEFAULT</span> <span style="color: #ff0000;">'N'</span>
  <span style="color: #66cc66;">,</span> has_power_seats <span style="color: #993333; font-weight: bold;">CHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #993333; font-weight: bold;">DEFAULT</span> <span style="color: #ff0000;">'N'</span>
  <span style="color: #66cc66;">,</span> has_leather <span style="color: #993333; font-weight: bold;">CHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #993333; font-weight: bold;">DEFAULT</span> <span style="color: #ff0000;">'N'</span>
  <span style="color: #66cc66;">,</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span> pk_Auto <span style="color: #66cc66;">&#40;</span>auto_id<span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#41;</span>;</pre></div></div>

<p>
There are a couple advantages to this type of approach:</p>
<ol>
<li>Given a simple SELECT * FROM the table, it is fairly easy to understand (immediately) what options the car has.</li>
<li>Only one table is involved in determining options for the car.</li>
</ol>
<p>This last point is often the reason why many novice developers choose to use this approach; it&#8217;s easier and more straightforward to filter a resultset based on a single option:
</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">*</span>
<span style="color: #993333; font-weight: bold;">FROM</span> Auto a
<span style="color: #993333; font-weight: bold;">WHERE</span> a<span style="color: #66cc66;">.</span>has_leather <span style="color: #66cc66;">=</span> <span style="color: #ff0000;">'Y'</span>;</pre></div></div>

<p>
or more complicated request:
</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">*</span>
<span style="color: #993333; font-weight: bold;">FROM</span> Auto a
<span style="color: #993333; font-weight: bold;">WHERE</span> <span style="color: #66cc66;">&#40;</span>a<span style="color: #66cc66;">.</span>has_leather <span style="color: #66cc66;">=</span> <span style="color: #ff0000;">'Y'</span> <span style="color: #993333; font-weight: bold;">OR</span> a<span style="color: #66cc66;">.</span>has_power_windows <span style="color: #66cc66;">=</span> <span style="color: #ff0000;">'Y'</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #993333; font-weight: bold;">AND</span> a<span style="color: #66cc66;">.</span>has_power_steering <span style="color: #66cc66;">=</span> <span style="color: #ff0000;">'Y'</span>;</pre></div></div>

<p>
The code is easy to understand for most anyone looking at it, and there&#8217;s no special syntax or SQL tricks needed in order to query on an OR condition or multiple various filters.  Unfortunately, this approach has a number of downsides.  Most important among them are:</p>
<ol>
<li>If you need to add an auto feature, you&#8217;ve got to ALTER TABLE Auto ADD COLUMN &#8230;
<p>
This disadvantage is often overlooked in early design phases by novice programmers or design teams because everyone always assumes that they know every option that the customer might want to use.  However, this is rarely the reality.  Customers change their minds and will inevitably ask to add or remove Auto Features from the list.  With this method, that requires a fundamental change in the schema: removing or adding columns.  When doing so, especially on large tables, deadlocks can easily occur while read requests wait for an exclusive write lock to finish while the table is rebuilt to the new schema.  (That&#8217;s a <strong>bad</strong> thing.)
</p>
</li>
<li>There isn&#8217;t any useful place for an index on any of these data fields.
<p>
&#8220;Wait a minute!&#8221; you say, &#8220;You can place an index on any of these fields!&#8221;  Sure, that is correct.  Of course you <em>can</em> place an index on any of these fields.  In fact, you can place an index on <em>all</em> of them if you really wanted to.  But none of those indexes is likely to be used effectively by MySQL.  Why?  Well, that&#8217;s simple.  In the schema above, there are only two values possible for each field &#8212; a &#8216;Y&#8217; or &#8216;N&#8217; (or 0 and 1, &#8216;T&#8217; and &#8216;F&#8217;, etc).  Let&#8217;s assume a table of 500K auto records.  If the only two values in the field are &#8216;Y&#8217; and &#8216;N&#8217;, what use would an index be?  For a common auto feature, say &#8220;has_leather&#8221;. probably half of the records would contain a &#8216;Y&#8217; and half an &#8216;N&#8217;.  What use would this be to an index?  None.  In fact, an index would slow MySQL down, as so many index records would have to be read, with a lookup operation to the data record from each index record.  The <em>selectivity</em>, or <em>distribution</em>, of key values is extremely low (see <a href="http://www.jpipes.com/index.php?/archives/30-MySQL-5-Stored-Procedures-and-INFORMATION_SCHEMA.html"  title="MySQL Stored Procedures and INFORMATION_SCHEMA">my previous article</a> for more explanation on this) and therefore the index has limited use to MySQL.
</p>
</li>
<li>There is very little flexibility offered by this method.
<p>
Imagine if you were to give the customer the ability to add and remove auto features at will.  You would have to GRANT the customer (or the web-based or application user) the ability to ALTER TABLE.  This isn&#8217;t generally considered to be the most secure or effective method of managing change&#8230;
</p>
<p>
Also, Let&#8217;s say you came up with a list of 300 auto features.  Are you going to add 300 fields to the table?  Ever tried doing a SELECT * from a table with 300 fields?  Even if you use the /G option in the mysql client, you&#8217;d still have a mess!
</p>
</li>
</ol>
<p>
So, for all the reasons outlined above, I recommend against this approach for all but the simplest and non-production environments.  It isn&#8217;t flexible enough to withstand change, and the limitations of performance far outweight any advantage to ease of use.
</p>
<h5><em>Wait a Minute!</em></h5>
<p>
&#8220;But wait!&#8221; you say, &#8220;MySQL <em>itself</em> uses this strategy for its <em>own</em> mysql database!  The mysql.user table has fields like &#8220;Select_priv&#8221;, &#8220;Insert_priv&#8221;, and &#8220;Delete_priv&#8221;.  Don&#8217;t you think MySQL knows what they&#8217;re doing!?&#8221;
</p>
<p>
Yes, of course MySQL knows what they&#8217;re doing.  But, you have to remember one thing about the mysql system schema tables.  They&#8217;re always <strong>in memory</strong>.  There is no performance degradation related to the mysql.user table&#8217;s multiple field definitions for privileges because upon startup, the MySQL server actually loads all user information (privileges) into a hash table of structs containing privilege information.  When a request is received to perform some operation, this hash table of user privileges is checked using very fast logical OR and AND operations.  So, there aren&#8217;t any performance issues associated with the mysql tables.  This is, by the way, why you must issue a FLUSH PRIVILEGES when manually changing any of the mysql tables.  The FLUSH PRIVILEGES reloads this in-memory hash table.
</p>
<p>
So, for the mysql schema, the tables are designed for ease of use and simplicity, so this method was used to represent privileges.  Do I favor it?  Not particularly, but I can see why it was done&#8230; <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />
</p>
<h4>The INT, BIGINT, or SET Bitmap Method</h4>
<p>
With this next method, the INT, BIGINT, or <a href="http://dev.mysql.com/doc/refman/5.0/en/set.html"  title="MySQL SET data type">MySQL SET data type</a> is used to store from zero to 64 separate key flags within a single field.  Again, with this method, we denormalize the schema by reducing this many-to-many relationship down into a single field in a single table.  Below, you&#8217;ll see an example of our Auto table converted to use the SET data type instead of the 7 distinct CHAR(1) fields from Method #1:
</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> Auto <span style="color: #66cc66;">&#40;</span>
  auto_id <span style="color: #993333; font-weight: bold;">INT</span> <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #993333; font-weight: bold;">AUTO_INCREMENT</span>
  <span style="color: #66cc66;">,</span> auto_manufacturer_id <span style="color: #993333; font-weight: bold;">SMALLINT</span> <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>
  <span style="color: #66cc66;">,</span> auto_model <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">20</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>
  <span style="color: #66cc66;">,</span> model_year <span style="color: #993333; font-weight: bold;">SMALLINT</span> <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>
  <span style="color: #66cc66;">,</span> asking_price <span style="color: #993333; font-weight: bold;">DECIMAL</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">12</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">2</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>
  <span style="color: #66cc66;">,</span> auto_options <span style="color: #993333; font-weight: bold;">SET</span><span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'Air Conditioning'</span>
    <span style="color: #66cc66;">,</span>   <span style="color: #ff0000;">'Power Windows'</span>
    <span style="color: #66cc66;">,</span>   <span style="color: #ff0000;">'Power Steering'</span>
    <span style="color: #66cc66;">,</span>   <span style="color: #ff0000;">'Moonroof'</span>
    <span style="color: #66cc66;">,</span>   <span style="color: #ff0000;">'Disk Brakes'</span>
    <span style="color: #66cc66;">,</span>   <span style="color: #ff0000;">'Power Seats'</span>
    <span style="color: #66cc66;">,</span>   <span style="color: #ff0000;">'Leather'</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>
  <span style="color: #66cc66;">,</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span> pk_Auto <span style="color: #66cc66;">&#40;</span>auto_id<span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#41;</span>;</pre></div></div>

<p>
Although the SET (and ENUM, SET&#8217;s one-to-many column type cousin) are listed in the MySQL manual as <a href="http://dev.mysql.com/doc/refman/5.0/en/string-type-overview.html"  title="MySQL String Column Types">string column types</a>, they are internally represented as integers.  Below, when we cover bitwise operations, you&#8217;ll see how you can use the SET data type the same way as you would an INT or BIGINT, though there are advantages to simply sticking with the SET data type for simplicity&#8217;s sake.  As with the multiple-field method, there are advantages to the SET method.  They include:</p>
<ol>
<li>You can store a large (64 elements) number of key values in a small storage unit.
<p>
For the SET data type, 1 byte of storage is used for up to 8 elements, 2 bytes for up to 16 elements, 3 bytes for up to 24 elements, 4 bytes of storage for up to 32 elements, and 8 bytes of storage for 33-64 possible elements.  Compared with method #1 (and method #4 below), this does save storage space in the table
</p>
</li>
<li>MySQL automatically handles showing the descriptive value of the SET column value
<p>
With the INT and BIGINT data types, MySQL will simply show the numeric representation of the field value.  With the SET data type, by default, MySQL shows the descriptive string value of the column, instead of it&#8217;s numeric value (more below).  This can be a handy feature.
</p>
<li>MySQL provides the handy FIND_IN_SET() function to quickly filter for a single key value.
<p>
The FIND_IN_SET() function can be used to query for rows in which a <em>specific</em> key value is turned on for the particular row.  If you want to find whether <em>more than one</em> key values are turned on, you can use an AND expression in the WHERE clause.  For instance, let&#8217;s say we wanted to find all records in which the automobile had <strong>both</strong> the leather and power windows options.  We could use the following:
</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">*</span>
<span style="color: #993333; font-weight: bold;">FROM</span> Auto a
<span style="color: #993333; font-weight: bold;">WHERE</span> FIND_IN_SET<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'Leather'</span><span style="color: #66cc66;">,</span> auto_options<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&gt;</span><span style="color: #cc66cc;">0</span>
<span style="color: #993333; font-weight: bold;">AND</span> FIND_IN_SET<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'Power Windows'</span><span style="color: #66cc66;">,</span> auto_options<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&gt;</span><span style="color: #cc66cc;">0</span>;</pre></div></div>

<p>
Although there are some performance issues with SET fields (see below, on lack of indexing), the FIND_IN_SET() function is highly optimized to work specifically with the integer data housed beneath the surface.  Bitwise operations are used to determine whether the row matches the specific key flag queried for.  These bitwise operations are generally very fast, and we&#8217;ll cover them below.
</p>
</li>
</ol>
<p>
Besides FIND_IN_SET(), any bitwise operator that you would normally use on integer data can be used on SET, INT and BIGINT columns.  </p>
<p>
Bitwise operations are performed on the actual binary representation of the data.  Bits (each representing a key, are turned on or off by placing the corresponding bit in the binary number to 1.  The table below shows the binary and decimal representations of the first byte (8 bits) of an integer bitmap.  As you can see, each &#8220;on&#8221; bit simply turns on the appropriate power of 2.
</p>
<table align="center" border="0" cellspacing="0" cellpadding="4" style="background-color: #f7f7f7; border: solid 1px #aaa; font: 10px Courier;" width="320">
<tr>
<th>Binary</th>
<th>Decimal</th>
</tr>
<tr>
<td>0000 0001</td>
<td align="center">1</td>
</tr>
<tr>
<td>0000 0010</td>
<td align="center">2</td>
</tr>
<tr>
<td>0000 0100</td>
<td align="center">4</td>
</tr>
<tr>
<td>0000 1000</td>
<td align="center">8</td>
</tr>
<tr>
<td>0001 0000</td>
<td align="center">16</td>
</tr>
<tr>
<td>0010 0000</td>
<td align="center">32</td>
</tr>
<tr>
<td>0100 0000</td>
<td align="center">64</td>
</tr>
<tr>
<td>1000 0000</td>
<td align="center">128</td>
</tr>
</table>
<p>
Clearly, the larger the storage unit, the more bits can be used to represent the key flags.  A BIGINT can store up to 64 unique values, an INT 32 unique values, and so on.  One advantage to using the SET type is that it automatically chooses the smallest storage type needed to store the required number of set elements.
</p>
<p>
To have more than one key flag turned on within the bitmap field, the flag bits are added together.  Therefore, if the third (2<sup>3</sup>) and fourth (2<sup>4</sup>) flags (0000 0100 and 0000 1000) are turned on, the bitmap field would contain 0000 1100, or the number 12 in decimal.  If ever you want to see the decimal or binary representation of a SET field column, you can use +0 and the BIN() function, as shown below:
</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span>
  auto_id
<span style="color: #66cc66;">,</span> auto_options
<span style="color: #66cc66;">,</span> auto_options<span style="color: #66cc66;">+</span><span style="color: #cc66cc;">0</span> <span style="color: #993333; font-weight: bold;">AS</span> <span style="color: #ff0000;">'dec'</span>
<span style="color: #66cc66;">,</span> LPAD<span style="color: #66cc66;">&#40;</span>BIN<span style="color: #66cc66;">&#40;</span>auto_options<span style="color: #66cc66;">+</span><span style="color: #cc66cc;">0</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">8</span><span style="color: #66cc66;">,</span><span style="color: #ff0000;">'0'</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AS</span> <span style="color: #ff0000;">'bin'</span>
<span style="color: #993333; font-weight: bold;">FROM</span> Auto a;</pre></div></div>

<pre>
+---------+-----------------------------------------------+-----+----------+
| auto_id | auto_options                                  | dec | bin      |
+---------+-----------------------------------------------+-----+----------+
|       1 | Power Windows,Power Steering,Moonroof         |  14 | 00001110 |
|       2 | Power Windows,Power Steering,Moonroof,Leather |  78 | 01001110 |
|       3 | Power Windows,Moonroof,Leather                |  74 | 01001010 |
|       4 | Power Windows,Moonroof                        |  10 | 00001010 |
+---------+-----------------------------------------------+-----+----------+
4 rows in set (0.00 sec)
</pre>
<p>
Use the LPAD() function to pretty up the output to a pre-determined width, with leading zeroes.
</p>
<p>
This means that you can do more complex querying using the numeric values of the SET elements (key values).  For instance, suppose we wanted to find all those records which <em>did not</em> have Leather or Power Steering, but <em>did</em> have Power Windows.  From the output above, we can easily see that the auto with ID#4 is the only record which will meet our criteria. </p>
<p>
But, how do we structure our WHERE expression in SQL?  Looking back at our table schema, we know that the numeric position (starting from the number 1) of the key values in our WHERE clause will be:
</p>
<table align="center" border="0" cellspacing="0" cellpadding="4" style="background-color: #f7f7f7; border: solid 1px #aaa;" width="320">
<tr>
<th>Key Value</th>
<th>Numeric Position in SET</th>
</tr>
<tr>
<td>Leather</td>
<td align="center">7</td>
</tr>
<tr>
<td>Power Steering</td>
<td align="center">3</td>
</tr>
<tr>
<td>Power Windows</td>
<td align="center">2</td>
</tr>
</table>
<p>
Using the <a href="http://dev.mysql.com/doc/refman/5.0/en/bit-functions.html"  title="MySQL Bitwise Functions">MySQL bitwise functions</a>, we can issue our query like so:
</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> auto_id<span style="color: #66cc66;">,</span> auto_options
<span style="color: #993333; font-weight: bold;">FROM</span> Auto a
<span style="color: #993333; font-weight: bold;">WHERE</span> auto_options &amp; <span style="color: #66cc66;">&#40;</span>POW<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">2</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">6</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">+</span>POW<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">2</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">2</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">=</span> <span style="color: #cc66cc;">0</span>
<span style="color: #993333; font-weight: bold;">AND</span> auto_options &amp; POW<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">2</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&gt;</span> <span style="color: #cc66cc;">0</span>;</pre></div></div>

<pre>
+---------+------------------------+
| auto_id | auto_options           |
+---------+------------------------+
|       4 | Power Windows,Moonroof |
+---------+------------------------+
1 row in set (0.00 sec)
</pre>
<p>
In the code above, the POW() function is used to get the correct bit set for each desired element in the query.  We substract 1 from the number of the element, because the &#8220;on&#8221; bits are counted from the right side of the byte structure and determined as a power of 2.  For the first part of the WHERE expression:
</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">WHERE</span> auto_options &amp; <span style="color: #66cc66;">&#40;</span>POW<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">2</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">6</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">+</span>POW<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">2</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">2</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">=</span> <span style="color: #cc66cc;">0</span></pre></div></div>

<p>
we ensure that the result of the bitwise &#038; operator (which returns a 1 for each bit where both sides of the equation have the specified bit or bits turned on) results in 0.  This ensures that only rows which <em>do not</em> have the Power Steering and Leather options are returned.  The second part of the WHERE expression uses the bitwise &#038; operator against the Power Windows option, and filters where the result is <em>greater than</em> zero, so that the resulting rows are known to contain the Power Windows option.
</p>
<p>
Besides violating normalization rules (and I won&#8217;t get into an idealistic debate about that here&#8230;there&#8217;s more than enough discussion online about that), there are some concrete reasons why <strong><em>not</em></strong> to use the SET data type for handling many-to-many relationships.  Although I began writing this article quite a while before <a href="http://www.sheeri.com/archives/13"  title="ENUM/SET debate">Sheeri Kritzer posted about the ENUM/SET type</a>, some points are worth repeating.  Foremost among these are the following:</p>
<ol>
<li>Using the SET data type for many-to-many relationships imposes a 64-element limit on the number of keys available to relate to the main entity.</li>
<li>Using the SET data type with the built-in SET functions <em>or</em> any of the bitwise operators prohibits the MySQL optimizer from using indexes on the column.
<p>
This is a major performance drawback to scalability and the primary reason I choose not to use this data type for anything but the smallest projects or for fields containing 3 or fewer elements.  Why 3 or fewer?  Well, because even if an index ould be used against the SET field, the chances of the index selectivity being large enough to filter an adequate amount of rows to make the index useful to the optimizer is already very small.
</p>
</li>
<li>
Again, working with the SET data type is not particularly flexible.</p>
<p>
Changing elements of a SET data type <em>once data is already in the table</em> can be a real pain in the behind!  I won&#8217;t go into the details, as the <a href="http://dev.mysql.com/doc/refman/5.0/en/set.html"  title="SET Data type">MySQL site</a> covers many of the main points, and <a href="http://www.futhark.ch/mysql/109.html"  title="Manipulating the MySQL SET Data Type">Beat Vontobel&#8217;s blog post on SETs</a> covers what the MySQL site doesn&#8217;t (nice work, Beat!)
</li>
</ol>
<h4>The Long String Method</h4>
<p>
A third common approach to many-to-many relationships is to use a single long string field to store essentially a concatenated version of Method #1&#8242;s multiple fields.  An example of this method would be the schema below:
</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> Auto <span style="color: #66cc66;">&#40;</span>
  auto_id <span style="color: #993333; font-weight: bold;">INT</span> <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #993333; font-weight: bold;">AUTO_INCREMENT</span>
  <span style="color: #66cc66;">,</span> auto_manufacturer_id <span style="color: #993333; font-weight: bold;">SMALLINT</span> <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>
  <span style="color: #66cc66;">,</span> auto_model <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">20</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>
  <span style="color: #66cc66;">,</span> model_year <span style="color: #993333; font-weight: bold;">SMALLINT</span> <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>
  <span style="color: #66cc66;">,</span> asking_price <span style="color: #993333; font-weight: bold;">DECIMAL</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">12</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">2</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>
  <span style="color: #66cc66;">,</span> auto_options <span style="color: #993333; font-weight: bold;">CHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">7</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #993333; font-weight: bold;">DEFAULT</span> <span style="color: #ff0000;">'NNNNNNN'</span>
  <span style="color: #66cc66;">,</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span> pk_Auto <span style="color: #66cc66;">&#40;</span>auto_id<span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#41;</span>;</pre></div></div>

<p>
There are few advantages to this approach, either performance or maintenance-wise, but it can come in handy in at least one particular circumstance: when application code either relies, or is made much more simple, by the use of imploded arrays.</p>
<p><p>
Sometimes, particularly if you&#8217;ve inherited some legacy code which is simply too much of a nuisance to change, it can make sense to work with what you&#8217;ve got.  If the application code relies heavily on returned field values being concatenated string values, this method might work well.  The application returns a long list of either Y or N values representing whether keys are on or off.  For instance, if we had an automobile with the Leather and Power Windows options, and we used the same order as the SET example above, the returned auto_options field would be: &#8220;NYNNNNNY&#8221;.
</p>
<p>
Perhaps the application was a PHP program which walked the string using something like the following:
</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;?php</span>
<span style="color: #666666; font-style: italic;">/* We get the $auto_options string from the database... */</span>
<span style="color: #000088;">$all_options</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span>
<span style="color: #0000ff;">'Air Conditioning'</span>
<span style="color: #339933;">,</span> <span style="color: #0000ff;">'Power Windows'</span>
<span style="color: #339933;">,</span> <span style="color: #0000ff;">'Power Steering'</span>
<span style="color: #339933;">,</span> <span style="color: #0000ff;">'Moonroof'</span>
<span style="color: #339933;">,</span> <span style="color: #0000ff;">'Disk Brakes'</span>
<span style="color: #339933;">,</span> <span style="color: #0000ff;">'Power Seats'</span>
<span style="color: #339933;">,</span> <span style="color: #0000ff;">'Leather'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000088;">$num_options</span> <span style="color: #339933;">=</span> <span style="color: #990000;">strlen</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$auto_options</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$i</span><span style="color: #339933;">=</span><span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span><span style="color: #000088;">$i</span><span style="color: #339933;">&lt;</span><span style="color: #000088;">$num_options</span><span style="color: #339933;">;++</span><span style="color: #000088;">$i</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$auto_options</span><span style="color: #009900;">&#123;</span><span style="color: #000088;">$i</span><span style="color: #009900;">&#125;</span> <span style="color: #339933;">==</span> <span style="color: #0000ff;">'Y'</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #990000;">printf</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;The car has <span style="color: #009933; font-weight: bold;">%s</span>&quot;</span><span style="color: #339933;">,</span> <span style="color: #000088;">$all_options</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$i</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #000000; font-weight: bold;">?&gt;</span></pre></div></div>

<p>
In this case, the PHP program has the entire key table in memory (the $all_options array).  All the program needs to do is slice up the long string; the position of the character corresponds to the auto feature in the $all_options array.  If there was little complex querying in the application, yet there were hundreds or thousands of these key values for each row, this might be a decent method.
</p>
<p>
Except for this scenario, however, this method is generally not desirable, for the following reasons:</p>
<ol>
<li>Too little flexibility
<p>
Again, flexibility arises.  In this method, what happens if you want to remove an Auto Feature from the list?  Doing so in the PHP code would be fairly simple; just remove the element from the $all_options array.  But, unforunately, once you did that, the offsets into the character string would be skewed.  Likewise, if you wanted to add an Auto Feature, you&#8217;d have to change the table schema &mdash; not a very flexible plan.  Making matters worse, you could only add an Auto Feature to the <em>end</em> of the character string.  Adding one in the middle (say, if you wanted to keep some sort of alphabetical ordering in the PHP code) would again cause the offsets to be skewed.
</p>
</li>
<li>
Performance degrades dramatically because indexes can rarely be used</p>
<p>
This disadvantage becomes even more noticeable as queries become more complex.  Sure, an index might be used if you were querying on the first one (or more ordered) Auto Features.  For instance, assuming an index on the auto_options field, the following query would indeed use an index:
</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">*</span>
 <span style="color: #993333; font-weight: bold;">FROM</span> Auto a
<span style="color: #993333; font-weight: bold;">WHERE</span> a<span style="color: #66cc66;">.</span>auto_options <span style="color: #993333; font-weight: bold;">LIKE</span> <span style="color: #ff0000;">'Y%'</span>;</pre></div></div>

<p>
But, unfortunately, that&#8217;s about the limit of an indexes&#8217; usefulness for this method.  In the example above, we could use an index to find all records having Air Conditioning.  But, what happens when we want to, say, find all automobiles which have Power Seats and Power Windows (the sixth and second keys)?  Now we&#8217;re looking at the following SQL statement:
</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">*</span>
<span style="color: #993333; font-weight: bold;">FROM</span> Auto a
<span style="color: #993333; font-weight: bold;">WHERE</span> <span style="color: #993333; font-weight: bold;">SUBSTRING</span><span style="color: #66cc66;">&#40;</span>a<span style="color: #66cc66;">.</span>auto_options<span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">6</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">=</span> <span style="color: #ff0000;">'Y'</span>
<span style="color: #993333; font-weight: bold;">AND</span> <span style="color: #993333; font-weight: bold;">SUBSTRING</span><span style="color: #66cc66;">&#40;</span>a<span style="color: #66cc66;">.</span>auto_options<span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">2</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">=</span> <span style="color: #ff0000;">'Y'</span>;</pre></div></div>

<p>
Besides being a mess, this code will never be able to use an index because of the use of the SUBSTRING() function on the left side of the WHERE equations.  And, because indexes won&#8217;t be used, the performance of this schema will not scale well at all.
</p>
</li>
</ol>
<h4>The Mapping Table Method</h4>
<p>
Alright, the final (and my preferred) method for managing many-to-many relationships is to use a key mapping table (sometimes called a relationship table).  In the beginning of the article, in figure C, you saw an E-R diagram showing a key mapping table relating the AutoFeature entity to the Auto entity.  Note that this method is the only truly normalized method of managing many-to-many relationships.  Up until now in the article, the schema organization has been in a single table.  Now, through the use of three distinct tables, we are able to normalize the schema into the following DDL:
</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> Auto <span style="color: #66cc66;">&#40;</span>
  auto_id <span style="color: #993333; font-weight: bold;">INT</span> <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #993333; font-weight: bold;">AUTO_INCREMENT</span>
  <span style="color: #66cc66;">,</span> auto_manufacturer_id <span style="color: #993333; font-weight: bold;">SMALLINT</span> <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>
  <span style="color: #66cc66;">,</span> auto_model <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">20</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>
  <span style="color: #66cc66;">,</span> model_year <span style="color: #993333; font-weight: bold;">SMALLINT</span> <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>
  <span style="color: #66cc66;">,</span> asking_price <span style="color: #993333; font-weight: bold;">DECIMAL</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">12</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">2</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>
  <span style="color: #66cc66;">,</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span> pk_Auto <span style="color: #66cc66;">&#40;</span>auto_id<span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#41;</span>;
&nbsp;
<span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> AutoFeature <span style="color: #66cc66;">&#40;</span>
  auto_feature_id <span style="color: #993333; font-weight: bold;">SMALLINT</span> <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #993333; font-weight: bold;">AUTO_INCREMENT</span>
  <span style="color: #66cc66;">,</span> feature_name <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">80</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>
  <span style="color: #66cc66;">,</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span> pk_AutoFeature <span style="color: #66cc66;">&#40;</span>auto_feature_id<span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#41;</span>;
&nbsp;
<span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> Auto2AutoFeature <span style="color: #66cc66;">&#40;</span>
  auto_id <span style="color: #993333; font-weight: bold;">INT</span> <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>
  <span style="color: #66cc66;">,</span> auto_feature_id <span style="color: #993333; font-weight: bold;">SMALLINT</span> <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>
  <span style="color: #66cc66;">,</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span> pk_Auto2AutoFeature <span style="color: #66cc66;">&#40;</span>auto_id<span style="color: #66cc66;">,</span> auto_feature_id<span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#41;</span>;</pre></div></div>

<p>
Before we discuss the advantages to this method, it&#8217;s worth pointing out a couple drawbacks to storing many-to-many relationships in this normalized fashion.</p>
<ol>
<li>Generally, the key mapping table method will use the most overall storage of any of these described approaches.
<p>As with all matters in the database design world, everything comes with a tradeoff.  The key mapping table is no exception.  The biggest tradeoff by far is the storage space needed to store the many-to-many relationship.  Instead of a single field, or multiple small fields, in a single table, we now must store the foreign keys of each entity multiple times, with each unique combination occupying a single row in the relationship table.
</p>
<p>
Additionally, because indexes can be effectively used against the key mapping table, we now need space to store the index records <em>as well as the data records</em>.  As with any index, performance of INSERT and UPDATE operations (especially on high-volume mixed OLTP/OLAP applications) can suffer.  However, any performance impact on INSERT and UPDATE operations is usually far outweighed by the performance benefits for SELECT operations.  As always, however, benchmarking and testing is always a good idea; not only in the beginning of the project, but also at timed intervals as the database grows and matures.
</p>
</li>
<li>There are now one or two more tables to maintain for the schema
<p>
Whilst having an extra table or two will provide us with the most flexibility, that flexibility comes at the cost of extra tables which must be maintained by both the application and the database administrator.  For just a few seldom-changing keys, the hassle of maintaining extra tables and relationships may not be worth the added flexibility.
</p>
</li>
</ol>
<p>
Now, on to the benefits of this approach, along with some examples of how to retrieve result sets using the key mapping table.</p>
<ol>
<li>Robust Indexing possibilities are now supported.
<p>
Because both sides of the many-to-many relationship are separate fields in distinct rows of the key mapping table, our queries can support various indexing strategies.  It&#8217;s best to see this in action, so I&#8217;ll demonstrate with two simple query examples.
</p>
<p>
Let&#8217;s assume that we wish to find all the options available, in a list, for a specific automobile.  This is a fairly simple query, but a good starter:
</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> af<span style="color: #66cc66;">.</span>feature_name
<span style="color: #993333; font-weight: bold;">FROM</span> AutoFeature af
<span style="color: #993333; font-weight: bold;">INNER</span> <span style="color: #993333; font-weight: bold;">JOIN</span> Auto2AutoFeature a2af
<span style="color: #993333; font-weight: bold;">ON</span> af<span style="color: #66cc66;">.</span>auto_feature_id <span style="color: #66cc66;">=</span> a2af<span style="color: #66cc66;">.</span>auto_feature_id
<span style="color: #993333; font-weight: bold;">WHERE</span> a2af<span style="color: #66cc66;">.</span>auto_id <span style="color: #66cc66;">=</span> <span style="color: #cc66cc;">7</span>;</pre></div></div>

<pre>
+------------------+
| feature_name     |
+------------------+
| Air Conditioning |
| Disk Brakes      |
+------------------+
2 rows in set (0.00 sec)
</pre>
<p>
Pretty simple query.  Just an INNER JOIN from the AutoFeature table into our key mapping table on the auto_feature_id column, then a WHERE condition specifying the needed vehicle&#8217;s ID.  This kind of output isn&#8217;t possible with the previous 3 methods without a lot of headache, but let&#8217;s face it: this is generally the format that an application needs data, correct?  In a list, or an array.  Now, with a simple query like this, that kind of output is easy.
</p>
<p>But, are our indexes in operation for the above query?  Let&#8217;s find out:</p>
<pre>
mysql> EXPLAIN SELECT af.feature_name
    -> FROM AutoFeature af
    -> INNER JOIN Auto2AutoFeature a2af
    -> ON af.auto_feature_id = a2af.auto_feature_id
    -> WHERE a2af.auto_id = 7 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a2af
         type: ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 2
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: af
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 2
          ref: test.a2af.auto_feature_id
         rows: 1
        Extra:
2 rows in set (0.00 sec)
</pre>
<p>
Indeed, they are.  In the EXPLAIN output above, you&#8217;ll see that the optimizer is using a constant (ref: const) on the PRIMARY KEY to filter records.  In the &#8220;Extra&#8221; column, you&#8217;ll note that MySQL helpfully tells us that it&#8217;s &#8220;Using index&#8221;.  Many novice (or even intermediate) database developers new to MySQL will assume that if they see &#8220;Using index&#8221; in the Extra column, that they have an optimal query plan in place for the SQL code.  This is actually a bit of a misnomer.  When you see &#8220;Using index&#8221; in the Extra column, it means that MySQL is able to use the index records (as opposed to the data records which the index is attached to) in order to complete the query.  In other words, MySQL doesn&#8217;t have to even go to the data records; everything it needs for the query is <em>covered</em> by the index records.  This situation is called a <em>covering index</em>, and is something that goes hand-in-hand with key mapping tables.  Why?  Because, frankly, the entire table <em><strong>IS</strong></em> the index!</p>
<p>
For this reason, and others, which we&#8217;ll get to below, key mapping tables, when properly structured, are ideal for joining operations.  In EXPLAIN plans for queries which use the key mapping table, you will often see the use of a covering index, because all of the joined columns are available in the index records.  Practically speaking, this means that for MyISAM tables, the key_buffer will contain all the information that the query needs already in RAM; there is no need to access the data in the .MYD file, saving disk accesses and making the query performance lightning fast.  For InnoDB tables, the access is just as quick.  The PRIMARY KEY will be housed in the InnoDB data page buffer pool (since it is a clustered index and is the actual data records).  So, likewise, queries will be lightning fast as the records are in memory&#8230;
</p>
<p>
So, what about other types of queries; do they also benefit from the PRIMARY KEY index?  Let&#8217;s find out.  Here&#8217;s another fairly simple query which attempts to find all automobiles having the Leather option:
</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> a<span style="color: #66cc66;">.*</span>
<span style="color: #993333; font-weight: bold;">FROM</span> Auto a
<span style="color: #993333; font-weight: bold;">INNER</span> <span style="color: #993333; font-weight: bold;">JOIN</span> Auto2AutoFeature a2af
<span style="color: #993333; font-weight: bold;">ON</span> a<span style="color: #66cc66;">.</span>auto_id <span style="color: #66cc66;">=</span> a2af<span style="color: #66cc66;">.</span>auto_id
<span style="color: #993333; font-weight: bold;">WHERE</span> a2af<span style="color: #66cc66;">.</span>auto_feature_id <span style="color: #66cc66;">=</span> <span style="color: #cc66cc;">7</span>;</pre></div></div>

<pre>
+---------+----------------------+------------+------------+--------------+
| auto_id | auto_manufacturer_id | auto_model | model_year | asking_price |
+---------+----------------------+------------+------------+--------------+
|       3 |                    1 | 1          |       2003 |      5000.00 |
|       4 |                    1 | 2          |       2005 |     38000.00 |
|       5 |                    1 | 2          |       2004 |     31000.00 |
|       6 |                    1 | 3          |       2003 |     10000.00 |
|       8 |                    2 | 1          |       2004 |     12000.00 |
|      10 |                    2 | 2          |       2005 |     32000.00 |
|      11 |                    2 | 2          |       2005 |     42000.00 |
+---------+----------------------+------------+------------+--------------+
7 rows in set (0.00 sec)
</pre>
<p>
Let&#8217;s use an EXPLAIN to see if we&#8217;ve indeed got an ideal execution plan:
</p>
<pre>
mysql> EXPLAIN SELECT a.*
    -> FROM Auto a
    -> INNER JOIN Auto2AutoFeature a2af
    -> ON a.auto_id = a2af.auto_id
    -> WHERE a2af.auto_feature_id = 7 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 16
        Extra:
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: a2af
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 6
          ref: test.a.auto_id,const
         rows: 1
        Extra: Using index
2 rows in set (0.00 sec)
</pre>
<p>
Uh oh!  What happened?  We&#8217;ve encountered the dreaded ALL access type!  As you can see, MySQL chooses to do a table scan of the Auto table, and for each auto_id, do a lookup into the Auto2AutoFeature table along the auto_id key, filtering on the Auto2AutoFeature PRIMARY KEY&#8217;s auto_feature_id column (Look for &#8220;ref: test.a.auto_id,const&#8221;).  But, you ask, why didn&#8217;t MySQL just find the auto_id records in Auto2AutoFeature that had the Leather option, <em>and then</em> join that smaller resultset to the Auto table?
</p>
<p>
This is where the &#8220;robust indexing strategies&#8221; I spoke about earlier come into play, and where many novices get tripped up when dealing with key mapping tables.
</p>
<p>
The reason that MySQL chose the access plan above is because the auto_feature_id column (on which we are supplying a constant filter value of 7) is on the <strong><em>right side</em></strong> of the PRIMARY KEY index.  In order for MySQL to use an index effectively, it must be able to apply a constant or range filter value to the columns of an index, <em><strong>from LEFT to RIGHT</strong></em>.  This is why, although you do see the term &#8220;Using index&#8221; in the Extra column of the second EXPLAIN, MySQL actually has chosen a sub-optimal plan.  The solution?  We add another index to the key mapping table which allows the <em>reverse join direction</em> to be followed.
</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">UNIQUE</span> <span style="color: #993333; font-weight: bold;">INDEX</span> ix_ReversePK <span style="color: #993333; font-weight: bold;">ON</span>  Auto2AutoFeature <span style="color: #66cc66;">&#40;</span>auto_feature_id<span style="color: #66cc66;">,</span> auto_id<span style="color: #66cc66;">&#41;</span>;</pre></div></div>

<p>
Now, let&#8217;s take another stap at the same EXPLAIN from above:
</p>
<pre>
mysql> EXPLAIN SELECT a.*
    -> FROM Auto a
    -> INNER JOIN Auto2AutoFeature a2af
    -> ON a.auto_id = a2af.auto_id
    -> WHERE a2af.auto_feature_id = 7 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a2af
         type: ref
possible_keys: PRIMARY,ix_ReversePK
          key: ix_ReversePK
      key_len: 2
          ref: const
         rows: 7
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.a2af.auto_id
         rows: 1
        Extra:
2 rows in set (0.00 sec)
</pre>
<p>
See the difference?!  Now, the new reverse direction index we just added is first used to filter the Auto2AutoFeature records, and THEN the Auto table is joined.  This is the optimal query plan for this type of query.
</p>
<p>
So, to sum up this advantage, it should be said that with key mapping tables, you&#8217;re given much more ability to tune and optimize your queries, but at the same time, you&#8217;ve got to know what you&#8217;re looking for, and got to be willing to put in the time to analyze your queries.  But, hey!  If you&#8217;ve read all the way through this article so far, I&#8217;m willing to bet you&#8217;ll do just that.
</p>
</li>
<li>Flexibility to add and remove elements easily, and without schema changes.
<p>
One fatal flaw of the previously covered methods is their inability to deal easily with change.  Let&#8217;s face it, change happens constantly in the business world.  If you&#8217;re designing applications for use in this world, you had better make them able to deal with constant change.  Otherwise, you will be on the phone 24/7 fulfilling request after request to make small changes to this type of data.  Do yourself the favor, and give yourself back that time from the start.
</p>
<p>
With key mapping tables, adding new elements to the many-to-many relationship is simple.  You just add a record to the master table:
</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">INSERT</span> <span style="color: #993333; font-weight: bold;">INTO</span> AutoFeature <span style="color: #66cc66;">&#40;</span>feature_name<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">VALUES</span>  <span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'GPS Navigation'</span><span style="color: #66cc66;">&#41;</span>;</pre></div></div>

<p>
All done.  If you want to remove one, simply issue a DELETE with a join into the key mapping table.  This will remove the parent and all child relationships:
</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">DELETE</span>  AutoFeature<span style="color: #66cc66;">,</span> Auto2AutoFeature
<span style="color: #993333; font-weight: bold;">FROM</span> AutoFeature
<span style="color: #993333; font-weight: bold;">INNER</span> <span style="color: #993333; font-weight: bold;">JOIN</span> Auto2AutoFeature
<span style="color: #993333; font-weight: bold;">ON</span> AutoFeature<span style="color: #66cc66;">.</span>auto_feature_id <span style="color: #66cc66;">=</span> Auto2AutoFeature<span style="color: #66cc66;">.</span>auto_feature_id
<span style="color: #993333; font-weight: bold;">WHERE</span> AutoFeature<span style="color: #66cc66;">.</span>auto_feature_id <span style="color: #66cc66;">=</span> <span style="color: #cc66cc;">4</span>;</pre></div></div>

<pre>
Query OK, 5 rows affected (0.00 sec)
</pre>
</li>
</ol>
<p>
That wraps up this part&#8217;s coverage of the key mapping technique.  It was just a brief overview.  Next time, we&#8217;ll cover how to get the most out of your key mapping tables, including how to structure complex joins for various query filters against the many-to-many keys.
</p>
<h3>Conclusion</h3>
<p>
In this first part of the series, I&#8217;ve tried to thoroughly explain four common approaches for managing many-to-many relationships.  Each approach has it&#8217;s distinct benefits and drawbacks, and I&#8217;ve tried to represent those aspects clearly and fairly.  Please use the comments section below to let me know if you think I skimmed over certain methods or aspects, or didn&#8217;t give enough weight to one or the other.  Also, feel free to point out any methods I may have left out completely; I&#8217;m always looking to round out my material in a balanced and thoughful way, and welcome any constructive criticism!
</p>
<p>
In the second part of the series, I&#8217;ll be examining ways we can use stored procedures, views, and the INFORMATION_SCHEMA in MySQL 5 to construct an easy management system for key mapping tables, and I&#8217;ll present some more advanced SQL that enables you to have full control in querying across these key mapping tables, allowing you to drill down accurately into the wealth of information stored there.  We&#8217;ll also explore methods of increasing the performance of key mapping tables through the use of alternate indexes.
</p>
<p>
I&#8217;ll also be covering some benchmarks I&#8217;ll construct this coming week which show the relative performance and scalability of these methods, as well as the storage needs of each.  Hopefully, that part of the article series will give some qualitative numbers for those looking for that kind of thing! <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />   Till then, Cheers.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.joinfu.com/2005/12/managing-many-to-many-relationships-in-mysql-part-1/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>MySQL 5 Stored Procedures and INFORMATION_SCHEMA</title>
		<link>http://www.joinfu.com/2005/06/mysql-5-stored-procedures-and-information_schema/</link>
		<comments>http://www.joinfu.com/2005/06/mysql-5-stored-procedures-and-information_schema/#comments</comments>
		<pubDate>Fri, 17 Jun 2005 22:29:58 +0000</pubDate>
		<dc:creator>jaypipes</dc:creator>
				<category><![CDATA[Articles]]></category>

		<guid isPermaLink="false">http://joinfu.com/2005/06/mysql-5-stored-procedures-and-information_schema</guid>
		<description><![CDATA[A Handy One-Two Punch for Administrators In writing Pro MySQL, I&#8217;ve become extremely excited about the new features debuting in MySQL 5&#8212;features that have already started to get thoroughly tested by the development community and have leveled MySQL with Oracle, SQL Server, and PostgreSQL on functional equivalencies long sought-after by developers and administrators. I figured, [...]]]></description>
			<content:encoded><![CDATA[<h3><em>A Handy One-Two Punch for Administrators</em></h3>
<p>
In writing <a href="http://www.apress.com/book/bookDisplay.html?bID=433">Pro MySQL</a>, I&#8217;ve become extremely excited about the new features debuting in MySQL 5&#8212;features that have already started to get thoroughly tested by the development community and have leveled MySQL with Oracle, SQL Server, and PostgreSQL on functional equivalencies long sought-after by developers and administrators.
</p>
<p>
I figured, partly in response to <a href="http://www.mysql.com/news-and-events/news/article_The%20MySQL%205.0%20Beta%20Challenge.html">Arjen Lentz&#8217; call to action</a>, I&#8217;d write about two of these functional areas in this article: stored procedures and the INFORMATION_SCHEMA virtual database.  Both features are detailed thoroughly in Pro MySQL, Chapters 9 and 21, respectively, but I wanted to do a quick article combining these two features into a practical example of MySQL 5 functionality.
</p>
<p>
When I worked for RadioShack, a Microsoft shop, we used stored procedures and MS SQL Server&#8217;s INFORMATION_SCHEMA views <em><strong>all the time</strong></em>.  Since making the move to MySQL, I&#8217;ve missed these two features sorely.  Well, now the wait&#8217;s over.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.joinfu.com/2005/06/mysql-5-stored-procedures-and-information_schema/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

