<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>join-fu! &#187; C/C++</title>
	<atom:link href="http://www.joinfu.com/category/c-c-plus-plus/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.joinfu.com</link>
	<description>the art of sql</description>
	<lastBuildDate>Mon, 23 Jan 2012 20:21:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Signup for Drizzle Contributor Tutorial Webinar &#8211; May 15th</title>
		<link>http://www.joinfu.com/2010/05/signup-for-drizzle-contributor-tutorial-webinar-may-15th/</link>
		<comments>http://www.joinfu.com/2010/05/signup-for-drizzle-contributor-tutorial-webinar-may-15th/#comments</comments>
		<pubDate>Tue, 11 May 2010 15:48:57 +0000</pubDate>
		<dc:creator>jaypipes</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://www.joinfu.com/?p=367</guid>
		<description><![CDATA[Hi all! I&#8217;ll be giving an online webinar for Drizzle contributors on Saturday, May 15th @ 1am GMT (In the U.S. this is Friday, May14th @ 9pm EDT, 6pm PDT). Note that the DimDim widget below shows the time as May 14th @ 8pm. The widget is wrong, since DimDim does not account for daylight [...]]]></description>
			<content:encoded><![CDATA[<p>Hi all!</p>
<p>I&#8217;ll be giving an online webinar for <a href="http://drizzle.org">Drizzle</a> contributors on Saturday, May 15th @ 1am GMT (In the U.S. this is Friday, May14th @ 9pm EDT, 6pm PDT).</p>
<p><strong>Note that the DimDim widget below shows the time as May 14th @ 8pm.  The widget is wrong, since DimDim does not account for daylight savings.</strong></p>
<p>Space is strictly limited to 20 people and this will be done via DimDim.com.  Please register for the webinar by entering your email address in the widget below and clicking &#8220;Sign Up&#8221;.</p>
<p><script language='javascript' type='text/javascript' src='https://my.dimdim.com/static/js/common_support.js'></script><object type='application/x-shockwave-flash' id='flash_dimdim_widget' data='https://my.dimdim.com/static/dimdimWebinar2.swf?widgetParams=mid/4c1b7c5c-3c08-4b11-acd9-67af7b6cdaa6/furl/aHR0cHM6Ly9teS5kaW1kaW0uY29tLw==/op/saas:dimdim:all:jaypipes:default:dimdim:default:en_US/rec/0/tim/0/tra/0/reg/0/' width='250' height='310'><param name='movie' value='https://my.dimdim.com/static/dimdimWebinar2.swf?widgetParams=mid/4c1b7c5c-3c08-4b11-acd9-67af7b6cdaa6/furl/aHR0cHM6Ly9teS5kaW1kaW0uY29tLw==/op/saas:dimdim:all:jaypipes:default:dimdim:default:en_US/rec/0/tim/0/tra/0/reg/0/' /><param name='wmode' value='transparent' /><param name='allowNetworking' value='all' /><param name='allowFullScreen' value='false' /><param name='allowscriptaccess' value='always'></param></object></p>
<p>The agenda for this 2-3 hour tutorial will be:</p>
<ol>
<li>First Steps
<ul>
<li>Getting registered as a contributor for Drizzle on Launchpad</li>
<li>Registering your SSH keys with Launchpad</li>
<li>Picking up and creating blueprints</li>
<li>Basics of Bazaar</li>
<li>Setting up a local code repository for Drizzle</li>
<li>Committing your work to a Bazaar branch</li>
<li>Pushing your code to Launchpad</LI>
<li>Requesting a code review and merge into trunk</li>
<li><strong>One slide</strong> explaining the license your contributions may be submitted under</li>
</ul>
</li>
<li>The Drizzle Source Code
<ul>
<li>Our coding standards</li>
<li>Our build system</li>
<li>Walkthrough of major directories in Drizzle</li>
<li>Understanding the plugin system</li>
<li>Understanding what the kernel is responsible for</li>
<li>Where the Dragons live &mdash; and how to avoid them</li>
</ul>
</li>
<li>
Walkthrough of a SELECT statement</p>
<ul>
<li>Client communication with server</li>
<li>The role of the session scheduler plugin</li>
<li>How, when and where authentication and authorization plugins are called</li>
<li>How the <tt>drizzled::statement::Statement</tt> subclasses work</li>
<li>Dive into <tt>drizzled::statement::Select::execute()</tt></li>
<li>Walkthrough how a Table&#8217;s definition (metadata) is read from a protobuffer file or an engine</li>
<li>Dive into mysql_lock_tables()</li>
<li>How does <tt>drizzled::plugin::StorageEngine::startStatement()</tt> work?</li>
<li>How does <tt>drizzled::plugin::TransactionalStorageEngine::startTransaction()</tt> work?</li>
<li>Inside the join optimizer and <tt>Join::optimize()</tt>
<li>How does the nested loops algorithm get executed and how does <tt>READ_RECORD</tt> work?</tt>
<li>How does <tt>drizzled::Cursor</tt> perform table and index scans and seeks?</tt>
<li>How are result sets packaged up and sent to clients?</tt>
</ul>
</li>
<li>
Plugin Development Tutorial</p>
<ul>
<li>What plugin classes are even available?</li>
<li>Creating your basic plugin</li>
<li>The plugin.ini file</li>
<li>The module initialization and configuration file</li>
<li>Registering your plugin with the kernel with <tt>plugin::Context::add()</tt></li>
<li>Publishing your plugin's information using table functions</li>
<li>Providing users control over your plugin with user-defined functions</li>
</ul>
</li>
]]></content:encoded>
			<wfw:commentRss>http://www.joinfu.com/2010/05/signup-for-drizzle-contributor-tutorial-webinar-may-15th/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Holy Google Summer of Code, Batman</title>
		<link>http://www.joinfu.com/2010/03/holy-google-summer-of-code-batman/</link>
		<comments>http://www.joinfu.com/2010/03/holy-google-summer-of-code-batman/#comments</comments>
		<pubDate>Fri, 26 Mar 2010 16:06:02 +0000</pubDate>
		<dc:creator>jaypipes</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://www.joinfu.com/?p=356</guid>
		<description><![CDATA[So, last year, Drizzle participated in the Google Summer of Code under the MySQL project organization. We had four excellent student submissions and myself, Monty Taylor, Eric Day and Stewart Smith all mentored students for the summer. It was my second year mentoring, and I really enjoyed it, so I was looking forward to this [...]]]></description>
			<content:encoded><![CDATA[<p>So, last year, <a href="http://launchpad.net/drizzle">Drizzle</a> <a href="http://socghop.appspot.com/gsoc/program/list_projects/google/gsoc2009">participated</a> in the Google Summer of Code under the MySQL project organization.  We had four excellent student submissions and myself, <a href="http://inaugust.com">Monty Taylor</a>, <a href="http://oddments.org">Eric Day</a> and <a href="http://flamingspork.com">Stewart Smith</a> all mentored students for the summer.  It was my second year mentoring, and I really enjoyed it, so I was looking forward to this year&#8217;s summer of code.</p>
<p>This year, <a href="http://posulliv.com">Padraig O&#8217;Sullivan</a>, a GSoC student last year, is now working at <a href="http://www.akiban.com/">Akiban Technologies</a>, partly on Drizzle, and is the GSoC Adminsitrator and also a mentor for Drizzle this year, and <em>Drizzle is its own <a href="http://socghop.appspot.com/gsoc/program/accepted_orgs/google/gsoc2010">sponsored project organization</a> this year</em>.  Thank you, Padraig!</p>
<p>I have been absolutely floored by the flood of potential students who have shown up on the <a href="https://lists.launchpad.net/drizzle-discuss/">mailing list</a> and the #drizzle IRC channel.  I have been even more impressed with those students&#8217; ambition, sense of community, and willingness to ask questions and help other students as they show up.  A couple students have even gotten code contributed to the source trees even before submitting their official applications to GSoC.  See, I told you they were ambitious! <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>This year, Drizzle has a <a href="http://drizzle.org/wiki/Soc">listing of 16 potential projects</a> for students to work on.  The projects are for students interested in developing in C++, Python, or Perl.</p>
<p>If you are interested in participating, please do check out Drizzle!  For those new to Launchpad, Bazaar, and C++ development with Drizzle, feel free to check out these blog articles which cover those topics:</p>
<ul>
<li><a href="http://www.joinfu.com/2008/08/a-contributors-guide-to-launchpadnet-part-1-getting-started/">A Contributor&#8217;s Guide to Launchpad and Bazaar &#8211; Part 1 &#8211; Getting Started</a></li>
<li><a href="http://www.joinfu.com/2008/08/a-contributors-guide-to-launchpadnet-part-2-code-management/">A Contributor&#8217;s Guide to Launchpad and Bazaar &#8211; Part 2 &#8211; Code Management</a></li>
<li><a href="http://www.joinfu.com/2008/08/getting-a-working-c-c-plusplus-development-environment-for-developing-drizzle/">Getting a C++ Development Enviroment Established</a></li>
</ul>
<p>And, in other news, <a href="http://www.ncaa.com/brackets/basketball/men/">Go Buckeyes</a>!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.joinfu.com/2010/03/holy-google-summer-of-code-batman/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Understanding Drizzle&#8217;s Transaction Log</title>
		<link>http://www.joinfu.com/2010/03/understanding-drizzles-transaction-log/</link>
		<comments>http://www.joinfu.com/2010/03/understanding-drizzles-transaction-log/#comments</comments>
		<pubDate>Wed, 17 Mar 2010 21:03:00 +0000</pubDate>
		<dc:creator>jaypipes</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://www.joinfu.com/?p=355</guid>
		<description><![CDATA[Today I pushed up the initial patch which adds XA support to Drizzle&#8217;s transaction log. So, to give myself a bit of a rest from coding, I&#8217;m going to blog a bit about the transaction log and show off some of its features. WARNING: Please keep in mind that the transaction log module in Drizzle [...]]]></description>
			<content:encoded><![CDATA[<p>Today I pushed up the <a href="http://bazaar.launchpad.net/~jaypipes/drizzle/xa-transaction-log/revision/1312">initial patch</a> which adds XA support to Drizzle&#8217;s transaction log.  So, to give myself a bit of a rest from coding, I&#8217;m going to blog a bit about the transaction log and show off some of its features.</p>
<div style="padding: 15px 50px; background-color: #f7f7f7; border: solid 1px #666; color: red;">
<strong>WARNING</strong>: Please keep in mind that the transaction log module in Drizzle is under heavy development and should not be used in production environments.  That said, I&#8217;d love to get as much feedback as possible on it, and if you feel like throwing some heavy data at it, that would be awesome <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />
</div>
<h2>What is the Transaction Log?</h2>
<p>Simply put, the transaction log is a record of every modification to the state of the server&#8217;s data.  It is similar to MySQL&#8217;s binlog, with some substantial differences:</p>
<ul>
<li>The transaction log is composed of <a href="http://code.google.com/apis/protocolbuffers/docs/overview.html">Google Protobuffer</a> messages.  Because of this, it is possible to read the log using a variety of programming languages, as <a href="http://developian.blogspot.com/">Marcus Eriksson</a>&#8216;s <a href="https://launchpad.net/rabbitreplication">RabbitReplication</a> project demonstrates.</li>
<li>The transaction log is a plugin<sup>[1]</sup>.  It lives entirely outside of the Drizzle kernel.  The advantage of this is that development of the transaction log does not need to be linked with development in the kernel and versioning of the transaction log can happen independently of the kernel.</li>
<li>Currently, there is only a single log file.  MySQL&#8217;s binlog can be split into multiple files.  This may or may not change in the future. <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
<li>Drizzle&#8217;s transaction log is indexed.  Among other things, this means that you can query the transaction log directly from within a Drizzle client via DATA_DICTIONARY views.  I will demonstrate this feature below.</li>
</ul>
<p>It is important to also point out that Drizzle&#8217;s transaction log is not required for Drizzle replication.  This probably sounds very weird to folks who are accustomed to MySQL replication, which depends on the MySQL binlog.  In Drizzle, the replication API is different.  Although the transaction log <em><strong>can be used</strong></em> in Drizzle&#8217;s replication system, it&#8217;s not required.  I&#8217;ll write more on this in later blog posts which demonstrate how the replication system is not dependent on the transaction log, but in this article I just want to highlight the transaction log module. </p>
<h2>How Do I Enable the Transaction Log</h2>
<p>First things first, let&#8217;s see how we can enable the Transaction Log.  If you&#8217;ve built Drizzle from source or have installed Drizzle locally, you will be familiar with the process of starting up a Drizzle server.  To review, here is how you do so:</p>
<pre>
cd $basedir
./drizzled [options] &#038;
</pre>
<p>Where <tt>$basedir</tt> is the directory you built Drizzle or installed Drizzle.  For the <tt>[options]</tt>, typically you will need at the very least a <tt>--datadir=$DATADIR</tt> and a <tt>--mysql-protocol-port=$PORT</tt> value.  For an explanation of the <tt>--mysql-protocol-port</tt> option, see <a href="http://oddments.org">Eric Day</a>&#8216;s <a href="http://oddments.org/?p=307">recent article</a>.</p>
<p>To demonstrate, I&#8217;ve built a Drizzle server in a local directory of mine, and I&#8217;ll use the <tt>/tests/var/</tt> directory as my <tt>$datadir</tt>:</p>
<pre>
cd /home/jpipes/repos/drizzle/xa-transaction-log/drizzled/
./drizzled --datadir=/home/jpipes/repos/drizzle/xa-transaction-log/tests/var/ --mysql-protocol-port=9306 &#038;
</pre>
<p>You should see output similar to this:</p>
<pre>
jpipes@serialcoder:~/repos/drizzle/xa-transaction-log/drizzled$ ./drizzled --datadir=/home/jpipes/repos/drizzle/xa-transaction-log/tests/var/ --mysql-protocol-port=9306 &#038;
[1] 31499
jpipes@serialcoder:~/repos/drizzle/xa-transaction-log/drizzled$ InnoDB: The InnoDB memory heap is disabled
InnoDB: Mutexes and rw_locks use GCC atomic builtins.
InnoDB: The first specified data file ./ibdata1 did not exist:
InnoDB: a new database to be created!
100317 15:41:51  InnoDB: Setting file ./ibdata1 size to 10 MB
InnoDB: Database physically writes the file full: wait...
100317 15:41:52  InnoDB: Log file ./ib_logfile0 did not exist: new to be created
InnoDB: Setting log file ./ib_logfile0 size to 5 MB
InnoDB: Database physically writes the file full: wait...
100317 15:41:52  InnoDB: Log file ./ib_logfile1 did not exist: new to be created
InnoDB: Setting log file ./ib_logfile1 size to 5 MB
InnoDB: Database physically writes the file full: wait...
InnoDB: Doublewrite buffer not found: creating new
InnoDB: Doublewrite buffer created
InnoDB: Creating foreign key constraint system tables
InnoDB: Foreign key constraint system tables created
100317 15:41:53 InnoDB Plugin 1.0.4 started; log sequence number 0
Listening on 0.0.0.0:9306
Listening on :::9306
Listening on 0.0.0.0:4427
Listening on :::4427
./drizzled: Forcing close of thread 0 user: ''
./drizzled: ready for connections.
Version: '2010.03.1314' Source distribution (xa-transaction-log)
</pre>
<p>To connect to the above server, I then do:</p>
<pre>
../client/drizzle --port=9306
</pre>
<p>If all went well, you should be at a drizzle client prompt:</p>
<pre>
jpipes@serialcoder:~/repos/drizzle/xa-transaction-log/drizzled$ ../client/drizzle --port=9306
Welcome to the Drizzle client..  Commands end with ; or \g.
Your Drizzle connection id is 2
Server version: 7 Source distribution (xa-transaction-log)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

drizzle>
</pre>
<p>You can check to see whether the transaction log is enabled by querying the <tt>DATA_DICTIONARY.VARIABLES</tt> table.  The transaction log is not on by default:</p>
<pre>
drizzle> use data_dictionary
Reading table information for completion of table and column names
    You can turn off this feature to get a quicker startup with -A

Database changed
drizzle> SELECT * FROM GLOBAL_VARIABLES WHERE VARIABLE_NAME LIKE 'transaction_log%';
+---------------------------------+-----------------+
| VARIABLE_NAME                   | VARIABLE_VALUE  |
+---------------------------------+-----------------+
| transaction_log_enable          | OFF             |
| transaction_log_enable_checksum | OFF             |
| transaction_log_enable_xa       | OFF             |
| transaction_log_log_file        | transaction.log |
| transaction_log_sync_method     | 0               |
| transaction_log_truncate_debug  | OFF             |
| transaction_log_xa_num_slots    | 8               |
+---------------------------------+-----------------+
7 rows in set (0 sec)
</pre>
<p>OK, let&#8217;s start up the server, this time with the transaction log enabled.  To shutdown Drizzle, there is no need to use a tool like mysqladmin.  You can shutdown the server via the client:</p>
<pre>
drizzle> exit
Bye
jpipes@serialcoder:~/repos/drizzle/xa-transaction-log/drizzled$ ../client/drizzle --port=9306 --shutdown
jpipes@serialcoder:~/repos/drizzle/xa-transaction-log/drizzled$ ./drizzled: Normal shutdown
100317 15:53:48  InnoDB: Starting shutdown...
100317 15:53:49  InnoDB: Shutdown completed; log sequence number 44244
...
</pre>
<p>Now let&#8217;s start up the server, this time passing the <tt>--transaction-log-enable</tt> and the <tt>--default-replicator-enable</tt> options.  The <tt>--default-replicator-enable</tt> option is needed when the transaction log is not in XA mode (more on that later):</p>
<pre>
jpipes@serialcoder:~/repos/drizzle/xa-transaction-log/drizzled$ ./drizzled --datadir=/home/jpipes/repos/drizzle/xa-transaction-log/tests/var/ --mysql-protocol-port=9306 --transaction-log-enable --default-replicator-enable &#038;
[2] 31582
[1]   Done                    ./drizzled --datadir=/home/jpipes/repos/drizzle/xa-transaction-log/tests/var/ --mysql-protocol-port=9306
jpipes@serialcoder:~/repos/drizzle/xa-transaction-log/drizzled$ InnoDB: The InnoDB memory heap is disabled
...
./drizzled: ready for connections.
</pre>
<p>And again, connect to the server and check our transaction log variables again:</p>
<pre>
jpipes@serialcoder:~/repos/drizzle/xa-transaction-log/drizzled$ ../client/drizzle --port=9306
Welcome to the Drizzle client..  Commands end with ; or \g.
Your Drizzle connection id is 2
Server version: 7 Source distribution (xa-transaction-log)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

drizzle> use data_dictionary
Reading table information for completion of table and column names
    You can turn off this feature to get a quicker startup with -A

Database changed
drizzle> SELECT * FROM GLOBAL_VARIABLES WHERE VARIABLE_NAME LIKE 'transaction_log%';
+---------------------------------+-----------------+
| VARIABLE_NAME                   | VARIABLE_VALUE  |
+---------------------------------+-----------------+
| transaction_log_enable          | ON              |
| transaction_log_enable_checksum | OFF             |
| transaction_log_enable_xa       | OFF             |
| transaction_log_log_file        | transaction.log |
| transaction_log_sync_method     | 0               |
| transaction_log_truncate_debug  | OFF             |
| transaction_log_xa_num_slots    | 8               |
+---------------------------------+-----------------+
7 rows in set (0 sec)

drizzle>
</pre>
<p>OK.  So, if you check the <tt>$datadir</tt>, you should see a file called <tt>transaction.log</tt>, with a size of 0:</p>
<pre>
jpipes@serialcoder:~/repos/drizzle/xa-transaction-log/drizzled$ ls -lha ../tests/var/
total 21M
drwxr-xr-x  6 jpipes jpipes 4.0K 2010-03-17 15:54 .
drwxr-xr-x 11 jpipes jpipes 4.0K 2010-03-17 14:57 ..
-rw-rw----  1 jpipes jpipes  10M 2010-03-17 15:54 ibdata1
-rw-rw----  1 jpipes jpipes 5.0M 2010-03-17 15:54 ib_logfile0
-rw-rw----  1 jpipes jpipes 5.0M 2010-03-17 15:41 ib_logfile1
-rwxr-----  1 jpipes jpipes    6 2010-03-17 15:54 serialcoder.pid
-rwx------  1 jpipes jpipes    0 2010-03-17 15:54 transaction.log
</pre>
<p>Back in the drizzle client, let&#8217;s go ahead and create a new schema, a new table, and add a single row to that table.  This will add some entries to the transaction log that we&#8217;ll be able to view:</p>
<pre>
drizzle> CREATE SCHEMA lebowski;
Query OK, 1 rows affected (0.06 sec)
drizzle> USE lebowski
Database changed
drizzle> CREATE TABLE characters (name VARCHAR(20) NOT NULL PRIMARY KEY,
    -> hobby VARCHAR(10) NOT NULL) ENGINE=InnoDB;
Query OK, 0 rows affected (0.06 sec)

drizzle> INSERT INTO characters VALUES ('the dude','bowling');
Query OK, 1 row affected (0.05 sec)
</pre>
<p>Checking in on our transaction log file, we see it now has some size to it:</p>
<pre>
jpipes@serialcoder:~/repos/drizzle/xa-transaction-log/drizzled$ ls -lha ../tests/var/
total 21M
drwxr-xr-x  7 jpipes jpipes 4.0K 2010-03-17 16:11 .
drwxr-xr-x 11 jpipes jpipes 4.0K 2010-03-17 14:57 ..
-rw-rw----  1 jpipes jpipes  10M 2010-03-17 16:11 ibdata1
-rw-rw----  1 jpipes jpipes 5.0M 2010-03-17 16:11 ib_logfile0
-rw-rw----  1 jpipes jpipes 5.0M 2010-03-17 16:11 ib_logfile1
drwxrwx--x  2 jpipes jpipes 4.0K 2010-03-17 16:11 lebowski
-rwxr-----  1 jpipes jpipes    6 2010-03-17 16:11 serialcoder.pid
-rwx------  1 jpipes jpipes  444 2010-03-17 16:11 transaction.log
</pre>
<h2>Finding Out What&#8217;s In the Transaction Log</h2>
<p>OK, so now for the really cool part of this little demonstration. <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />   Let&#8217;s take a look at what is now contained in the transaction log, all via the Drizzle client and the <tt>DATA_DICTIONARY</tt> views.</p>
<p>There are currently three <tt>DATA_DICTIONARY</tt> views which show information about the transaction log and its contents:</p>
<ul>
<li><tt>DATA_DICTIONARY.TRANSACTION_LOG</tt></li>
<li><tt>DATA_DICTIONARY.TRANSACTION_LOG_ENTRIES</tt></li>
<li><tt>DATA_DICTIONARY.TRANSACTION_LOG_TRANSACTIONS</tt></li>
</ul>
<p>To see what each view contains, simply do a <tt>DESC</tt> on them:</p>
<pre>
drizzle> use data_dictionary
Reading table information for completion of table and column names
    You can turn off this feature to get a quicker startup with -A

Database changed
drizzle> DESC TRANSACTION_LOG;
+---------------------+---------+-------+---------+-----------------+-----------+
| Field               | Type    | Null  | Default | Default_is_NULL | On_Update |
+---------------------+---------+-------+---------+-----------------+-----------+
| FILE_NAME           | VARCHAR | FALSE |         | FALSE           |           |
| FILE_LENGTH         | BIGINT  | FALSE |         | FALSE           |           |
| NUM_LOG_ENTRIES     | BIGINT  | FALSE |         | FALSE           |           |
| NUM_TRANSACTIONS    | BIGINT  | FALSE |         | FALSE           |           |
| MIN_TRANSACTION_ID  | BIGINT  | FALSE |         | FALSE           |           |
| MAX_TRANSACTION_ID  | BIGINT  | FALSE |         | FALSE           |           |
| MIN_END_TIMESTAMP   | BIGINT  | FALSE |         | FALSE           |           |
| MAX_END_TIMESTAMP   | BIGINT  | FALSE |         | FALSE           |           |
| INDEX_SIZE_IN_BYTES | BIGINT  | FALSE |         | FALSE           |           |
+---------------------+---------+-------+---------+-----------------+-----------+
9 rows in set (0 sec)

drizzle> DESC TRANSACTION_LOG_ENTRIES;
+--------------+---------+-------+---------+-----------------+-----------+
| Field        | Type    | Null  | Default | Default_is_NULL | On_Update |
+--------------+---------+-------+---------+-----------------+-----------+
| ENTRY_OFFSET | BIGINT  | FALSE |         | FALSE           |           |
| ENTRY_TYPE   | VARCHAR | FALSE |         | FALSE           |           |
| ENTRY_LENGTH | BIGINT  | FALSE |         | FALSE           |           |
+--------------+---------+-------+---------+-----------------+-----------+
3 rows in set (0 sec)

drizzle> DESC TRANSACTION_LOG_TRANSACTIONS;
+-----------------+--------+-------+---------+-----------------+-----------+
| Field           | Type   | Null  | Default | Default_is_NULL | On_Update |
+-----------------+--------+-------+---------+-----------------+-----------+
| ENTRY_OFFSET    | BIGINT | FALSE |         | FALSE           |           |
| TRANSACTION_ID  | BIGINT | FALSE |         | FALSE           |           |
| SERVER_ID       | BIGINT | FALSE |         | FALSE           |           |
| START_TIMESTAMP | BIGINT | FALSE |         | FALSE           |           |
| END_TIMESTAMP   | BIGINT | FALSE |         | FALSE           |           |
| NUM_STATEMENTS  | BIGINT | FALSE |         | FALSE           |           |
| CHECKSUM        | BIGINT | FALSE |         | FALSE           |           |
+-----------------+--------+-------+---------+-----------------+-----------+
7 rows in set (0 sec)
</pre>
<p>Let&#8217;s see what each of the views tells us about what is in the transaction log.  Remember, we&#8217;ve executed a <tt>CREATE SCHEMA</tt>, a <tt>CREATE TABLE</tt>, and a single <tt>INSERT</tt>.  Here is what the <tt>TRANSACTION_LOG</tt> view shows:</p>
<pre>
drizzle> SELECT * FROM TRANSACTION_LOG\G
*************************** 1. row ***************************
          FILE_NAME: transaction.log
        FILE_LENGTH: 444
    NUM_LOG_ENTRIES: 3
   NUM_TRANSACTIONS: 3
 MIN_TRANSACTION_ID: 1
 MAX_TRANSACTION_ID: 3
  MIN_END_TIMESTAMP: 1268856698672620
  MAX_END_TIMESTAMP: 1268856707093000
INDEX_SIZE_IN_BYTES: 73736
</pre>
<p>The column names should be self explanatory.  The <tt>FILE_LENGTH</tt> shows the size in bytes of the log (which matches the output we had from our <tt>ls -lha</tt> above.)  The <tt>INDEX_SIZE_IN_BYTES</tt> is total amount of memory allocated for the transaction log index. </p>
<p>The <tt>TRANSACTION_LOG_ENTRIES</tt> view isn&#8217;t that interesting at first glance:</p>
<pre>
drizzle> SELECT * FROM TRANSACTION_LOG_ENTRIES;
+--------------+-------------+--------------+
| ENTRY_OFFSET | ENTRY_TYPE  | ENTRY_LENGTH |
+--------------+-------------+--------------+
|            0 | TRANSACTION |           89 |
|           89 | TRANSACTION |          223 |
|          312 | TRANSACTION |          132 |
+--------------+-------------+--------------+
</pre>
<p>You might be tempted to ask what the heck the purpose of the <tt>TRANSACTION_LOG_ENTRIES</tt> view is for.  It is a bit of a bridge table that allows one to see the type of entries at each offset.  Currently, the only types of entries in the transaction log are of type <tt>TRANSACTION</tt> &mdash; basically a serialized GPB Protobuffer message &mdash; and a <tt>BLOB</tt> entry, which is for storage of large blob data.</p>
<p>The <tt>TRANSACTION_LOG_TRANSACTIONS</tt> view shows all the transaction log entries which are of type <tt>TRANSACTION</tt>:</p>
<pre>
drizzle> SELECT * FROM TRANSACTION_LOG_TRANSACTIONS;
+--------------+----------------+-----------+------------------+------------------+----------------+----------+
| ENTRY_OFFSET | TRANSACTION_ID | SERVER_ID | START_TIMESTAMP  | END_TIMESTAMP    | NUM_STATEMENTS | CHECKSUM |
+--------------+----------------+-----------+------------------+------------------+----------------+----------+
|            0 |              1 |         1 | 1268856698672606 | 1268856698672620 |              1 |        0 |
|           89 |              2 |         1 | 1268856702792284 | 1268856702792331 |              1 |        0 |
|          312 |              3 |         1 | 1268856707025455 | 1268856707093000 |              1 |        0 |
+--------------+----------------+-----------+------------------+------------------+----------------+----------+
3 rows in set (0 sec)
</pre>
<p>As you can see, there is some basic information about each transaction entry in the log, including the offset in the transaction log, the start and end timestamp of the transaction, it&#8217;s transaction identifier, the number of statements involved in the transaction, and an optional checksum for the message (more on checksums below).</p>
<h2>Viewing the Transaction Content</h2>
<p>While the above view output may be nice, what we&#8217;d really like to be able to do is see what <em>precisely</em> were the changes a Transaction effected.  To see this, we can use the <tt>PRINT_TRANSACTION_MESSAGE(<em>log_file</em>, <em>offset</em>)</tt> UDF.  Below, I&#8217;ve added two more rows to the <tt>lebowski.characters</tt> table within an explicit transaction.  I then query the <tt>DATA_DICTIONARY</tt> views using the <tt>PRINT_TRANSACTION_MESSAGE()</tt> function to show the changes logged to the transaction log:</p>
<pre>
drizzle> use lebowski
Reading table information for completion of table and column names
    You can turn off this feature to get a quicker startup with -A

Database changed
drizzle> START TRANSACTION;
Query OK, 0 rows affected (0 sec)

drizzle> INSERT INTO characters VALUES ('walter','bowling');
Query OK, 1 row affected (0 sec)

drizzle> INSERT INTO characters VALUES ('donny','bowling');
Query OK, 1 row affected (0 sec)

drizzle> COMMIT;
Query OK, 0 rows affected (0.09 sec)
</pre>
<p>We now see an additional Transaction Log entry and can see that this transaction contains the two individual <tt>INSERT</tt> statements just executed:</p>
<pre>
drizzle> SELECT * FROM TRANSACTION_LOG_TRANSACTIONS;
+--------------+----------------+-----------+------------------+------------------+----------------+----------+
| ENTRY_OFFSET | TRANSACTION_ID | SERVER_ID | START_TIMESTAMP  | END_TIMESTAMP    | NUM_STATEMENTS | CHECKSUM |
+--------------+----------------+-----------+------------------+------------------+----------------+----------+
|            0 |              1 |         1 | 1268856698672606 | 1268856698672620 |              1 |        0 |
|           89 |              2 |         1 | 1268856702792284 | 1268856702792331 |              1 |        0 |
|          312 |              3 |         1 | 1268856707025455 | 1268856707093000 |              1 |        0 |
|          444 |              4 |         1 | 1268857926482600 | 1268857938514312 |              1 |        0 |
+--------------+----------------+-----------+------------------+------------------+----------------+----------+
...
drizzle> SELECT PRINT_TRANSACTION_MESSAGE('transaction.log', ENTRY_OFFSET) as info
    -> FROM TRANSACTION_LOG_TRANSACTIONS WHERE ENTRY_OFFSET = 444\G
*************************** 1. row ***************************
info: transaction_context {
  server_id: 1
  transaction_id: 4
  start_timestamp: 1268857926482600
  end_timestamp: 1268857938514312
}
statement {
  type: INSERT
  start_timestamp: 1268857926482605
  end_timestamp: 1268857938514310
  insert_header {
    table_metadata {
      schema_name: "lebowski"
      table_name: "characters"
    }
    field_metadata {
      type: VARCHAR
      name: "name"
    }
    field_metadata {
      type: VARCHAR
      name: "hobby"
    }
  }
  insert_data {
    segment_id: 1
    end_segment: true
    record {
      insert_value: "walter"
      insert_value: "bowling"
    }
    record {
      insert_value: "donny"
      insert_value: "bowling"
    }
  }
}

1 row in set (0.01 sec)
</pre>
<p>You may notice that <tt>NUM_STATEMENTS</tt> is equal to 1 even though there were 2 <tt>INSERT</tt> statements issued.  This is because the kernel packages both the <tt>INSERT</tt>s into a single <tt>message::Statement::InsertData</tt> package for more efficient storage.  If there had been an <tt>INSERT</tt> and an <tt>UPDATE</tt>, <tt>NUM_STATEMENTS</tt> would be 2.</p>
<h2>Enable Automatic Checksumming</h2>
<p>One final feature I&#8217;ll highlight in this blog post is an option to automatically store a checksum of each transaction message when writing entries to the transaction log.  To enable this feature, simply use the <tt>--transaction-log-enable-checksum</tt> command line option.  You can view the checksums of entries in the <tt>TRANSACTION_LOG_TRANSACTIONS</tt> view, as demonstrated below:</p>
<pre>
jpipes@serialcoder:~/repos/drizzle/xa-transaction-log/drizzled$ ./drizzled --datadir=/home/jpipes/repos/drizzle/xa-transaction-log/tests/var/ --mysql-protocol-port=9306 --transaction-log-enable --default-replicator-enable --transaction-log-enable-checksum &#038;
[5] 32042
[4]   Done                    ./drizzled --datadir=/home/jpipes/repos/drizzle/xa-transaction-log/tests/var/ --mysql-protocol-port=9306 --transaction-log-enable --default-replicator-enable
jpipes@serialcoder:~/repos/drizzle/xa-transaction-log/drizzled$ InnoDB: The InnoDB memory heap is disabled
InnoDB: Mutexes and rw_locks use GCC atomic builtins.
InnoDB: The first specified data file ./ibdata1 did not exist:
InnoDB: a new database to be created!
100317 16:47:07  InnoDB: Setting file ./ibdata1 size to 10 MB
InnoDB: Database physically writes the file full: wait...
100317 16:47:07  InnoDB: Log file ./ib_logfile0 did not exist: new to be created
InnoDB: Setting log file ./ib_logfile0 size to 5 MB
InnoDB: Database physically writes the file full: wait...
100317 16:47:08  InnoDB: Log file ./ib_logfile1 did not exist: new to be created
InnoDB: Setting log file ./ib_logfile1 size to 5 MB
InnoDB: Database physically writes the file full: wait...
InnoDB: Doublewrite buffer not found: creating new
InnoDB: Doublewrite buffer created
InnoDB: Creating foreign key constraint system tables
InnoDB: Foreign key constraint system tables created
100317 16:47:08 InnoDB Plugin 1.0.4 started; log sequence number 0
Listening on 0.0.0.0:9306
Listening on :::9306
Listening on 0.0.0.0:4427
Listening on :::4427
./drizzled: Forcing close of thread 0 user: ''
./drizzled: ready for connections.
Version: '2010.03.1314' Source distribution (xa-transaction-log)
...
jpipes@serialcoder:~/repos/drizzle/xa-transaction-log/drizzled$ ../client/drizzle --port=9306
Welcome to the Drizzle client..  Commands end with ; or \g.
Your Drizzle connection id is 2
Server version: 7 Source distribution (xa-transaction-log)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

drizzle> CREATE SCHEMA lebowski;
Query OK, 1 row affected (0.05 sec)

drizzle> CREATE TABLE characters (name VARCHAR(20) NOT NULL PRIMARY KEY, hobby VARCHAR(10) NOT NULL) ENGINE=InnoDB;
ERROR 1046 (3D000): No database selected
drizzle> use lebowski
Database changed
drizzle> CREATE TABLE characters (name VARCHAR(20) NOT NULL PRIMARY KEY, hobby VARCHAR(10) NOT NULL) ENGINE=InnoDB;
Query OK, 0 rows affected (0.11 sec)

drizzle> INSERT INTO characters VALUES ('the dude','bowling');
Query OK, 1 row affected (0.1 sec)

drizzle> use data_dictionary
Reading table information for completion of table and column names
    You can turn off this feature to get a quicker startup with -A

Database changed
drizzle> SELECT ENTRY_OFFSET, TRANSACTION_ID, CHECKSUM FROM TRANSACTION_LOG_TRANSACTIONS;
+--------------+----------------+------------+
| ENTRY_OFFSET | TRANSACTION_ID | CHECKSUM   |
+--------------+----------------+------------+
|            0 |              2 |  143866125 |
|           89 |              8 | 1466831622 |
|          312 |              9 |  460824986 |
+--------------+----------------+------------+
3 rows in set (0 sec)
</pre>
<h2>DDL is not Statement-based Replication</h2>
<p>As a final note, I&#8217;d like to point out that even DDL in Drizzle is replicated as row-based transaction messages, and not as raw SQL statements like in MySQL.  You can see, for instance, the <tt>message::Statement::CreateTableStatement</tt> inside the transaction message which contains all the metadata about the table you just created. <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<pre>
drizzle> SELECT PRINT_TRANSACTION_MESSAGE('transaction.log', ENTRY_OFFSET)
    -> FROM TRANSACTION_LOG_TRANSACTIONS WHERE ENTRY_OFFSET = 89\G
*************************** 1. row ***************************
PRINT_TRANSACTION_MESSAGE('transaction.log', ENTRY_OFFSET): transaction_context {
  server_id: 1
  transaction_id: 2
  start_timestamp: 1268858897017396
  end_timestamp: 1268858897017447
}
statement {
  type: CREATE_TABLE
  start_timestamp: 1268858897017402
  end_timestamp: 1268858897017445
  create_table_statement {
    table {
      name: "characters"
      engine {
        name: "InnoDB"
      }
      field {
        name: "name"
        type: VARCHAR
        format: DefaultFormat
        constraints {
          is_nullable: false
        }
        string_options {
          length: 20
          collation_id: 45
          collation: "utf8_general_ci"
        }
      }
      field {
        name: "hobby"
        type: VARCHAR
        format: DefaultFormat
        constraints {
          is_nullable: false
        }
        string_options {
          length: 10
          collation_id: 45
          collation: "utf8_general_ci"
        }
      }
      indexes {
        name: "PRIMARY"
        is_primary: true
        is_unique: true
        type: UNKNOWN_INDEX
        key_length: 80
        index_part {
          fieldnr: 0
          compare_length: 80
          key_type: 0
        }
        options {
          binary_pack_key: true
          var_length_key: true
        }
      }
      type: STANDARD
      options {
        collation: "utf8_general_ci"
        collation_id: 45
      }
    }
  }
}

1 row in set (0 sec)
</pre>
<p>If you like or don&#8217;t like what you see, please do get in touch with me or fire off a wishlist to the <a href="https://launchpad.net/~drizzle-discuss">Drizzle Discuss mailing list</a>.  We&#8217;d love to hear from ya!</p>
<p><sup>[1]</sup> Actually, the transaction log module is a set of plugins.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.joinfu.com/2010/03/understanding-drizzles-transaction-log/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Recent Work on Improving Drizzle&#8217;s Storage Engine API</title>
		<link>http://www.joinfu.com/2010/03/recent-work-on-improving-drizzles-storage-engine-api/</link>
		<comments>http://www.joinfu.com/2010/03/recent-work-on-improving-drizzles-storage-engine-api/#comments</comments>
		<pubDate>Sat, 13 Mar 2010 07:07:59 +0000</pubDate>
		<dc:creator>jaypipes</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[storage engine API]]></category>

		<guid isPermaLink="false">http://www.joinfu.com/?p=343</guid>
		<description><![CDATA[Over the past six weeks or so, I have been working on cleaning up the pluggable storage engine API in Drizzle.  I&#8217;d like to describe some of this work and talk a bit about the next steps I&#8217;m taking in the coming months as we roll towards implementing Log Shipping in Drizzle. First, how did [...]]]></description>
			<content:encoded><![CDATA[<p>Over the past six weeks or so, I have been working on cleaning up the pluggable storage engine API in Drizzle.  I&#8217;d like to describe some of this work and talk a bit about the next steps I&#8217;m taking in the coming months as we roll towards implementing <a href="https://blueprints.launchpad.net/drizzle/+spec/replication-log-shipping">Log Shipping in Drizzle</a>.</p>
<p>First, how did it come about that I started working on the storage engine API?</p>
<h3>From Commands to Transactions</h3>
<p>Well, it really goes back to my work on Drizzle&#8217;s replication system.  I had implemented a simple, fast, and extensible log which stored records of the data changes made to a server.  Originally, the log was called the Command Log, because the Google Protobuffer messages it contained were called <tt><a href="http://www.joinfu.com/2009/08/drizzle-replication-the-command-message/">message::Command</a></tt>s.  The API  for implementing replication plugins was very simple and within a month or so of debuting the API, quite a few replication plugins had been built, including one replicating to Memcached, a prototype one replicating to Gearman, and a filtering replicator plugin.</p>
<p>In addition, <a href="http://developian.blogspot.com">Marcus Eriksson</a> had created the <a href="http://www.rabbitreplication.org/">RabbitReplication</a> project which could replicate from Drizzle to other data stores, including Cassandra and Project Voldemort.  However, Marcus did not actually implement any C/C++ plugins using the Drizzle replication API.  Instead, RabbitReplication simply read the new Command Log, which due to it simply being a file full of Google Protobuffer messages, was quick and easy to read into memory using a variety of different programming languages.  RabbitReplication is written in Java, and it was great to see other programming languages be able to read Drizzle&#8217;s replication log so easily.  Marcus <a href="http://developian.blogspot.com/2010/01/replicating-transactions-directly-to.html" alt="Replicate Drizzle to RabbitMQ">later coded up</a> a C++ TransactionApplier plugin which replaces the Drizzle replication log and instead replicates the GPB messages directly to RabbitMQ.</p>
<p>And there, you&#8217;ll note that one of the plugins involved in Drizzle&#8217;s replication system is called <em>Transaction</em>Applier.  It used to be called CommandApplier. That was because the GPB Command messages were individual row change events for the most part.  However, I made a series of changes to the replication API and now the GPB messages sent through the APIs are of class <tt><a href="http://www.joinfu.com/2009/10/drizzle-replication-changes-in-api-to-support-group-commit/">message::Transaction</a></tt>.  <tt>message::Transaction</tt> objects contain a transaction context, with information about the transaction&#8217;s start and end time, it&#8217;s transaction identifer, along with a series of <tt>message::Statement objects</tt>, each of which representing a part of the data changes that the SQL transaction made.</p>
<p>Thus, the Command Log now turned into the Transaction Log, and everywhere the term Command was used now was replaced with the terms Transaction and Statement (depending on whether you were talking about the entire Transaction or a piece of it).  Log entries were now written at COMMIT to the Transaction Log and were not written if no COMMIT occurred<sup>1</sup>.</p>
<p>After finishing this work to make the transaction log write Transaction messages at commit time, I was keen to begin coding up the publisher and subscriber plugins which represent a node in the replication environment.  However, Brian had asked me to delay working on other replication features and ensure that the replication API could support fully distributed transactions via the X/Open XA distributed transaction protocol.  XA support had been removed from Drizzle when the MySQL binlog and original replication system was ripped out and needed some TLC.  Fair enough, I said.  So, off I went to work on XA.</p>
<h3>If Only It Were Simple&#8230;</h3>
<p>As anyone who has worked on the MySQL source code or developed storage engines for MySQL knows, working with the MySQL pluggable storage engine API is sometimes not the easiest or most straightforward thing.  I think the biggest problem with the MySQL storage engine API is that, due to understandable historical reasons, it&#8217;s an API that was designed with the MyISAM and HEAP storage engines in mind.  Much of the transactional pieces of the API seem to be a bolted-on afterthought and can be very confusing to work with.</p>
<p>As an example, <a href="http://pbxt.blogspot.com/">Paul McCullagh</a>, developer of the transactional storage engine <a href="http://www.primebase.org/">PBXT</a>, recently <a href="http://lists.mysql.com/internals/37662">emailed</a> the mysql internals mailing list asking how the storage engine could tell when a SQL statement started and ended.  You would think that such a seemingly basic functionality would have a simple answer.  You&#8217;d be wrong.  <a href="http://monty-says.blogspot.com">Monty Widenius</a> <a href="http://lists.mysql.com/internals/37675">answered</a> like this:</p>
<blockquote><p>
Why not simply have a counter in your transaction object for how start_stmt &#8211; reset();  When this is 0 then you know stmnt ended.</p>
<p>In Maria we count number of calls to external_lock() and when the sum goes to 0 we know the transaction has ended.
</p></blockquote>
<p>To this, <a href="http://mysqlha.blogspot.com/">Mark Callaghan</a> responded:</p>
<blockquote><p>
Why does the solution need to be so obscure?
</p></blockquote>
<p>Monty <a href="http://lists.mysql.com/internals/37689">answered</a> (emphasis mine):</p>
<blockquote><p>
Historic reasons.</p>
<p>MySQL never kept a count of which handlers are used by a transaction, only which tables.</p>
<p>So the original logic was that external_lock(lock/unlock) is called for each usage of the table, which is normally more than enough information for a handler to know when a statement starts/ends.</p>
<p>The one case this didn&#8217;t work was in the case someone does lock tables as then external_lock is not called per statement. It was to satisfy this case that we added a call to start_stmt() for each table.</p>
<p><strong>It&#8217;s of course possible to change things so that start_stmt() / end_stmt() would be called once per used handler, but this would be yet another overhead for the upper level to do which the current handlers that tracks call to external_lock() doesn&#8217;t need.</strong>
</p></blockquote>
<p>Well, in Drizzle-land, we aren&#8217;t beholden to &#8220;historic reasons&#8221; <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />   So, after looking through the in-need-of-attention transaction processing code in the kernel, I decided that I would clean up the API so that storage engines did not have to jump through hoops to notify the kernel they participate in a transaction or just to figure out when a statement and a transaction started and ended.</p>
<p>The resulting changes to the API are quite dramatic I think, but I&#8217;ll leave it to the storage engine developers to tell me if the changes are good or not.  The following is a summary of the changes to the storage engine API that I committed in the last few weeks.</p>
<h3><tt>plugin::StorageEngine</tt> Split Into Subclasses</h3>
<p>The very first thing I did was to split the enormous base plugin class for a storage engine, <tt>plugin::StorageEngine</tt>, into two other subclasses containing transactional elements.  <tt>plugin::TransactionalStorageEngine</tt> is now the base class for all storage engines which implement SQL transactions:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #ff0000; font-style: italic;">/**
 * A type of storage engine which supports SQL transactions.
 *
 * This class adds the SQL transactional API to the regular
 * storage engine.  In other words, it adds support for the
 * following SQL statements:
 *
 * START TRANSACTION;
 * COMMIT;
 * ROLLBACK;
 * ROLLBACK TO SAVEPOINT;
 * SET SAVEPOINT;
 * RELEASE SAVEPOINT;
 */</span>
<span style="color: #0000ff;">class</span> TransactionalStorageEngine <span style="color: #008080;">:</span><span style="color: #0000ff;">public</span> StorageEngine
<span style="color: #008000;">&#123;</span>
<span style="color: #0000ff;">public</span><span style="color: #008080;">:</span>
  TransactionalStorageEngine<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> std<span style="color: #008080;">::</span><span style="color: #007788;">string</span> name_arg,
                             <span style="color: #0000ff;">const</span> std<span style="color: #008080;">::</span><span style="color: #007788;">bitset</span><span style="color: #000080;">&lt;</span>HTON_BIT_SIZE<span style="color: #000080;">&gt;</span> <span style="color: #000040;">&amp;</span>flags_arg<span style="color: #000080;">=</span> HTON_NO_FLAGS<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
  <span style="color: #0000ff;">virtual</span> ~TransactionalStorageEngine<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
...
<span style="color: #0000ff;">private</span><span style="color: #008080;">:</span>
  <span style="color: #0000ff;">void</span> setTransactionReadWrite<span style="color: #008000;">&#40;</span>Session<span style="color: #000040;">&amp;</span> session<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
  <span style="color: #ff0000; font-style: italic;">/*
   * Indicates to a storage engine the start of a
   * new SQL transaction.  This is called ONLY in the following
   * scenarios:
   *
   * 1) An explicit BEGIN WORK/START TRANSACTION is called
   * 2) After an explicit COMMIT AND CHAIN is called
   * 3) After an explicit ROLLBACK AND RELEASE is called
   * 4) When in AUTOCOMMIT mode and directly before a new
   *    SQL statement is started.
   */</span>
  <span style="color: #0000ff;">virtual</span> <span style="color: #0000ff;">int</span> doStartTransaction<span style="color: #008000;">&#40;</span>Session <span style="color: #000040;">*</span>session, start_transaction_option_t options<span style="color: #008000;">&#41;</span>
  <span style="color: #008000;">&#123;</span>
    <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">void</span><span style="color: #008000;">&#41;</span> session<span style="color: #008080;">;</span>
    <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">void</span><span style="color: #008000;">&#41;</span> options<span style="color: #008080;">;</span>
    <span style="color: #0000ff;">return</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
  <span style="color: #008000;">&#125;</span>
&nbsp;
  <span style="color: #ff0000; font-style: italic;">/**
   * Implementing classes should override these to provide savepoint
   * functionality.
   */</span>
  <span style="color: #0000ff;">virtual</span> <span style="color: #0000ff;">int</span> doSetSavepoint<span style="color: #008000;">&#40;</span>Session <span style="color: #000040;">*</span>session, NamedSavepoint <span style="color: #000040;">&amp;</span>savepoint<span style="color: #008000;">&#41;</span><span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">virtual</span> <span style="color: #0000ff;">int</span> doRollbackToSavepoint<span style="color: #008000;">&#40;</span>Session <span style="color: #000040;">*</span>session, NamedSavepoint <span style="color: #000040;">&amp;</span>savepoint<span style="color: #008000;">&#41;</span><span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">virtual</span> <span style="color: #0000ff;">int</span> doReleaseSavepoint<span style="color: #008000;">&#40;</span>Session <span style="color: #000040;">*</span>session, NamedSavepoint <span style="color: #000040;">&amp;</span>savepoint<span style="color: #008000;">&#41;</span><span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
&nbsp;
  <span style="color: #ff0000; font-style: italic;">/**
   * Commits either the &quot;statement transaction&quot; or the &quot;normal transaction&quot;.
   *
   * @param[in] The Session
   * @param[in] true if it's a real commit, that makes persistent changes
   *            false if it's not in fact a commit but an end of the
   *            statement that is part of the transaction.
   * @note
   *
   * 'normal_transaction' is also false in auto-commit mode where 'end of statement'
   * and 'real commit' mean the same event.
   */</span>
  <span style="color: #0000ff;">virtual</span> <span style="color: #0000ff;">int</span> doCommit<span style="color: #008000;">&#40;</span>Session <span style="color: #000040;">*</span>session, <span style="color: #0000ff;">bool</span> normal_transaction<span style="color: #008000;">&#41;</span><span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
&nbsp;
  <span style="color: #ff0000; font-style: italic;">/**
   * Rolls back either the &quot;statement transaction&quot; or the &quot;normal transaction&quot;.
   *
   * @param[in] The Session
   * @param[in] true if it's a real commit, that makes persistent changes
   *            false if it's not in fact a commit but an end of the
   *            statement that is part of the transaction.
   * @note
   *
   * 'normal_transaction' is also false in auto-commit mode where 'end of statement'
   * and 'real commit' mean the same event.
   */</span>
  <span style="color: #0000ff;">virtual</span> <span style="color: #0000ff;">int</span> doRollback<span style="color: #008000;">&#40;</span>Session <span style="color: #000040;">*</span>session, <span style="color: #0000ff;">bool</span> normal_transaction<span style="color: #008000;">&#41;</span><span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">virtual</span> <span style="color: #0000ff;">int</span> doReleaseTemporaryLatches<span style="color: #008000;">&#40;</span>Session <span style="color: #000040;">*</span>session<span style="color: #008000;">&#41;</span>
  <span style="color: #008000;">&#123;</span>
    <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">void</span><span style="color: #008000;">&#41;</span> session<span style="color: #008080;">;</span>
    <span style="color: #0000ff;">return</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
  <span style="color: #008000;">&#125;</span>
  <span style="color: #0000ff;">virtual</span> <span style="color: #0000ff;">int</span> doStartConsistentSnapshot<span style="color: #008000;">&#40;</span>Session <span style="color: #000040;">*</span>session<span style="color: #008000;">&#41;</span>
  <span style="color: #008000;">&#123;</span>
    <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">void</span><span style="color: #008000;">&#41;</span> session<span style="color: #008080;">;</span>
    <span style="color: #0000ff;">return</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
  <span style="color: #008000;">&#125;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span></pre></div></div>

<p>As you can see, <tt>plugin::TransactionalStorageEngine</tt> inherits from <tt>plugin::StorageEngine</tt> and extends it with a series of private pure virtual methods that implement the SQL transaction parts of a query &mdash; doCommit(), doRollback(), etc.  Implementing classes simply inherit from <tt>plugin::TransactionalStorageEngine</tt> and implement their internal transaction processing in these private methods.</p>
<p>In addition to the SQL transaction, however, is the concept of an XA transaction, which is for distributed transaction coordination.  The <a href="http://en.wikipedia.org/wiki/X/Open_XA">XA protocol</a> is a <a href="http://en.wikipedia.org/wiki/Two-phase_commit">two-phase commit protocol</a> because it implements a PREPARE step before a COMMIT occurs.  This XA API is exposed via two other classes, <tt>plugin::XaResourceManager</tt> and <tt>plugin::XaStorageEngine</tt>.  <tt>plugin::XaResourceManager</tt> derived classes implement the resource manager API of the XA protocol.  <tt>plugin::XaStorageEngine</tt> is a storage engine subclass which, while also implementing SQL transactions, also implements XA transactions.</p>
<p>Here is the <tt>plugin::XaResourceManager</tt> class:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #ff0000; font-style: italic;">/**
 * An abstract interface class which exposes the participation
 * of implementing classes in distributed transactions in the XA protocol.
 */</span>
<span style="color: #0000ff;">class</span> XaResourceManager
<span style="color: #008000;">&#123;</span>
<span style="color: #0000ff;">public</span><span style="color: #008080;">:</span>
  XaResourceManager<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span><span style="color: #008000;">&#125;</span>
  <span style="color: #0000ff;">virtual</span> ~XaResourceManager<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span><span style="color: #008000;">&#125;</span>
...
<span style="color: #0000ff;">private</span><span style="color: #008080;">:</span>
  <span style="color: #ff0000; font-style: italic;">/**
   * Does the COMMIT stage of the two-phase commit.
   */</span>
  <span style="color: #0000ff;">virtual</span> <span style="color: #0000ff;">int</span> doXaCommit<span style="color: #008000;">&#40;</span>Session <span style="color: #000040;">*</span>session, <span style="color: #0000ff;">bool</span> normal_transaction<span style="color: #008000;">&#41;</span><span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
  <span style="color: #ff0000; font-style: italic;">/**
   * Does the ROLLBACK stage of the two-phase commit.
   */</span>
  <span style="color: #0000ff;">virtual</span> <span style="color: #0000ff;">int</span> doXaRollback<span style="color: #008000;">&#40;</span>Session <span style="color: #000040;">*</span>session, <span style="color: #0000ff;">bool</span> normal_transaction<span style="color: #008000;">&#41;</span><span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
  <span style="color: #ff0000; font-style: italic;">/**
   * Does the PREPARE stage of the two-phase commit.
   */</span>
  <span style="color: #0000ff;">virtual</span> <span style="color: #0000ff;">int</span> doXaPrepare<span style="color: #008000;">&#40;</span>Session <span style="color: #000040;">*</span>session, <span style="color: #0000ff;">bool</span> normal_transaction<span style="color: #008000;">&#41;</span><span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
  <span style="color: #ff0000; font-style: italic;">/**
   * Rolls back a transaction identified by a XID.
   */</span>
  <span style="color: #0000ff;">virtual</span> <span style="color: #0000ff;">int</span> doXaRollbackXid<span style="color: #008000;">&#40;</span>XID <span style="color: #000040;">*</span>xid<span style="color: #008000;">&#41;</span><span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
  <span style="color: #ff0000; font-style: italic;">/**
   * Commits a transaction identified by a XID.
   */</span>
  <span style="color: #0000ff;">virtual</span> <span style="color: #0000ff;">int</span> doXaCommitXid<span style="color: #008000;">&#40;</span>XID <span style="color: #000040;">*</span>xid<span style="color: #008000;">&#41;</span><span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
  <span style="color: #ff0000; font-style: italic;">/**
   * Notifies the transaction manager of any transactions
   * which had been marked prepared but not committed at
   * crash time or that have been heurtistically completed
   * by the storage engine.
   *
   * @param[out] Reference to a vector of XIDs to add to
   *
   * @retval
   *  Returns the number of transactions left to recover
   *  for this engine.
   */</span>
  <span style="color: #0000ff;">virtual</span> <span style="color: #0000ff;">int</span> doXaRecover<span style="color: #008000;">&#40;</span>XID <span style="color: #000040;">*</span> append_to, <span style="color: #0000ff;">size_t</span> len<span style="color: #008000;">&#41;</span><span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span></pre></div></div>

<p>and here is the <tt>plugin::XaStorageEngine</tt> class:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #ff0000; font-style: italic;">/**
 * A type of storage engine which supports distributed
 * transactions in the XA protocol.
 */</span>
<span style="color: #0000ff;">class</span> XaStorageEngine <span style="color: #008080;">:</span><span style="color: #0000ff;">public</span> TransactionalStorageEngine,
                       <span style="color: #0000ff;">public</span> XaResourceManager
<span style="color: #008000;">&#123;</span>
<span style="color: #0000ff;">public</span><span style="color: #008080;">:</span>
  XaStorageEngine<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> std<span style="color: #008080;">::</span><span style="color: #007788;">string</span> name_arg,
                  <span style="color: #0000ff;">const</span> std<span style="color: #008080;">::</span><span style="color: #007788;">bitset</span><span style="color: #000080;">&lt;</span>HTON_BIT_SIZE<span style="color: #000080;">&gt;</span> <span style="color: #000040;">&amp;</span>flags_arg<span style="color: #000080;">=</span> HTON_NO_FLAGS<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
  <span style="color: #0000ff;">virtual</span> ~XaStorageEngine<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  ...
<span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span></pre></div></div>

<p>Pretty clear.  A <tt>plugin::XaStorageEngine</tt> inherits from both <tt>plugin::TransactionStorageEngine</tt> and <tt>plugin::XaResourceManager</tt> because it implements both SQL transactions and XA transactions.  The <tt>InnobaseEngine</tt> is a plugin which inherits from <tt>plugin::XaStorageEngine</tt> because InnoDB supports SQL transactions as well as XA.</p>
<h3>Explicit Statement and Transaction Boundaries</h3>
<p>The second major change I made addressed the problem that Mark Callaghan noted in asking why finding out when a statement starts and ends was so obscure.  I added two new methods to <tt>plugin::StorageEngine</tt> called <tt>doStartStatement()</tt> and <tt>doEndStatement()</tt>.  The kernel now explicitly tells storage engines when a SQL statement starts and ends.  This happens before any calls to <tt>Cursor::external_lock()</tt> happen, and there are no exception cases.  In addition, the kernel now always tells transactional storage engines when a new SQL transaction is starting.  It does this via an explicit call to <tt>plugin::TransactionalStorageEngine::doStartTransaction()</tt>.  No exceptions, and yes, even for DDL operations.</p>
<p>What this means is that for a transactional storage engine, it no longer needs to &#8220;count the calls to Cursor::external_lock()&#8221; in order to know when a statement or transaction starts and ends.  For a SQL transaction, this means that there is a clear code call path and there is no need for the storage engine to track whether the session is in AUTOCOMMIT mode or not.  The kernel does all that work for the storage engine.  Imagine a Session executes a single INSERT statement against an InnoDB table while in AUTOCOMMIT mode.  This is what the call path looks like:</p>
<pre>
 drizzled::Statement::Insert::execute()
 |
 -> drizzled::mysql_lock_tables()
    |
    -> drizzled::TransactionServices::registerResourceForTransaction()
       |
       -> drizzled::plugin::TransactionalStorageEngine::startTransaction()
          |
          -> InnobaseEngine::doStartTransaction()
       |
       -> drizzled::plugin::StorageEngine::startStatement()
          |
          -> InnobaseEngine::doStartStatement()
       |
       -> drizzled::plugin::StorageEngine::getCursor()
          |
          -> drizzled::Cursor::write_row()
             |
             -> InnobaseCursor::write_row()
       |
       -> drizzled::TransactionServices::autocommitOrRollback()
          |
          -> drizzled::plugin::TransactionStorageEngine::commit()
             |
             -> InnobaseEngine::doCommit()
</pre>
<p>I think this will come as a welcome change to storage engine developers working with Drizzle.</p>
<h3>No More Need for Engine to Call <tt>trans_register_ha()</tt></h3>
<p>There was an interesting comment in the <a href="http://bazaar.launchpad.net/~mysql/mysql-server/mysql-next-mr/annotate/head%3A/sql/handler.cc">original documentation</a> for the transaction processing code.  It read:</p>
<blockquote><p>
  Roles and responsibilities<br />
  &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;</p>
<p>  The server has no way to know that an engine participates in<br />
  the statement and a transaction has been started<br />
  in it unless the engine says so. Thus, in order to be<br />
  a part of a transaction, the engine must &#8220;register&#8221; itself.<br />
  This is done by invoking trans_register_ha() server call.<br />
  Normally the engine registers itself whenever handler::external_lock()<br />
  is called. trans_register_ha() can be invoked many times: if<br />
  an engine is already registered, the call does nothing.<br />
  In case autocommit is not set, the engine must register itself<br />
  twice &#8212; both in the statement list and in the normal transaction<br />
  list.
</p></blockquote>
<p>That comment, and I&#8217;ve read it dozens of times, always seemed strange to me.  I mean, does the server <em>really not know</em> that an engine participates in a statement or transaction unless the engine tells it?  Of course not.</p>
<p>So, I removed the need for a storage engine to &#8220;register itself&#8221; with the kernel.  Now, the transaction manager inside the Drizzle kernel (implemented in the TransactionServices component) automatically monitors which engines are participating in an SQL transaction and the engine doesn&#8217;t need to do anything to register itself.</p>
<p>In addition, due to the break-up of the <tt>plugin::StorageEngine</tt> class and the XA API into <tt>plugin::XaResourceManager</tt>, Drizzle&#8217;s transaction manager can now coordinate XA transactions from <em>plugins other than storage engines</em>.  Yep, that&#8217;s right.  Any plugin which implements <tt>plugin::XaResourceManager</tt> can participate in an XA transaction and Drizzle will act as the transaction manager.  What&#8217;s the first plugin that will do this?  Drizzle&#8217;s transaction log.  The transaction log isn&#8217;t a storage engine, but it <em>is</em> able to participate in an XA transaction, so it will implement <tt>plugin::XaResourceManager</tt> but not <tt>plugin::StorageEngine</tt>.</p>
<h3>Performance Impact of Code Changes</h3>
<p>So, that &#8220;yet another overhead&#8221; Monty talked about in the quote above?  There wasn&#8217;t any noticeable impact in performance or scalability at all.  So much for optimize-first coding.</p>
<h3>What&#8217;s Next?</h3>
<p>The next thing I&#8217;m working on is removing the notion of the &#8220;statement transaction&#8221;, which is also a historical by-product, this time because of BerkeleyDB.  Gee, I&#8217;ve got a lot of work ahead of me&#8230;</p>
<p><sup>[1]</sup> Actually, there is a way that a transaction that was rolled back can get written to the transaction log.  For bulk operations, the server can cut a Transaction message into multiple segments, and if the SQL transaction is rolled back, a special RollbackStatement message is written to the transaction log.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.joinfu.com/2010/03/recent-work-on-improving-drizzles-storage-engine-api/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>Happiness is a Warm Cloud</title>
		<link>http://www.joinfu.com/2010/03/happiness-is-a-warm-cloud/</link>
		<comments>http://www.joinfu.com/2010/03/happiness-is-a-warm-cloud/#comments</comments>
		<pubDate>Mon, 08 Mar 2010 14:31:34 +0000</pubDate>
		<dc:creator>jaypipes</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[Launchpad]]></category>
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://joinfu.com/2010/03/happiness-is-a-warm-cloud</guid>
		<description><![CDATA[Although a few folks knew about where I and many of the Sun Drizzle team had ended up, we&#8217;ve waited until today to &#8220;officially&#8221; tell folks what&#8217;s up. We &#8212; Monty Taylor, Eric Day, Stewart Smith, Lee Bieber, and myself &#8212; are all now &#8220;Rackers&#8221;, working at Rackspace Cloud. And yep, we&#8217;re still workin&#8217; on [...]]]></description>
			<content:encoded><![CDATA[<p>
Although a few folks knew about where I and many of the Sun Drizzle team had ended up, we&#8217;ve waited until today to &#8220;officially&#8221; tell folks what&#8217;s up.  We &mdash; <a href="http://inaugust.com"  title="Monty Taylor">Monty Taylor</a>, <a href="http://oddments.org"  title="Eric Day">Eric Day</a>, <a href="http://flamingspork.com"  title="Stewart Smith">Stewart Smith</a>, Lee Bieber, and myself &mdash; are all now &#8220;Rackers&#8221;, working at <a href="http://www.rackspacecloud.com/"  title="Rackspace Cloud">Rackspace Cloud</a>.  And yep, we&#8217;re still workin&#8217; on Drizzle.  That&#8217;s the short story.  Read on for the longer one <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />
</p>
<h3>An Interesting Almost 3 Years at MySQL</h3>
<p>
I left my previous position of Community Relations Manager at MySQL to begin working on <a href="http://krow.livejournal.com"  title="Brian Aker">Brian Aker</a>&#8216;s newfangled Drizzle project in October 2008.
</p>
<p>
Many people at MySQL still think that I abandoned MySQL when I did so.  I did not.  I merely had gotten frustrated with the slow pace of change in the MySQL engineering department and its resistance to transparency.  Sure, over the 3 years I was at MySQL, the engineering department opened up a bit, but it was far from the ideal level of transparency I had hoped to inspire when I joined MySQL.
</p>
<p>
For almost 3 years, I had sent numerous emails to the MySQL internal email discussion lists asking the engineering and marketing departments, both headed by Zack Urlocker, to recognize the importance and necessity of major refactoring of the MySQL kernel, and the need to modularize the kernel or risk having more modular databases overtake MySQL as the key web infrastructure database.  The focus was always on the short term; on keeping up with the Jones&#8217; as far as features went, and I railed against this kind of roadmap, instead pushing the idea of breaking up the server into modules that could be blackboxed and developed independently of the kernel.  My ideas were met with mostly kind responses, but nothing ever materialized as far as major refactoring efforts were concerned.
</p>
<p>
I remember Jim Winstead casually responding to one of my emails, <em>&#8220;Congratulations, you&#8217;ve just reinvented Apache 2.0&#8243;</em>.  And, yes, Jim, that was kind of the point&#8230;
</p>
<p>
The MySQL source code base had gotten increasingly unmaintainable over the years, and key engineers were extremely resistant to changing the internals of MySQL and modernizing it.  There were some good reasons for being resistant, and some poor reasons (such as &#8220;this is the way we&#8217;ve always done it&#8221;).  Overall, it&#8217;s tough to question the strategy that Zack, Marten Mickos, and others had regarding the short term gains.  After all, they managed to maneuver MySQL into a winning position that Sun Microsystems thought was worth one billion dollars.  Because of this, it&#8217;s tough to argue with them. <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_neutral.gif' alt=':|' class='wp-smiley' />
</p>
<h3>Working on Drizzle since October 2008 (officially)</h3>
<p>
I&#8217;m not the kind of person which likes to wait for years to see change, and so the Drizzle project interested me because it was not concerned with backwards compatibility with MySQL, it wasn&#8217;t concerned with having a roadmap that was dependent on the whims of a few big customers, and it was very much interested in challenging the assumptions built into a 20 year-old code base.  This is a project I could sink my teeth into.  And I did.
</p>
<p>
Many folks have said that the only reason Drizzle is still around is because Sun continued to pay for a number of engineers to work on Drizzle as &#8220;an experiment of sorts&#8221; and that Drizzle has no customers and therefore nothing to lose and everything to gain.  This was true, no doubt about it.  At Sun CTO Labs, the few of us did have the ability to code on Drizzle without the pressure-cooker of product marketing and sales demands.  <strong><em>We were lucky.</em></strong>
</p>
<h3><strike>4</strike> <strike>6</strike> <strike>9</strike> 10 Months in Purgatory</h3>
<p>
So, around rolls April 2009.  The stock market and worldwide economy had collapsed and recession was in the air.  There&#8217;s one thing that is absolutely certain in recession economies: <em>companies that have poor leadership and direction and <a href="http://www.siliconbeat.com/2008/10/22/patience-of-suns-largest-shareholders-seems-to-be-wearing-thin/" >are beholden to the interests of a large stockholder</a> will seek an end to their misery through acquisition by a larger, stronger firm</em>.
</p>
<p>
And Sun Microsystems was no different.  JAVA stock plummeted to two dollars a share, and Jonathan Schwartz and the Sun board began shopping Sun around to the highest bidder.  IBM was courted along with other tech giants.  So was Oracle.
</p>
<p>
And it was with a bit of a hangover that I awoke at the MySQL conference in April 2009 to the news that Oracle had purchased Sun Microsystems.  Joy.  We&#8217;d just gone through 14 months of ongoing integration with Sun Microsystems and now it was going to start all over again.
</p>
<p>
Anyone who follows <a href="http://planetmysql.org"  title="Planet MySQL">PlanetMySQL</a> knows about the ensuing battle in the European Commission&#8217;s court regarding monopoly of Oracle in the database market with its acquisition of MySQL.  Monty Widenius, Eben Moglen, even Richard Stallman, weighed in on the pros and cons of Oracle&#8217;s impending control over MySQL.
</p>
<p>
All the while, us Sun Microsystems employees had to hold our tongues and try to keep our jobs as Sun laid off thousands more workers while the EC battle ensued.  Not fun.  It was the employment equivalent of purgatory.  And the time just dragged on, with many employees, including myself and the Sun Drizzle team, not having a clue as to what would happen to us.  Management was completely silent about future plans.  Oracle made zero attempts to outline its future strategy regarding software, and thus most software employees simply kept on doing their work not knowing if the pink slip was arriving tomorrow or not.  Lots of fun that was.
</p>
<h3>Oracle Doesn&#8217;t Need Our Services &mdash; Larry Don&#8217;t Need No Stinkin&#8217; Cloud</h3>
<p>
The acquisition finally closed and very shortly afterwards, I got a call from my boss, Lee Bieber, that Oracle wouldn&#8217;t be needing our services.  Monty, Eric, and Stewart had already resigned; none of them had any desire to work for Oracle.  Lee and I had decided to see what they had in mind for us.  Apparently, not much.
</p>
<p>
Larry Ellison has gone on record that the whole &#8220;cloud thing&#8221; is faddish.  I don&#8217;t know whether Larry understands that cloud computing and infrastructure-as-a-service, platform-as-a-service, and database-as-a-service will eventually put his beloved Oracle cash cow in its place or not.  I don&#8217;t know whether Oracle is planning on embracing the cloud environments which will continue to eat up the market share of more traditional in-house environments upon which their revenue streams depend.  I really don&#8217;t.
</p>
<p>
But what I <em>do</em> know is that Rackspace is betting that providing these services is what the future of technology will be about.
</p>
<h3>Happiness is a Warm Cloud</h3>
<p>
Our team has landed at Rackspace Cloud.  I&#8217;ve now been down to San Antonio twice to meet with key individuals with whom we&#8217;ll be working closely.  Rackspace is not shy about why the wanted to acquire our team.  They see Drizzle as a database that will provide them an infrastructure piece that will be modular and scalable enough to meet the needs of their very diverse Cloud customers, of which there are many tens of thousands.
</p>
<p>
Rackspace recognizes that the pain points they feel with traditional MySQL cannot be solved with simple hacks and workarounds, and that to service the needs of so many customers, they will need a database server that thinks of itself as a friendly piece of their infrastructure and not the driver of its applications.  Drizzle&#8217;s core principles of flexibility and focus on scalability align with the goals Rackspace Cloud has for its platform&#8217;s future.
</p>
<p>
Rackspace is also heavily invested in <a href="http://incubator.apache.org/cassandra/"  title="Cassandra Project">Cassandra</a>, and sees integration of Drizzle and Cassandra as being a key way to add value to its platforms and therefore for its customers.
</p>
<p>
Rackspace is all about the customers, and this is a really cool thing to experience.  It&#8217;s typical for companies to claim they are all about the customer &mdash; in fact, every company I&#8217;ve ever worked for has claimed this.  Rackspace is the first company I&#8217;ve worked for where you actually feel this spirit, though.  You can see the fanaticism of Rackers and how they view what they do always in terms of service to the customer.  It&#8217;s infectious, and I&#8217;m pretty psyched to be on their team.
</p>
<p>
Anyway, that&#8217;s my story and I&#8217;m stickin&#8217; to it.  See y&#8217;all on the nets.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.joinfu.com/2010/03/happiness-is-a-warm-cloud/feed/</wfw:commentRss>
		<slash:comments>34</slash:comments>
		</item>
		<item>
		<title>Sneak Peek &#8211; Drizzle Transaction Log and INFORMATION_SCHEMA</title>
		<link>http://www.joinfu.com/2009/11/sneak-peek-drizzle-transaction-log-and-information-schema/</link>
		<comments>http://www.joinfu.com/2009/11/sneak-peek-drizzle-transaction-log-and-information-schema/#comments</comments>
		<pubDate>Tue, 10 Nov 2009 20:30:00 +0000</pubDate>
		<dc:creator>jaypipes</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://joinfu.com/2009/11/sneak-peek-drizzle-transaction-log-and-information-schema</guid>
		<description><![CDATA[I&#8217;ve been coding up a storm in the last couple days and have just about completed coding on three new INFORMATION_SCHEMA views which allow anyone to query the new Drizzle transaction log for information about its contents. I&#8217;ve also finished a new UDF for Drizzle called PRINT_TRANSACTION_MESSAGE() that prints out the Transaction message&#8216;s contents in [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been coding up a storm in the last couple days and have just about completed coding on three new INFORMATION_SCHEMA views which allow anyone to query the new Drizzle transaction log for information about its contents.  I&#8217;ve also finished a new UDF for Drizzle called PRINT_TRANSACTION_MESSAGE() that prints out the <a title="Drizzle transaction message" href="http://www.joinfu.com/2009/10/drizzle-replication-changes-in-api-to-support-group-commit/">Transaction message</a>&#8216;s contents in a easy-to-read format.</p>
<p>I don&#8217;t have time for a full walk-through blog entry about it, so I&#8217;ll just paste some output below and let y&#8217;all take a looksie.  A later blog entry will feature lots of source code explaining how you, too, can easily add INFORMATION_SCHEMA views to your Drizzle plugins.</p>
<p>Below is the results of the following sequence of actions:</p>
<ul>
<li>Start up a Drizzle server with the transaction log enabled, checksumming enabled, and the default replicator enabled.</li>
<li>Open a Drizzle client</li>
<li>Create a sample table, insert some data into it, do an update to that table, then drop the table</li>
<li>Query the INFORMATION_SCHEMA views and take a look at the transaction messages and information the transaction log now contains</li>
</ul>
<p>Enjoy! <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<pre>
jpipes@serialcoder:~/repos/drizzle/replication-group-commit/tests$ ./dtr --mysqld="--default-replicator-enable"\
 --mysqld="--transaction-log-enable"\
 --mysqld="--transaction-log-enable-checksum"\
 --start-and-exit
...
Servers started, exiting
jpipes@serialcoder:~/repos/drizzle/replication-group-commit/tests$ ../client/drizzle --port=9306
Welcome to the Drizzle client..  Commands end with ; or \g.
Your Drizzle connection id is 2
Server version: 2009.11.1181 Source distribution (replication-group-commit)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

drizzle> use test
Database changed
drizzle> CREATE TABLE t1 (   id INT NOT NULL PRIMARY KEY , padding VARCHAR(200) NOT NULL );
Query OK, 0 rows affected (0.01 sec)

drizzle> INSERT INTO t1 VALUES (1, "I love testing.");
Query OK, 1 row affected (0.01 sec)

drizzle> INSERT INTO t1 VALUES (2, "I hate testing.");
Query OK, 1 row affected (0.01 sec)

drizzle> UPDATE t1 SET padding="I love it when a plan comes together" WHERE id = 2;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

drizzle> DROP TABLE t1;
Query OK, 0 rows affected (0.17 sec)

drizzle> SELECT * FROM INFORMATION_SCHEMA.TRANSACTION_LOG\G
*************************** 1. row ***************************
         FILE_NAME: transaction.log
       FILE_LENGTH: 639
   NUM_LOG_ENTRIES: 5
  NUM_TRANSACTIONS: 5
MIN_TRANSACTION_ID: 0
MAX_TRANSACTION_ID: 9
 MIN_END_TIMESTAMP: 1257888458463696
 MAX_END_TIMESTAMP: 1257888473929116
1 row in set (0 sec)

drizzle> SELECT * FROM INFORMATION_SCHEMA.TRANSACTION_LOG_ENTRIES;
+--------------+-------------+--------------+
| ENTRY_OFFSET | ENTRY_TYPE  | ENTRY_LENGTH |
+--------------+-------------+--------------+
|            0 | TRANSACTION |          141 |
|          141 | TRANSACTION |          121 |
|          262 | TRANSACTION |          121 |
|          383 | TRANSACTION |          181 |
|          564 | TRANSACTION |           75 |
+--------------+-------------+--------------+
5 rows in set (0 sec)

drizzle> SELECT * FROM INFORMATION_SCHEMA.TRANSACTION_LOG_TRANSACTIONS;
+--------------+----------------+-----------+------------------+------------------+----------------+------------+
| ENTRY_OFFSET | TRANSACTION_ID | SERVER_ID | START_TIMESTAMP  | END_TIMESTAMP    | NUM_STATEMENTS | CHECKSUM   |
+--------------+----------------+-----------+------------------+------------------+----------------+------------+
|            0 |              0 |         1 | 1257888458463668 | 1257888458463696 |              1 | 3275955647 |
|          141 |              7 |         1 | 1257888462222183 | 1257888462226990 |              1 |  407829420 |
|          262 |              8 |         1 | 1257888465371330 | 1257888465378423 |              1 | 4073072174 |
|          383 |              9 |         1 | 1257888470209443 | 1257888470215165 |              1 |   92884681 |
|          564 |              9 |         1 | 1257888473929111 | 1257888473929116 |              1 | 2850269133 |
+--------------+----------------+-----------+------------------+------------------+----------------+------------+
5 rows in set (0 sec)

drizzle> SELECT PRINT_TRANSACTION_MESSAGE("transaction.log", ENTRY_OFFSET) as trx
       > FROM INFORMATION_SCHEMA.TRANSACTION_LOG_ENTRIES\G
*************************** 1. row ***************************
trx: transaction_context {
  server_id: 1
  transaction_id: 0
  start_timestamp: 1257888458463668
  end_timestamp: 1257888458463696
}
statement {
  type: RAW_SQL
  start_timestamp: 1257888458463676
  end_timestamp: 1257888458463694
  sql: "CREATE TABLE t1 (   id INT NOT NULL PRIMARY KEY , padding VARCHAR(200) NOT NULL )"
}

*************************** 2. row ***************************
trx: transaction_context {
  server_id: 1
  transaction_id: 7
  start_timestamp: 1257888462222183
  end_timestamp: 1257888462226990
}
statement {
  type: INSERT
  start_timestamp: 1257888462222185
  end_timestamp: 1257888462226989
  insert_header {
    table_metadata {
      schema_name: "test"
      table_name: "t1"
    }
    field_metadata {
      type: INTEGER
      name: "id"
    }
    field_metadata {
      type: VARCHAR
      name: "padding"
    }
  }
  insert_data {
    segment_id: 1
    end_segment: true
    record {
      insert_value: "1"
      insert_value: "I love testing."
    }
  }
}

*************************** 3. row ***************************
trx: transaction_context {
  server_id: 1
  transaction_id: 8
  start_timestamp: 1257888465371330
  end_timestamp: 1257888465378423
}
statement {
  type: INSERT
  start_timestamp: 1257888465371332
  end_timestamp: 1257888465378422
  insert_header {
    table_metadata {
      schema_name: "test"
      table_name: "t1"
    }
    field_metadata {
      type: INTEGER
      name: "id"
    }
    field_metadata {
      type: VARCHAR
      name: "padding"
    }
  }
  insert_data {
    segment_id: 1
    end_segment: true
    record {
      insert_value: "2"
      insert_value: "I hate testing."
    }
  }
}

*************************** 4. row ***************************
trx: transaction_context {
  server_id: 1
  transaction_id: 9
  start_timestamp: 1257888470209443
  end_timestamp: 1257888470215165
}
statement {
  type: UPDATE
  start_timestamp: 1257888470209446
  end_timestamp: 1257888470215163
  update_header {
    table_metadata {
      schema_name: "test"
      table_name: "t1"
    }
    key_field_metadata {
      type: INTEGER
      name: "id"
    }
    set_field_metadata {
      type: VARCHAR
      name: "padding"
    }
  }
  update_data {
    segment_id: 1
    end_segment: true
    record {
      key_value: "2"
      key_value: "I love it when a plan comes together"
      after_value: "I love it when a plan comes together"
    }
  }
}

*************************** 5. row ***************************
trx: transaction_context {
  server_id: 1
  transaction_id: 9
  start_timestamp: 1257888473929111
  end_timestamp: 1257888473929116
}
statement {
  type: RAW_SQL
  start_timestamp: 1257888473929113
  end_timestamp: 1257888473929115
  sql: "DROP TABLE `t1`"
}

5 rows in set (0.06 sec)
</pre>
<p>FYI, if you look closely, you&#8217;ll see some odd things — namely that there is a transaction with an ID of zero.  I&#8217;m aware of this and am working on fixing it <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />   Like I said, I&#8217;m <em>almost</em> done coding&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.joinfu.com/2009/11/sneak-peek-drizzle-transaction-log-and-information-schema/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Great Escape</title>
		<link>http://www.joinfu.com/2009/11/the-great-escape/</link>
		<comments>http://www.joinfu.com/2009/11/the-great-escape/#comments</comments>
		<pubDate>Wed, 04 Nov 2009 14:11:00 +0000</pubDate>
		<dc:creator>jaypipes</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://joinfu.com/2009/11/the-great-escape</guid>
		<description><![CDATA[This week, I am working on putting together test cases which validate the Drizzle transaction log&#8216;s handling of BLOB columns. I ran into an interesting set of problems and am wondering how to go about handling them. Perhaps the LazyWeb will have some solutions. The problem, in short, is inconsistency in the way that the [...]]]></description>
			<content:encoded><![CDATA[<p>
This week, I am working on putting together test cases which validate the <a href="http://www.joinfu.com/index.php?/archives/299-Drizzle-Replication-The-Transaction-Log.html"  title="Drizzle transaction log">Drizzle transaction log</a>&#8216;s handling of BLOB columns.
</p>
<p>
I ran into an interesting set of problems and am wondering how to go about handling them.  Perhaps the LazyWeb will have some solutions. <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />
</p>
<p>
The problem, in short, is inconsistency in the way that the NUL character is escaped (or not escaped) in both the MySQL/Drizzle protocol and the MySQL/Drizzle client tools.  And, by client tools, I mean both everyone&#8217;s favourite little mysql command-line client, but also the mysqltest client, which provides infrastructure and runtime services for the MySQL and Drizzle test suites.
</p>
<p>
Even within the server and client protocol, there appears to be some inconsistency in how and when things are escaped.  Take a look at this interesting output from the drizzle client program (FYI, output is identical for mysql client, I checked&#8230;)
</p>
<pre>
drizzle> select 'test\0me';
+---------+
| test    |
+---------+
| test me |
+---------+
1 row in set (0 sec)
</pre>
<p>
You&#8217;ll notice that in the first <tt>SELECT</tt> statement, the column header is cut off &mdash; i.e. the column header is not escaping the <tt>\0</tt> NUL character in the string <tt>'test\0me'</tt>.  However, the result data <strong><em>does not</em></strong> truncate the string but <em>replaces</em> the NUL character with a space character.  So, I came to the conclusion that the drizzle client does not escape column headers but does do some sort of escaping for the result data. Given this conclusion, you will understand my raised eyebrow when the following <tt>SELECT</tt> statement was displayed:
</p>
<pre>
drizzle> select 'test\0me' = 'test me';
+------------------------+
| 'test\0me' = 'test me' |
+------------------------+
|                      0 |
+------------------------+
1 row in set (0 sec)
</pre>
<p>
Hmmm&#8230;so maybe column headers <em>are</em> being escaped by the MySQL/Drizzle client?  Clearly, the NUL character was escaped as the characters &#8216;\\&#8217; followed by the character &#8217;0&#8242; in the column header above.  Indeed, quite puzzling.
</p>
<p>
OK, so the above anomaly needs to be investigated.  However, a similar issue exists for the mysqltest/drizzletest client program.  To see the problem, check the following out.  I create a simple test case with the following in it:
</p>
<pre>
--disable_warnings
DROP TABLE IF EXISTS t1;
--enable_warnings

SELECT 'test\0me';

CREATE TABLE t1 (fld BLOB NULL);
INSERT INTO t1 VALUES ('test\0me');
SELECT COUNT(*) FROM t1;
DROP TABLE t1;
</pre>
<p>
Now, what you <strong>would expect to see</strong> for the output of the above &mdash; at least if you expect results similar to the MySQL/Drizzle client output &mdash; is the following:
</p>
<pre>
DROP TABLE IF EXISTS t1;
SELECT 'test\0me';
test
test me
CREATE TABLE t1 (fld BLOB NULL);
INSERT INTO t1 VALUES ('test\0me');
SELECT COUNT(*) FROM t1;
COUNT(*)
1
DROP TABLE t1;
</pre>
<p>
That is what you would <em>expect</em> to see in the output of course&#8230; Here is what you <em>actually</em> get in the output:
</p>
<pre>
DROP TABLE IF EXISTS t1;
SELECT 'test\0me';
test
test
</pre>
<p>
So, the mysqltest/drizzletest client apparently does not escape the NUL character for the <strong>result data</strong> at all.  It looks like it does do some escaping/replacing for the NUL character in the column header, though, otherwise the second &#8220;test&#8221; line would not appear.  This leads to the result file being essentially truncated as soon as a NUL character is included in any output to the mysqltest/drizzletest client.  This essentially makes the mysqltest/drizzletest client useless for testing and validating BLOB data.
</p>
<h2>Possible Solutions?</h2>
<p>
I think the cleanest solution would be to create a shared library of code that would be responsible for uniformly and consistently escaping data, and then linking the various clients (and server) with this library and removing all of the various escaping functions currently in the server.  This would, of course, take some time, but would be the most future proof solution.  Anyone else have ideas on solving the problem of being able to test and validate binary data via the test suite?  Cheers!
</p></p>
]]></content:encoded>
			<wfw:commentRss>http://www.joinfu.com/2009/11/the-great-escape/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Drizzle Replication &#8211; The Transaction Log</title>
		<link>http://www.joinfu.com/2009/10/drizzle-replication-the-transaction-log/</link>
		<comments>http://www.joinfu.com/2009/10/drizzle-replication-the-transaction-log/#comments</comments>
		<pubDate>Tue, 27 Oct 2009 16:08:00 +0000</pubDate>
		<dc:creator>jaypipes</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://joinfu.com/2009/10/drizzle-replication-the-transaction-log</guid>
		<description><![CDATA[In this installment of my Drizzle Replication blog series, I&#8217;ll be talking about the Transaction Log. Before reading this entry, you may want to first read up on the Transaction Message, which is a central concept to this blog entry. The transaction log is just one component of Drizzle&#8217;s default replication services, but it also [...]]]></description>
			<content:encoded><![CDATA[<p>In this installment of my Drizzle Replication blog series, I&#8217;ll be talking about <strong>the Transaction Log</strong>.  Before reading this entry, you may want to first read up on the <a title="Drizzle Replication - The Transaction Message" href="http://www.joinfu.com/2009/10/drizzle-replication-changes-in-api-to-support-group-commit/">Transaction Message</a>, which is a central concept to this blog entry.</p>
<p>The transaction log is just one component of Drizzle&#8217;s default replication services, but it also serves as a generalized log of atomic data changes to a particular server.  In this way, it is only partially related to replication.  The transaction log is used by components of the replication services to store changes made to a server&#8217;s data.  However, there is nothing that mandates that this particular transaction log be a required feature for Drizzle replication systems.  For instance, Eric Lambert is currently working on a Gearman-based replication service which, while following the same APIs, does not require the transaction log to function.  Furthermore, other, non-replication-related modules may use the transaction log themselves.  For instance, a future Recovery and/or Backup module may just as easily use the transaction log for its own purposes as well.</p>
<p>Before we get into the details, it&#8217;s worth noting the general goals we&#8217;ve had for the transaction log, as these goals may help explain some of the design choices made.  In short, the goals for the transaction log are:</p>
<ul>
<li>Introduce <strong>no global contention points (mutexes/locks)</strong></li>
<li>Once written, the transaction log <strong>may not be modified</strong></li>
<li>The transaction log should be <strong>easily readable in multiple programming languages</strong></li>
</ul>
<h2>Overview of the Transaction Log Structure</h2>
<p><img style="float: left; margin: 0px 100px 0px 0px;" src="http://joinfu.com/img/transaction_log_format.png" alt="" /><br />
<img style="float: left; margin: 0px 100px 0px 0px;" src="http://joinfu.com/img/transaction_log_entry_format.png" alt="" /><br />
<img style="float: left; margin: 0px 100px 20px 0px;" src="http://joinfu.com/img/transaction_message_entry_format.png" alt="" /><br />
The format of the transaction log is simple and straightforward.  It is a single file that contains log entries, one after another.  These log entries have a type associated with them.  Currently, there are only two types of entries that can go in the transaction log: a Transaction message entry and a BLOB entry.  We will only cover the Transaction message entry in this article, as I&#8217;ll leave how to deal with BLOBs for a separate article entirely.</p>
<p>Each entry in the transaction log is preceded by a 4 bytes containing an integer code identifying the type of entry to follow.  The bytes which follow this type header are interpreted based on the type of entry.  For entries of type Transaction message, the graphics here show the layout of the entry in the log.  First, a 4 byte length header is written, then the serialized Transaction message, then a 4 byte checksum of the serialized Transaction message.</p>
<p><br style="clear: both;" /></p>
<h2>Details of the <tt>TransactionLog::apply()</tt> Method</h2>
<p>For those interested in how the transaction log is written to, I&#8217;m going to detail the <tt>apply()</tt> method of the <tt>TransactionLog</tt> class in <tt>/plugin/transaction_log/transaction_log.cc</tt>.  The <tt>TransactionLog</tt> class is simply a subclass of <tt>plugin::TransactionApplier</tt> and therefore must implement the single pure virtual <tt>apply</tt> method of that class interface.</p>
<p>The TransactionLog class has a private <tt>drizzled::atomic&lt;off_t&gt;</tt> called <tt>log_offset</tt> which is an offset into the transaction log file that is incremented with each atomic write to the log file.  You will notice in the code below that this atomic off_t is stored locally, then incremented by the total length of the log entry to be written.  A buffer is then written to the log file using <tt><a title="pwrite POSIX call" href="http://opengroup.org/onlinepubs/007908799/xsh/pwrite.html">pwrite()</a></tt> at the original offset.  In this way, we completely avoid calling <tt><a title="pthread_mutex_lock POSIX call" href="http://opengroup.org/onlinepubs/007908775/xsh/pthread_mutex_lock.html">pthread_mutex_lock()</a></tt> or similar when writing to the log file, which should increase scalability of the transaction log.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">void</span> TransactionLog<span style="color: #008080;">::</span><span style="color: #007788;">apply</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> message<span style="color: #008080;">::</span><span style="color: #007788;">Transaction</span> <span style="color: #000040;">&amp;</span>to_apply<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
  <span style="color: #0000ff;">uint8_t</span> <span style="color: #000040;">*</span>buffer<span style="color: #008080;">;</span> <span style="color: #ff0000; font-style: italic;">/* Buffer we will write serialized header, message and trailing checksum to */</span>
  <span style="color: #0000ff;">uint8_t</span> <span style="color: #000040;">*</span>orig_buffer<span style="color: #008080;">;</span>
&nbsp;
  <span style="color: #0000ff;">int</span> error_code<span style="color: #008080;">;</span>
  <span style="color: #0000ff;">size_t</span> message_byte_length<span style="color: #000080;">=</span> to_apply.<span style="color: #007788;">ByteSize</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  ssize_t written<span style="color: #008080;">;</span>
  off_t cur_offset<span style="color: #008080;">;</span>
  <span style="color: #0000ff;">size_t</span> total_envelope_length<span style="color: #000080;">=</span> HEADER_TRAILER_BYTES <span style="color: #000040;">+</span> message_byte_length<span style="color: #008080;">;</span>
&nbsp;
  <span style="color: #ff0000; font-style: italic;">/*
   * Attempt allocation of raw memory buffer for the header,
   * message and trailing checksum bytes.
   */</span>
  buffer<span style="color: #000080;">=</span> <span style="color: #0000ff;">static_cast</span><span style="color: #000080;">&lt;</span><span style="color: #0000ff;">uint8_t</span> <span style="color: #000040;">*</span><span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span><span style="color: #0000dd;">malloc</span><span style="color: #008000;">&#40;</span>total_envelope_length<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>buffer <span style="color: #000080;">==</span> <span style="color: #0000ff;">NULL</span><span style="color: #008000;">&#41;</span>
  <span style="color: #008000;">&#123;</span>
    errmsg_printf<span style="color: #008000;">&#40;</span>ERRMSG_LVL_ERROR,
      _<span style="color: #008000;">&#40;</span><span style="color: #FF0000;">&quot;Failed to allocate enough memory to buffer header, transaction message, &quot;</span>
        <span style="color: #FF0000;">&quot;and trailing checksum bytes. Tried to allocate %&quot;</span> PRId64
        <span style="color: #FF0000;">&quot; bytes.  Error: %s<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #008000;">&#41;</span>,
    <span style="color: #0000ff;">static_cast</span><span style="color: #000080;">&lt;</span><span style="color: #0000ff;">uint64_t</span><span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span>total_envelope_length<span style="color: #008000;">&#41;</span>,
    <span style="color: #0000dd;">strerror</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">errno</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    state<span style="color: #000080;">=</span> CRASHED<span style="color: #008080;">;</span>
    deactivate<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">return</span><span style="color: #008080;">;</span>
  <span style="color: #008000;">&#125;</span>
  <span style="color: #0000ff;">else</span>
    orig_buffer<span style="color: #000080;">=</span> buffer<span style="color: #008080;">;</span> <span style="color: #ff0000; font-style: italic;">/* We will free() orig_buffer, as buffer is moved during write */</span>
&nbsp;
  <span style="color: #ff0000; font-style: italic;">/*
   * Do an atomic increment on the offset of the log file position
   */</span>
  cur_offset<span style="color: #000080;">=</span> log_offset.<span style="color: #007788;">fetch_and_add</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">static_cast</span><span style="color: #000080;">&lt;</span>off_t<span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span>total_envelope_length<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
  <span style="color: #ff0000; font-style: italic;">/*
   * We adjust cur_offset back to the original log_offset before
   * the increment above…
   */</span>
 cur_offset<span style="color: #000040;">-</span><span style="color: #000080;">=</span> <span style="color: #0000ff;">static_cast</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#40;</span>total_envelope_length<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
  <span style="color: #ff0000; font-style: italic;">/*
   * Write the header information, which is the message type and
   * the length of the transaction message into the buffer
   */</span>
  buffer<span style="color: #000080;">=</span> protobuf<span style="color: #008080;">::</span><span style="color: #007788;">io</span><span style="color: #008080;">::</span><span style="color: #007788;">CodedOutputStream</span><span style="color: #008080;">::</span><span style="color: #007788;">WriteLittleEndian32ToArray</span><span style="color: #008000;">&#40;</span>
    <span style="color: #0000ff;">static_cast</span><span style="color: #000080;">&lt;</span><span style="color: #0000ff;">uint32_t</span><span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span>ReplicationServices<span style="color: #008080;">::</span><span style="color: #007788;">TRANSACTION</span><span style="color: #008000;">&#41;</span>, buffer<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  buffer<span style="color: #000080;">=</span> protobuf<span style="color: #008080;">::</span><span style="color: #007788;">io</span><span style="color: #008080;">::</span><span style="color: #007788;">CodedOutputStream</span><span style="color: #008080;">::</span><span style="color: #007788;">WriteLittleEndian32ToArray</span><span style="color: #008000;">&#40;</span>
    <span style="color: #0000ff;">static_cast</span><span style="color: #000080;">&lt;</span><span style="color: #0000ff;">uint32_t</span><span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span>message_byte_length<span style="color: #008000;">&#41;</span>, buffer<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
  <span style="color: #ff0000; font-style: italic;">/*
   * Now write the serialized transaction message, followed
   * by the optional checksum into the buffer.
   */</span>
  buffer<span style="color: #000080;">=</span> to_apply.<span style="color: #007788;">SerializeWithCachedSizesToArray</span><span style="color: #008000;">&#40;</span>buffer<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">uint32_t</span> checksum<span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>do_checksum<span style="color: #008000;">&#41;</span>
  <span style="color: #008000;">&#123;</span>
    checksum<span style="color: #000080;">=</span> drizzled<span style="color: #008080;">::</span><span style="color: #007788;">hash</span><span style="color: #008080;">::</span><span style="color: #007788;">crc32</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">reinterpret_cast</span><span style="color: #000080;">&lt;</span><span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span><span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span>buffer<span style="color: #008000;">&#41;</span> – 
                     message_byte_length, message_byte_length<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  <span style="color: #008000;">&#125;</span>
&nbsp;
  <span style="color: #ff0000; font-style: italic;">/* We always write in network byte order */</span>
  buffer<span style="color: #000080;">=</span> protobuf<span style="color: #008080;">::</span><span style="color: #007788;">io</span><span style="color: #008080;">::</span><span style="color: #007788;">CodedOutputStream</span><span style="color: #008080;">::</span><span style="color: #007788;">WriteLittleEndian32ToArray</span><span style="color: #008000;">&#40;</span>checksum, buffer<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  <span style="color: #ff0000; font-style: italic;">/*
   * Quick safety…if an error occurs above in another writer, the log
   * file will be in a crashed state.
   */</span>
  <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>unlikely<span style="color: #008000;">&#40;</span>state <span style="color: #000080;">==</span> CRASHED<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>
  <span style="color: #008000;">&#123;</span>
    <span style="color: #ff0000; font-style: italic;">/*
     * Reset the log’s offset in case we want to produce a decent error message including
     * the original offset where an error occurred.
     */</span>
    log_offset<span style="color: #000080;">=</span> cur_offset<span style="color: #008080;">;</span>
    <span style="color: #0000dd;">free</span><span style="color: #008000;">&#40;</span>orig_buffer<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">return</span><span style="color: #008080;">;</span>
  <span style="color: #008000;">&#125;</span>
&nbsp;
  <span style="color: #ff0000; font-style: italic;">/* Write the full buffer in one swoop */</span>
  <span style="color: #0000ff;">do</span>
  <span style="color: #008000;">&#123;</span>
    written<span style="color: #000080;">=</span> pwrite<span style="color: #008000;">&#40;</span>log_file, orig_buffer, total_envelope_length, cur_offset<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  <span style="color: #008000;">&#125;</span>
  <span style="color: #0000ff;">while</span> <span style="color: #008000;">&#40;</span>written <span style="color: #000080;">==</span> <span style="color: #000040;">-</span><span style="color: #0000dd;">1</span> <span style="color: #000040;">&amp;&amp;</span> <span style="color: #0000ff;">errno</span> <span style="color: #000080;">==</span> EINTR<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> <span style="color: #ff0000; font-style: italic;">/* Just retry the write when interrupted by a signal… */</span>
&nbsp;
  <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>unlikely<span style="color: #008000;">&#40;</span>written <span style="color: #000040;">!</span><span style="color: #000080;">=</span> <span style="color: #0000ff;">static_cast</span><span style="color: #000080;">&lt;</span>ssize_t<span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span>total_envelope_length<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>
  <span style="color: #008000;">&#123;</span>
    errmsg_printf<span style="color: #008000;">&#40;</span>ERRMSG_LVL_ERROR,
     _<span style="color: #008000;">&#40;</span>“Failed to write full size of transaction.  <span style="color: #007788;">Tried</span> to write <span style="color: #000040;">%</span>” PRId64
        ” bytes at offset <span style="color: #000040;">%</span>” PRId64 “, but only wrote <span style="color: #000040;">%</span>” PRId32 ” bytes.  <span style="color: #007788;">Error</span><span style="color: #008080;">:</span> <span style="color: #000040;">%</span>s\n“<span style="color: #008000;">&#41;</span>,
        <span style="color: #0000ff;">static_cast</span><span style="color: #000080;">&lt;</span><span style="color: #0000ff;">uint64_t</span><span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span>total_envelope_length<span style="color: #008000;">&#41;</span>,
        <span style="color: #0000ff;">static_cast</span><span style="color: #000080;">&lt;</span><span style="color: #0000ff;">uint64_t</span><span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span>cur_offset<span style="color: #008000;">&#41;</span>,
        <span style="color: #0000ff;">static_cast</span><span style="color: #000080;">&lt;</span><span style="color: #0000ff;">uint64_t</span><span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span>written<span style="color: #008000;">&#41;</span>,
        <span style="color: #0000dd;">strerror</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">errno</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
      state<span style="color: #000080;">=</span> CRASHED<span style="color: #008080;">;</span>
      <span style="color: #ff0000; font-style: italic;">/*
       * Reset the log’s offset in case we want to produce a decent error message including
       * the original offset where an error occurred.
       */</span>
      log_offset<span style="color: #000080;">=</span> cur_offset<span style="color: #008080;">;</span>
      deactivate<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  <span style="color: #008000;">&#125;</span>
  <span style="color: #0000dd;">free</span><span style="color: #008000;">&#40;</span>orig_buffer<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  error_code<span style="color: #000080;">=</span> my_sync<span style="color: #008000;">&#40;</span>log_file, <span style="color: #0000dd;">0</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>unlikely<span style="color: #008000;">&#40;</span>error_code <span style="color: #000040;">!</span><span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>
  <span style="color: #008000;">&#123;</span>
    errmsg_printf<span style="color: #008000;">&#40;</span>ERRMSG_LVL_ERROR,
      _<span style="color: #008000;">&#40;</span>“Failed to sync <span style="color: #0000dd;">log</span> file. <span style="color: #007788;">Got</span> error<span style="color: #008080;">:</span> <span style="color: #000040;">%</span>s\n“<span style="color: #008000;">&#41;</span>,
      <span style="color: #0000dd;">strerror</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">errno</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  <span style="color: #008000;">&#125;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<h2>Reading the Transaction Log</h2>
<p>OK, so the above code shows how the transaction log is written.  What about reading the log file?  Well, it&#8217;s pretty simple.  There is an example program in <tt>/drizzle/message/transaction_reader.cc</tt> which has code showing how to do this.  Here&#8217;s a snippet from that program:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">int</span> main<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> argc, <span style="color: #0000ff;">char</span><span style="color: #000040;">*</span> argv<span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
  …
  message<span style="color: #008080;">::</span><span style="color: #007788;">Transaction</span> transaction<span style="color: #008080;">;</span>
&nbsp;
  file<span style="color: #000080;">=</span> open<span style="color: #008000;">&#40;</span>argv<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#93;</span>, O_RDONLY<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>file <span style="color: #000080;">==</span> <span style="color: #000040;">-</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#41;</span>
  <span style="color: #008000;">&#123;</span>
    <span style="color: #0000dd;">fprintf</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">stderr</span>, _<span style="color: #008000;">&#40;</span>“Cannot open file<span style="color: #008080;">:</span> <span style="color: #000040;">%</span>s\n“<span style="color: #008000;">&#41;</span>, argv<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#93;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">return</span> <span style="color: #000040;">-</span><span style="color: #0000dd;">1</span><span style="color: #008080;">;</span>
  <span style="color: #008000;">&#125;</span>
      …
  protobuf<span style="color: #008080;">::</span><span style="color: #007788;">io</span><span style="color: #008080;">::</span><span style="color: #007788;">ZeroCopyInputStream</span> <span style="color: #000040;">*</span>raw_input<span style="color: #000080;">=</span>
    <span style="color: #0000dd;">new</span> protobuf<span style="color: #008080;">::</span><span style="color: #007788;">io</span><span style="color: #008080;">::</span><span style="color: #007788;">FileInputStream</span><span style="color: #008000;">&#40;</span>file<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  protobuf<span style="color: #008080;">::</span><span style="color: #007788;">io</span><span style="color: #008080;">::</span><span style="color: #007788;">CodedInputStream</span> <span style="color: #000040;">*</span>coded_input<span style="color: #000080;">=</span>
    <span style="color: #0000dd;">new</span> protobuf<span style="color: #008080;">::</span><span style="color: #007788;">io</span><span style="color: #008080;">::</span><span style="color: #007788;">CodedInputStream</span><span style="color: #008000;">&#40;</span>raw_input<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
  <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>buffer<span style="color: #000080;">=</span> <span style="color: #0000ff;">NULL</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span>temp_buffer<span style="color: #000080;">=</span> <span style="color: #0000ff;">NULL</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">uint32_t</span> length<span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">uint32_t</span> previous_length<span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">uint32_t</span> checksum<span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">bool</span> result<span style="color: #000080;">=</span> <span style="color: #0000ff;">true</span><span style="color: #008080;">;</span>
  <span style="color: #0000ff;">uint32_t</span> message_type<span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
&nbsp;
  <span style="color: #ff0000; font-style: italic;">/* Read in the length of the command */</span>
  <span style="color: #0000ff;">while</span> <span style="color: #008000;">&#40;</span>result <span style="color: #000080;">==</span> <span style="color: #0000ff;">true</span> <span style="color: #000040;">&amp;&amp;</span>
           coded_input<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>ReadLittleEndian32<span style="color: #008000;">&#40;</span><span style="color: #000040;">&amp;</span>message_type<span style="color: #008000;">&#41;</span> <span style="color: #000080;">==</span> <span style="color: #0000ff;">true</span> <span style="color: #000040;">&amp;&amp;</span>
           coded_input<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>ReadLittleEndian32<span style="color: #008000;">&#40;</span><span style="color: #000040;">&amp;</span>length<span style="color: #008000;">&#41;</span> <span style="color: #000080;">==</span> <span style="color: #0000ff;">true</span><span style="color: #008000;">&#41;</span>
  <span style="color: #008000;">&#123;</span>
      <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>message_type <span style="color: #000040;">!</span><span style="color: #000080;">=</span> ReplicationServices<span style="color: #008080;">::</span><span style="color: #007788;">TRANSACTION</span><span style="color: #008000;">&#41;</span>
      <span style="color: #008000;">&#123;</span>
        <span style="color: #0000dd;">fprintf</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">stderr</span>, _<span style="color: #008000;">&#40;</span><span style="color: #FF0000;">&quot;Found a non-transaction message &quot;</span>
                            <span style="color: #FF0000;">&quot;in log.  Currently, not supported.<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        <span style="color: #0000dd;">exit</span><span style="color: #008000;">&#40;</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
      <span style="color: #008000;">&#125;</span>
&nbsp;
      <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>length <span style="color: #000080;">&gt;</span> <span style="color: #0000ff;">INT_MAX</span><span style="color: #008000;">&#41;</span>
      <span style="color: #008000;">&#123;</span>
        <span style="color: #0000dd;">fprintf</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">stderr</span>, _<span style="color: #008000;">&#40;</span>“Attempted to read record bigger than <span style="color: #0000ff;">INT_MAX</span>\n“<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        <span style="color: #0000dd;">exit</span><span style="color: #008000;">&#40;</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
      <span style="color: #008000;">&#125;</span>
&nbsp;
      <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>buffer <span style="color: #000080;">==</span> <span style="color: #0000ff;">NULL</span><span style="color: #008000;">&#41;</span>
      <span style="color: #008000;">&#123;</span>
        temp_buffer<span style="color: #000080;">=</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span><span style="color: #008000;">&#41;</span> <span style="color: #0000dd;">malloc</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">static_cast</span><span style="color: #000080;">&lt;</span><span style="color: #0000ff;">size_t</span><span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span>length<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
      <span style="color: #008000;">&#125;</span>
      <span style="color: #ff0000; font-style: italic;">/* No need to allocate if we have a buffer big enough… */</span>
      <span style="color: #0000ff;">else</span> <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>length <span style="color: #000080;">&gt;</span> previous_length<span style="color: #008000;">&#41;</span>
      <span style="color: #008000;">&#123;</span>
        temp_buffer<span style="color: #000080;">=</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">char</span> <span style="color: #000040;">*</span><span style="color: #008000;">&#41;</span> <span style="color: #0000dd;">realloc</span><span style="color: #008000;">&#40;</span>buffer, <span style="color: #0000ff;">static_cast</span><span style="color: #000080;">&lt;</span><span style="color: #0000ff;">size_t</span><span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span>length<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
      <span style="color: #008000;">&#125;</span>
&nbsp;
      <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>temp_buffer <span style="color: #000080;">==</span> <span style="color: #0000ff;">NULL</span><span style="color: #008000;">&#41;</span>
      <span style="color: #008000;">&#123;</span>
        <span style="color: #0000dd;">fprintf</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">stderr</span>, _<span style="color: #008000;">&#40;</span><span style="color: #FF0000;">&quot;Memory allocation failure trying to &quot;</span>
                            <span style="color: #FF0000;">&quot;allocate %&quot;</span> PRIu64 <span style="color: #FF0000;">&quot; bytes.<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #008000;">&#41;</span>,
                 <span style="color: #0000ff;">static_cast</span><span style="color: #000080;">&lt;</span><span style="color: #0000ff;">uint64_t</span><span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span>length<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        <span style="color: #0000ff;">break</span><span style="color: #008080;">;</span>
      <span style="color: #008000;">&#125;</span>
      <span style="color: #0000ff;">else</span>
        buffer<span style="color: #000080;">=</span> temp_buffer<span style="color: #008080;">;</span>
&nbsp;
      <span style="color: #ff0000; font-style: italic;">/* Read the Command */</span>
      result<span style="color: #000080;">=</span> coded_input<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>ReadRaw<span style="color: #008000;">&#40;</span>buffer, <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span><span style="color: #008000;">&#41;</span> length<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
      <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>result <span style="color: #000080;">==</span> <span style="color: #0000ff;">false</span><span style="color: #008000;">&#41;</span>
      <span style="color: #008000;">&#123;</span>
        <span style="color: #0000dd;">fprintf</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">stderr</span>, _<span style="color: #008000;">&#40;</span>“Could not read transaction message.\n“<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        <span style="color: #0000dd;">fprintf</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">stderr</span>, _<span style="color: #008000;">&#40;</span>“GPB ERROR<span style="color: #008080;">:</span> <span style="color: #000040;">%</span>s.\n“<span style="color: #008000;">&#41;</span>, <span style="color: #0000dd;">strerror</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">errno</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        <span style="color: #0000dd;">fprintf</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">stderr</span>, _<span style="color: #008000;">&#40;</span>“Raw buffer read<span style="color: #008080;">:</span> <span style="color: #000040;">%</span>s.\n“<span style="color: #008000;">&#41;</span>, buffer<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        <span style="color: #0000ff;">break</span><span style="color: #008080;">;</span>
      <span style="color: #008000;">&#125;</span>
&nbsp;
      result<span style="color: #000080;">=</span> transaction.<span style="color: #007788;">ParseFromArray</span><span style="color: #008000;">&#40;</span>buffer, <span style="color: #0000ff;">static_cast</span><span style="color: #000080;">&lt;</span><span style="color: #0000ff;">int32_t</span><span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span>length<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
      <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>result <span style="color: #000080;">==</span> <span style="color: #0000ff;">false</span><span style="color: #008000;">&#41;</span>
      <span style="color: #008000;">&#123;</span>
        <span style="color: #0000dd;">fprintf</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">stderr</span>, _<span style="color: #008000;">&#40;</span>“Unable to parse command. <span style="color: #007788;">Got</span> error<span style="color: #008080;">:</span> <span style="color: #000040;">%</span>s.\n“<span style="color: #008000;">&#41;</span>,
                 transaction.<span style="color: #007788;">InitializationErrorString</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>.<span style="color: #007788;">c_str</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>buffer <span style="color: #000040;">!</span><span style="color: #000080;">=</span> <span style="color: #0000ff;">NULL</span><span style="color: #008000;">&#41;</span>
          <span style="color: #0000dd;">fprintf</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">stderr</span>, _<span style="color: #008000;">&#40;</span>“BUFFER<span style="color: #008080;">:</span> <span style="color: #000040;">%</span>s\n“<span style="color: #008000;">&#41;</span>, buffer<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        <span style="color: #0000ff;">break</span><span style="color: #008080;">;</span>
    <span style="color: #008000;">&#125;</span>
    <span style="color: #ff0000; font-style: italic;">/* Print the transaction */</span>
    printTransaction<span style="color: #008000;">&#40;</span>transaction<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #ff0000; font-style: italic;">/* Skip 4 byte checksum */</span>
    coded_input<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>ReadLittleEndian32<span style="color: #008000;">&#40;</span><span style="color: #000040;">&amp;</span>checksum<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>do_checksum<span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
      <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>checksum <span style="color: #000040;">!</span><span style="color: #000080;">=</span> drizzled<span style="color: #008080;">::</span><span style="color: #007788;">hash</span><span style="color: #008080;">::</span><span style="color: #007788;">crc32</span><span style="color: #008000;">&#40;</span>buffer, <span style="color: #0000ff;">static_cast</span><span style="color: #000080;">&lt;</span><span style="color: #0000ff;">size_t</span><span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span>length<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>
      <span style="color: #008000;">&#123;</span>
        <span style="color: #0000dd;">fprintf</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">stderr</span>, _<span style="color: #008000;">&#40;</span><span style="color: #FF0000;">&quot;Checksum failed. Wanted %&quot;</span> PRIu32
                              <span style="color: #FF0000;">&quot; got %&quot;</span> PRIu32 <span style="color: #FF0000;">&quot;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #008000;">&#41;</span>,
                 checksum,
                 drizzled<span style="color: #008080;">::</span><span style="color: #007788;">hash</span><span style="color: #008080;">::</span><span style="color: #007788;">crc32</span><span style="color: #008000;">&#40;</span>buffer, <span style="color: #0000ff;">static_cast</span><span style="color: #000080;">&lt;</span><span style="color: #0000ff;">size_t</span><span style="color: #000080;">&gt;</span><span style="color: #008000;">&#40;</span>length<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
      <span style="color: #008000;">&#125;</span>
    <span style="color: #008000;">&#125;</span>
    previous_length<span style="color: #000080;">=</span> length<span style="color: #008080;">;</span>
  <span style="color: #008000;">&#125;</span>
&nbsp;
  <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>buffer<span style="color: #008000;">&#41;</span>
    <span style="color: #0000dd;">free</span><span style="color: #008000;">&#40;</span>buffer<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  <span style="color: #0000dd;">delete</span> coded_input<span style="color: #008080;">;</span>
  <span style="color: #0000dd;">delete</span> raw_input<span style="color: #008080;">;</span>
  <span style="color: #0000ff;">return</span> <span style="color: #008000;">&#40;</span>result <span style="color: #000080;">==</span> <span style="color: #0000ff;">true</span> <span style="color: #008080;">?</span> <span style="color: #0000dd;">0</span> <span style="color: #008080;">:</span> <span style="color: #0000dd;">1</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<h2>Shortcomings of the Transaction Log</h2>
<p>So far, we&#8217;ve generally focused on a scalable design for the transaction log and have not spent too much time on performance tuning the code — and yes, performance != scalability.  There are a number of problems with the current code which we will address in future versions of the transaction log.  Namely:</p>
<ul>
<li>Reduce calls to <tt>malloc()</tt>.  Currently, each write of a transaction message to the log file incurs a call to <tt>malloc()</tt> to allocate enough memory to store the serialized log entry.  Clearly, this is not optimal.  We&#8217;ve considered a number of alternate approached to calling <tt>malloc()</tt>, including having a scoreboard approach where a vector of memory slabs are used in a round-robin fashion.  This would introduce some locking, however.  Also, I&#8217;ve thought about using a hazard pointer list on the Session object to have previously-allocated memory on the Session object be used for something like this.  But, these ideas must be hashed out further.</li>
<li>There is <strong>no index into the transaction log</strong>.  This is not a problem for writing the transaction log, of course, but for readers of the transaction log.  I&#8217;m in the process of creating classes and a library for building indexes for a transaction log and, in addition, creating archived snapshots to enable log shipping for Drizzle replication.  I&#8217;ll be pushing code for this to Launchpad later this week and will write a new article about log shipping and snapshot creation.</li>
<li>Each call to <tt>apply()</tt> calls <tt>fdatasync()/fsync()</tt> on the transaction log.  Certain environments may consider this to be too strict a sync requirement, since the storage engine may already keep a transaction log file of its own that is also synced. For instance, InnoDB has a transaction log that, depending on the setting of InnoDB configuration variables, may call fdatasync() upon every transaction commit.  It would be best to have the syncing behaviour be user-adjustable — for instance, a setting to allow the transaction log to be synced every X number of seconds&#8230;</li>
</ul>
<h2>Summary and Request for Comments</h2>
<p>That&#8217;s it for the discussion about the transaction log.  I&#8217;ll post some more code examples from the replication plugins which utilize the transaction log in a later blog entry.</p>
<p>What do you think of the design of the transaction log?  What would you change?  Comments are always welcome! Cheers. <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.joinfu.com/2009/10/drizzle-replication-the-transaction-log/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Drizzle Replication &#8211; Changes in API to support Group Commit</title>
		<link>http://www.joinfu.com/2009/10/drizzle-replication-changes-in-api-to-support-group-commit/</link>
		<comments>http://www.joinfu.com/2009/10/drizzle-replication-changes-in-api-to-support-group-commit/#comments</comments>
		<pubDate>Tue, 27 Oct 2009 14:00:00 +0000</pubDate>
		<dc:creator>jaypipes</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://joinfu.com/2009/10/drizzle-replication-changes-in-api-to-support-group-commit</guid>
		<description><![CDATA[Hi all. It&#8217;s been quite some time since my last article on the new replication system in Drizzle. My apologies for the delay in publishing the next article in the replication series. The delay has been due to a reworking of the replication system to fully support &#8220;group commit&#8221; behaviour and to support fully transactional [...]]]></description>
			<content:encoded><![CDATA[<p>
Hi all.  It&#8217;s been quite some time since <a href="http://www.joinfu.com/index.php?/archives/300-Drizzle-Replication-The-CommandReplicator-and-CommandApplier-Plugin-API.html"  title="Drizzle replication - Part II - CommandReplicator and CommandApplier">my last article</a> on the new replication system in <a href="http://drizzle.org"  title="Drizzle">Drizzle</a>.  My apologies for the delay in publishing the next article in the replication series.
</p>
<p>
The delay has been due to a reworking of the replication system to fully support &#8220;group commit&#8221; behaviour and to support fully transactional replication.  The changes allow replicator and applier plugins to understand much more about the actual changes which occurred on the server, and to understand the transactional container properly.
</p>
<p>
The goals of <a href="http://drizzle.org"  title="Drizzle">Drizzle</a>&#8216;s replication system are as follows:
</p>
<ul>
<li>Make replication <strong>modular</strong> and not dependent on one particular implementation</li>
<li>Make it <strong>simple and fun to develop</strong> plugins for Drizzle replication</li>
<li><strong>Encapsulate</strong> all transmitted information in an efficient, portable, and standard format</li>
</ul>
<p>
This article serves to build on the last article and explain the changes to the Google Protobuffer message definitions used in the replication API.  The actual replication API described in the last article remains almost the same.  However, instead of being named CommandApplier and CommandReplicator, those plugin base classes are now named TransactionApplier and TransactionReplicator respectively.  And, instead of consuming a Command message, they consume Transaction messages.
</p>
<p>
<img src="http://joinfu.com/img/transaction_message.png" style="float: right; margin: 0px 0px 50px 50px;"/><br />
For my friend <a href="http://www.linkedin.com/pub/edwin-desouza/7/1a4/385"  title="Edwin DeSouza">Edwin</a>&#8216;s benefit, I&#8217;ll be including lots of pretty graphics. <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />   For my developer readers, I&#8217;m including lots of example C++ code to help you best understand how to read and manipulate the Transaction and Statement messages in the new replication system.
</p>
<h2>New Message Definitions</h2>
<p>
As I mentioned above, the Command message <a href="http://www.joinfu.com/index.php?/archives/298-Drizzle-Replication-The-Command-Message.html"  title="Drizzle Replication - Command Message">previously discussed in the first replication article</a>, has been changed in favour of a more space-efficient and transactional message format.  The proto file is now called <tt>/drizzled/message/transaction.proto</tt>.  You can look at the <a href="http://bazaar.launchpad.net/~jaypipes/drizzle/replication-group-commit/annotate/head%3A/drizzled/message/transaction.proto"  title="transaction.proto">proto file online</a>.
</p>
<p>
The Command Message has become the Statement message, and a new Transaction message serves as a container for multiple Statement messages representing (for most cases) an <strong>atomic change in the state of the database server</strong>.  I&#8217;ll discuss later in the article those specific cases where a Transaction message&#8217;s contents may contain only a partial atomic change to the server.
</p>
<p>
The image to the right depicts the Transaction message container.  As you can see, the Transaction message contains two things: a TransactionContext message and an array of one or more Statement messages.
</p>
<h2>The TransactionContext Message</h2>
<p>
Each Transaction message contains a single TransactionContext message.  The TransactionContext message contains information about the entire transaction.  The data members of the TransactionContext are as follows:
</p>
<p><img src="http://joinfu.com/img/transaction_context_message.png" style="float: left; margin: 0px 50px 50px 0px;" /></p>
<ul>
<li><tt>server_id</tt> &#8211; (uint32_t) A numeric identifier for the server which executed this transaction</li>
<li><tt>transaction_id</tt> &#8211; (uint64_t) A globally-unique transaction identifier</li>
<li><tt>start_timestamp</tt> &#8211; (uint64_t) A nano-second precision timestamp of when the transaction began.</li>
<li><tt>end_timestamp</tt> &#8211; (uint64_t) A nano-second precision timestamp of when the transaction completed.</li>
</ul>
<p>
Since TransactionContext is simply a <a href="http://code.google.com/apis/protocolbuffers/docs/overview.html"  title="Google Protobuffers">Google Protobuffer</a> <a href="http://code.google.com/apis/protocolbuffers/docs/reference/cpp/google.protobuf.message.html"  title="Google Protobuffer Message class">message</a>, accessing data members is simple and straightforward.  If you&#8217;re writing a replicator or applier, a reference to a const Transaction message will be supplied to you via the standard API.  For instance, let&#8217;s assume we&#8217;re writing a replicator and we want to filter all messages that are from the server with a server_id of 100.  Kind of a silly example, but nevertheless, it allows us to see some example code.
</p>
<p>
As you may remember, the API for a replicator is dirt simple.  There is a <tt>replicate()</tt> pure virtual method which accepts two parameters, the GPB message and a reference to the Applier which will &#8220;apply&#8221; the message to some target.  The new function signature is the same as the last one, with the term &#8220;Command&#8221; replaced with the term &#8220;Transaction&#8221;:
</p>
<div class="syntax">
<div class="cpp" style="font-family: monospace;">
<ol>
<li class="li1">
<div class="de1"><span class="kw2">virtual</span> <span class="kw4">void</span> replicate<span class="br0">&#40;</span>TransactionApplier *in_applier, </div>
</li>
<li class="li2">
<div class="de2">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;message::<span class="me2">Transaction</span> &amp;to_replicate<span class="br0">&#41;</span>= <span class="nu0">0</span>; </div>
</li>
</ol>
</div>
</div>
<p>
Suppose our replicator class is called MyReplicator.  Here is how to query the transaction context of the Transaction message and filter out transactions coming from server #100. <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />
</p>
<div class="syntax">
<div class="cpp" style="font-family: monospace;">
<ol>
<li class="li1">
<div class="de1"><span class="kw4">void</span> MyReplicator::<span class="me2">replicate</span><span class="br0">&#40;</span>TransactionApplier *in_applier, </div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; message::<span class="me2">Transaction</span> &amp;to_replicate<span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#123;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; <span class="kw4">const</span> message::<span class="me2">TransactionContext</span> &amp;ctx= to_replicate.<span class="me1">transaction_context</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>ctx.<span class="me1">server_id</span><span class="br0">&#40;</span><span class="br0">&#41;</span> != <span class="nu0">100</span><span class="br0">&#41;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; in_applier-&gt;apply<span class="br0">&#40;</span>to_replicate<span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span> </div>
</li>
</ol>
</div>
</div>
<p>
See? Pretty darn simple. <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />   OK, on to the Statement message, which is slightly more complicated.
</p>
<h2>The Statement Message</h2>
<p>
As noted above, the Transaction message contains an array of Statement messages.  In Protobuffer terminology, the Transaction message contains a &#8220;repeated&#8221; Statement data member.  The Statement message is an envelope containing the following information:
</p>
<p><img src="http://joinfu.com/img/statement_message.png" style="float: right; margin: 0px 0px 50px 50px;"/></p>
<ul>
<li><tt>type</tt> &#8211; (enum Type) The type of Statement this message represents.  Currently, the possible values of the type are as follows:
<ul>
<li>ROLLBACK</li>
<li>INSERT</li>
<li>UPDATE</li>
<li>DELETE</li>
<li>TRUNCATE_TABLE</li>
<li>CREATE_SCHEMA</li>
<li>ALTER_SCHEMA</li>
<li>DROP_SCHEMA</li>
<li>CREATE_TABLE</li>
<li>ALTER_TABLE</li>
<li>DROP_TABLE</li>
<li>SET_VARIABLE</li>
<li>RAW_SQL</li>
</ul>
</li>
<li><tt>start_timestamp</tt> &#8211; (uint64_t) A nano-second precision timestamp of when the statement began.</li>
<li><tt>end_timestamp</tt> &#8211; (uint64_t) A nano-second precision timestamp of when the statement completed.</li>
<li><tt>sql</tt> &#8211; (string) Optionally stores the exact original SQL string producing this message.</li>
<li>For certain types of Statement messages, there will also be a specialized header and data message (see below).</li>
</ul>
<p>
To access the Statement messages in a Transaction, use something like the following code, which loops over the Transaction message&#8217;s vector of Statement messages:
</p>
<div class="syntax">
<div class="cpp" style="font-family: monospace;">
<ol>
<li class="li1">
<div class="de1"><span class="kw4">void</span> MyReplicator::<span class="me2">replicate</span><span class="br0">&#40;</span>TransactionApplier *in_applier, </div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; message::<span class="me2">Transaction</span> &amp;to_replicate<span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#123;</span></div>
</li>
<li class="li2">
<div class="de2">&#8230;</div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">/* Grab the number of statements in the Transaction message */</span></div>
</li>
<li class="li2">
<div class="de2"><span class="kw4">size_t</span> x;</div>
</li>
<li class="li1">
<div class="de1"><span class="kw4">size_t</span> num_statements= to_replicate.<span class="me1">statement_size</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="coMULTI">/* Do something with each statement&#8230; */</span></div>
</li>
<li class="li2">
<div class="de2"><span class="kw1">for</span> <span class="br0">&#40;</span>x= <span class="nu0">0</span>; x &lt; num_statements; ++x<span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#123;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; <span class="kw4">const</span> message::<span class="me2">Statement</span> &amp;stmt= to_replicate.<span class="me1">statement</span><span class="br0">&#40;</span>x<span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="coMULTI">/* processStatement() does something with the statement&#8230; */</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; processStatement<span class="br0">&#40;</span>stmt<span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span></div>
</li>
<li class="li2">
<div class="de2">&#8230;</div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span> </div>
</li>
</ol>
</div>
</div>
<h2>Serialized Polymorphism with the <tt>type</tt> Member</h2>
<p>
The <tt>type</tt> data member is of critical importance to the Statement message, as it allows us to have a sort of polymorphism serialized within the Statement message itself.  This polymorphism allows the generic Statement message to contain specialized submessages depending on what type of event occurred on the server.
</p>
<p>
The above paragraph probably sounds overly complicated, but in reality things are pretty simple.  As usual, it&#8217;s easiest to see what&#8217;s going on by looking at an example in code.  For our example, let&#8217;s build out our fictional <tt>processStatement()</tt> method from the snippet above.
</p>
<p>
The <tt>processStatement()</tt> method is basically a giant switch statement, switching off of the supplied Statement message parameter&#8217;s <tt>type</tt> data member property.  Here is the outline of the <tt>processStatement()</tt> method, with only our switch statement and some comments visible which should give you an idea of how we deal with specific types of Statements:
</p>
<div class="syntax">
<div class="cpp" style="font-family: monospace;">
<ol>
<li class="li1">
<div class="de1"><span class="kw4">void</span> processStatement<span class="br0">&#40;</span><span class="kw4">const</span> message::<span class="me2">Statement</span> &amp;stmt<span class="br0">&#41;</span></div>
</li>
<li class="li2">
<div class="de2"><span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="kw1">switch</span> <span class="br0">&#40;</span>stmt.<span class="me1">type</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="kw1">case</span> message::<span class="me2">Statement</span>::<span class="me2">INSERT</span>:</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="coMULTI">/* Handle statements which insert new data&#8230; */</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw2">break</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; <span class="kw1">case</span> message::<span class="me2">Statement</span>::<span class="me2">UPDATE</span>:</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="coMULTI">/* Handle statements which update existing data&#8230; */</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="kw2">break</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="kw1">case</span> message::<span class="me2">Statement</span>::<span class="kw3">DELETE</span>:</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="coMULTI">/* Handle statements which delete existing data&#8230; */</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw2">break</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &#8230;&nbsp; &nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li2">
<div class="de2"><span class="br0">&#125;</span> </div>
</li>
</ol>
</div>
</div>
<p>
<img src="http://joinfu.com/img/insert_header_message.png" style="float: right; margin: 0px 0px 0px 100px;"/><br />
<img src="http://joinfu.com/img/insert_data_message.png" style="float: right; margin: 0px 0px 40px 100px;"/><br />
Let&#8217;s go ahead and &#8220;fill out&#8221; one of the case blocks in the switch statement above.  We will handle the case where the Statement <tt>type</tt> is </tt>INSERT</tt>.  Note that this <strong>does not necessarily mean a <em>SQL</em> INSERT statement was executed</strong>.  All this means is that an SQL statement was executed which resulted in a new record being added to a table on the server.  This means that the actual SQL statement could have been any of INSERT, INSERT ... SELECT, REPLACE INTO, or LOAD DATA INFILE.
</p>
<p>
The <tt>/drizzled/message/transaction.proto</tt> file will always contain lots of documentation explaining how each of the specific submessages in the Statement message class are handled.  To the right is a graphic depicting the <tt>InsertHeader</tt> and <tt>InsertData</tt> message classes which compose the "meat" of Statements that inserted new records into the database.  Whenever the Statement message's <tt>type</tt> is <tt>INSERT</tt>, the Statement message will contain two submessages, one called <tt>insert_header</tt> and another called <tt>insert_data</tt> which will be populated with the <tt>InsertHeader</tt> and <tt>InsertData</tt> messages.  The header message will contain information about the table and fields affected, while the data message will contain the values to be inserted into the table.
</p>
<p>
Here is some example code which queries the header and data messages and constructs an SQL string from them:
</p>
<div class="syntax">
<div class="cpp" style="font-family: monospace;">
<ol>
<li class="li1">
<div class="de1"><span class="kw4">void</span> processStatement<span class="br0">&#40;</span><span class="kw4">const</span> message::<span class="me2">Statement</span> &amp;stmt<span class="br0">&#41;</span></div>
</li>
<li class="li2">
<div class="de2"><span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="kw1">switch</span> <span class="br0">&#40;</span>stmt.<span class="me1">type</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="kw1">case</span> message::<span class="me2">Statement</span>::<span class="me2">INSERT</span>:</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="coMULTI">/* Handle statements which insert new data... */</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="br0">&#123;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="kw4">const</span> message::<span class="me2">InsertHeader</span> &amp;header= stmt.<span class="me1">insert_header</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw4">const</span> message::<span class="me2">InsertData</span> &amp;data= stmt.<span class="me1">insert_data</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; string destination;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw4">char</span> quoted_identifier= <span class="st0">'`'</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; destination-&gt;assign<span class="br0">&#40;</span><span class="st0">&quot;INSERT INTO &quot;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; destination-&gt;push_back<span class="br0">&#40;</span>quoted_identifier<span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; destination-&gt;append<span class="br0">&#40;</span>header.<span class="me1">table_metadata</span><span class="br0">&#40;</span><span class="br0">&#41;</span>.<span class="me1">schema_name</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; destination-&gt;push_back<span class="br0">&#40;</span>quoted_identifier<span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; destination-&gt;push_back<span class="br0">&#40;</span><span class="st0">'.'</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; destination-&gt;push_back<span class="br0">&#40;</span>quoted_identifier<span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; destination-&gt;append<span class="br0">&#40;</span>header.<span class="me1">table_metadata</span><span class="br0">&#40;</span><span class="br0">&#41;</span>.<span class="me1">table_name</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; destination-&gt;push_back<span class="br0">&#40;</span>quoted_identifier<span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; destination-&gt;append<span class="br0">&#40;</span><span class="st0">&quot; (&quot;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="coMULTI">/* Add field list to SQL string... */</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="kw4">size_t</span> num_fields= header.<span class="me1">field_metadata_size</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw4">size_t</span> x;</div>
</li>
<li class="li2">
<div class="de2">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw1">for</span> <span class="br0">&#40;</span>x= <span class="nu0">0</span>; x &lt; num_fields; ++x<span class="br0">&#41;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; <span class="kw4">const</span> message::<span class="me2">FieldMetadata</span> &amp;field_metadata= header.<span class="me1">field_metadata</span><span class="br0">&#40;</span>x<span class="br0">&#41;</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>x != <span class="nu0">0</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; destination-&gt;push_back<span class="br0">&#40;</span><span class="st0">','</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; </div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; destination-&gt;push_back<span class="br0">&#40;</span>quoted_identifier<span class="br0">&#41;</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; destination-&gt;append<span class="br0">&#40;</span>field_metadata.<span class="me1">name</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; destination-&gt;push_back<span class="br0">&#40;</span>quoted_identifier<span class="br0">&#41;</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; destination-&gt;append<span class="br0">&#40;</span><span class="st0">&quot;) VALUES (&quot;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="coMULTI">/* Add insert values */</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw4">size_t</span> num_records= data.<span class="me1">record_size</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="kw4">size_t</span> y;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; </div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="kw1">for</span> <span class="br0">&#40;</span>x= <span class="nu0">0</span>; x &lt; num_records; ++x<span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="br0">&#123;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>x != <span class="nu0">0</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; destination-&gt;append<span class="br0">&#40;</span><span class="st0">&quot;),(&quot;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; <span class="kw1">for</span> <span class="br0">&#40;</span>y= <span class="nu0">0</span>; y &lt; num_fields; ++y<span class="br0">&#41;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; <span class="br0">&#123;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>y != <span class="nu0">0</span><span class="br0">&#41;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; destination-&gt;push_back<span class="br0">&#40;</span><span class="st0">','</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; destination-&gt;push_back<span class="br0">&#40;</span><span class="st0">'<span class="es0">\'</span>'</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; destination-&gt;append<span class="br0">&#40;</span>data.<span class="me1">record</span><span class="br0">&#40;</span>x<span class="br0">&#41;</span>.<span class="me1">insert_value</span><span class="br0">&#40;</span>y<span class="br0">&#41;</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; &nbsp; &nbsp; destination-&gt;push_back<span class="br0">&#40;</span><span class="st0">'<span class="es0">\'</span>'</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; destination-&gt;push_back<span class="br0">&#40;</span><span class="st0">')'</span><span class="br0">&#41;</span>;</div>
</li>
<li class="li2">
<div class="de2">&nbsp;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li2">
<div class="de2">&nbsp; &nbsp; <span class="kw2">break</span>;</div>
</li>
<li class="li1">
<div class="de1">&nbsp; ...&nbsp; &nbsp;</div>
</li>
<li class="li2">
<div class="de2">&nbsp; <span class="br0">&#125;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="br0">&#125;</span> </div>
</li>
</ol>
</div>
</div>
<p>
The example code above is far from production-ready, of course.  I don't take into account different field types, instead simply enclosing everything in single quotes.  Also, I don't handle errors or escaping strings.  The point isn't to be perfect, but to show you the general way to get information out of the Statement message...
</p>
<h2>Partial Atomic Transactions</h2>
<p>
Above, I stated that the Transaction messages sent to Replicators and Appliers represent an atomic change to the state of a server.  This is true, most of the time. <img src='http://www.joinfu.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />   There are specific situations when a Transaction message will not represent an atomic change, and you should be aware of these scenarios if you plan to write plugins which implement a replication scheme.
</p>
<p>
There are times when it is simply inefficient or impossible to create a Transaction message that represents the actual atomic change on a server.  For instance, imagine a table having 100 million records.  Now, imagine issuing an <tt>UPDATE</tt> against that table that potentially affected every row in the table.
</p>
<p>
In order to transmit to replicas the atomic change to the server, one gigantic Transaction message would need to be constructed on the master server.  Not only is there a distinct chance that the master would run out of memory constructing such a large message object, but it's safe to say that the master server would suffer from performance degradation during this construction.  There must, therefore, be a way to start streaming the changes made to the master server before the actual final commit has happened on the master.
</p>
<p>
You may have noticed two data members of the <tt>InsertData</tt> message above named <tt>segment_id</tt> and <tt>end_segment</tt>.  The first is of type <tt>uint32_t</tt> and the second is a <tt>bool</tt>.  Together, these two data members fulfill the need to transmit transaction messages that are part of a <em><strong>bulk data modification</strong></em>.  When a reader of a Transaction message sees that the end_segment data member is false, then the reader knows that another data segment will follow the current data message and will contain more inserts, updates, or deletes for the current transaction.
</p>
<h2>Summary and Request for Comments</h2>
<p>
Hopefully, I've explained the changes that have been made to Drizzle's replication system well enough above, but I understand the changes to the message definitions are substantial and am available at any time to discuss the changes and assist people with their code.  You can find me on IRC, Freenode's #drizzle channel, via the <a href="https://lists.launchpad.net/drizzle-discuss/"  title="drizzle-discuss">Drizzle discussion mailing list</a>, or via email joinfu@sun.com.  I very much welcome comments.  The new replication system is just finishing up the valgrind regression tests and should hit trunk later today.
</p>
<p>
The next article covers the new Transaction Log, which is a serialized log of the Transaction messages used in the replication system.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.joinfu.com/2009/10/drizzle-replication-changes-in-api-to-support-group-commit/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Yet Another Post on REPLACE</title>
		<link>http://www.joinfu.com/2009/10/yet-another-post-on-replace/</link>
		<comments>http://www.joinfu.com/2009/10/yet-another-post-on-replace/#comments</comments>
		<pubDate>Fri, 23 Oct 2009 14:11:13 +0000</pubDate>
		<dc:creator>jaypipes</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Drizzle]]></category>
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://joinfu.com/2009/10/yet-another-post-on-replace</guid>
		<description><![CDATA[Sometimes, as Sergei rightly mentioned, I can be, well, &#8220;righteously indignant&#8221; about what I perceive to be a hack. In this case, after Sergei repeatedly tried to set me straight about what was going on &#8220;under the covers&#8221; during a REPLACE operation, I was still arguing that he was incorrect. Doh. I then realized that [...]]]></description>
			<content:encoded><![CDATA[<p>
Sometimes, as Sergei <a href="http://www.joinfu.com/index.php?/archives/303-The-Deal-with-REPLACE-..-Or-Is-It-UPDATE.html#c189236" >rightly mentioned</a>, I can be, well, &#8220;righteously indignant&#8221; about what I perceive to be a hack.
</p>
<p>
In this case, after Sergei repeatedly tried to set me straight about what was going on &#8220;under the covers&#8221; during a <tt>REPLACE</tt> operation, I was still arguing that he was incorrect.
</p>
<p>
Doh.
</p>
<p>
I then realized that <a href="http://www.joinfu.com/index.php?/archives/301-Pop-Quiz-What-Does-REPLACE-Do.html#c189128" >Sarah Sproenhle&#8217;s original comment</a> about my test table not having a primary key was the reason that I was seeing the behaviour that I had been seeing.
</p>
<p>
My original test case was failing, expecting to see a <tt>DELETE</tt> + an <tt>INSERT</tt>, when a <tt>REPLACE INTO</tt> was issued against a table.  When I placed the PRIMARY KEY on the table in my test case and re-ran the test case, it still failed because the DELETE still was not in the transaction log.  Well, it turns out that the reason was because <tt>ha_update_row()</tt> was actually called and not <tt>ha_delete_row()</tt> + <tt>ha_write_row()</tt>.  And, because of the documentation for the <tt>REPLACE</tt> command, I wasn&#8217;t checking that <tt>ha_update_row()</tt> may have been called &mdash; since I didn&#8217;t realize a <tt>REPLACE</tt> could actually do an <tt>UPDATE</tt>.
</p>
<p>
Anyway, I wanted to post to say that most of this whole <a href="http://www.urbandictionary.com/define.php?term=kerfuffle"  title="Kerfuffle">kerfuffle</a> was my fault.  Though I think that both the online and code documentation should reflect the fact that a REPLACE can do an UPDATE, the source of the failure was not what I originally wrote.  In contrast, ha_write_row() does indeed return ER_FOUND_DUPP_KEY appropriately during a REPLACE call.
</p>
<p>
Mmmmm, that piece of humble pie was delicious.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.joinfu.com/2009/10/yet-another-post-on-replace/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

