Small but steady progress in improving Drizzle performance

We’re making steady progress in removing bottlenecks in the Drizzle code base. So far, a number of mutexes have been removed and we’ve begin to replace a number of contention points with atomic instructions which remove the need for a lock structure on platforms which support atomic fetch and store instructions.

I’m pretty positive about the direction we are going so far. We’re seeing the right trends in our scaling graphs, with very little performance drop off in read-only workloads up to 4X the number of cores on the machine, and little performance drop off on the read-write workloads up to 2X the number of cores, as you can see from the graphs below.

It’s a little difficult to see, but we’ve made a small but steady improvement from r950 to r968, with numbers increasing around 1-2% across most concurrency levels. You can see the raw numbers here:

+--------------------------------+-------+-----+---------+----------+
| config_name                    | revno | c   | tps     | rwrps    |
+--------------------------------+-------+-----+---------+----------+
| drizzle_innodb_readonly_1000K  |   950 |   2 |  710.67 |  9949.34 | 
| drizzle_innodb_readonly_1000K  |   950 |   4 | 1163.77 | 16292.81 | 
| drizzle_innodb_readonly_1000K  |   950 |   8 | 1692.59 | 23696.29 | 
| drizzle_innodb_readonly_1000K  |   950 |  16 | 2470.31 | 34584.39 | 
| drizzle_innodb_readonly_1000K  |   950 |  32 | 3104.98 | 43469.73 | 
| drizzle_innodb_readonly_1000K  |   950 |  64 | 3376.98 | 47277.73 | 
| drizzle_innodb_readonly_1000K  |   950 | 128 | 2986.91 | 41816.74 | 
| drizzle_innodb_readonly_1000K  |   950 | 256 | 2657.54 | 37205.47 | 
| drizzle_innodb_readonly_1000K  |   968 |   2 |  712.73 |  9978.25 | 
| drizzle_innodb_readonly_1000K  |   968 |   4 | 1081.72 | 15144.10 | 
| drizzle_innodb_readonly_1000K  |   968 |   8 | 1714.77 | 24006.77 | 
| drizzle_innodb_readonly_1000K  |   968 |  16 | 2480.48 | 34726.77 | 
| drizzle_innodb_readonly_1000K  |   968 |  32 | 3140.16 | 43962.29 | 
| drizzle_innodb_readonly_1000K  |   968 |  64 | 3394.03 | 47516.32 | 
| drizzle_innodb_readonly_1000K  |   968 | 128 | 3008.74 | 42122.30 | 
| drizzle_innodb_readonly_1000K  |   968 | 256 | 2676.62 | 37472.65 | 
| drizzle_innodb_readwrite_1000K |   950 |   2 |  438.04 |  8322.77 | 
| drizzle_innodb_readwrite_1000K |   950 |   4 |  720.68 | 13692.98 | 
| drizzle_innodb_readwrite_1000K |   950 |   8 | 1068.65 | 20304.39 | 
| drizzle_innodb_readwrite_1000K |   950 |  16 | 1454.71 | 27639.47 | 
| drizzle_innodb_readwrite_1000K |   950 |  32 | 1699.74 | 32295.02 | 
| drizzle_innodb_readwrite_1000K |   950 |  64 | 1506.04 | 28614.71 | 
| drizzle_innodb_readwrite_1000K |   950 | 128 | 1341.46 | 25487.69 | 
| drizzle_innodb_readwrite_1000K |   950 | 256 | 1157.95 | 22001.16 | 
| drizzle_innodb_readwrite_1000K |   968 |   2 |  444.10 |  8437.81 | 
| drizzle_innodb_readwrite_1000K |   968 |   4 |  700.45 | 13308.53 | 
| drizzle_innodb_readwrite_1000K |   968 |   8 | 1075.59 | 20436.14 | 
| drizzle_innodb_readwrite_1000K |   968 |  16 | 1457.83 | 27698.76 | 
| drizzle_innodb_readwrite_1000K |   968 |  32 | 1732.04 | 32908.82 | 
| drizzle_innodb_readwrite_1000K |   968 |  64 | 1506.98 | 28632.61 | 
| drizzle_innodb_readwrite_1000K |   968 | 128 | 1355.17 | 25748.31 | 
| drizzle_innodb_readwrite_1000K |   968 | 256 | 1157.29 | 21988.59 | 
+--------------------------------+-------+-----+---------+----------+

Drizzle is all about open and transparent, and if you want to know how these numbers are generated, feel free to download the drizzle-automation project, a Python benchmarking and code coverage utility I’ve written over the past couple weeks. We use standard sysbench modified to use our client driver, with the following configuration in our /etc/drizzle-automation/bench.cnf file:

[defaults]

# Number of iterations the benchmark process should do for each level
# of concurrency
iterations= 5

# Comma-separate list of concurrency levels to run
concurrency_levels= 2,4,8,16,32,64,128,256

# Options passed as-as to make when building server
make_options= -j32

# Options passed as-is to configure when building server
configure_options= 

[drizzle_innodb_readonly_1000K]

# The program used to run the benchmark.  If you use paths, they
# will be relative to the sandbox directory in which the run is
# being processed
bench_cmd= sysbench

# Options given to the benchmark program on every iteration
bench_options= --max-time=60 --max-requests=0 --test=oltp \
--drizzle-db=test --drizzle-port=4427 --drizzle-host=127.0.0.1 \
--drizzle-user=root --db-ps-mode=disable --db-driver=drizzleclient \
--drizzle-table-engine=innodb --oltp-read-only=on --oltp-table-size=1000000

# The program used to start the server instance.  If you use paths, they
# will be relative to the sandbox directory in which the run is
# being processed
server_cmd= ./drizzled/drizzled

# Options passed to the server on startup
server_options= --port=4427 --datadir=/tmp --innodb-buffer-pool=4G \
--key-buffer-size=64M --scheduler=multi_thread --innodb_log_buffer_size=512M \
 --innodb_additional_mem_pool_size=120M --table_open_cache=4096 &

[drizzle_innodb_readwrite_1000K]

# The program used to run the benchmark.  If you use paths, they
# will be relative to the sandbox directory in which the run is
# being processed
bench_cmd= sysbench

# Options given to the benchmark program on every iteration
bench_options= --max-time=60 --max-requests=0 --test=oltp --drizzle-db=test \
--drizzle-port=4427 --drizzle-host=127.0.0.1 --drizzle-user=root --db-ps-mode=disable \
--db-driver=drizzleclient --drizzle-table-engine=innodb --oltp-read-only=off \
--oltp-table-size=1000000

# The program used to start the server instance.  If you use paths, they
# will be relative to the sandbox directory in which the run is
# being processed
server_cmd= ./drizzled/drizzled

# Options passed to the server on startup
server_options= --port=4427 --datadir=/tmp --innodb-buffer-pool=4G \
--key-buffer-size=64M --scheduler=multi_thread --innodb_log_buffer_size=512M \
--innodb_additional_mem_pool_size=120M --table_open_cache=4096 &

The machine we run the benchmarks on is an Intel Xeon QuadCore, and all tests fit into the available RAM on the box.

These benchmarks, which we can run against a single revision or a range of revisions, are invaluable in allowing us to pinpoint the specific cause of a performance regression. Feel free to check out the automation work, contribute to the project, or suggest improvements to it on the Drizzle Discuss mailing list.