Yesterday and today I gave a couple webinars on basic performance tuning of SQL queries and schema/index strategies. I’ve given this presentation quite a few times now, and a number of good questions always seem to come up. I figured I would summarize them and their answers in a blog post.
Q: Which will be faster out of these two queries?
SELECT … WHERE some_col IN (1,2,3)
SELECT … WHERE some_col = 1 OR some_col = 2 OR some_col = 3
A:
Neither. The optimizer rewrites an IN() operator to a series of OR conditions, so there will be no performance diffference. Use IN() as it makes the code shorter and more readable.
Q: Where does MyISAM cache table records?
A:
Nowhere. MyISAM does not cache table records like InnoDB does in it’s innodb_buffer_pool. Instead, MyISAM relies on the operating system buffering to buffer table records as it reads them from the .MYD file. The MyISAM key_buffer only stores index blocks, not data records.
Q: Which will be faster out these two queries:
SELECT … FROM t1, t2 WHERE t1.id = t2.id
SELECT … FROM t1 INNER JOIN t2 ON t1.id = t2.id
A:
Neither. The optimizer actually will rewrite the bottom query into the top form. The SQL style you use is, of course, entirely up to you, however I recommend using the bottom style (known as ANSI syntax) over the top style (known as Theta syntax) for a couple reasons:
- MySQL only supports the inner and cross join for the Theta syntax. However, MySQL supports INNER, CROSS, LEFT and RIGHT outer joins for the ANSI syntax. Mixing and matching both styles can lead to hard-to-read SQL code.
- It is supremely easy to miss a join condition with Theta style, especially when joining many tables together. Leaving off a join condition by accident in the WHERE clause will lead to a cartesian product (not a good thing!). ANSI syntax is more explicit, and it is much harder to forget a join condition
Q: Is InnoDB faster/better than MyISAM
A:
It completely depends. Nobody ever really likes the answer to this question, but it is completely true. There are strengths and disadvantages to each storage engine. MyISAM has very good read performance, bulk load performance, and has a small footprint. But, InnoDB is great for heavy UPDATE environments, transaction needs, referential integrity, and fast single key lookups. You need to pick an engine based on what your application needs, and not some general “only use XXX engine” dictate.
Q: Is CHAR faster than VARCHAR?
A:
No, not really. If there is any performance difference, it is negligible. Pick CHAR if you know that the data has a specific number of characters (like a social security number, for instance) and VARCHAR otherwise.
Q: Is VARCHAR(80) faster than VARCHAR(255)
A:
Yes, but probably not in the way you think. AFAIK, there’s no difference (at least in recent versions of MySQL and InnoDB) between the speed at which VARCHAR columns of different lengths are retrieved from disk or memory. However, there is a big difference when either of the following scenarios occurs:
- A temporary table is implicitly created to handle a GROUP BY or ORDER BY clause and a VARCHAR column is in the SELECT statement
- A temporary table is created explicitly which contains a VARCHAR column
In these cases, the length of the VARCHAR columns does come into play. Why? Because temporary tables in memory are actually just tables of the MEMORY storage engine. The MEMORY storage engine, for some reason, treats all VARCHAR(X) columns as CHAR(X) columns.
This means that if you define two fields, one as VARCHAR(255) and another as VARCHAR(128), the latter will consume half as much space when allocated in a temporary table. The more records can fit into the max_heap_table_size, the fewer cases of swapping to disk tables (look for SHOW STATUS LIKE ‘Created_tmp_disk_’;) will occur, resulting in better overall performance.
Q: Are there performance issues when joining tables from different storage engines
A:
No. Issuing a SELECT against multiple storage engines is fine. It’s when you mix and match transactional and non-transactional engines within a transaction that modifies data that you will get unpredictable results.
Q: If I change a derived table to a view, will performance increase?
A:
No. A view is simply a derived table behind the scenes (at least when created using the TEMPTABLE algorithm). This means there is no performance difference between a regular derived table and a view. The view will simply make the code more readable and more “componentized”.
Q: If I see “Using temporary; Using filesort” in the Extra column of EXPLAIN output, does that mean a temporary table is created on disk?
A:
No. A disk-based table will only occur in the following situations:
- When the size of the implicitly created temporary table (from a GROUP BY or ORDER BY on a non-indexed column) is greater than both tmp_table_size and max_heap_table_size
- When there are any BLOB or TEXT fields in the SELECT expression
- When a full table scan occurs that exceeds the read_buffer_size variable (configured per connection thread)
These are the scenarios (off the top of my head) I can think of which cause disk-based temporary table creation. There could be a few more. By the way, the disk-based temporary table created is a MyISAM table.
Q: Is it possible to do a FULL OUTER JOIN in MySQL?
A:
Yes. Use both LEFT and RIGHT JOIN on the same join condition in the same query, like so:
SELECT * FROM A LEFT JOIN B ON A.id = B.id UNION ALL SELECT * FROM A RIGHT JOIN B ON A.id = B.id WHERE A.id IS NULL
Got any more questions? Feel free to comment! Cheers.
Go