December 16th 2012

By Davey Shafik

TAGS:
Databases

MySQL Minutiae: Table Engines

Most people know that MySQL supports different table engines, and could probably name the two most common: InnoDB (now the default) and MyISAM.

We’ve all seen the engine specified during a CREATE TABLE call, and we simply pick one of these two and go about happy.

But MySQL actually ships with seven engines enabled by default, and another which is not.

All of these engines have specific traits which make them applicable (and more importantly: the best tool) for specific tasks.

Because your MySQL could be compiled differently, you can learn what engines your MySQL server supports, and the engines current status using the SHOW ENGINES query. You may see more, or less than those listed below.

Engine
`InnoDB`	ACID compliant, supports transactions, row-level locking. _Default __since__ MySQL 5.5_.
`MyISAM`	Fast, supports FULLTEXT indexes, table-level locking. _Default __before__ MySQL 5.5_
`CSV`	Stores data as CSV text files.
`MERGE`	Allows you to aggregate multiple tables with the same schema. _(Think: `UNION`.)_
`ARCHIVE`	Compressed storage, supporting `INSERT` and `SELECT` __only__. No support for indexes.
`MEMORY`	Stores data in memory, similar to a limited `MyISAM` when it comes to feature-set.
`BLACKHOLE`	Data goes to `/dev/null`.
`FEDERATED`	__Not enabled by default__. Data is stored on a remote MySQL instance. (__Note:__ Not as cool as it sounds!)

Which Engine, When?

Just like when you make the choice between InnoDB or MyISAM, each of these engines has different purposes, strengths, weaknesses and abilities. ###InnoDB With the release of MySQL 5.5, InnoDB is now the default table engine used. It features ACID compliance, transaction support and row-level locking.

Row-level locking in particular is important as it allows for great concurrency during write operations.

Engine Yard will be making the change to have InnoDB as the default in the near future for new instances of MySQL and highly recommends the use of InnoDB. ###MyISAM The MyISAM storage engine was the default prior to MySQL 5.5; it’s main advantage is speed — however, this is at the expense of ACID compliance, transactions and row-level locking: all the features of InnoDB.

MyISAM uses table level locking, this means that during certain operations the entire table is locked from writes, causing potential concurrency issues.

One feature that MyISAM provides that no other (default) table engine supports is FULLTEXT indexes. ###CSV The CSV table engine started as (and largely remains) an example table engine implementation. However, because it uses plain CSV files on disk (Windows folks: it needs to use unix newline) you can easily pull CSV data into and out of MySQL using it.

(CREATE TABLE user_export (
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(80) NOT NULL,
    email VARCHAR(100) NOT NULL 
) ENGINE = CSV;

INSERT INTO
    user_export 
SELECT 
    first_name,
    last_name, 
    email 
FROM 
    accounts
WHERE
    is_active = 1;
)

This will result in a user_export.CSV file in your MySQL data directory that contains the result of theINSERT… SELECT.

You can easily flip this and INSERT INTO ... SELECT from the CSV table to pull in existing CSV data.

CSV is great for maintaining portability and interoperability, since it is essentially just a CSV file. ###MERGE The MERGE table engine allows you to create a table that represents a UNION between two or more __identical__MyISAM tables.

MERGE can allow you to split large datasets across multiple disks, this can be help in alleviating performance issues when doing schema updates (e.g. adding or removing an index) or when doing table repairs.

However, not all features are available (notably, FULLTEXT indexes cannot be used, although the underlying tables can have them) and index reads are slower.

Merge tables lend themselves to read-heavy data-sets, such as log storage (simple consecutive inserts, many different reads), but the ARCHIVE engine (or even a NoSQL solution like MongoDB) might be a better option for your specific use case.

Using merge is very simple; first simply define your first MyISAM tables and create your MERGE table:

CREATE TABLE logs_201209 ( name VARCHAR(50) NOT NULL, server VARCHAR(100) NOT NULL, line TEXT NOT NULL, date DATETIME NOT NULL, INDEX (name, server) ) ENGINE = MyISAM;

CREATE TABLE logs ( name VARCHAR(50) NOT NULL, server VARCHAR(100) NOT NULL, line TEXT NOT NULL, date DATETIME NOT NULL, INDEX (name, server) ) ENGINE = MERGE UNION=(logs_201209) INSERT_METHOD=LAST;

Now, any INSERT into to logs will actually go into logs_201209:

INSERT INTO logs VALUES (“httpd”, “192.168.1.2”, “File not found /randompage”, NOW());

You can then, based on some criteria (say: monthly) add more tables to the MERGE using an ALTER statement:

CREATE TABLE logs_201210 ( name VARCHAR(50) NOT NULL, server VARCHAR(100) NOT NULL, line TEXT NOT NULL, date DATETIME NOT NULL, INDEX (name, server) ) ENGINE = MyISAM;

ALTER TABLE logs UNION=(logs_201209, logs_201210);

If we perform the same INSERT then SELECT as above, we will no longer see data being added to thelogs_201209 table, but instead it will be added to the logs_201210 table. This is determined by theINSERT_METHOD segment of the MERGE tables CREATE TABLE statement. Possible values for INSERT_METHODare:

`LAST` — Insert into the last table listed in the `UNION`
`FIRST` — Insert into the first table listed in the `UNION`
`NO` or no value (*default*) — Inserts are not permitted and result in an error

In addition to being able to query the individual tables, you can also pull the merged dataset from the logs table directly.

As far as any MySQL client is concerned, this a single table, producing a standard dataset; despite the fact it actually is the result of multiple merged tables. ###ARCHIVE The ARCHIVE table engine only supports SELECT and INSERT queries. You cannot UPDATE or DELETE data from an ARCHIVE table. However, in trade you get compressed storage.

This makes ARCHIVE tables great for storage of permanent data, especially large data sets that you intend to keep around perpetually, such as analytics, logs, statistics and historical data. ###MEMORY The MEMORY table type is intended for very fast access to volatile data. MEMORY is a great option for caching complex datasets comprised from other tables.

Use of MEMORY tables for fast access should not be used blindly, on memory-starved servers, InnoDB can be faster and you should benchmark.

Additionally, because of it’s volatile nature, if the server is restarted all data will be lost; while this might be obvious, this means that if the table is replicated, until the master has been able to [automatically] replicate a DELETEstatement all slaves will continue to contain the previous data-set.

MEMORY tables have several limitations:

Does not support `BLOB` or `TEXT` columns
Can suck up memory due to fixed-block allocation
Limited by `max_heap_table_size` setting, which is also used for temporary tables

###BLACKHOLE BLACKHOLE seems like a bit of an oddity; after all, what’s the point in a storage engine that intentionally doesn’t store anything.

However, BLACKHOLE has two great use cases.

First, it’s important to realize that doing an INSERT into a BLACKHOLE table is essentially a no-op. This means that you can use it for a number of checks where the end result doesn’t matter. These include:

Testing the validity of a schema dump
Performance testing, by eradicating the overhead of the table engine
Performance testing of binary logging, by comparing with and without binlogs enabled

This last one is also a hint as to what the other purpose of BLACKHOLE is, and for me, is what makes it pretty cool: Binary logging, and replication still takes place.

Being able to replicate data, without storing it, means that you can offload disk I/O to a slave machine without the fear of losing it if network connectivity is severed — all the standard rules for replication apply. The only I/O that occurs is the writing of the binlog, which is minimal (especially for STATEMENT based replication). This makesBLACKHOLE perfect for daisy-chained replication scenarions.

For example, I have seen BLACKHOLE used to allow huge volumes of writes to a server that would normally be unable to handle that much traffic, which would then be replicated out to multiple slaves for redundant storage. Now, in this case the slaves themselves would lag when writing, but because the data didn’t need to be readily accessible this was not an issue (millions of records that were aggregated monthly for analyses).

Additionally, you can replicate from a master with non-BLACKHOLE tables to a slave with BLACKHOLE that filters the data using replicate-do-* and/or replicate-ignore-* which then creates a second filtered binlog which is replicated out.

Because the use of BLACKHOLE means that no data is stored, you can run the first slave on the same instance as the primary with little extra overhead, and therefore only replicate out a specific data set — for example to avoid sensitive data being sent to non-secure locations. ###FEDERATED FEDERATED tables can be thought of as a proxy to another table on a different MySQL server. However they have a number of limitations and due to network latency are rarely used. The FEDERATED engine, while included is not enabled by default and is no longer maintained by Oracle and it’s use is discouraged.

There is an open source fork, called FEDERATEDX which aims to continue development and overcome some of the limitations of the original implementation. More details can be found here.

What’s Next

Hopefully, now you’re aware that MySQL has a lot more to offer than just it’s the standard two table engines, but as with any tool, you must evaluate it for a given task, and ensure that it meets your requirements — and that you can live with the trade-offs.

For more information, the best resource is the Storage Engine section of the MySQL manual.

Additional Table Engines

These eight default MySQL table engines are just a starting point. In a followup to this post, we will be looking at third party table engines. Look for that in the coming weeks.

Share your thoughts with @engineyard on Twitter

About Davey Shafik

Davey Shafik is a full time PHP Developer with 12 years experience in PHP and related technologies. A Community Engineer for Engine Yard, he has written three books (so far!), numerous articles and spoken at conferences the globe over.

Davey is best known for his books, the Zend PHP 5 Certification Study Guide and PHP Master: Write Cutting Edge Code, and as the originator of PHP Archive (PHAR) for PHP 5.3.