MySQL Minutiae: Table Engines
Most people know that MySQL supports different table engines, and could probably name the two most common: InnoDB (now the default) and MyISAM.
We’ve all seen the engine specified during a CREATE TABLE
call, and we simply pick one of these two and go about happy.
But MySQL actually ships with seven engines enabled by default, and another which is not.
All of these engines have specific traits which make them applicable (and more importantly: the best tool) for specific tasks.
Because your MySQL could be compiled differently, you can learn what engines your MySQL server supports, and the engines current status using the SHOW ENGINES
query. You may see more, or less than those listed below.
Engine | |
---|---|
`InnoDB` | ACID compliant, supports transactions, row-level locking. _Default __since__ MySQL 5.5_. |
`MyISAM` | Fast, supports FULLTEXT indexes, table-level locking. _Default __before__ MySQL 5.5_ |
`CSV` | Stores data as CSV text files. |
`MERGE` | Allows you to aggregate multiple tables with the same schema. _(Think: `UNION`.)_ |
`ARCHIVE` | Compressed storage, supporting `INSERT` and `SELECT` __only__. No support for indexes. |
`MEMORY` | Stores data in memory, similar to a limited `MyISAM` when it comes to feature-set. |
`BLACKHOLE` | Data goes to `/dev/null`. |
`FEDERATED` | __Not enabled by default__. Data is stored on a remote MySQL instance. (__Note:__ Not as cool as it sounds!) |
##
Which Engine, When?
Just like when you make the choice between InnoDB
or MyISAM
, each of these engines has different purposes, strengths, weaknesses and abilities.
###InnoDB
With the release of MySQL 5.5, InnoDB is now the default table engine used. It features ACID compliance, transaction support and row-level locking.
Row-level locking in particular is important as it allows for great concurrency during write operations.
Engine Yard will be making the change to have InnoDB as the default in the near future for new instances of MySQL and highly recommends the use of InnoDB. ###MyISAM The MyISAM storage engine was the default prior to MySQL 5.5; it’s main advantage is speed — however, this is at the expense of ACID compliance, transactions and row-level locking: all the features of InnoDB.
MyISAM uses table level locking, this means that during certain operations the entire table is locked from writes, causing potential concurrency issues.
One feature that MyISAM provides that no other (default) table engine supports is FULLTEXT
indexes.
###CSV
The CSV
table engine started as (and largely remains) an example table engine implementation. However, because it uses plain CSV files on disk (Windows folks: it needs to use unix newline) you can easily pull CSV data into and out of MySQL using it.
(CREATE TABLE user_export (
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(80) NOT NULL,
email VARCHAR(100) NOT NULL
) ENGINE = CSV;
INSERT INTO
user_export
SELECT
first_name,
last_name,
email
FROM
accounts
WHERE
is_active = 1;
)
This will result in a user_export.CSV
file in your MySQL data directory that contains the result of theINSERT… SELECT
.
You can easily flip this and INSERT INTO ... SELECT
from the CSV
table to pull in existing CSV data.
CSV
is great for maintaining portability and interoperability, since it is essentially just a CSV file.
###MERGE
The MERGE
table engine allows you to create a table that represents a UNION
between two or more __identical__MyISAM tables.
MERGE
can allow you to split large datasets across multiple disks, this can be help in alleviating performance issues when doing schema updates (e.g. adding or removing an index) or when doing table repairs.
However, not all features are available (notably, FULLTEXT indexes cannot be used, although the underlying tables can have them) and index reads are slower.
Merge tables lend themselves to read-heavy data-sets, such as log storage (simple consecutive inserts, many different reads), but the ARCHIVE
engine (or even a NoSQL solution like MongoDB) might be a better option for your specific use case.
Using merge is very simple; first simply define your first MyISAM
tables and create your MERGE
table:
CREATE TABLE logs_201209 (
name VARCHAR(50) NOT NULL,
server VARCHAR(100) NOT NULL,
line TEXT NOT NULL,
date
DATETIME NOT NULL,
INDEX (name, server)
) ENGINE = MyISAM;
CREATE TABLE logs (
name VARCHAR(50) NOT NULL,
server VARCHAR(100) NOT NULL,
line TEXT NOT NULL,
date
DATETIME NOT NULL,
INDEX (name, server)
) ENGINE = MERGE UNION=(logs_201209) INSERT_METHOD=LAST;
Now, any INSERT
into to logs
will actually go into logs_201209
:
INSERT INTO logs VALUES (“httpd”, “192.168.1.2”, “File not found /randompage”, NOW());
SELECT * FROM logs_201209; +——-+————-+—————————-+———————+ | name | server | line | date | +——-+————-+—————————-+———————+ | httpd | 192.168.1.2 | File not found /randompage | 2012-10-30 22:58:18 | +——-+————-+—————————-+———————+
You can then, based on some criteria (say: monthly) add more tables to the MERGE
using an ALTER
statement:
CREATE TABLE logs_201210 (
name VARCHAR(50) NOT NULL,
server VARCHAR(100) NOT NULL,
line TEXT NOT NULL,
date
DATETIME NOT NULL,
INDEX (name, server)
) ENGINE = MyISAM;
ALTER TABLE logs UNION=(logs_201209, logs_201210);
If we perform the same INSERT
then SELECT
as above, we will no longer see data being added to thelogs_201209
table, but instead it will be added to the logs_201210
table. This is determined by theINSERT_METHOD
segment of the MERGE
tables CREATE TABLE
statement. Possible values for INSERT_METHOD
are:
- `LAST` — Insert into the last table listed in the `UNION`
- `FIRST` — Insert into the first table listed in the `UNION`
- `NO` or no value (*default*) — Inserts are not permitted and result in an error
In addition to being able to query the individual tables, you can also pull the merged dataset from the logs
table directly.
SELECT * FROM logs; +——-+————-+—————————-+———————+ | name | server | line | date | +——-+————-+—————————-+———————+ | httpd | 192.168.1.2 | File not found /randompage | 2012-10-30 22:58:18 | | httpd | 192.168.1.2 | File not found /randompage | 2012-10-30 23:00:06 | +——-+————-+—————————-+———————+
As far as any MySQL client is concerned, this a single table, producing a standard dataset; despite the fact it actually is the result of multiple merged tables.
###ARCHIVE
The ARCHIVE
table engine only supports SELECT
and INSERT
queries. You cannot UPDATE
or DELETE
data from an ARCHIVE
table. However, in trade you get compressed storage.
This makes ARCHIVE
tables great for storage of permanent data, especially large data sets that you intend to keep around perpetually, such as analytics, logs, statistics and historical data.
###MEMORY
The MEMORY
table type is intended for very fast access to volatile data. MEMORY
is a great option for caching complex datasets comprised from other tables.
Use of MEMORY
tables for fast access should not be used blindly, on memory-starved servers, InnoDB
can be faster and you should benchmark.
Additionally, because of it’s volatile nature, if the server is restarted all data will be lost; while this might be obvious, this means that if the table is replicated, until the master has been able to [automatically] replicate a DELETE
statement all slaves will continue to contain the previous data-set.
MEMORY
tables have several limitations:
- Does not support `BLOB` or `TEXT` columns
- Can suck up memory due to fixed-block allocation
- Limited by `max_heap_table_size` setting, which is also used for temporary tables
###BLACKHOLE
BLACKHOLE
seems like a bit of an oddity; after all, what’s the point in a storage engine that intentionally doesn’t store anything.
However, BLACKHOLE
has two great use cases.
First, it’s important to realize that doing an INSERT
into a BLACKHOLE
table is essentially a no-op. This means that you can use it for a number of checks where the end result doesn’t matter. These include:
- Testing the validity of a schema dump
- Performance testing, by eradicating the overhead of the table engine
- Performance testing of binary logging, by comparing with and without binlogs enabled
This last one is also a hint as to what the other purpose of BLACKHOLE
is, and for me, is what makes it pretty cool: Binary logging, and replication still takes place.
Being able to replicate data, without storing it, means that you can offload disk I/O to a slave machine without the fear of losing it if network connectivity is severed — all the standard rules for replication apply. The only I/O that occurs is the writing of the binlog, which is minimal (especially for STATEMENT
based replication). This makesBLACKHOLE
perfect for daisy-chained replication scenarions.
For example, I have seen BLACKHOLE
used to allow huge volumes of writes to a server that would normally be unable to handle that much traffic, which would then be replicated out to multiple slaves for redundant storage. Now, in this case the slaves themselves would lag when writing, but because the data didn’t need to be readily accessible this was not an issue (millions of records that were aggregated monthly for analyses).
Additionally, you can replicate from a master with non-BLACKHOLE
tables to a slave with BLACKHOLE
that filters the data using replicate-do-*
and/or replicate-ignore-*
which then creates a second filtered binlog which is replicated out.
Because the use of BLACKHOLE
means that no data is stored, you can run the first slave on the same instance as the primary with little extra overhead, and therefore only replicate out a specific data set — for example to avoid sensitive data being sent to non-secure locations.
###FEDERATED
FEDERATED
tables can be thought of as a proxy to another table on a different MySQL server. However they have a number of limitations and due to network latency are rarely used. The FEDERATED
engine, while included is not enabled by default and is no longer maintained by Oracle and it’s use is discouraged.
There is an open source fork, called FEDERATEDX
which aims to continue development and overcome some of the limitations of the original implementation. More details can be found here.
What’s Next
Hopefully, now you’re aware that MySQL has a lot more to offer than just it’s the standard two table engines, but as with any tool, you must evaluate it for a given task, and ensure that it meets your requirements — and that you can live with the trade-offs.
For more information, the best resource is the Storage Engine section of the MySQL manual.
Additional Table Engines
These eight default MySQL table engines are just a starting point. In a followup to this post, we will be looking at third party table engines. Look for that in the coming weeks.
Share your thoughts with @engineyard on Twitter