MySQL InnoDB: performance gains as well as some pitfalls

In the vast majority of cases, web sites will be setup with the MyISAM engine for all the Drupal tables.

MyISAM Engine

MyISAM is the default engine on most MySQL installations, and is oriented towards read heavy workload. MyISAM started life for datawarehousing and analytics, where transactions are rarely needed. Its has no support for things that relational databases take for granted nowadays, such as lack of referential integrity, row level locking, ...etc.

InnoDB

For transactional workload, MySQL relies so far on InnoDB from InnoBase. InnoDB has all the features of a "real" relational database engine, including row level locking, referential integrity, ...etc..

Drupal and locking

Drupal, by default, caters for MyISAM, since this is how it started, and this is what most people have on their hosting machines. So, Drupal does table level locking, as per this code in database.mysql.inc:

function db_next_id($name) {
$name = db_prefix_tables($name);
db_query('LOCK TABLES {sequences} WRITE');
$id = db_result(db_query("SELECT id FROM {sequences} WHERE name = '%s'", $name)) + 1;
db_query("REPLACE INTO {sequences} VALUES ('%s', %d)", $name, $id);
db_query('UNLOCK TABLES');

return $id;
}

This code is used every time we assign a new id for an object, for example, a user, a node, a comment, a category term.

There is also this function:

function db_lock_table($table) {
db_query('LOCK TABLES {'. db_escape_table($table) .'} WRITE');
}

And this one to go with it:

function db_unlock_tables() {
db_query('UNLOCK TABLES');
}

These last functions are called in Drupal core (5.x) in the following functions:

./includes/bootstrap.inc:387:  db_lock_table('variable');
./includes/cache.inc:94: db_lock_table($table);
./modules/block/block.module:190: db_lock_table('blocks');

The first one in bootstrap.inc is when setting a variable, which is usually not a frequent operation. The second one in cache.inc is in cache_set(), which can be a frequent operation for the filter cache, and page cache, and can cause contention.

There has been some work on eliminating locking in Drupal, and it should be available in Drupal 6.x.

How InnoDB can help with locking

InnoDB can help with locking because its locks are on the row level, not on the table level.

So, if a busy site has lots of people adding comments, and browsing, the node and comment modules call check_markup(), which in turn calls cache_get() and cache_set().

On a large site, I commented out the filter cache code altogether so as to eliminate the contention for that table.

Similarly, if the page cache is turned on by the site administrator, there could be contention on the table if there are lots of adds on it.

Auto Increment and Locking in InnoDB

Note that InnoDB still locks the entire table when there is and auto increment column. This is to avoid two inserts at the same time giving duplicate values for the auto increment column.

So, watch out for that caveat too. 

InnoDB Pitfalls

However, all is not rosy with InnoDB. Because of its transactional nature, it has bottlenecks of its own. On MyISAM, doing a query that does SELECT COUNT(*) FROM {some_table}, is very fast, since MyISAM keeps the information in the index.

On InnoDB, this info is not stored in an index, and even the index and the data are kept in the same file. So, doing the same query on a table can incur a significant performance penalty.

To check what overhead this has, I wrote a simple test benchmark code. I duplicated a client node table that has 20,243 rows from MyISAM to InnoDB.

On a quiescent AMD 64 machine with MySQL server 5.0.24, doing a SELECT COUNT(*) FROM node takes 0.835 milliseconds on MyISAM, while on InnoDB it takes 12.292 milliseconds!

Running the query a second time causes comparable numbers, with MyISAM taking 0.331 ms, and InnoDB taking 0.488 ms. However, this is due to the query cache having the results in there.

In practice, this affects Drupal in some blocks, like the "popular content" block for example. A client had to turn that block off to save some 1200 milliseconds per page load after they converted to InnoDB.

Resources

Contents: 

Comments

InnoDB with Drupal

I work on a site which uses InnoDB for the row-locking and we had to change a lot of the core functions to get the proper performance. I lay this squarely at the feet of Drupal, whose "database abstraction" isn't really all that abstract -- it basically assumes MySQL+MyISAM. There's also no abstraction for creating/modifying tables, so that the burden of creating tables for multiple databases is placed on the shoulders of the module developers. I'd say 70% of third-party modules install correctly only for MySQL and often explicitly set the table type to MyISAM.

With InnoDB MySQL provides a way of calculating the found rows in a given query: the SQL_CALC_FOUND_ROWS keyword. For example,
SELECT SQL_CALC_FOUND_ROWS u.name, u.uid FROM users

To get the row count one then executes SELECT FOUND_ROWS() to get the previous row count. We replaced the logic in pager_query to do this rather than utilizing the "count SQL" parameter.

Abstractions has its limits

Abstraction of table creation and modification has been discussed, and it seems that the general feeling is leaning towards it.

One has to note that abstraction has its limits, specially when it comes to highly optimized environments.

To squeeze out performance, it is necessary to have custom code or custom SQL, which may be overkill for small to medium sites. The very example you cite would not work on PostgreSQL, and other code is necessary.

Rather than write on blogs, why don't you share your modifications in the form of patches attached to issues on drupal.org so everyone can see/use, and they may be even accepted into core?
--
2bits -- Drupal consulting

what about creating a MySQL+InnoDB abstaction layer...

From what I've read, drupal allows for addition of module-like abstraction layers. Why not create such an abstraction layer, instead of modifying core/current abstraction layer? Sure, you may then need to modify the .install file of the modules you are using, to ensure they are setup properly for the new InnoDB abstraction layer, but that would be quite trivial considering they are likely already set up for MySQL.

InnoDB count(*) limitation

> On InnoDB, this info is not stored in an index, and even the index and the data are kept in the same file. So, doing the same
> query on a table can incur a significant performance penalty.

I don't think having the index and data in separate files has much to do with the limitation. InnoDB keeps multiple versions of each record at any one time in order to boost concurrency[1]. Since in any given transaction the "number of rows" in {some_table} is going to be different, it's harder to maintain this sort of information.

Some applications can get around the SELECT count(*) FROM table issue by replacing it with a SELECT MAX(primary_key) FROM table. This is only going to be correct if your table has a primary key that increments one value at a time, and there are no deleted records.

[1] http://en.wikipedia.org/wiki/Multi-Version_Concurrency_Control

We converted our site to

We converted our site to innodb because of the locking issues, but I think we are having problems with the accesslog from the statistics module. Sadly we need to have it enabled to generate some content (tipical most viewed content and such).

I found some references to putting the table on memcache and then committing it to database every X minutes. What do you think about that? wouldn't that be a problem when the accumulated queries hit the db server?

We need to keep the statistics on, but we definitely need new ideas on how to fight them ;)