Gold Apple Software Web Development by Geoff Appleby

Reducing Drupal's On-Disk Temporary Table Usage

MySQL frequently creates temporary tables to store intermediary query data, such as when sorting a result set.  These tables are stored in memory until they grow past the tmp_table_size or max_heap_table_size configuration variables, unless they contain a BLOB or TEXT column (among some other factors), in which case they will be placed directly on disk.  This can have a significant performance impact due to the extra IO operations, which are particularly important to avoid in a virtualized environment if possible.

This is a challenge with Drupal's default schemas, as any text column expected to be greater than 255 characters in length will be placed in a TEXT column rather than a VARCHAR column.  CCK also stores text fields as TEXT columns if their maximum length is configured to a value greater than 255, and some custom CCK fields such as FileField and Link utilize TEXT columns to store serialized field information.

If you're running a version of MySQL greater than 5.0.3, however, CHAR and VARCHAR fields are no longer limited to 255 characters, and instead you can use them to store up to 64KiB. Do note that the maximum row size is still only 64KiB as well though, so you may only store one full-sized VARCHAR column in a table and adding any other columns means you have to subtract from the space available for your VARCHAR column.  Also, if you use a multi-byte character set it must be possible to allocate the full byte length for each character; for Drupal's default setting of UTF-8 this requires three bytes for every character, even though most latin characters will only use one byte, and so your character limit will be one third the number of bytes available (resulting in a maximum of 21,844 characters).  In many cases no more than a few thousand characters is likely to be stored, so column and row size limits would not be an issue.

Some candidates for this change are:

  • CCK Text fields with a maximum length greater than 255 characters
  • Data column for filefield/imagefield instances
  • Attributes column for Link intstances
  • Description field for taxonomy terms

There are many other TEXT columns in the database[1], but the biggest impact will be on columns that appear frequently, and in queries that will have large result sets. The first three listed will be included in any views query that includes the field they belong to, and I found some queries on taxomony terms included the description column even when it wasn't needed.

For CCK text columns you can set the column size to match the maximum size of the field, but for other columns you should make sure to leave extra space for future content; having a description unexplicably cut short would be confusing for a user, and cutting off a string of serialized PHP data could have significant consequences.  To find the lengths of values already in the column you can run the query SELECT CHAR_LENGTH(ColumnName) as cl, COUNT(*) FROM TableName GROUP BY cl DESC and then use the values as a guideline for how much slack space would be needed.

The following graphs show the result of this change:

Graph - Yearly temporary table usage Graph - Monthly temporary table usage

The yearly graph initially shows how the on-disk temporary table usage grew in proportion as the server usage increased; approximately 35%.  Once the changes were made to the schema of the largest website on the server, on-disk tables dropped and remain relatively constant despite fluctations in total temporary tables. The server currently reports 10% of all temporary tables go to disk, but this could likely be improved by another small amount by converting the schema of the other smaller websites on the server.

The periodic, large spikes in the monthly graph are due to the nightly database backup, showing that large result sets are causing significantly more on-disk tables than regular usage and are likely the biggest factor in the average shown on the yearly graph.  Unfortunately I don't have a historical graph to show a similar time frame before the changes were made.


[1] To find all the TEXT columns in the database, you can run SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE, CHARACTER_SET_NAME FROM `information_schema`.`COLUMNS` WHERE TABLE_SCHEMA = 'DatabaseName' AND DATA_TYPE in ('text', 'smalltext', 'mediumtext', 'longtext') ORDER BY TABLE_NAME

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <br> <p>
  • Lines and paragraphs break automatically.
  • Each email address will be obfuscated in a human readable fashion or (if JavaScript is enabled) replaced with a spamproof clickable link.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.