XML Sitemap 6.x-2.x: How Drupal modules can overload a site during cron, with solutions
One of the most useful features in Drupal is its cron hook. It allows modules to execute stuff at regular intervals when cron.php is run.
This is used for many things, such as indexing new content that was added to the site, cleaning up old watchdog entries and many other things.
In many cases, though cron hooks implemented by modules can cause added burden to a site. This is specially true if you run cron too frequently or too infrequently.
An example for a client came up recently. They are using xmlsitemap, like many sites do.
This module has had scalability issues in the past, as we wrote on an older version in XML Sitemap module performance issues and how to avoid them.
We were thrilled to see that finally the module has a new maintainer who is focusing on performance and scalability. See Dave Reid's 6.x-2.x-dev series.
The site is question is not particularly large or high traffic, getting only about 70,000 page views a day, having about 7,500 nodes, and 24,500 users. It runs cron three times an hour though (0, 20 and 40 minutes after the hour).
When we first enabled the module, we saw a surge in database load when cron is running, as evidenced in this graph:
The solution was really simple: in xmlsitemap's setting, change the "Minimum sitemap lifetime" to a higher value. In our case, we set it to 3 hours, which should not affect the module's functionality or its submitting of updates to search engines.
Is your Drupal site slow?