XML Sitemap 6.x-2.x: How Drupal modules can overload a site during cron, with solutions

One of the most useful features in Drupal is its cron hook. It allows modules to execute stuff at regular intervals when cron.php is run.

This is used for many things, such as indexing new content that was added to the site, cleaning up old watchdog entries and many other things.

In many cases, though cron hooks implemented by modules can cause added burden to a site. This is specially true if you run cron too frequently or too infrequently.

An example for a client came up recently. They are using xmlsitemap, like many sites do.

This module has had scalability issues in the past, as we wrote on an older version in XML Sitemap module performance issues and how to avoid them.

We were thrilled to see that finally the module has a new maintainer who is focusing on performance and scalability. See Dave Reid's 6.x-2.x-dev series.

The site is question is not particularly large or high traffic, getting only about 70,000 page views a day, having about 7,500 nodes, and 24,500 users. It runs cron three times an hour though (0, 20 and 40 minutes after the hour).

When we first enabled the module, we saw a surge in database load when cron is running, as evidenced in this graph:

The solution was really simple: in xmlsitemap's setting, change the "Minimum sitemap lifetime" to a higher value. In our case, we set it to 3 hours, which should not affect the module's functionality or its submitting of updates to search engines.

Contents: 

Comments

Thanks for the mention!

Glad to know the new version is working for you as expected! The default settings for the minimum lifetime probably should be one day or 12 hours, but I was getting too many support requests from people wondering why their sitemap wasn't updating. :)

I'll probably write up a section in the README.txt on some default recommendations/settings for high-capacity sites.

Graph?

Where is the graph? It appears to be missing.

Shows up now

Shows up now. Darn input formats!