Contents

XML sitemap module performance issues and how to avoid them

The XML Sitemap (formerly gsitemap) module provides good SEO features that allows Google and other search engines to know what content a site has.

The module does however have had some scalability issues if you are using an old version, or if you have the options set incorrectly.

Disable sitemap upon submission

On relatively small sites, with a small number of nodes and comments, scalability issues are not observed. However, on large sites, there can be a hidden performance issue when the "Submit site map when updated" option is enabled.

What this option says is that in hook_exit(), if there is new content (nodes, comments, ...etc.), the XML sitemap module will try to submit the site map to several search engines. This is done by generating the map, then opening an HTTP connection to each site and sending the map to each one of them over that connection. This is a very time consuming process.

On a site with 15,000 nodes and 119,000 comments, this caused every comment submission to take 30 to 60 seconds.

Disabling this option made comment submission go back to being instantaneous. Make sure that you have cron enabled.

Here is a screenshot of how the options should be.

Make sure you are using a current version

For versions prior to May 3, 2007 (5.x-1.10) suffer from severe performance issues if your site has a few thousand nodes. The reason is that there is a superfluous LEFT JOIN with a CONCAT that is not needed.

I submitted a patch for this bug in issue #124325, and thankfully, it got fixed. Make sure you use that version. Or if you are still on 4.7, you can backport the simple patch.

On a site with 15,000

On a site with 15,000 nodes

They are called pages not nodes 15,000 pages (speak english not drupal)

There is a difference

No, in this context, node makes perfect sense.

A node is a specific Drupal "object" with a specific presentation in the database.

A page can be from a node, or from other aspects of a Drupal site, such as node lists, categories, users and many other things.

So, the use of nodes instead of page is intentional.
--
2bits -- Drupal consulting