XML sitemap module performance issues and how to avoid them

The XML Sitemap (formerly gsitemap) module provides good SEO features that allows Google and other search engines to know what content a site has.

The module does however have had some scalability issues if you are using an old version, or if you have the options set incorrectly.

Disable sitemap upon submission

On relatively small sites, with a small number of nodes and comments, scalability issues are not observed. However, on large sites, there can be a hidden performance issue when the "Submit site map when updated" option is enabled.

What this option says is that in hook_exit(), if there is new content (nodes, comments, ...etc.), the XML sitemap module will try to submit the site map to several search engines. This is done by generating the map, then opening an HTTP connection to each site and sending the map to each one of them over that connection. This is a very time consuming process.

On a site with 15,000 nodes and 119,000 comments, this caused every comment submission to take 30 to 60 seconds.

Disabling this option made comment submission go back to being instantaneous. Make sure that you have cron enabled.

Here is a screenshot of how the options should be.

Make sure you are using a current version

For versions prior to May 3, 2007 (5.x-1.10) suffer from severe performance issues if your site has a few thousand nodes. The reason is that there is a superfluous LEFT JOIN with a CONCAT that is not needed.

I submitted a patch for this bug in issue #124325, and thankfully, it got fixed. Make sure you use that version. Or if you are still on 4.7, you can backport the simple patch.

Contents: 

Comments

On a site with 15,000

On a site with 15,000 nodes

They are called pages not nodes 15,000 pages (speak english not drupal)

There is a difference

No, in this context, node makes perfect sense.

A node is a specific Drupal "object" with a specific presentation in the database.

A page can be from a node, or from other aspects of a Drupal site, such as node lists, categories, users and many other things.

So, the use of nodes instead of page is intentional.
--
2bits -- Drupal consulting

Nodes are different from pages

For Drupal, nodes are different from pages; a page is a generic term that can mean also settings page.

A Drupal site can have 15000 nodes, but the pages it has are more than 15000, because the pages of a Drupal site include all the pages outputed by a module that are not correlated with nodes content.

In the specific context of the article, the problem is the number of nodes the site has, not the number of pages shown by all modules installed in the site.