Google Crawler hitting your site too aggressively?

If your Drupal site suffers occasional slow downs or outages, check if crawlers are hitting your site too hard.

We've seen several clients complain, and upon investigation we found that the culprit is Google's own crawler.

The tell tale sign is that you will see lots of queries executing with the LIMIT clause having high numbers. Depending on your site's specifics, these queries would be slow queries too.

This means that there are crawlers that accessing very old content (hundreds of pages back).

Here is an example from a recent client:

How Google and Bing crawlers were confused by quicktabs

Quick Tabs is a widely used Drupal module. Site builders like it because it improves usability in some cases by reducing clutter.

Incidentally, the way this module works has cause us to run across performance issues caused by certain uses. See previous article about Quick Tabs can sure use more caching and a case study involving Quick Tabs.

Identifying aggressive crawlers using Go Access

Aggressive crawlers that hit your web site a lot can cause performance problems.

There are many ways to identify aggressive crawlers, including writing custom scripts that analyze your web server logs.

One tool that we found to be useful in analyzing which crawlers hit the site the most today or yesterday is Go Access.

Getting Go Access

Go Access is available for Ubuntu Natty Narwahl (11.04) only, but not earlier LTS releases.

Is your Drupal or Backdrop CMS site slow?
Is it suffering from server resources shortages?
Is it experiencing outages?
Contact us for Drupal or Backdrop CMS Performance Optimization and Tuning Consulting