Reducing server resource utilization for busy sites by implementing fast 404s in Drupal

One of the things that a default Drupal installation does is that it handles 404s for static files within Drupal itself. In other words, a 404 for a .css or .jpg file causes a full Drupal bootstrap. This is not necessary, and wastes resources on a site that gets lots of 404s: lots of code is executed, many database queries are performed, only to return a 404 for a static file.

Much of these 404s are invisible. They are normally caused by elements of a page that does exist, such as a .css file, a small .png or similar items. In effect, no human ever sees these 404s, only browsers internally.

For Drupal 5.x and 6.x, the following code snippet added to the bottom of your settings.php will reduce resource usage considerably on a busy site.

We measured this on one site and found that the time taken to process and send a 404 from 24 milliseconds, down to 6 milliseconds and even 2 milliseconds in some cases.

On a low traffic site, this will not make much of a difference. However, on a large site with millions of page views per day, this will add up into real savings in terms of CPU, memory and disk I/O.

Here is the snippet that we use in settings.php.

// List of extensions for static files
$exts = 'txt|png|gif|jpe?g|shtml?|css|js|ico|swf|flv|cgi|bat|pl|dll|exe|asp|xml';

// It is not an imagecache path, which we allow to go through Drupal
if (!strpos($_SERVER['QUERY_STRING'], 'imagecache')) {
  // It is not our main feed page
  if ($_SERVER['QUERY_STRING'] != 'rss.xml') {
    // Is it a static file? 
    if (preg_match('/\.(' . $exts . ')$/', $_SERVER['QUERY_STRING']))
      // Just send a 404 right now ...
      {
      header('HTTP/1.0 404 Not Found');
      print '<html>';
      print '<head><title>404 Not Found</title></head>';
      print '<body><h1>Not Found</h1>';
      print '<p>The requested URL was not found on this server.</p>';
      print '</body></html>';
      exit();
    }
  }
}

The ultimate solution though is to get this in Drupal core. Issue #76824, which I submitted more than 4 years ago, aims at doing just that.

Let us try to get this in for Drupal 7.

Contents: 

Tags: 

Comments

Could work too

We tried that early on in the issue linked to above.

Drupal will still need to handle imagecache 404s so that it would generate the presets when there is a 404. There is also private file downloads and .htaccess is not enough in that case. And if you do a.htaccess based solution, it will be overwritten when you upgrade to newer versions of Drupal.

By putting this in settings.php, it is safe from being overwritten, and early enough in the bootstrap to be very low overhead.

We use it on all high traffic client sites.

Just thought .htaccess bases

Just thought .htaccess bases solution (if possible) would be even better for high traffic sites since no drupal bootstrap involved. Like Boost does. For imagecache , you can just look for imagecache path in tjhe request.

Now I think more of it, this would be a great feature request for boost module. It already has rules for putting into .htaccess and there just needs to be additional rule for managing static files, right?

Excellent idea. My server

Excellent idea. My server doesn't really need this as load is always low but I'll implement it anyways. Not doing so would feel like leaving the tv on while going away for the weekend. Why use all that energy if no-one is watching the produce.

Looks like it's in..

Congratulations, the wait was worth it: http://drupal.org/node/76824#comment-3073456

Not yet

No, it is not in yet. It is RTBC only.

Let us hope Dries or webchick commit it soon.

Carefully select some of the extensions

robots.txt, sitemap.xml (paths created by the robotstxt and xmlsitemap modules) and any extensions used in specific path aliases like index.asp or mypage.shtml should be left out of the extension list because they return a 404 error. On our pressflow sites, sitemap.xml and robots.txt were returning the 404 error page created above. I set $exts = 'png|gif|jpe?g|css|js|ico|swf|flv';

Yes, tailor it to your site ...

Yes, tailor it to your site for sure. Each site has a different set of files that can be excluded/included.

For robots.txt, this code should not affect it in any way, since it gets triggered only if the static file does not exist. If it exists, it will be served directly by the web server itself.

You can easily adjust the code to allow sitemap.xml and rss.xml to go through Drupal, but return 404s for other .xml files.

good resources

my website was developed in drupal 6, i trying added this code but some error will showing.. i couldn't analyze the solution. how to solve it?

When running cron via drush,

When running cron via drush, I was getting several PHP notices from settings.php
Undefined index:  QUERY_STRING settings.php:295 [0.08 sec, 5.96MB [notice]

You can avoid these by making the following tweek to the initial code:

Original: if (!strpos($_SERVER['QUERY_STRING'], 'imagecache')) {
Fixed: if (isset($_SERVER['QUERY_STRING']) && !strpos($_SERVER['QUERY_STRING'], 'imagecache')) {