Microsoft-WebDAV-MiniRedir + Drupal singlesignon = An aggressive crawler

When tuning sites for clients, we often see the usual symptoms and causes of why a site is slow, and occasionally we find an unusual reason.

For one client, after we diagnosed the main reason for a site being slow (bottleneck between database server and web server), they continued to experience slowdowns.

This was getting annoying as it happened several times a day, and their customers were complaining about the slowdown on a daily basis.

We started to investigate again, and found that Apache was being hit as much as a 8 hits per second by the same IP address.

So, we thought "crawlers!" and started blocking IPs.

Attempt 1: Blocking IP addresses

Although we can block IP addresses at the Drupal level, we think this is too inefficient, because by the time the request reaches that stage in Drupal, it has wasted enough resources. So blocking it at the network layer in the kernel is far more efficient, although it has the drawback of having to be done from the command line, via the command:

iptables -I INPUT  -s x.x.x.x -j DROP

That worked and the load on the site was instantly reduced. We were doing less than 1 request per second sustained.

The problem is that we had to continually monitor Apache's access log and see which IP addresses are aggressive and then block them manually.

Another bigger problem is that we found that we are blocking IP addresses that belong to actual legitimate users of the site! Not good.

Attempt 2: Apache Ignoring WebDAV OPTIONS

We then looked into the requests coming more closely and found that all of them are WebDAV HTTP requests, with the OPTIONS HTTP method being used.
So, we thought that blocking those at the Apache level would do the trick.
We used this snippet in the virtual host:

RewriteCond %{REQUEST_METHOD} OPTIONS
RewriteRule ^.*$ http://example.com/

What we were doing here is redirecting all OPTIONS requests to an outside server.

That still did not work, as the clients just kept hammering the site, and there were more WebDAV methods used, such as PROPFIND.

Attempt 3: Patching Drupal

We then found that all these requests were coming from a specific user agent called Microsoft-WebDAV-MiniRedir, for example:

99.99.99.99 - - [17/Mar/2009:03:06:15 +0000] "OPTIONS /singlesignon/initial_check?slave_session=xxxxxxxx&singlesignon_dest=http%3A%2F%2Fwww.example.com%2F HTTP/1.1" 302 - "-" "Microsoft-WebDAV-MiniRedir/5.1.2600"

We also found that this particular piece of software does not understand redirects.

The root cause was a combination of the Drupal Shared Sign On / Single Sign On module, and that MiniRedir user agent.

Google searching on MiniRedir did not help other than saying that it hits sites hard, for example here and here.

We found that singlesignon works by setting session variables and then redirecting more than once. And because MiniRedir is dumb and does not understand a 302 redirect, it just keeps hitting the site again, which causes a redirect, so MiniRedir hits it again, ad infinitum.

So, we opted for a Drupal patch that goes in settings.php and basically ends the request with a 404 if the method is other than GET and POST, like so:

switch($_SERVER['REQUEST_METHOD']) {
case 'GET':
case 'POST':
break;
default:
// All other WebDAV methods, such as PROPFIND, OPTIONS, HEAD
header('HTTP/1.1 404 Not Found');
exit();
}

Luckily, that MiniRedir takes the hint when a 404 is sent and stops hammering the site.

This works well, and so far has no side effects on that particular site. The load went down and the site is usable again.

If your site uses WebDAV command though, you have to modify the above snippet to check $_SERVER['HTTP_USER_AGENT'] as well.

Contents: 

Comments

Yes, that is better

Yes, that is better, because it gives out a better reason on why the request returns a non-success status code.

No, I did not try it. From what MiniRedir was doing (does not understand a redirect), I have doubts that 501 would work.

Mod Security

Did you consider ModSecurity? http://www.modsecurity.org/

It's very easy to block all kind of evil and less evil requests with ModSecurity and it doesn't require this little Drupal hack. I guess it should be faster too...

Not in this case

Did not consider it in this case, but yes, it should faster. On the other hand it can consume some extra memory in Apache for this module to be loaded.

If you have a rule or snippet for this use case, it would be helpful if you post it here, so everyone can see it.

Would this be it?

# Deny all methods
SecRule REQUEST_METHOD "^.*$" deny
# Allow only GET and PUT
SecRule REQUEST_METHOD "^(get|put)$" allow

Thanks in advance.

alternative to single signon

There's an alternative to the single signon module called Multisite Login:
http://drupal.org/project/multisite_login
Which does its work without extra requests and redirects on every page.

I intended to put

I intended to put you one very small word to help say thank you once again with your spectacular guidelines you've documented on this site. It's simply wonderfully open-handed with you to convey unreservedly what exactly a lot of folks would have marketed as an ebook to generate some bucks for themselves, specifically now that you might well have done it if you considered necessary. Those solutions likewise worked to provide a easy way to understand that other people online have a similar zeal much like my very own to learn lots more in terms of this issue. I'm sure there are many more fun sessions ahead for those who go through your site.