We previously wrote in detail about how botnets hammering a web site can cause outages.

Here is another case that emerged in the past month or so.

Again, it is a distributed attempt from many IP addresses all over the world, most probably from PCs infected with malware.

Their main goal seems to be to add content to a Drupal web site, and trying to register a new user when that attempt is denied because of site permissions.

The pattern is like the following excerpt from the web server's access log.

Note the POST, as well as the node/add in the referer. Also note the hard coded 80 port number:

173.0.59.46 - - [10/Mar/2014:00:00:04 -0400] "POST /user/register HTTP/1.1" 200 12759 "http://example.com/user/register" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36"
173.0.59.46 - - [10/Mar/2014:00:00:06 -0400] "POST /user/register HTTP/1.1" 200 12776 "http://example.com/user/register" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36"
107.161.81.55 - - [10/Mar/2014:00:00:10 -0400] "GET /user/register HTTP/1.1" 200 12628 "http://example.com/user/register" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36"
107.161.81.55 - - [10/Mar/2014:00:00:16 -0400] "GET /user/register HTTP/1.1" 200 12642 "http://example.com/user/login?destination=node/add" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36"
202.75.16.18 - - [10/Mar/2014:00:00:17 -0400] "POST /user/register HTTP/1.1" 200 12752 "http://example.com/user/register" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/536.30.1 (KHTML, like Gecko) Version/6.0.5 Safari/536.30.1"
5.255.90.89 - - [10/Mar/2014:00:00:18 -0400] "GET /user/register HTTP/1.1" 200 12627 "http://example.com/user/register" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36"
107.161.81.55 - - [10/Mar/2014:00:00:24 -0400] "GET /user/register HTTP/1.1" 200 12644 "http://example.com/user/login?destination=node/add" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36"
...
128.117.43.92 - - [11/Mar/2014:10:13:30 -0400] "POST /user/register HTTP/1.1" 200 12752 "http://example.com:80/" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0a2) Gecko/20110613 Firefox/6.0a2"
128.117.43.92 - - [11/Mar/2014:10:13:30 -0400] "POST /user/register HTTP/1.1" 200 12752 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0a2) Gecko/20110613 Firefox/6.0a2"
128.117.43.92 - - [11/Mar/2014:10:13:30 -0400] "POST /user/register HTTP/1.1" 200 12752 "http://example.com:80/" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0a2) Gecko/20110613 Firefox/6.0a2"

In the above case, the web site has a CAPTCHA on the login registration page, and that causes a session to be created, and hence full Drupal bootstrap (i.e. no page caching). When this is done by lots of bots simultaneously, it takes its toll on the server's resources.

Botnet Statistics

We gleaned these statistics from analyzing the access log for the web server for a week, prior to putting in the fix below.

Out of 2.3 million requests, 3.9% were to /user/register. 5.6% had http://example.com:80/ in the referer (with the real site instead of example). 2.4% had "destination=node/add" in the referer.

For the same period, but limiting the analysis to accesses to /user/register only, 54.6% have the "/user/login?destination=node/add" in the referer. Over 91% pose as coming from a computer running Mac OS/X Lion 10.7.5 (released October 2012). 45% claim they are on Firefox browser, 33% pretend they are on Chrome, and 19.7% pose as Safari.

Workaround

As usual with botnets, blocking individual IP addresses is futile, since there are so many of them. CloudFlare, which is front ending the site, did not detect nor block these attempts.

In order to solve this problem, we just put in a fix to abort the Drupal bootstrap when this bot is detected. We just add this in settings.php. Don't forget to replace example.com with the domain/subdomain you see in your own access log.

if ($_SERVER['HTTP_REFERER'] == 'http://example.com/user/login?destination=node/add') {
  if ($_SERVER['REQUEST_URI'] == '/user/register') {
    header("HTTP/1.0 418 I'm a teapot");
    exit();
  }
}

// This is for the POST variant, with either port 80 in 
// the referer, or an empty referer
if ($_SERVER['REQUEST_METHOD'] == 'POST') {
  if ($_SERVER['REQUEST_URI'] == '/user/register') {
    switch($_SERVER['HTTP_REFERER']) {
      case 'http://example.com:80/':
      case '':
        header("HTTP/1.0 418 I'm a teapot");
        exit();
    }
  }
}

Comments

For me the main determining factor for these bots is that they do not behave like normal human visitors would. I have noticed that they almost *never* request additional resources, like css, png, or js files.

I block them in my .htaccess. First, I set a cookie with mod-rewrite for anybody requesting resources.

# If they are requesting resources, then they're probably not bots.
  RewriteCond %{REQUEST_FILENAME} (mytheme\.css|\.png)$ [NC]
  RewriteRule .* - [L,co=dude:abides:%{HTTP:Host}:86400]

Second, I check incoming POST's to see if they have that cookie set, if not, I can assume that they are most likely bots. This skips the index.php because the POST url's get redirected there.

  # Check if this is a post method,
  # If so, the human cookie must be set.
  # If the dudes dont abide, they get a 403 for their POST.
  RewriteCond %{REQUEST_METHOD} =POST
  RewriteCond %{REQUEST_URI} !=/index.php  [NC]
  RewriteCond %{HTTP_COOKIE} !^.*dude.*$ [NC]
  RewriteRule .* - [F]

This shuts down almost all of our bot traffic, leaving only those human manned clickfarms that are able to get through.

Now, if you really want to get tricky, we can leverage elements of the browser that automated scripts aren't going to bother implementing. You can embed an href to a resource inside of an SVG file. So you can put a link to a cookie generating file inside that svg. A regular user on a typical browser is going to download the svg, as well as the embedded resources in the file, get the cookie, and have permission to POST.

One more thing to add: we're also using cloudflare, so whatever resource that we want the end user to receive that sets the cookie needs to be set in cloudflare to never cache. Thats why I have that .png file in there - its super smalll, so we can afford for it to not get cached.

Wed, 2014/03/12 - 16:30

I rather like what you've done in your .htaccess, but would it also take out search engine crawlers? I don't imagine they save cookies?

Wed, 2014/03/12 - 20:35

Search engine crawlers also tend not to POST to your site either.. at least I haven't gotten any comments on my site from googlebot.

Thu, 2014/03/13 - 19:53

DOH!

Wed, 2014/03/12 - 18:35

Apart from CloudFlare, what about Varnish caching static files? Wouldn't that deny legit users access to the site?

Wed, 2014/03/12 - 20:34

Im selecting one particular file to *not* cache. typically some small png/jpg. In that instance I would configure varnish to pass through that particular item, and let apache handle it. Basically whatever caching system you have set up means you need to have to make an exception for that one file.

Fri, 2014/06/06 - 04:35

Sweet! Thanks mate.

Based on your idea (regex cleanup + logic tweaks):

# If they are requesting resources, then they're probably not bots.
# set cookie
RewriteCond %{HTTP_COOKIE} !realbrowser
RewriteCond %{THE_REQUEST} system\.base\.css
RewriteRule .* - [L,co=realbrowser:getscookies:%{HTTP:Host}:86400]

# Check if this goes to register user or add content,
# If so, the human cookie must be set.
# If the dudes dont abide, they get a 403.
RewriteCond %{HTTP_COOKIE} !realbrowser
RewriteCond %{THE_REQUEST} (user\/register|node\/add)
RewriteRule .* - [F]

# drop the bot's first hit (which checks for Drupal)
RewriteCond %{HTTP_COOKIE} !realbrowser
RewriteCond %{REQUEST_URI} (node\/1|node)$
RewriteCond %{HTTP_REFERER} (site\.domain\.com\/)$
RewriteRule .* - [F]

(replace "site\.domain\.com\/" with your host)

The new generation of bots is slightly more evolved:
1. checks for Drupal first by getting "node/1" with the home page as the referer (without, of course, visiting the home page and getting the cookie)
2. attempts to GET "user/register" and "node/add" before POSTing (which still eats up resources, even if Drupal ultimately 403s the bastards).

The above is adjusted to cover for these two point as well.

Fri, 2014/06/20 - 08:50

Great trick, but I am struggling to implement it.
Does the line: RewriteCond %{THE_REQUEST} system\.base\.css need to be adapted as well to point to an existing file?

Whatever I do, my access to node/add or node/register is rejected, even once logged in.

Fri, 2014/06/27 - 05:57

I saw the same thing, so changed that line to just require my theme's logo.png file, which did the job.

Clever!

Pages

Is your Drupal or Backdrop CMS site slow?
Is it suffering from server resources shortages?
Is it experiencing outages?
Contact us for Drupal or Backdrop CMS Performance Optimization and Tuning Consulting