Another botnet spamming Drupal web sites, causing performance issues

We previously wrote in detail about how botnets hammering a web site can cause outages.

Here is another case that emerged in the past month or so.

Again, it is a distributed attempt from many IP addresses all over the world, most probably from PCs infected with malware.

Their main goal seems to be to add content to a Drupal web site, and trying to register a new user when that attempt is denied because of site permissions.

The pattern is like the following excerpt from the web server's access log.

Note the POST, as well as the node/add in the referer. Also note the hard coded 80 port number:

173.0.59.46 - - [10/Mar/2014:00:00:04 -0400] "POST /user/register HTTP/1.1" 200 12759 "http://example.com/user/register" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36"
173.0.59.46 - - [10/Mar/2014:00:00:06 -0400] "POST /user/register HTTP/1.1" 200 12776 "http://example.com/user/register" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36"
107.161.81.55 - - [10/Mar/2014:00:00:10 -0400] "GET /user/register HTTP/1.1" 200 12628 "http://example.com/user/register" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36"
107.161.81.55 - - [10/Mar/2014:00:00:16 -0400] "GET /user/register HTTP/1.1" 200 12642 "http://example.com/user/login?destination=node/add" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36"
202.75.16.18 - - [10/Mar/2014:00:00:17 -0400] "POST /user/register HTTP/1.1" 200 12752 "http://example.com/user/register" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/536.30.1 (KHTML, like Gecko) Version/6.0.5 Safari/536.30.1"
5.255.90.89 - - [10/Mar/2014:00:00:18 -0400] "GET /user/register HTTP/1.1" 200 12627 "http://example.com/user/register" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36"
107.161.81.55 - - [10/Mar/2014:00:00:24 -0400] "GET /user/register HTTP/1.1" 200 12644 "http://example.com/user/login?destination=node/add" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36"
...
128.117.43.92 - - [11/Mar/2014:10:13:30 -0400] "POST /user/register HTTP/1.1" 200 12752 "http://example.com:80/" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0a2) Gecko/20110613 Firefox/6.0a2"
128.117.43.92 - - [11/Mar/2014:10:13:30 -0400] "POST /user/register HTTP/1.1" 200 12752 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0a2) Gecko/20110613 Firefox/6.0a2"
128.117.43.92 - - [11/Mar/2014:10:13:30 -0400] "POST /user/register HTTP/1.1" 200 12752 "http://example.com:80/" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0a2) Gecko/20110613 Firefox/6.0a2"

In the above case, the web site has a CAPTCHA on the login registration page, and that causes a session to be created, and hence full Drupal bootstrap (i.e. no page caching). When this is done by lots of bots simultaneously, it takes its toll on the server's resources.

Botnet Statistics

We gleaned these statistics from analyzing the access log for the web server for a week, prior to putting in the fix below.

Out of 2.3 million requests, 3.9% were to /user/register. 5.6% had http://example.com:80/ in the referer (with the real site instead of example). 2.4% had "destination=node/add" in the referer.

For the same period, but limiting the analysis to accesses to /user/register only, 54.6% have the "/user/login?destination=node/add" in the referer. Over 91% pose as coming from a computer running Mac OS/X Lion 10.7.5 (released October 2012). 45% claim they are on Firefox browser, 33% pretend they are on Chrome, and 19.7% pose as Safari.

Workaround

As usual with botnets, blocking individual IP addresses is futile, since there are so many of them. CloudFlare, which is front ending the site, did not detect nor block these attempts.

In order to solve this problem, we just put in a fix to abort the Drupal bootstrap when this bot is detected. We just add this in settings.php. Don't forget to replace example.com with the domain/subdomain you see in your own access log.

if ($_SERVER['HTTP_REFERER'] == 'http://example.com/user/login?destination=node/add') {
  if ($_SERVER['REQUEST_URI'] == '/user/register') {
    header("HTTP/1.0 418 I'm a teapot");
    exit();
  }
}

// This is for the POST variant, with either port 80 in 
// the referer, or an empty referer
if ($_SERVER['REQUEST_METHOD'] == 'POST') {
  if ($_SERVER['REQUEST_URI'] == '/user/register') {
    switch($_SERVER['HTTP_REFERER']) {
      case 'http://example.com:80/':
      case '':
        header("HTTP/1.0 418 I'm a teapot");
        exit();
    }
  }
}

Contents: 

Tags: 

Comments

Make the bots ask for permission to POST.

For me the main determining factor for these bots is that they do not behave like normal human visitors would. I have noticed that they almost *never* request additional resources, like css, png, or js files.

I block them in my .htaccess. First, I set a cookie with mod-rewrite for anybody requesting resources.

# If they are requesting resources, then they're probably not bots.
  RewriteCond %{REQUEST_FILENAME} (mytheme\.css|\.png)$ [NC]
  RewriteRule .* - [L,co=dude:abides:%{HTTP:Host}:86400]

Second, I check incoming POST's to see if they have that cookie set, if not, I can assume that they are most likely bots. This skips the index.php because the POST url's get redirected there.

  # Check if this is a post method,
  # If so, the human cookie must be set.
  # If the dudes dont abide, they get a 403 for their POST.
  RewriteCond %{REQUEST_METHOD} =POST
  RewriteCond %{REQUEST_URI} !=/index.php  [NC]
  RewriteCond %{HTTP_COOKIE} !^.*dude.*$ [NC]
  RewriteRule .* - [F]

This shuts down almost all of our bot traffic, leaving only those human manned clickfarms that are able to get through.

Now, if you really want to get tricky, we can leverage elements of the browser that automated scripts aren't going to bother implementing. You can embed an href to a resource inside of an SVG file. So you can put a link to a cookie generating file inside that svg. A regular user on a typical browser is going to download the svg, as well as the embedded resources in the file, get the cookie, and have permission to POST.

One more thing to add: we're also using cloudflare, so whatever resource that we want the end user to receive that sets the cookie needs to be set in cloudflare to never cache. Thats why I have that .png file in there - its super smalll, so we can afford for it to not get cached.

SEO impact

I rather like what you've done in your .htaccess, but would it also take out search engine crawlers? I don't imagine they save cookies?

Search engine crawlers also

Search engine crawlers also tend not to POST to your site either.. at least I haven't gotten any comments on my site from googlebot.

DOH!

DOH!

Varnish?

Apart from CloudFlare, what about Varnish caching static files? Wouldn't that deny legit users access to the site?

Im selecting one particular

Im selecting one particular file to *not* cache. typically some small png/jpg. In that instance I would configure varnish to pass through that particular item, and let apache handle it. Basically whatever caching system you have set up means you need to have to make an exception for that one file.

SWEET!

Sweet! Thanks mate.

Based on your idea (regex cleanup + logic tweaks):

# If they are requesting resources, then they're probably not bots.
# set cookie
RewriteCond %{HTTP_COOKIE} !realbrowser
RewriteCond %{THE_REQUEST} system\.base\.css
RewriteRule .* - [L,co=realbrowser:getscookies:%{HTTP:Host}:86400]

# Check if this goes to register user or add content,
# If so, the human cookie must be set.
# If the dudes dont abide, they get a 403.
RewriteCond %{HTTP_COOKIE} !realbrowser
RewriteCond %{THE_REQUEST} (user\/register|node\/add)
RewriteRule .* - [F]

# drop the bot's first hit (which checks for Drupal)
RewriteCond %{HTTP_COOKIE} !realbrowser
RewriteCond %{REQUEST_URI} (node\/1|node)$
RewriteCond %{HTTP_REFERER} (site\.domain\.com\/)$
RewriteRule .* - [F]

(replace "site\.domain\.com\/" with your host)

The new generation of bots is slightly more evolved:
1. checks for Drupal first by getting "node/1" with the home page as the referer (without, of course, visiting the home page and getting the cookie)
2. attempts to GET "user/register" and "node/add" before POSTing (which still eats up resources, even if Drupal ultimately 403s the bastards).

The above is adjusted to cover for these two point as well.

Stupid question

Great trick, but I am struggling to implement it.
Does the line: RewriteCond %{THE_REQUEST} system\.base\.css need to be adapted as well to point to an existing file?

Whatever I do, my access to node/add or node/register is rejected, even once logged in.

I saw the same thing, so

I saw the same thing, so changed that line to just require my theme's logo.png file, which did the job.

Clever!

Newbie try

I am also having a difficulties to implements above rules. Hence I adapted it to bellow

  # If they are requesting resources, then they're probably not bots.
  # set cookie
  RewriteCond %{HTTP_COOKIE} !^.*dude.*$ [NC]
  RewriteCond %{REQUEST_FILENAME} (mytheme\.css|\.png)$ [NC]
  RewriteRule .* - [L,co=dude:abides:%{HTTP:Host}:86400]

  # Check if this goes to register user or add content,
  # If so, the human cookie must be set.
  # If the dudes dont abide, they get a 403.
  RewriteCond %{HTTP_COOKIE} !^.*dude.*$ [NC]
  RewriteCond %{THE_REQUEST} (user\/register|node\/add)
  RewriteRule .* - [F]

  # drop the bot's first hit (which checks for Drupal)
  RewriteCond %{HTTP_COOKIE} !^.*dude.*$ [NC]
  RewriteCond %{REQUEST_URI} (node\/1|node)$
  RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://\1/.*
  RewriteRule .* - [F]

Tested used CURL on terminal, and seems to be working as expected. Nevertheless I am still not sure that this is the correct solution for all. I hope you find it useful.

Cheers..

We can also think of using

We can also think of using the url structure at regular intervals by using https://drupal.org/project/rename_admin_paths
So bots will not be able find those paths.

Would not help ...

That would not help, since they are also attacking regular web pages and posting stuff to registration to mailing list, ...etc. They seem to scan the site and post something specific to it.

Just another botnet attack on a Drupal site

Hi;

I've been fighting a similar attack for the last ten days or so. It looks like a mix of the cases described here:
- thousands of IPs from any continent
- just one request, always the same: "POST / HTTP/1.1" 200 15303
- no referrer
- ancient user agent.

Firewalling the site had no positive effect whatsoever. Support suggested blocking POST but that would cripple Drupal.

I think I'll try the solution you suggested for a previous case.

Thanks

Max

captcha will help identify bots, too

We experienced the same exact problem, although I took a more explicit approach. I like Ryan Aslett's approach with setting cookies and blocking POST operations based on that. We have been indexing our watchdog and access logs in Apache SOLR so we can quickly identify problem areas on the site and traffic patterns. Regardless, if you use Mollom and have your forms protected (I hope you do) 100% of the time on our side they are failing the captcha. Look for the corresponding event in the watchdog (it's something like Mollom Failed Captcha). If you aggregate on that you can get a list of client IPs to block in .htaccess (or can block at the subnet level like we did). But, this is a maintenance item so I will be trying Ryan's recipe.

Thanks for sharing!

Resource hogging, not spam

The problem is not that they are spamming. They are indeed trying to, but failing. The problem is that they are using resources (CPU, memory, database, ...etc.). So Mollom, or any other CAPTCHA, which they do indeed have, does not help on the user login and registration pages.

We can't block IPs, since it is a cat and mouse whack-a-mole game. There are tens of thousands of unique IP addresses. A bot net composed of infected home PCs on broad band connections around the world.

Right, they are not trying to

Right, they are not trying to spam but why not leverage the fact that mollom (or some other CAPTCHA service) knows the are repeatedly (in our case) failing the captcha? That's valuable information that can be integrated into an anti-ddos remedy. I was hoping to do some work in that area very soon via patches, but the general idea is taking that information and incorporating into a throttling mechanism (ideally in the fast path of the drupal bootstrap).

After all, CAPTCHA does mean "Completely Automated Public Turing test to tell Computers and Humans Apart". :)

Great post, Khalid! Picked up a few great tips here, especially using GoAccess.

Blocking mobile browsers

I implemented this and users on mobile browsers can't register. they get 403 error