Increasing Drupal's speed via the Squid caching reverse proxy

Dynamic web applications like Drupal offer a lot of benefits for a web site. However, when it comes to scalability, there can be challenges with any dynamic web site, including those build with Drupal.

One strategy that helps a web site is setting up a caching reverse proxy, such as Squid, or Varnish. These become very relevant when the site receives a lot of requests from users who are not logged in (anonymous users).

How much of a performance boost does a caching reverse proxy provide? How do I set one up? Do I need Drupal changes to set it up?

Read below for the answers to these questions.

What is a caching reverse proxy

The idea behind a caching reverse proxy is for it to act as a front end to the web server, retrieving requests from it, and serving them to users, while keeping copies in its own cache for future requests. Then users who are accessing the site can
be served directly from the pages cached in the proxy without needing
to go back to the web server, execute PHP, and database queries, ...etc. Since the cached copy is only HTML, it is either served from disk or from memory, which makes them feel much faster to users (20 ms or so) as opposed to dynamically generating content from Drupal (100s of ms).

See diagrams of What is Reverse Proxy Cache, and How Reverse proxy caches work. (Ignore the configuration on that page since it does not apply to Squid 2.6).

Why is a Drupal patch needed?

With Drupal, the HTTP headers send insist that content not be cached, and is always retrieved from the origin web server. These headers are document in RFC 2616.

So, the patch discussed below changes these headers to be caching reverse proxy
friendly, and eases the load off the Drupal server by letting the proxy
serve pages to anonymous users, as well as making the site feel faster for them. 

Setting up Squid as a reverse proxy

Squid can be many things. One of them is a caching reverse proxy. So, the first order of business is to set up Squid as a reverse proxy.

This setup assumes that Squid's version is 2.6 and that it will run on its own server (tuna.example.com) and will be proxying for the Drupal server at head.example.com.

Here is the/etc/squid/squid.conf file, annonated for easier understanding. Note the required Drupal part at the end.

# Basic parameters
visible_hostname localhost
# This line indicates the server we will be proxying for
http_port 80 accel defaultsite=head.example.com
# And the IP Address for it
cache_peer 192.168.0.222 parent 80 0 no-query originserver
acl apache rep_header Server ^Apache
broken_vary_encoding allow apache
# Where the cache files will be, memory and such 
cache_dir ufs /var/spool/squid 10000 16 256
cache_mem 256 MB
maximum_object_size_in_memory 32 KB
# Log locations and format 
logformat common %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %Hs %<st %Ss:%Sh
logformat combined %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %Hs %<st \
  "%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh
access_log /var/log/squid/access.log squid
cache_log /var/log/squid/cache.log
cache_store_log /var/log/squid/store.log
hosts_file /etc/hosts
# Basic ACLs
acl all src 0.0.0.0/0.0.0.0
acl mydomain dstdomain .example.com
acl manager proto cache_object
acl localhost src 127.0.0.1/255.255.255.255
acl to_localhost dst 127.0.0.0/8
acl Safe_ports port 80
acl purge method PURGE
acl CONNECT method CONNECT
http_access allow manager localhost
http_access deny manager
http_access allow purge localhost
http_access deny purge
http_access deny !Safe_ports
http_access allow localhost
http_access allow all
http_access allow mydomain
http_access deny all
http_reply_access allow all
icp_access allow all
cache_effective_group proxy
coredump_dir /var/spool/squid
forwarded_for on
emulate_httpd_log on
redirect_rewrites_host_header off
buffered_logs on
# Drupal specific stuff, assumes the patch in #14730
acl cookie_logged_in_set rep_header Set-Cookie DRUPAL_LOGGED_IN=Y
cache deny cookie_logged_in_set
acl cookie_logged_in_out rep_header Cookie DRUPAL_LOGGED_IN=Y
cache deny cookie_logged_in_out
acl cookie_logged_in     req_header Cookie DRUPAL_LOGGED_IN=Y
cache deny cookie_logged_in 

Desigining the tests

In order to assess the impact of Squid on a running server, we had to design and execute a testing strategy for it.

For these tests we used a CVS checkout of Drupal 7.x. We also developed a patch for the correct HTTP headers (see below).

We used the devel module to setup the site with the following content:

514 Nodes of type: page and article, each one with its own path alias, 502 users, 5111 comments, 11 Taxonomy vocabularies, and 544 Taxonomy terms.

We then selected 51 unique URLs, using their path alias, as well as the home page. These same URLs are exercised in every test below.

We setup up the stress tests to hammer the site for 2 full minutes as fast as possible, from 10 concurrent users. Each test was run twice.Alll the tests were run through the proxy.

Drupal parameters changes

In order for Drupal to run correctly behind a reverse proxy, several parameters need to be set:

In settings.php, make sure that the base url and cookie domain correspond to the external server name, and not the local machine name. So, in this case, we used the external name for the proxy server (tuna.example.com) and not the Drupal server (head.example.com).

$base_url = 'http://tuna.example.com';
$cookie_domain = 'tuna.example.com'; 

Moreover, you have to set the reverse proxy parameter to true, and the IP address of the reverse proxy.

$conf = array(
'reverse_proxy' => TRUE,
'reverse_proxy_addresses' => array('192.168.0.111',),
); 

These parameters are available in Drupal 6.x and 7.x.

Test 1 No caching in Drupal

In this test, we run Drupal as it ships by default. The page cache is disabled.

The results on our test setup are as follows:

 Number of requests  Response time (seconds)  Transaction rate (Requests/second)
 3,233  0.37  26.87
 3,282  0.37  27.2

As you can see the site can only do 26 or so transactions per second maximum.

Test 2 Only block caching enabled

Out of curiosity, we tested the block cache, with only two "light" blocks on the site, one on each sidebar.

Number of requests Response time (seconds) Transaction rate (Requests/second)
3,044 0.39 25.31
3,094 0.39 25.71

There is some slowdowns observed. This is unexplained and warrants more investigation with a site with a lot of blocks on it.

Test 3 Page caching, 5 minutes lifetime, without patch

We then enabled Drupal's page caching in normal mode, with the minimum cache life time set to 5 minutes.

Number of requests Response time (seconds) Transaction rate (Requests/second)
18,820 0.06 156.82
20,860 0.06 174.44

A noticeable improvement was improved, as the site is able to do over 150 requests per second.

Test 4 Page caching, 5 minutes lifetime, with patch

We applied the patch we developed specifically for reverse proxies, which in comment 39 of issue #147310. This patch is for Drupal 7.x. Attached to this article, you will also find a Drupal 6.x version of the patch in case you want to use it.

Number of requests Response time (seconds) Transaction rate (Requests/second)
61,610 0.02 512.05
61,640 0.02 514.91

That was impressive! From 170 requests per second to 512 requests per second!

Conclusion

For a heavily trafficed site that has a large portion of its visitors not logged in, Squid can be a very beneficial tool. The patching needed in Drupal for Squid is relatively minor. We hope to push for this patch to become part of the standard Drupal 7.x release. Testing of this patch behind corporate proxies is encouraged. Please report that in teh issue, as well as in comments below.

Contents: 

Comments

Thanks 2bits for providing

Thanks 2bits for providing this solution. Can you explain a little bit of what is happening when implementing this patch - for those of us who are not familiar with this type of caching?

Regards

combine with CSS filtering?

Maybe this question is way off target, but please help me out - I've been struggling with the general impression that Drupal ain't for dialups, not because of server load, but just because of the huge amount of unused data that gets transferred, compared to the 'minified' data transfer set that would be needed to get the identical output on the browser.

Mainly, the ginormous amount of unused CSS.  A first pass at some calculations showed that on 56K dial-up you could save 2-3 full seconds per page load just by filtering out the unused CSS identifiers.  There are some hand apps to find the unused identifiers, and the ratios of used-to-total are pretty apalling if you look around various drupal sites.  50%, 20%, even 10% or less on large production drupal sites seems pretty common.   

 So, wouldn't it be handy if some type of proxy scheme could first render the page from drupal, then second trim unused css and javascript, then third send that trimmed package over the pipe to the client?

 For reference please see the questions at http://drupal.org/node/338102

 Any thoughts?  Can squid be combined with CSS-pruning and/or javascript-pruning??  This would be a huge win and would put drupal back into the realm of usability for the rest of the world that's not ultra-connected... and if you're running a nonprofit web page where most clients have dial-up, it's a big deal.

 Or is this way off base??

 Thanks in advance

This is off topic, but I will

This is off topic, but I will answer it only this once. We can continue in the issue you mentioned if you want.

Drupal is certainly for dialup (and broadband and mobile phones and PDAs, and everything ...)

1. CSS is loaded only on the first access from any modern browser. This means the first access will take a few seconds, but subsequent ones will not load the CSS, because they are stored in the browser's cache.

2. You can go to admin/settings/performance and compress the CSS, so it is only one file and not many. This will cut the time needed for page loads.

3. You can use the following code in template.php to remove any CSS files you like, and take the few pieces from them that is needed and you roll them into the style.css of your theme.

function _phptemplate_variables($hook, $vars) {
  $css = drupal_add_css();

  // System CSS
  unset($css['all']['module']['modules/system/system.css']);
  unset($css['all']['module']['modules/system/defaults.css']);

  // Module CSS
  $rm = drupal_get_path('module','help').'/help.css';
  unset($css['all']['module'][$rm]);

  $vars['styles'] = drupal_get_css($css);
  return $vars;
}

4. I am not sure if an automated approach would work. It would be a lot of work, but if such logic is developed (the hard part), writing a proxy to strip down the CSS would be the easy part.

css

Thanks for the reply.  Will put more thought into it here.

Pre-packaged derivative of Drupal with built-in Squid support

Four Kitchens maintains Pressflow, a free, open-source derivative of Drupal featuring extensive performance improvements while maintaining compatibility with contributed modules.

We currently maintain Pressflow 5 and 6, which are derivatives of Drupal 5 and 6. Both have built-in support for Squid and other reverse proxy caches. We used Khalid's patch above for Pressflow 6, and we also adapted his patch for Pressflow 5, which required backporting some related proxy support code from Drupal 6.

If you'd like more information about how you can use Pressflow to scale your Drupal projects, please contact Four Kitchens:

http://fourkitchens.com/contact

Modified for Drupal 5?

Could the patch for Drupal 6 be modified for the Drupal 5 environment?

Yes, but requires work

It can be changed to work with Drupal 5.x, but it is not simple, since Drupal 6.x has the reverse proxy settings.

If you are interested in 2bits.com working on this for you, then please click on Contact on the top right of this page.

Patch does not work as of drupal 6.6

It seems that the exit(); in bootstrap.inc (function drupal_page_cache_header, line 643) has changed to return;

node access

Sometimes I access via one node of the webservers. I have two apache2 servers feeding the site through a reverse proxy. Although, making the changes to settings.php corrected issues I was having with sitemaps for google delivering the local IP instead of the published URL.

Now I cannot directly access a single node out of the pair of web servers.

Is there a way to map through this so that a single node can be addressed? I have mappings through the reverse proxy for them with their own url and then the main site is load balanced between them with session tracking.

I read on your post that you

I read on your post that you need to set the variables

$base_url = 'http://tuna.example.com';
$cookie_domain = 'tuna.example.com';

on your settings.php

So... you have to put the address of the machine running squid in there? At least that's what it looks like on the example: squid server is 'tuna' and www server is 'head'.

I ask because we have a couple of web servers and a couple of squid servers running with this setup and I always thought you had to put the website's real url in there, not the cache one...

I read your articles several

I read your articles several times and I like them.

I've tried to use reverse proxy for my drupal site. I use your squid configuration and
work well. But my website is a multi site that has many subdomain.

The problem is content is same on each subdomain.

I used varnish before squid and content is correct but I think squid is faster after
I tested.

Thanks and best regards,
ลงประกาศฟรี

patch didn't work with me :(

Hi,
i have drupal 6.2 and when i run the patch i got this:
$patch <147310-42-d6.patch

--------------------------
|Index: includes/bootstrap.inc
|===================================================================
|RCS file: /cvs/drupal/drupal/includes/bootstrap.inc,v
|retrieving revision 1.206.2.4
|diff -u -F^f -r1.206.2.4 bootstrap.inc
|--- includes/bootstrap.inc 18 Aug 2008 18:56:30 -0000 1.206.2.4
|+++ includes/bootstrap.inc 11 Sep 2008 15:48:05 -0000
--------------------------
File to patch: includes/bootstrap.inc
patching file includes/bootstrap.inc
Hunk #1 succeeded at 723 (offset 135 lines).
Hunk #2 FAILED at 752.
1 out of 2 hunks FAILED -- saving rejects to file includes/bootstrap.inc.rej
can't find file to patch at input line 82
Perhaps you should have used the -p or --strip option?
The text leading up to this was:
--------------------------

how can i fix this issue??
Thanks

Reverse Proxy Server

Thank you so much for your nice tutorial.

Recently I setup a Reverse Proxy Server with Squid (server accelerator) and wrote a full detailed tutorial that you can find in:

http://cosmolinux.no-ip.org/raconetlinux/html/17-squid.html

where I explain how to configure Squid (version 3.x) as a reverse Proxy Server (server accelerator), providing examples about how to do it using two computers (one as a Proxy server and another as a Web Server) or just by using one single computer.

I also describe how to format the Squid's logs and how to send the logs to a remote computer.

Also, you can find an explanation of how to deny access to certain files and how to get correct logs in Apache Web Server.

I wish it is useful to someone.