Dynamic web applications like Drupal offer a lot of benefits for a web site. However, when it comes to scalability, there can be challenges with any dynamic web site, including those build with Drupal.

One strategy that helps a web site is setting up a caching reverse proxy, such as Squid, or Varnish. These become very relevant when the site receives a lot of requests from users who are not logged in (anonymous users).

How much of a performance boost does a caching reverse proxy provide? How do I set one up? Do I need Drupal changes to set it up?

Read below for the answers to these questions.

What is a caching reverse proxy

The idea behind a caching reverse proxy is for it to act as a front end to the web server, retrieving requests from it, and serving them to users, while keeping copies in its own cache for future requests. Then users who are accessing the site can
be served directly from the pages cached in the proxy without needing
to go back to the web server, execute PHP, and database queries, ...etc. Since the cached copy is only HTML, it is either served from disk or from memory, which makes them feel much faster to users (20 ms or so) as opposed to dynamically generating content from Drupal (100s of ms).

See diagrams of What is Reverse Proxy Cache, and How Reverse proxy caches work. (Ignore the configuration on that page since it does not apply to Squid 2.6).

Why is a Drupal patch needed?

With Drupal, the HTTP headers send insist that content not be cached, and is always retrieved from the origin web server. These headers are document in RFC 2616.

So, the patch discussed below changes these headers to be caching reverse proxy
friendly, and eases the load off the Drupal server by letting the proxy
serve pages to anonymous users, as well as making the site feel faster for them. 

Setting up Squid as a reverse proxy

Squid can be many things. One of them is a caching reverse proxy. So, the first order of business is to set up Squid as a reverse proxy.

This setup assumes that Squid's version is 2.6 and that it will run on its own server (tuna.example.com) and will be proxying for the Drupal server at head.example.com.

Here is the/etc/squid/squid.conf file, annonated for easier understanding. Note the required Drupal part at the end.

# Basic parameters
visible_hostname localhost
# This line indicates the server we will be proxying for
http_port 80 accel defaultsite=head.example.com
# And the IP Address for it
cache_peer 192.168.0.222 parent 80 0 no-query originserver
acl apache rep_header Server ^Apache
broken_vary_encoding allow apache
# Where the cache files will be, memory and such 
cache_dir ufs /var/spool/squid 10000 16 256
cache_mem 256 MB
maximum_object_size_in_memory 32 KB
# Log locations and format 
logformat common %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %Hs %<st %Ss:%Sh
logformat combined %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %Hs %<st \
  "%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh
access_log /var/log/squid/access.log squid
cache_log /var/log/squid/cache.log
cache_store_log /var/log/squid/store.log
hosts_file /etc/hosts
# Basic ACLs
acl all src 0.0.0.0/0.0.0.0
acl mydomain dstdomain .example.com
acl manager proto cache_object
acl localhost src 127.0.0.1/255.255.255.255
acl to_localhost dst 127.0.0.0/8
acl Safe_ports port 80
acl purge method PURGE
acl CONNECT method CONNECT
http_access allow manager localhost
http_access deny manager
http_access allow purge localhost
http_access deny purge
http_access deny !Safe_ports
http_access allow localhost
http_access allow all
http_access allow mydomain
http_access deny all
http_reply_access allow all
icp_access allow all
cache_effective_group proxy
coredump_dir /var/spool/squid
forwarded_for on
emulate_httpd_log on
redirect_rewrites_host_header off
buffered_logs on
# Drupal specific stuff, assumes the patch in #14730
acl cookie_logged_in_set rep_header Set-Cookie DRUPAL_LOGGED_IN=Y
cache deny cookie_logged_in_set
acl cookie_logged_in_out rep_header Cookie DRUPAL_LOGGED_IN=Y
cache deny cookie_logged_in_out
acl cookie_logged_in     req_header Cookie DRUPAL_LOGGED_IN=Y
cache deny cookie_logged_in 

Desigining the tests

In order to assess the impact of Squid on a running server, we had to design and execute a testing strategy for it.

For these tests we used a CVS checkout of Drupal 7.x. We also developed a patch for the correct HTTP headers (see below).

We used the devel module to setup the site with the following content:

514 Nodes of type: page and article, each one with its own path alias, 502 users, 5111 comments, 11 Taxonomy vocabularies, and 544 Taxonomy terms.

We then selected 51 unique URLs, using their path alias, as well as the home page. These same URLs are exercised in every test below.

We setup up the stress tests to hammer the site for 2 full minutes as fast as possible, from 10 concurrent users. Each test was run twice.Alll the tests were run through the proxy.

Drupal parameters changes

In order for Drupal to run correctly behind a reverse proxy, several parameters need to be set:

In settings.php, make sure that the base url and cookie domain correspond to the external server name, and not the local machine name. So, in this case, we used the external name for the proxy server (tuna.example.com) and not the Drupal server (head.example.com).

$base_url = 'http://tuna.example.com';
$cookie_domain = 'tuna.example.com'; 

Moreover, you have to set the reverse proxy parameter to true, and the IP address of the reverse proxy.

$conf = array(
'reverse_proxy' => TRUE,
'reverse_proxy_addresses' => array('192.168.0.111',),
); 

These parameters are available in Drupal 6.x and 7.x.

Test 1 No caching in Drupal

In this test, we run Drupal as it ships by default. The page cache is disabled.

The results on our test setup are as follows:

 Number of requests  Response time (seconds)  Transaction rate (Requests/second)
 3,233  0.37  26.87
 3,282  0.37  27.2

As you can see the site can only do 26 or so transactions per second maximum.

Test 2 Only block caching enabled

Out of curiosity, we tested the block cache, with only two "light" blocks on the site, one on each sidebar.

Number of requests Response time (seconds) Transaction rate (Requests/second)
3,044 0.39 25.31
3,094 0.39 25.71

There is some slowdowns observed. This is unexplained and warrants more investigation with a site with a lot of blocks on it.

Test 3 Page caching, 5 minutes lifetime, without patch

We then enabled Drupal's page caching in normal mode, with the minimum cache life time set to 5 minutes.

Number of requests Response time (seconds) Transaction rate (Requests/second)
18,820 0.06 156.82
20,860 0.06 174.44

A noticeable improvement was improved, as the site is able to do over 150 requests per second.

Test 4 Page caching, 5 minutes lifetime, with patch

We applied the patch we developed specifically for reverse proxies, which in comment 39 of issue #147310. This patch is for Drupal 7.x. Attached to this article, you will also find a Drupal 6.x version of the patch in case you want to use it.

Number of requests Response time (seconds) Transaction rate (Requests/second)
61,610 0.02 512.05
61,640 0.02 514.91

That was impressive! From 170 requests per second to 512 requests per second!

Conclusion

For a heavily trafficed site that has a large portion of its visitors not logged in, Squid can be a very beneficial tool. The patching needed in Drupal for Squid is relatively minor. We hope to push for this patch to become part of the standard Drupal 7.x release. Testing of this patch behind corporate proxies is encouraged. Please report that in teh issue, as well as in comments below.

Comments

Thu, 2008/09/11 - 03:36

Thanks 2bits for providing this solution. Can you explain a little bit of what is happening when implementing this patch - for those of us who are not familiar with this type of caching?

Regards

Sat, 2008/12/13 - 12:11

Maybe this question is way off target, but please help me out - I've been struggling with the general impression that Drupal ain't for dialups, not because of server load, but just because of the huge amount of unused data that gets transferred, compared to the 'minified' data transfer set that would be needed to get the identical output on the browser.

Mainly, the ginormous amount of unused CSS.  A first pass at some calculations showed that on 56K dial-up you could save 2-3 full seconds per page load just by filtering out the unused CSS identifiers.  There are some hand apps to find the unused identifiers, and the ratios of used-to-total are pretty apalling if you look around various drupal sites.  50%, 20%, even 10% or less on large production drupal sites seems pretty common.   

 So, wouldn't it be handy if some type of proxy scheme could first render the page from drupal, then second trim unused css and javascript, then third send that trimmed package over the pipe to the client?

 For reference please see the questions at http://drupal.org/node/338102

 Any thoughts?  Can squid be combined with CSS-pruning and/or javascript-pruning??  This would be a huge win and would put drupal back into the realm of usability for the rest of the world that's not ultra-connected... and if you're running a nonprofit web page where most clients have dial-up, it's a big deal.

 Or is this way off base??

 Thanks in advance

Sat, 2008/12/13 - 22:42

This is off topic, but I will answer it only this once. We can continue in the issue you mentioned if you want.

Drupal is certainly for dialup (and broadband and mobile phones and PDAs, and everything ...)

1. CSS is loaded only on the first access from any modern browser. This means the first access will take a few seconds, but subsequent ones will not load the CSS, because they are stored in the browser's cache.

2. You can go to admin/settings/performance and compress the CSS, so it is only one file and not many. This will cut the time needed for page loads.

3. You can use the following code in template.php to remove any CSS files you like, and take the few pieces from them that is needed and you roll them into the style.css of your theme.

function _phptemplate_variables($hook, $vars) {
  $css = drupal_add_css();

  // System CSS
  unset($css['all']['module']['modules/system/system.css']);
  unset($css['all']['module']['modules/system/defaults.css']);

  // Module CSS
  $rm = drupal_get_path('module','help').'/help.css';
  unset($css['all']['module'][$rm]);

  $vars['styles'] = drupal_get_css($css);
  return $vars;
}

4. I am not sure if an automated approach would work. It would be a lot of work, but if such logic is developed (the hard part), writing a proxy to strip down the CSS would be the easy part.

Sat, 2008/12/13 - 23:01

Thanks for the reply.  Will put more thought into it here.

Four Kitchens maintains Pressflow, a free, open-source derivative of Drupal featuring extensive performance improvements while maintaining compatibility with contributed modules.

We currently maintain Pressflow 5 and 6, which are derivatives of Drupal 5 and 6. Both have built-in support for Squid and other reverse proxy caches. We used Khalid's patch above for Pressflow 6, and we also adapted his patch for Pressflow 5, which required backporting some related proxy support code from Drupal 6.

If you'd like more information about how you can use Pressflow to scale your Drupal projects, please contact Four Kitchens:

http://fourkitchens.com/contact

Thu, 2009/01/15 - 11:49

Could the patch for Drupal 6 be modified for the Drupal 5 environment?

Thu, 2009/01/15 - 12:06

It can be changed to work with Drupal 5.x, but it is not simple, since Drupal 6.x has the reverse proxy settings.

If you are interested in 2bits working on this for you, then please click on Contact on the top right of this page.

Pages

Is your Drupal or Backdrop CMS site slow?
Is it suffering from server resources shortages?
Is it experiencing outages?
Contact us for Drupal or Backdrop CMS Performance Optimization and Tuning Consulting