Dynamic web applications like Drupal offer a lot of benefits for a web site. However, when it comes to scalability, there can be challenges with any dynamic web site, including those build with Drupal.

One strategy that helps a web site is setting up a caching reverse proxy, such as Squid, or Varnish. These become very relevant when the site receives a lot of requests from users who are not logged in (anonymous users).

How much of a performance boost does a caching reverse proxy provide? How do I set one up? Do I need Drupal changes to set it up?

Read below for the answers to these questions.

What is a caching reverse proxy

The idea behind a caching reverse proxy is for it to act as a front end to the web server, retrieving requests from it, and serving them to users, while keeping copies in its own cache for future requests. Then users who are accessing the site can
be served directly from the pages cached in the proxy without needing
to go back to the web server, execute PHP, and database queries, ...etc. Since the cached copy is only HTML, it is either served from disk or from memory, which makes them feel much faster to users (20 ms or so) as opposed to dynamically generating content from Drupal (100s of ms).

See diagrams of What is Reverse Proxy Cache, and How Reverse proxy caches work. (Ignore the configuration on that page since it does not apply to Squid 2.6).

Why is a Drupal patch needed?

With Drupal, the HTTP headers send insist that content not be cached, and is always retrieved from the origin web server. These headers are document in RFC 2616.

So, the patch discussed below changes these headers to be caching reverse proxy
friendly, and eases the load off the Drupal server by letting the proxy
serve pages to anonymous users, as well as making the site feel faster for them. 

Setting up Squid as a reverse proxy

Squid can be many things. One of them is a caching reverse proxy. So, the first order of business is to set up Squid as a reverse proxy.

This setup assumes that Squid's version is 2.6 and that it will run on its own server (tuna.example.com) and will be proxying for the Drupal server at head.example.com.

Here is the/etc/squid/squid.conf file, annonated for easier understanding. Note the required Drupal part at the end.

# Basic parameters
visible_hostname localhost
# This line indicates the server we will be proxying for
http_port 80 accel defaultsite=head.example.com
# And the IP Address for it
cache_peer 192.168.0.222 parent 80 0 no-query originserver
acl apache rep_header Server ^Apache
broken_vary_encoding allow apache
# Where the cache files will be, memory and such 
cache_dir ufs /var/spool/squid 10000 16 256
cache_mem 256 MB
maximum_object_size_in_memory 32 KB
# Log locations and format 
logformat common %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %Hs %<st %Ss:%Sh
logformat combined %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %Hs %<st \
  "%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh
access_log /var/log/squid/access.log squid
cache_log /var/log/squid/cache.log
cache_store_log /var/log/squid/store.log
hosts_file /etc/hosts
# Basic ACLs
acl all src 0.0.0.0/0.0.0.0
acl mydomain dstdomain .example.com
acl manager proto cache_object
acl localhost src 127.0.0.1/255.255.255.255
acl to_localhost dst 127.0.0.0/8
acl Safe_ports port 80
acl purge method PURGE
acl CONNECT method CONNECT
http_access allow manager localhost
http_access deny manager
http_access allow purge localhost
http_access deny purge
http_access deny !Safe_ports
http_access allow localhost
http_access allow all
http_access allow mydomain
http_access deny all
http_reply_access allow all
icp_access allow all
cache_effective_group proxy
coredump_dir /var/spool/squid
forwarded_for on
emulate_httpd_log on
redirect_rewrites_host_header off
buffered_logs on
# Drupal specific stuff, assumes the patch in #14730
acl cookie_logged_in_set rep_header Set-Cookie DRUPAL_LOGGED_IN=Y
cache deny cookie_logged_in_set
acl cookie_logged_in_out rep_header Cookie DRUPAL_LOGGED_IN=Y
cache deny cookie_logged_in_out
acl cookie_logged_in     req_header Cookie DRUPAL_LOGGED_IN=Y
cache deny cookie_logged_in 

Desigining the tests

In order to assess the impact of Squid on a running server, we had to design and execute a testing strategy for it.

For these tests we used a CVS checkout of Drupal 7.x. We also developed a patch for the correct HTTP headers (see below).

We used the devel module to setup the site with the following content:

514 Nodes of type: page and article, each one with its own path alias, 502 users, 5111 comments, 11 Taxonomy vocabularies, and 544 Taxonomy terms.

We then selected 51 unique URLs, using their path alias, as well as the home page. These same URLs are exercised in every test below.

We setup up the stress tests to hammer the site for 2 full minutes as fast as possible, from 10 concurrent users. Each test was run twice.Alll the tests were run through the proxy.

Drupal parameters changes

In order for Drupal to run correctly behind a reverse proxy, several parameters need to be set:

In settings.php, make sure that the base url and cookie domain correspond to the external server name, and not the local machine name. So, in this case, we used the external name for the proxy server (tuna.example.com) and not the Drupal server (head.example.com).

$base_url = 'http://tuna.example.com';
$cookie_domain = 'tuna.example.com'; 

Moreover, you have to set the reverse proxy parameter to true, and the IP address of the reverse proxy.

$conf = array(
'reverse_proxy' => TRUE,
'reverse_proxy_addresses' => array('192.168.0.111',),
); 

These parameters are available in Drupal 6.x and 7.x.

Test 1 No caching in Drupal

In this test, we run Drupal as it ships by default. The page cache is disabled.

The results on our test setup are as follows:

 Number of requests  Response time (seconds)  Transaction rate (Requests/second)
 3,233  0.37  26.87
 3,282  0.37  27.2

As you can see the site can only do 26 or so transactions per second maximum.

Test 2 Only block caching enabled

Out of curiosity, we tested the block cache, with only two "light" blocks on the site, one on each sidebar.

Number of requests Response time (seconds) Transaction rate (Requests/second)
3,044 0.39 25.31
3,094 0.39 25.71

There is some slowdowns observed. This is unexplained and warrants more investigation with a site with a lot of blocks on it.

Test 3 Page caching, 5 minutes lifetime, without patch

We then enabled Drupal's page caching in normal mode, with the minimum cache life time set to 5 minutes.

Number of requests Response time (seconds) Transaction rate (Requests/second)
18,820 0.06 156.82
20,860 0.06 174.44

A noticeable improvement was improved, as the site is able to do over 150 requests per second.

Test 4 Page caching, 5 minutes lifetime, with patch

We applied the patch we developed specifically for reverse proxies, which in comment 39 of issue #147310. This patch is for Drupal 7.x. Attached to this article, you will also find a Drupal 6.x version of the patch in case you want to use it.

Number of requests Response time (seconds) Transaction rate (Requests/second)
61,610 0.02 512.05
61,640 0.02 514.91

That was impressive! From 170 requests per second to 512 requests per second!

Conclusion

For a heavily trafficed site that has a large portion of its visitors not logged in, Squid can be a very beneficial tool. The patching needed in Drupal for Squid is relatively minor. We hope to push for this patch to become part of the standard Drupal 7.x release. Testing of this patch behind corporate proxies is encouraged. Please report that in teh issue, as well as in comments below.

Comments

Tue, 2009/03/17 - 10:42

Sometimes I access via one node of the webservers. I have two apache2 servers feeding the site through a reverse proxy. Although, making the changes to settings.php corrected issues I was having with sitemaps for google delivering the local IP instead of the published URL.

Now I cannot directly access a single node out of the pair of web servers.

Is there a way to map through this so that a single node can be addressed? I have mappings through the reverse proxy for them with their own url and then the main site is load balanced between them with session tracking.

Sun, 2009/10/25 - 18:32

I read on your post that you need to set the variables

$base_url = 'http://tuna.example.com';
$cookie_domain = 'tuna.example.com';

on your settings.php

So... you have to put the address of the machine running squid in there? At least that's what it looks like on the example: squid server is 'tuna' and www server is 'head'.

I ask because we have a couple of web servers and a couple of squid servers running with this setup and I always thought you had to put the website's real url in there, not the cache one...

Sun, 2010/04/25 - 05:56

I read your articles several times and I like them.

I've tried to use reverse proxy for my drupal site. I use your squid configuration and
work well. But my website is a multi site that has many subdomain.

The problem is content is same on each subdomain.

I used varnish before squid and content is correct but I think squid is faster after
I tested.

Thanks and best regards,
ลงประกาศฟรี

Thu, 2011/03/24 - 08:07

Hi,
i have drupal 6.2 and when i run the patch i got this:
$patch <147310-42-d6.patch

--------------------------
|Index: includes/bootstrap.inc
|===================================================================
|RCS file: /cvs/drupal/drupal/includes/bootstrap.inc,v
|retrieving revision 1.206.2.4
|diff -u -F^f -r1.206.2.4 bootstrap.inc
|--- includes/bootstrap.inc 18 Aug 2008 18:56:30 -0000 1.206.2.4
|+++ includes/bootstrap.inc 11 Sep 2008 15:48:05 -0000
--------------------------
File to patch: includes/bootstrap.inc
patching file includes/bootstrap.inc
Hunk #1 succeeded at 723 (offset 135 lines).
Hunk #2 FAILED at 752.
1 out of 2 hunks FAILED -- saving rejects to file includes/bootstrap.inc.rej
can't find file to patch at input line 82
Perhaps you should have used the -p or --strip option?
The text leading up to this was:
--------------------------

how can i fix this issue??
Thanks

Wed, 2013/09/18 - 10:46

Thank you so much for your nice tutorial.

Recently I setup a Reverse Proxy Server with Squid (server accelerator) and wrote a full detailed tutorial that you can find in:

http://cosmolinux.no-ip.org/raconetlinux/html/17-squid.html

where I explain how to configure Squid (version 3.x) as a reverse Proxy Server (server accelerator), providing examples about how to do it using two computers (one as a Proxy server and another as a Web Server) or just by using one single computer.

I also describe how to format the Squid's logs and how to send the logs to a remote computer.

Also, you can find an explanation of how to deny access to certain files and how to get correct logs in Apache Web Server.

I wish it is useful to someone.

Pages

Is your Drupal or Backdrop CMS site slow?
Is it suffering from server resources shortages?
Is it experiencing outages?
Contact us for Drupal or Backdrop CMS Performance Optimization and Tuning Consulting