Increasing Drupal's speed via the Squid caching reverse proxy

Dynamic web applications like Drupal offer a lot of benefits for a web site. However, when it comes to scalability, there can be challenges with any dynamic web site, including those build with Drupal.

One strategy that helps a web site is setting up a caching reverse proxy, such as Squid, or Varnish. These become very relevant when the site receives a lot of requests from users who are not logged in (anonymous users).

How much of a performance boost does a caching reverse proxy provide? How do I set one up? Do I need Drupal changes to set it up?

Read below for the answers to these questions.

What is a caching reverse proxy

The idea behind a caching reverse proxy is for it to act as a front end to the web server, retrieving requests from it, and serving them to users, while keeping copies in its own cache for future requests. Then users who are accessing the site can
be served directly from the pages cached in the proxy without needing
to go back to the web server, execute PHP, and database queries, ...etc. Since the cached copy is only HTML, it is either served from disk or from memory, which makes them feel much faster to users (20 ms or so) as opposed to dynamically generating content from Drupal (100s of ms).

See diagrams of What is Reverse Proxy Cache, and How Reverse proxy caches work. (Ignore the configuration on that page since it does not apply to Squid 2.6).

Why is a Drupal patch needed?

With Drupal, the HTTP headers send insist that content not be cached, and is always retrieved from the origin web server. These headers are document in RFC 2616.

So, the patch discussed below changes these headers to be caching reverse proxy
friendly, and eases the load off the Drupal server by letting the proxy
serve pages to anonymous users, as well as making the site feel faster for them. 

Setting up Squid as a reverse proxy

Squid can be many things. One of them is a caching reverse proxy. So, the first order of business is to set up Squid as a reverse proxy.

This setup assumes that Squid's version is 2.6 and that it will run on its own server (tuna.example.com) and will be proxying for the Drupal server at head.example.com.

Here is the/etc/squid/squid.conf file, annonated for easier understanding. Note the required Drupal part at the end.

# Basic parameters
visible_hostname localhost
# This line indicates the server we will be proxying for
http_port 80 accel defaultsite=head.example.com
# And the IP Address for it
cache_peer 192.168.0.222 parent 80 0 no-query originserver
acl apache rep_header Server ^Apache
broken_vary_encoding allow apache
# Where the cache files will be, memory and such 
cache_dir ufs /var/spool/squid 10000 16 256
cache_mem 256 MB
maximum_object_size_in_memory 32 KB
# Log locations and format 
logformat common %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %Hs %<st %Ss:%Sh
logformat combined %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %Hs %<st "%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh
access_log /var/log/squid/access.log squid
cache_log /var/log/squid/cache.log
cache_store_log /var/log/squid/store.log
hosts_file /etc/hosts
# Basic ACLs
acl all src 0.0.0.0/0.0.0.0
acl mydomain dstdomain .example.com
acl manager proto cache_object
acl localhost src 127.0.0.1/255.255.255.255
acl to_localhost dst 127.0.0.0/8
acl Safe_ports port 80
acl purge method PURGE
acl CONNECT method CONNECT
http_access allow manager localhost
http_access deny manager
http_access allow purge localhost
http_access deny purge
http_access deny !Safe_ports
http_access allow localhost
http_access allow all
http_access allow mydomain
http_access deny all
http_reply_access allow all
icp_access allow all
cache_effective_group proxy
coredump_dir /var/spool/squid
forwarded_for on
emulate_httpd_log on
redirect_rewrites_host_header off
buffered_logs on
# Drupal specific stuff, assumes the patch in #14730
acl cookie_logged_in_set rep_header Set-Cookie DRUPAL_LOGGED_IN=Y
cache deny cookie_logged_in_set
acl cookie_logged_in_out rep_header Cookie DRUPAL_LOGGED_IN=Y
cache deny cookie_logged_in_out
acl cookie_logged_in     req_header Cookie DRUPAL_LOGGED_IN=Y
cache deny cookie_logged_in 

Desigining the tests

In order to assess the impact of Squid on a running server, we had to design and execute a testing strategy for it.

For these tests we used a CVS checkout of Drupal 7.x. We also developed a patch for the correct HTTP headers (see below).

We used the devel module to setup the site with the following content:

514 Nodes of type: page and article, each one with its own path alias, 502 users, 5111 comments, 11 Taxonomy vocabularies, and 544 Taxonomy terms.

We then selected 51 unique URLs, using their path alias, as well as the home page. These same URLs are exercised in every test below.

We setup up the stress tests to hammer the site for 2 full minutes as fast as possible, from 10 concurrent users. Each test was run twice.Alll the tests were run through the proxy.

Drupal parameters changes

In order for Drupal to run correctly behind a reverse proxy, several parameters need to be set:

In settings.php, make sure that the base url and cookie domain correspond to the external server name, and not the local machine name. So, in this case, we used the external name for the proxy server (tuna.example.com) and not the Drupal server (head.example.com).

$base_url = 'http://tuna.example.com';
$cookie_domain = 'tuna.example.com'; 

Moreover, you have to set the reverse proxy parameter to true, and the IP address of the reverse proxy.

$conf = array(
'reverse_proxy' => TRUE,
'reverse_proxy_addresses' => array('192.168.0.111',),
); 

These parameters are available in Drupal 6.x and 7.x.

Test 1 No caching in Drupal

In this test, we run Drupal as it ships by default. The page cache is disabled.

The results on our test setup are as follows:

 Number of requests  Response time (seconds)  Transaction rate (Requests/second)
 3,233  0.37  26.87
 3,282  0.37  27.2

As you can see the site can only do 26 or so transactions per second maximum.

Test 2 Only block caching enabled

Out of curiosity, we tested the block cache, with only two "light" blocks on the site, one on each sidebar.

Number of requests Response time (seconds) Transaction rate (Requests/second)
3,044 0.39 25.31
3,094 0.39 25.71

There is some slowdowns observed. This is unexplained and warrants more investigation with a site with a lot of blocks on it.

Test 3 Page caching, 5 minutes lifetime, without patch

We then enabled Drupal's page caching in normal mode, with the minimum cache life time set to 5 minutes.

Number of requests Response time (seconds) Transaction rate (Requests/second)
18,820 0.06 156.82
20,860 0.06 174.44

A noticeable improvement was improved, as the site is able to do over 150 requests per second.

Test 4 Page caching, 5 minutes lifetime, with patch

We applied the patch we developed specifically for reverse proxies, which in comment 39 of issue #147310. This patch is for Drupal 7.x. Attached to this article, you will also find a Drupal 6.x version of the patch in case you want to use it.

Number of requests Response time (seconds) Transaction rate (Requests/second)
61,610 0.02 512.05
61,640 0.02 514.91

That was impressive! From 170 requests per second to 512 requests per second!

Conclusion

For a heavily trafficed site that has a large portion of its visitors not logged in, Squid can be a very beneficial tool. The patching needed in Drupal for Squid is relatively minor. We hope to push for this patch to become part of the standard Drupal 7.x release. Testing of this patch behind corporate proxies is encouraged. Please report that in teh issue, as well as in comments below.

AttachmentSize
Drupal 6.x patch for Squid Reverse Proxy (issue #147310)5.17 KB

Thanks 2bits for providing

Thanks 2bits for providing this solution. Can you explain a little bit of what is happening when implementing this patch - for those of us who are not familiar with this type of caching?

Regards

Sure ...

Sure.

I added a section on this to the article.
--
2bits -- Drupal consulting

Wrong URL for patch?

It doesn't look like http://drupal.org/node/14730 is the right issue

Fixed

It is 147310. Thanks.
--
2bits -- Drupal consulting