Dynamic web applications like Drupal offer a lot of benefits for a web site. However, when it comes to scalability, there can be challenges with any dynamic web site, including those build with Drupal.
One strategy that helps a web site is setting up a caching reverse proxy, such as Squid, or Varnish. These become very relevant when the site receives a lot of requests from users who are not logged in (anonymous users).
How much of a performance boost does a caching reverse proxy provide? How do I set one up? Do I need Drupal changes to set it up?
Read below for the answers to these questions.
What is a caching reverse proxy
The idea behind a caching reverse proxy is for it to act as a front end to the web server, retrieving requests from it, and serving them to users, while keeping copies in its own cache for future requests. Then users who are accessing the site can
be served directly from the pages cached in the proxy without needing
to go back to the web server, execute PHP, and database queries, ...etc. Since the cached copy is only HTML, it is either served from disk or from memory, which makes them feel much faster to users (20 ms or so) as opposed to dynamically generating content from Drupal (100s of ms).
See diagrams of What is Reverse Proxy Cache, and How Reverse proxy caches work. (Ignore the configuration on that page since it does not apply to Squid 2.6).
Why is a Drupal patch needed?
With Drupal, the HTTP headers send insist that content not be cached, and is always retrieved from the origin web server. These headers are document in RFC 2616.
So, the patch discussed below changes these headers to be caching reverse proxy
friendly, and eases the load off the Drupal server by letting the proxy
serve pages to anonymous users, as well as making the site feel faster for them.
Setting up Squid as a reverse proxy
Squid can be many things. One of them is a caching reverse proxy. So, the first order of business is to set up Squid as a reverse proxy.
This setup assumes that Squid's version is 2.6 and that it will run on its own server (tuna.example.com) and will be proxying for the Drupal server at head.example.com.
Here is the/etc/squid/squid.conf file, annonated for easier understanding. Note the required Drupal part at the end.
# Basic parameters visible_hostname localhost # This line indicates the server we will be proxying for http_port 80 accel defaultsite=head.example.com # And the IP Address for it cache_peer 192.168.0.222 parent 80 0 no-query originserver acl apache rep_header Server ^Apache broken_vary_encoding allow apache # Where the cache files will be, memory and such cache_dir ufs /var/spool/squid 10000 16 256 cache_mem 256 MB maximum_object_size_in_memory 32 KB # Log locations and format logformat common %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %Hs %<st %Ss:%Sh logformat combined %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %Hs %<st \ "%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh access_log /var/log/squid/access.log squid cache_log /var/log/squid/cache.log cache_store_log /var/log/squid/store.log hosts_file /etc/hosts # Basic ACLs acl all src 0.0.0.0/0.0.0.0 acl mydomain dstdomain .example.com acl manager proto cache_object acl localhost src 127.0.0.1/255.255.255.255 acl to_localhost dst 127.0.0.0/8 acl Safe_ports port 80 acl purge method PURGE acl CONNECT method CONNECT http_access allow manager localhost http_access deny manager http_access allow purge localhost http_access deny purge http_access deny !Safe_ports http_access allow localhost http_access allow all http_access allow mydomain http_access deny all http_reply_access allow all icp_access allow all cache_effective_group proxy coredump_dir /var/spool/squid forwarded_for on emulate_httpd_log on redirect_rewrites_host_header off buffered_logs on # Drupal specific stuff, assumes the patch in #14730 acl cookie_logged_in_set rep_header Set-Cookie DRUPAL_LOGGED_IN=Y cache deny cookie_logged_in_set acl cookie_logged_in_out rep_header Cookie DRUPAL_LOGGED_IN=Y cache deny cookie_logged_in_out acl cookie_logged_in req_header Cookie DRUPAL_LOGGED_IN=Y cache deny cookie_logged_in
Desigining the tests
In order to assess the impact of Squid on a running server, we had to design and execute a testing strategy for it.
For these tests we used a CVS checkout of Drupal 7.x. We also developed a patch for the correct HTTP headers (see below).
We used the devel module to setup the site with the following content:
514 Nodes of type: page and article, each one with its own path alias, 502 users, 5111 comments, 11 Taxonomy vocabularies, and 544 Taxonomy terms.
We then selected 51 unique URLs, using their path alias, as well as the home page. These same URLs are exercised in every test below.
We setup up the stress tests to hammer the site for 2 full minutes as fast as possible, from 10 concurrent users. Each test was run twice.Alll the tests were run through the proxy.
Drupal parameters changes
In order for Drupal to run correctly behind a reverse proxy, several parameters need to be set:
In settings.php, make sure that the base url and cookie domain correspond to the external server name, and not the local machine name. So, in this case, we used the external name for the proxy server (tuna.example.com) and not the Drupal server (head.example.com).
$base_url = 'http://tuna.example.com'; $cookie_domain = 'tuna.example.com';
Moreover, you have to set the reverse proxy parameter to true, and the IP address of the reverse proxy.
$conf = array( 'reverse_proxy' => TRUE, 'reverse_proxy_addresses' => array('192.168.0.111',), );
These parameters are available in Drupal 6.x and 7.x.
Test 1 No caching in Drupal
In this test, we run Drupal as it ships by default. The page cache is disabled.
The results on our test setup are as follows:
Number of requests | Response time (seconds) | Transaction rate (Requests/second) |
3,233 | 0.37 | 26.87 |
3,282 | 0.37 | 27.2 |
As you can see the site can only do 26 or so transactions per second maximum.
Test 2 Only block caching enabled
Out of curiosity, we tested the block cache, with only two "light" blocks on the site, one on each sidebar.
Number of requests | Response time (seconds) | Transaction rate (Requests/second) |
3,044 | 0.39 | 25.31 |
3,094 | 0.39 | 25.71 |
There is some slowdowns observed. This is unexplained and warrants more investigation with a site with a lot of blocks on it.
Test 3 Page caching, 5 minutes lifetime, without patch
We then enabled Drupal's page caching in normal mode, with the minimum cache life time set to 5 minutes.
Number of requests | Response time (seconds) | Transaction rate (Requests/second) |
18,820 | 0.06 | 156.82 |
20,860 | 0.06 | 174.44 |
A noticeable improvement was improved, as the site is able to do over 150 requests per second.
Test 4 Page caching, 5 minutes lifetime, with patch
We applied the patch we developed specifically for reverse proxies, which in comment 39 of issue #147310. This patch is for Drupal 7.x. Attached to this article, you will also find a Drupal 6.x version of the patch in case you want to use it.
Number of requests | Response time (seconds) | Transaction rate (Requests/second) |
61,610 | 0.02 | 512.05 |
61,640 | 0.02 | 514.91 |
That was impressive! From 170 requests per second to 512 requests per second!
Conclusion
For a heavily trafficed site that has a large portion of its visitors not logged in, Squid can be a very beneficial tool. The patching needed in Drupal for Squid is relatively minor. We hope to push for this patch to become part of the standard Drupal 7.x release. Testing of this patch behind corporate proxies is encouraged. Please report that in teh issue, as well as in comments below.
Attachment | Size |
---|---|
Drupal 6.x patch for Squid Reverse Proxy (issue #147310) | 5.17 KB |
Comments
Visitor (not verified)
Patch does not work as of drupal 6.6
Thu, 2009/01/29 - 08:38It seems that the exit(); in bootstrap.inc (function drupal_page_cache_header, line 643) has changed to return;
Visitor (not verified)
node access
Tue, 2009/03/17 - 10:42Sometimes I access via one node of the webservers. I have two apache2 servers feeding the site through a reverse proxy. Although, making the changes to settings.php corrected issues I was having with sitemaps for google delivering the local IP instead of the published URL.
Now I cannot directly access a single node out of the pair of web servers.
Is there a way to map through this so that a single node can be addressed? I have mappings through the reverse proxy for them with their own url and then the main site is load balanced between them with session tracking.
Visitor (not verified)
I read on your post that you
Sun, 2009/10/25 - 18:32I read on your post that you need to set the variables
$base_url = 'http://tuna.example.com';
$cookie_domain = 'tuna.example.com';
on your settings.php
So... you have to put the address of the machine running squid in there? At least that's what it looks like on the example: squid server is 'tuna' and www server is 'head'.
I ask because we have a couple of web servers and a couple of squid servers running with this setup and I always thought you had to put the website's real url in there, not the cache one...
Visitor (not verified)
notice that it works if drupal cache is inables !
Wed, 2010/02/17 - 17:43Hello, friends, great job !. In my case it only works if drupal cache is enabled, i spent a couple of hours trying to put in work !. Thanks
Visitor (not verified)
I read your articles several
Sun, 2010/04/25 - 05:56I read your articles several times and I like them.
I've tried to use reverse proxy for my drupal site. I use your squid configuration and
work well. But my website is a multi site that has many subdomain.
The problem is content is same on each subdomain.
I used varnish before squid and content is correct but I think squid is faster after
I tested.
Thanks and best regards,
ลงประกาศฟรี
Visitor (not verified)
Thanks for how to config
Mon, 2010/04/26 - 00:04Thanks for how to config squid.
Alaa Alomari (not verified)
patch didn't work with me :(
Thu, 2011/03/24 - 08:07Hi,
i have drupal 6.2 and when i run the patch i got this:
$patch <147310-42-d6.patch
--------------------------
|Index: includes/bootstrap.inc
|===================================================================
|RCS file: /cvs/drupal/drupal/includes/bootstrap.inc,v
|retrieving revision 1.206.2.4
|diff -u -F^f -r1.206.2.4 bootstrap.inc
|--- includes/bootstrap.inc 18 Aug 2008 18:56:30 -0000 1.206.2.4
|+++ includes/bootstrap.inc 11 Sep 2008 15:48:05 -0000
--------------------------
File to patch: includes/bootstrap.inc
patching file includes/bootstrap.inc
Hunk #1 succeeded at 723 (offset 135 lines).
Hunk #2 FAILED at 752.
1 out of 2 hunks FAILED -- saving rejects to file includes/bootstrap.inc.rej
can't find file to patch at input line 82
Perhaps you should have used the -p or --strip option?
The text leading up to this was:
--------------------------
how can i fix this issue??
Thanks
Visitor (not verified)
Reverse Proxy Server
Wed, 2013/09/18 - 10:46Thank you so much for your nice tutorial.
Recently I setup a Reverse Proxy Server with Squid (server accelerator) and wrote a full detailed tutorial that you can find in:
http://cosmolinux.no-ip.org/raconetlinux/html/17-squid.html
where I explain how to configure Squid (version 3.x) as a reverse Proxy Server (server accelerator), providing examples about how to do it using two computers (one as a Proxy server and another as a Web Server) or just by using one single computer.
I also describe how to format the Squid's logs and how to send the logs to a remote computer.
Also, you can find an explanation of how to deny access to certain files and how to get correct logs in Apache Web Server.
I wish it is useful to someone.
Pages