Using a Reverse Proxy and/or a Content Delivery Network (CDN) has become common practice for Drupal and other Content Management Systems.
One inconvenient aspect of this is that your web server no longer gets the correct IP address, and neither does your application. The IP address is that of the machine that the reverse proxy is running on.
In Drupal, there is code in core that tries to work around this, by looking up the IP address in the HTTP header HTTP_X_FORWARDED_FOR, or a custom header that you can set.
We have mentioned before that both Pressflow 6.x and Drupal 7.x (but not core Drupal 6.x), disable page caching when a session is created for an anonymous user.
An extreme case of this happened recently, because of a perfect storm.
The client sends a newsletter to site users, be they who have accounts on the site, or others who just entered their email to get the newsletter.
One of the suboptimal techniques that developers often use, is a query that retrieves the entire content of a table, without any conditions or filters.
SELECT * FROM table_name ORDER BY column_name;
This is acceptable if there are not too many rows in the table, and there is only one call per page view to that function.
However, things start to get out of control when developers do not take into account the frequency of these calls.
Here is an example to illustrate the problem:
The bulk of Drupal hosting for clients that we deal with is on virtual servers, whether they are marketed as "cloud" or not. Many eventually have to move to dedicated servers because increased traffic or continually adding features that increase complexity and bloat.
But, there are often common issues that we see repeatedly that have solutions which can prolong the life of your current site's infrastructure.
We assume that your staff, or your hosting provider, have full access to the virtual servers, as well as the physical servers that run on them.
Ubuntu Server 12.04 LTS finally provides a stable long term support server distro that has a recent version of Varnish in its repositories.
Trouble is, the repository provided package of Varnish has some issues. Specifically, the command line tools, such as varnishhist, varnishstat, ...etc. do not report anything. Therefore one cannot know the hit/miss rates, hits per second, or other useful information. Moreover, monitoring Varnish using Munin for such statistics does not work either.
There are two ways you can overcome this, both are described below.
Today, Khalid gave a presentation on Drupal Performance and Scalability for members of the London (Ontario) Drupal Users Group.
The slides from the presentation are attached below.
For sites that have lots of slow queries, disk access is often the bottleneck. For these slow queries, MySQL writes temporary tables to disk, populates them with intermediate results, then query them again for the final result.
We all know that the disk is the slowest part in a computer, because it is limited by being mechanical, rather than electronic. One way of mitigating this is to tell MySQL to use memory rather than disk for temporary tables.
Over the past few years, we were called in to assist clients with poor performance of their site. Many of these were using Pressflow, because it is "faster" and "more scalable" than Drupal 6.x.
However, some of these clients hurt their site's performance by using Pressflow, rather than plain Drupal, often because they misconfigured or misused it in some way or another.
Setting cache to "external" without having a caching reverse proxy
The Boost is often a great help with speeding up web sites of small to medium size and/or hosted on shared hosts.
It works by writing the entire cached page to a disk file, and serving it entirely from the web server, bypassing PHP and MySQL entirely.
This works well in most cases, but we have observed a few cases where boost itself becomes a bottleneck.
One example was when 2bits.com were called to investigate and solve a problem for a Fortune 500 company's Drupal web site.
In the Drupal community, we always recommend using the Drupal API, and best practices for development, management and deployment. This is for many reasons, including modularity, security and maintainability.
But it is also for performance that you need to stick to these guidelines, refined for many years by so many in the community.
Is your Drupal or WordPress site slow?