The bulk of Drupal hosting for clients that we deal with is on virtual servers, whether they are marketed as "cloud" or not. Many eventually have to move to dedicated servers because increased traffic or continually adding features that increase complexity and bloat.
But, there are often common issues that we see repeatedly that have solutions which can prolong the life of your current site's infrastructure.
We assume that your staff, or your hosting provider, have full access to the virtual servers, as well as the physical servers that run on them.
Disks cannot be virtualized
Even for dedicated servers, the server's disk(s) are often the bottleneck for the overall system. They are the slowest part. This is definitely true for mechanical hard disks with rotating platters, and even Solid State Disks (SSDs) are often slower than the CPU or memory.
For the above reasons, disks cannot be fully virtualized. Yes, you do get a storage allocation that is yours to use and no one else can use. But you cannot guarantee a portion of the I/O throughput, which is always a precious resource on servers.
So, other virtual servers that are on the same physical server as you will contend for disk I/O if your site (or theirs) is a busy one or not optimally configured.
In a virtual server environment, you cannot tell how many virtual servers are on the same physical server, nor if they are busy or not. You only deal with the effects (see below).
For a Drupal site, the following are some of the most common causes for high disk I/O activity:
- MySQL, with either a considerable amount of slow queries that do file sorts and temporary tables; or lots of INSERT/UPDATE/DELETE
- Lots of logging activity, such as a warning or a notice in a module that keeps reporting exceptions many times per disk access
- Boost cache expiry, e.g. when a comment is posted
Xen based virtualization vs. Virtuozzo or OpenVZ
The market uses virtualization technologies much like airlines when they overbook flights based on the assumption that some passengers will not show up.
Similarly, not all virtual hosting customers will use all the resources allocated to them, so there is often plenty of unused capacity.
However, not all virtualization technologies are equal when it comes to resource allocation.
Virtuozzo and its free variant, OpenVZ, use the term "burst memory" to allocate unused memory from other instances, or even swap space when applications demand it on one instance. However, this can bring a server to its knees if swap usage causes thrashing.
Moreover, some Virtuozzo/OpenVZ hosts use vzfs, a virtualized file system, which is slow for Drupal when used for certain things, such as having all of web root on it, logs, and database files.
Xen does not suffer from any of the above. It guarantees that memory and CPU allocated to one virtual instance stays dedicated for that instance.
However, since physical disk I/O cannot be virtualized, it remains the only bottleneck with Xen.
One issue that Amazon AWS EC2 users face is that the reasonably priced instances are often underpowered for most Drupal sites. These are the Small and Medium instances.
For sites with low number of nodes/comments per day, and with most traffic being anonymous. These sites lend themselves to working well with proper Varnish caching enabled set to long hours before expiring.
Other sites that rely on a large number of simultaneous logged in users, with lots of enabled modules, and with short cache expiry times do not work well with these underpowered instances. Such sites require the Extra Large instances, and often the High CPU ones too.
Of course, this all adds to the total costs of hosting.
Expensive As You Grow
Needless to say, if your site keeps growing then there will be added hosting costs to cope with this growth.
With the cloud providers, these costs often grow faster than with dedicated servers, as you add more instances, and so on.
Some companies choose to self manage physical servers colocated at a datacenter and virtualized them themselves.
This is often a good option, but can also be a pitfall. Sometimes the servers are badly misconfigured. We saw one case where the physical server was segmented into 12 VMWare virtual servers with no good reason. Moreover, all of them were accessing a single RAID array. On top of that boost was used on a busy popular forum. When a comment was posted, boost was expiring pages, and that was tying up the RAID array from doing anything useful to other visitors of the site.
Variability in Performance
With cloud and virtual servers, you often don't notice issues, but then suddenly variability will creep in.
An analogy ...
This happens because you have bad housemates who flush the toilet when you are in the shower. Except that you do not know who those housemates are, and can't ask them directly. The only symptom is this sudden cold water over your body. Your only recourse is to ask the landlord if someone flushed the toilet!
Here is a case in point: a Drupal site at a VPS with a popular cloud provider. It worked fine for several years. Then the host upgraded to another, newer version, and asked all customers to move their sites.
It was fine most of the time, but then extremely slow at other times. No pattern could be predicted.
For example while getting a page from the cache for anonymous visitors usually takes a few tens of milliseconds at most, on some occasions it takes much more than that, in one case, 13,879 milliseconds, with the total page load time 17,423 milliseconds.
Here is a sample of devel's output:
Executed 55 queries in 12.51 milliseconds. .Page execution time was 118.61 ms.
Executed 55 queries in 7.56 milliseconds. Page execution time was 93.48 ms.
Most of the time is spent is retrieving cached items.
ms where query
0.61 cache_get SELECT data, created, headers, expire FROM cache WHERE cid = 'menu:1:en'
0.42 cache_get SELECT data, created, headers, expire FROM cache WHERE cid = 'bc_87_[redacted]'
0.36 cache_get SELECT data, created, headers, expire FROM cache WHERE cid = 'bc_54_[redacted]'
0.19 cache_get SELECT data, created, headers, expire FROM cache WHERE cid = 'filter:3:0b81537031336685af6f2b0e3a0624b0'
0.18 cache_get SELECT data, created, headers, expire FROM cache WHERE cid = 'bc_88_[redacted]'
0.18 block_list SELECT * FROM blocks WHERE theme = '[redacted]' AND status = 1 ORDER BY region, weight, module
Then suddenly, same site, same server, and you get:
Executed 55 queries in 2237.67 milliseconds. Page execution time was 2323.59 ms.
This was a Virtuozzo host, and it was a sign of disk contention. Since this is a virtual server, we could not tell if this is something inside the virutal host or some other tenant on the same physical server flushing the toilet ...
The solution is in the following point.
Move your VPS to another physical server
When you encounter variable performance or poor performance, before wasting time on troubleshooting that may not lead anywhere, it is worthwhile to contact your host, and ask for your VPS to be moved to a different physical server.
Doing so most likely will solve the issue, since you effectively have a different set of housemates.
- Drupal on Dedicated vs. Amazon AWS EC2, and how we halved the cost of Amazon by going dedicated.
- Hosting Virtualization: Virtuozzo/OpenVZ vs. Xen which is best? has real life issues from sites we've helped.
- When Boost slows down your Drupal site, discusses boost cache expiry and how disk contention can be a real issue.