The bulk of Drupal hosting for clients that we deal with is on virtual servers, whether they are marketed as "cloud" or not. Many eventually have to move to dedicated servers because increased traffic or continually adding features that increase complexity and bloat.

But, there are often common issues that we see repeatedly that have solutions which can prolong the life of your current site's infrastructure.

We assume that your staff, or your hosting provider, have full access to the virtual servers, as well as the physical servers that run on them.

Disks cannot be virtualized

Even for dedicated servers, the server's disk(s) are often the bottleneck for the overall system. They are the slowest part. This is definitely true for mechanical hard disks with rotating platters, and even Solid State Disks (SSDs) are often slower than the CPU or memory.

For the above reasons, disks cannot be fully virtualized. Yes, you do get a storage allocation that is yours to use and no one else can use. But you cannot guarantee a portion of the I/O throughput, which is always a precious resource on servers.

So, other virtual servers that are on the same physical server as you will contend for disk I/O if your site (or theirs) is a busy one or not optimally configured.

In a virtual server environment, you cannot tell how many virtual servers are on the same physical server, nor if they are busy or not. You only deal with the effects (see below).

For a Drupal site, the following are some of the most common causes for high disk I/O activity:

  • MySQL, with either a considerable amount of slow queries that do file sorts and temporary tables; or lots of INSERT/UPDATE/DELETE
  • Lots of logging activity, such as a warning or a notice in a module that keeps reporting exceptions many times per disk access
  • Boost cache expiry, e.g. when a comment is posted

Xen based virtualization vs. Virtuozzo or OpenVZ

The market uses virtualization technologies much like airlines when they overbook flights based on the assumption that some passengers will not show up.

Similarly, not all virtual hosting customers will use all the resources allocated to them, so there is often plenty of unused capacity.

However, not all virtualization technologies are equal when it comes to resource allocation.

Virtuozzo and its free variant, OpenVZ, use the term "burst memory" to allocate unused memory from other instances, or even swap space when applications demand it on one instance. However, this can bring a server to its knees if swap usage causes thrashing.

Moreover, some Virtuozzo/OpenVZ hosts use vzfs, a virtualized file system, which is slow for Drupal when used for certain things, such as having all of web root on it, logs, and database files.

Xen does not suffer from any of the above. It guarantees that memory and CPU allocated to one virtual instance stays dedicated for that instance.

However, since physical disk I/O cannot be virtualized, it remains the only bottleneck with Xen.

Underpowered Instances

One issue that Amazon AWS EC2 users face is that the reasonably priced instances are often underpowered for most Drupal sites. These are the Small and Medium instances.

For sites with low number of nodes/comments per day, and with most traffic being anonymous. These sites lend themselves to working well with proper Varnish caching enabled set to long hours before expiring.

Other sites that rely on a large number of simultaneous logged in users, with lots of enabled modules, and with short cache expiry times do not work well with these underpowered instances. Such sites require the Extra Large instances, and often the High CPU ones too.

Of course, this all adds to the total costs of hosting.

Expensive As You Grow

Needless to say, if your site keeps growing then there will be added hosting costs to cope with this growth.

With the cloud providers, these costs often grow faster than with dedicated servers, as you add more instances, and so on.

Misconfigured Self-Virtualization

Some companies choose to self manage physical servers colocated at a datacenter and virtualized them themselves.

This is often a good option, but can also be a pitfall. Sometimes the servers are badly misconfigured. We saw one case where the physical server was segmented into 12 VMWare virtual servers with no good reason. Moreover, all of them were accessing a single RAID array. On top of that boost was used on a busy popular forum. When a comment was posted, boost was expiring pages, and that was tying up the RAID array from doing anything useful to other visitors of the site.

Variability in Performance

With cloud and virtual servers, you often don't notice issues, but then suddenly variability will creep in.

An analogy ...

This happens because you have bad housemates who flush the toilet when you are in the shower. Except that you do not know who those housemates are, and can't ask them directly. The only symptom is this sudden cold water over your body. Your only recourse is to ask the landlord if someone flushed the toilet!

Here is a case in point: a Drupal site at a VPS with a popular cloud provider. It worked fine for several years. Then the host upgraded to another, newer version, and asked all customers to move their sites.

It was fine most of the time, but then extremely slow at other times. No pattern could be predicted.

For example while getting a page from the cache for anonymous visitors usually takes a few tens of milliseconds at most, on some occasions it takes much more than that, in one case, 13,879 milliseconds, with the total page load time 17,423 milliseconds.

Here is a sample of devel's output:

Executed 55 queries in 12.51 milliseconds. .Page execution time was 118.61 ms.

Executed 55 queries in 7.56 milliseconds. Page execution time was 93.48 ms.

Most of the time is spent is retrieving cached items.

ms where query
0.61 cache_get SELECT data, created, headers, expire FROM cache WHERE cid = 'menu:1:en'
0.42 cache_get SELECT data, created, headers, expire FROM cache WHERE cid = 'bc_87_[redacted]'
0.36 cache_get SELECT data, created, headers, expire FROM cache WHERE cid = 'bc_54_[redacted]'
0.19 cache_get SELECT data, created, headers, expire FROM cache WHERE cid = 'filter:3:0b81537031336685af6f2b0e3a0624b0'
0.18 cache_get SELECT data, created, headers, expire FROM cache WHERE cid = 'bc_88_[redacted]'
0.18 block_list SELECT * FROM blocks WHERE theme = '[redacted]' AND status = 1 ORDER BY region, weight, module

Then suddenly, same site, same server, and you get:

Executed 55 queries in 2237.67 milliseconds. Page execution time was 2323.59 ms.

This was a Virtuozzo host, and it was a sign of disk contention. Since this is a virtual server, we could not tell if this is something inside the virutal host or some other tenant on the same physical server flushing the toilet ...

The solution is in the following point.

Move your VPS to another physical server

When you encounter variable performance or poor performance, before wasting time on troubleshooting that may not lead anywhere, it is worthwhile to contact your host, and ask for your VPS to be moved to a different physical server.

Doing so most likely will solve the issue, since you effectively have a different set of housemates.

Further Reading:

Comments

Thu, 2013/04/18 - 17:38

I click on every article I see by 2bits - always very insightful, real-world, down-and-dirty performance information.

I have a small non-profit client on AWS EC2 experiencing these very issues right now intermittently, this article is very helpful in helping me understand what is actually happening.

Thanks!

Tue, 2013/06/04 - 20:33

I have read that on EC2, if you simply stop the instance, Then restart the instance, it comes up on different hardware. The advice being that if it starts acting strange, just stop and start the instance, it might even out again (new housemates). So, I'd agree, rather than spend days trying to figure out what is wrong with the plumbing, just move to a new house. I even wrote a batch file to do that very thing. My instance restarts in like 5 minutes.

Thu, 2013/06/20 - 09:47

You sure can virtualize storage, perhaps not on the hypervisors you mentioned, but it's possible on ESXi. You can limit each vm instance to certain number of IOPS.

The problem is not the technology here, it's hosting companies.

One way to overcome disk contention, at least in my case, was to move Boost cache to memory.

Thu, 2013/09/26 - 20:16

I would suggest to anyone that you go for VPS. It seems to be the more reliable solution. I tried a lot of those VPS and cloud solutions and didn't find proper one. Eventually found Limy VPS and their free trial offer, tested it and decided to subscribe for Advanced Plan. It is the best option for my small online business and very glad I found reliable company. I would recommend to everyone to visit Limy VPS and sign up for 7 days free trial.

Mon, 2016/03/28 - 04:17

If AWS instance is expensive for you, then why not try out DigitalOcean for your Drupal website. If you find installing Drupal on cloud difficult, then try out managed hosting platforms, like Cloudways. Managed platforms offer 1-click hosting option. There is no need to use command line.

Mon, 2016/03/28 - 08:42

There is no need to use command line.

Wow!

Indeed there is no need. The newbies prefer clicking, and more often than not do not understand the underlying concepts.

That is fine.

But most of us experienced sysadmins and devs prefer the command line. We do not see it as a hindrance, but rather as a powerful toolset that we like using.

Is your Drupal or Backdrop CMS site slow?
Is it suffering from server resources shortages?
Is it experiencing outages?
Contact us for Drupal or Backdrop CMS Performance Optimization and Tuning Consulting