When boost slows down your Drupal site ...

The Boost is often a great help with speeding up web sites of small to medium size and/or hosted on shared hosts.

It works by writing the entire cached page to a disk file, and serving it entirely from the web server, bypassing PHP and MySQL entirely.

This works well in most cases, but we have observed a few cases where boost itself becomes a bottleneck.

One example was when 2bits.com were called to investigate and solve a problem for a Fortune 500 company's Drupal web site.

The site was configured to run on 12 web servers, each being a virtual instance on VMWare, but all of them sharing a single RAID-5 pool for disk storage.

The main problem was when someone posts a comment: the site took up to 20 seconds to respond, and all the web instances were effectively hung.

We investigated and found out that what happens is that boost's expiry logic kicked in, and tries to delete cached HTML intelligently for the node, the front page, ...etc. All this while the site is busy serving pages from the same disk from boost's cache, as well as other static files.

This disk contention from deleting files caused the bottleneck observed.

By disabling boost, and using memcache instead, we were able to bring down the time from 20 seconds to just 8 seconds.

Further improvement could be achieved by using Varnish as the front tier for caching, reducing contention.




Yes, due to several reasons ...

Yes, 8 seconds it kind of long. It was due to several reasons:

- They did not have APC installed for PHP, so no op-code caching.

- They did not have memcache as a data cache.

- They had 50 comments displayed per page, which added up.

We were able to get the time down to 1.6 seconds.

But the "boost only" difference was 12 seconds on its own.

The biggest problem that I

The biggest problem that I saw when reading this was RAID 5. No wonder boost was dog slow, RAID 5 cannot deliver requred IOPS for write operations. RAID 5 carries huge IOPS write penalty and it's just a disaster waiting to happen. Calculating parity information adds additional disk write latency. In contrast, RAID 10 can deliver twice as much write IOPS at the same number of disks and suffers no latency issues. Anybody with a bit of brain would not put RAID 5 into production. And to jump ahead of storage space, disks don't cost $300 for 40GB anymore, it's relatively cheap to build storage system with RAID 10 to deliver the same capacity as RAID 5, and additional performance gain and reliabilty (RAID 5 can easily fail without any disks failing, so there's no reliability there at all) easily justify the costs. Anyway, when virtualizing, nobody looks at $ per GB, but rather at $ per IOPS.

Well, not quite cut and dry ...

Well, it is not as absolutist as you make it sound.

If the system was being set up for a new project, then
The client has this infrastructure for whatever reasons they had (historical, restrictions, budget, don't know better, ...etc.).

RAID 10 requires more spindles, which is more cost, not only initial outlay, but also room in chassis, controllers, power, ...etc.

We were able to improve performance with the infrastructure limitations that they had. So the existing RAID 5 setup did not become the bottleneck that it was.

Remember that most hosting setups can't even have RAID 5, and would at best have RAID 1, due to the limitation on the number of physical spindles they can cram into a chassis.

RAID 10 only requires more

RAID 10 only requires more spindles IF your goal is the storage space. But like I said before, IOPS and reliabilty are far more important, and we have disks hitting 4TB now so storage should never be a concern for a Drupal site (unless you're trying to reproduce Wikipedia or similar caliber site in Drupal).

I always thought RAID

I always thought RAID required spindles for speed in addition to storage. For example, an 8TB RAID 10 made from eight 2TB disks would be faster than an array made of four 4TB disks.

But since we are talking servers and not desktops, disk space is at a premium. A typical 2U HP server may only have room for 8 SFF drives and at $1,000.00 per 1.2TB drive, that eats up a lot of budget.

$350.00 gets you 600GB drives; 1/2 the space for 1/3 the cost, which makes the bean counters very happy.

Faced with budgetary constraints, which sometimes override application needs, the sysadmin has to make some choices, including uptime which means leaving one of the 8 drives unused for a hot spare, further reducing the available space/spindles but increasing the available uptime (sleeptime).

So I can easily understand where RAID 5 becomes attractive. Especially when money is an object.

You're right, but we're not

You're right, but we're not talking about number of disk in RAID 10 array, but rather number of disks in RAID 5 vs RAID 10 arrays. The same number of disks in both arrays and RAID 10 will deliver twice as much IOPS than RAID 5.

In the configuration mentioned in article I doubt they were running local storage. With big setups like that, you implement NAS or SAN boxes, and these can easily be extended with additional expander frames if more storage or performance is required.

If uptime is your concern, and that implies reliability, any form of parity RAID, even with hot spares doesn't even come close to RAID 10. And in case of disk failure, parity based arrays have extremely long rebuild time, and chances of additional disk failures are growing with array size, reaching 100% at 12TB for SATA disks.

Also, majority of storage vendors published articles advising against using RAID 5 in production.

I also don't see how RAID 5 can be attractive, if you consider all the risks, performance penalty, RAID 10 is clearly most cost effective option. Remember, storage systems are most likely the only bottlenecks we face when deploying applications or virtualizing, it doesn't make much sense to for more storage vs more IOPS. Let me give you an example, Sharepoint 2010 Search app crawl database requires 3500 to 7000 IOPS, lest say an average of 5000 (numbers taken from MS Technet). Now, single 3.5" 15k SAS disk delivers about 250 IOPS, you would need 40 disks in RAID 10 to meet that requirement, but you would need 80 disks in RAID 5 to deliver the same performance! And that doesn't even take latency into consideration. Now, how is RAID 5 more cost effective and more attractive here?