On many occasions, we see web site performance suffereing due to misconfiguration or oversight of system resources. Here is an example where RAM and Disk I/O severely impacted web site performance, and how we fixed them.
A recent project for a client who had bad site performance uncovered issues within the application itself, i.e. how the Drupal site was put together. However, overcoming those issues was not enough to achieve the required scalability with several hundred logged in users on the site at the same time.
First, regarding memory, the site configured too many PHP-FPM processes, and that left no room in memory for the filesystem buffers and cache, which help a lot with disk I/O load.
Here is a partial display from when we were monitoring the server before we fixed it:
As you can see, the buffers + cache + free memory all amount to less than 1 GB of total RAM, while the used RAM is over 7GB.
used | buffers | cache | free |
---|---|---|---|
7112M | 8892k | 746M | 119M |
7087M | 9204k | 738M | 151M |
7081M | 9256k | 770M | 125M |
7076M | 4436k | 768M | 136M |
7087M | 4556k | 760M | 133M |
We did calculations on how much RAM is really needed by watching the main components on the server:
In this case the calculation was:
Memcache + MySQL + (Apache2 X number of instances) + (PHP-FPM X number of instances)
And then adjusting the PHP-FPM number of processes down to a reasonable number, for a total application RAM of no more than 70% of the total.
The result is as follows. As you can see, used memory is now 1.8GB instead of 7GB. Free memory will slowly be used by cache and buffers making I/O operations much faster.
used | buffers | cache | free |
---|---|---|---|
1858M | 50.9M | 1793M | 4283M |
1880M | 51.2M | 1795M | 4258M |
1840M | 52.1M | 1815M | 4278M |
1813M | 52.4M | 1815M | 4304M |
Another issue with the server, partially caused by by the above lack of cache and buffers, but also by forgotten settings, caused a severe bottleneck in the Disk I/O performance. The disk was so tied up that everything had to wait. I/O Wait was 30%, as seen in top and htop. This is very very high, and should usually be no more than 1 or 2% maximum.
We also observed excessive disk reads and writes, as follows:
disk read | disk write | i/o read | i/o write |
---|---|---|---|
5199k | 1269k | 196 | 59.9 |
1731k | 1045k | 80 | 50.7 |
7013k | 1106k | 286 | 55.2 |
23M | 1168k | 607 | 58.4 |
9121k | 1369k | 358 | 59.7 |
Upon investigating, we found that the rules_debug_log setting was on. The site had 130 enabled rules and the syslog module was enabled. We found a file under /var/log/ with over a GB per day and growing. This writing of rules debugging for every page load tied up the disk when a few hundred users were on the site.
After disabling the rules debug log settings, wait for I/O went down to 1.3%! A significant improvement.
Here is the disk I/O figures after the fix.
disk read | disk write | i/o read | i/o write |
---|---|---|---|
192k | 429k | 10.1 | 27.7 |
292k | 334k | 16.0 | 26.3 |
2336k | 429k | 83.6 | 30.7 |
85k | 742k | 4.53 | 30.8 |
Now, the site has response times of 1 second or less instead of the 3-4 seconds.
Most Comments