Most of us have heard about the 80/20 rule a.k.a. The Pareto Principle. Simply put: 80% of something have just 20% of the reasons. In other words if you address the 20% of the causes, 80% of the symptoms will go away because of that.
Since we do a lot of performance optimization for Drupal sites, we noticed a while back that this rule is true for optimization: there will be a few underlying causes that account for the bulk of the symptoms, whether they are slowness, inability to handle many concurrent users, or poor server resource utilization.
One such case is for a client that we are working with currently. They get over 415,000 page views per day and 100,000 visits (both via Google Analytics). They did not have any other server resource measurement tools, nor web server log analysis tools.
So in addition to our Drupal performance assessment, tuning and optimization service, we recommended that we also do Drupal server installtion, configuration and tuning for them. They accepted and we proceeded to configure the new server with an optimized stack at the same hosting company they were using.
After we installed our recommended set of software tools, we found that Awstats says that page views are as much as 950,000 per day. We have not seen such high variance from Google Analytics before. Normally that variance is around 20% more for Awstats over Analytics.
Those tasks helped a lot and we were able to get better stability and performance on a single server.
But there was still some variance that we are seeing in the server resource utilization, such as load average occasionally jumping up, and disk I/O as well.
After some investigation, we narrowed down the main reason to MySQL doing a lot of slow queries that use temporary tables and file sort. We saw that on Munin's graph under MySQL slow queries as well as in the I/O stat graph as well.
The query in question was a voting API "Current user" view, which causes this query which takes 6 to 14 seconds:
SELECT node.nid AS nid,
node.title AS node_title,
votingapi_vote_node_vote_curuser.value_type AS votingapi_vote_node_vote_curuser_value_type,
votingapi_vote_node_vote_curuser.value AS votingapi_vote_node_vote_curuser_value,
votingapi_vote_node_vote_curuser.timestamp AS votingapi_vote_node_vote_curuser_timestamp
FROM node node
LEFT JOIN votingapi_vote votingapi_vote_node_vote_curuser
ON node.nid = votingapi_vote_node_vote_curuser.content_id
AND (votingapi_vote_node_vote_curuser.content_type = 'node'
AND votingapi_vote_node_vote_curuser.tag = 'vote'
AND votingapi_vote_node_vote_curuser.uid = '0')
WHERE (node.status <> 0)
AND (votingapi_vote_node_vote_curuser.value IS NOT NULL)
ORDER BY votingapi_vote_node_vote_curuser_timestamp DESC
LIMIT 231300, 50;
An EXPLAIN on that statement showed "using temporary, using filesort", so no wonder it was really slow.
As soon as we disabled that view, things were far more stable, and it shows ...
Look at 5 pm in the following graphs, and compare before and after.
MySQL slow queries drop sharply.
See how the CPU graph shows barely any system time (green) after 5pm, and less I/O wait time (magenta)?
The I/O stat graph shows barely any activity after the change on /dev/sda (the root disk, which has the /tmp directory that MySQL uses for temporary tables).
The load average falls visibily as a result.
And the PHP CGI processes show much less variance.
In many cases of slow performance, there are "low hanging fruit" that would give measurable performance enhancement for relatively little effort. The hard part is finding which are the ones that apply to your specific site, and addressing them. This is why we recommend starting with a performance assessment.