Recently, we were reviewing the performance of a large site that has a significant portion of its traffic from logged in users. The site was suffering from a high load average during peak times.

We enabled slow query logging on the site for a entire week, using the following in my.cnf:

log_slow_queries               = 1
slow_query_log                 = 1
slow_query_log_file            = /var/log/mysql/slow-query.log
log-queries-not-using-indexes  = 1
long_query_time                = 0.100

Note that the parameter long_query_time can be a fraction of a second only on more recent versions on MySQL.

You should not set this value too low, otherwise the server's disk could be tied up in logging the queries. Nor should it be too high so as to miss most slow queries.

We then analyzed the logged queries after a week.

We found that the slow queries, on aggregate, examined a total of 150,180 trillion rows, and returned 838,930 million rows.

Out of the total types of queries analyzed, the top two had a disproportionate share of the total.

So these two queries combined were 63.7% of the total slow queries! That is very high, and if we were able to improve these two queries, it would be a huge win for performance and server resources.

Voting API Slow Query

The first query had to do with Voting API and Userpoints.

It was:

SELECT votingapi_vote.*
FROM votingapi_vote
WHERE  value_type = 'points'
AND tag = 'userpoints_karma'
AND uid = '75979'
AND value = '-1'
AND timestamp > '1464077478'

It hogged 45.3% of the total slow queries, and was called 367,531 times per week. It scanned over 213,000 rows every time it ran!

The query took an aggregate time for execution of 90,766, with an average of 247 milliseconds per execution.

The solution was simple: create an index on the uid column:

CREATE INDEX votingapi_vote_uid ON votingapi_vote (uid);

After that was done, the query used the index and scanned only one row, and returned instantly.

Private Messaging Slow Query

The second query had to do with Privatemsg. It is:

SELECT COUNT(pmi.recipient) AS sent_count
FROM pm_message pm
INNER JOIN pm_index pmi ON pm.mid = pmi.mid
WHERE  pm.author = '394106'
AND pm.timestamp > '1463976037'
AND pm.author <> pmi.recipient

This query accounted for 18.4% of the total slow queries, and was called 32,318 times per week. It scanned over 1,350,000 rows on each execution!

The query took an aggregate time for execution of 36,842, with an average of 1.14 seconds (yes, seconds!) per execution.

Again, the solution was simple: create an index on the author column.

CREATE INDEX pm_message_author ON pm_message (author);

Just like the first query, after creating the index, the query used the index and scanned only 10 rows and over a million! It returned instantly.

Results After Tuning

As with any analysis, comparison of the before and after data is crucial.

After letting the tuned top two offending queries run for another week, the results were extremely pleasing:

Before After
Total rows examined 150.18 T 34.93 T
Total rows returned 838.93 M 500.65 M

A marked improvement!

Conclusion

With performance, the 80/20 rule applies. There are often low hanging fruit that can easily be tuned.

Do not try to tune because of something you read somewhere, that may not apply to your site (including this and other articles on our site!)

Rather, you should do proper analysis, and reach a diagnosis based on facts and measurements, as to the cause(s) of the slowness. After that, tuning them will provide good results.

Comments

Tue, 2016/06/14 - 09:08

Have you written patches for these? The database schema API in Drupal 7 supports adding indexes via db_add_index() and the hook_schema() definition. Have you written patches to these modules to add the indexes?

Tue, 2016/06/14 - 09:33

No, I have not had the time to do so.

Moreover, these queries, although from commonly used modules, are very specific to this site, and not general queries that are found in the module out of the box.

Thu, 2016/06/16 - 05:05

Does adding indexes that modules may add in future updates isn't a little dangerous ?
It can blocked on the index name. If you tell index for "author" is named "author" and module make same thing in update, it crash.
In this case, I suggest to name the index with a prefix of your choice. No blocking on update and easy find on removal on duplicates.

Thu, 2016/06/16 - 08:42

No, it will neither block, nor crash. It will happily create another index on the same column with another index name.

Try it yourself. Create an index on a column, then create another index on the same column with a different index name.

CREATE INDEX index1 on table1 (column1);

CREATE INDEX index2 on table1 (column1);

MySQL will create both indexes without any errors.

That means that all you need is monitor new releases for those modules, and when a new index is created, delete yours and that is it.

Thu, 2016/06/16 - 08:49

My (problematic ?) example is rather like :
CREATE INDEX author on table1 (author);

And the module does :
db_add_index('table1', 'author', array('author'));

Which can make an error I think.

Indexes with differents name should not be a problem, that's why I was talking about prefixing these to prevent errors (and remove them if module do it on its way someday).

Thu, 2016/06/16 - 08:53

It will return an error, but it will not block.

ERROR 1061 (42000): Duplicate key name 'author'

This will be an update.php task, which should not affect any other tasks before or after it. Only this one will fail.

And since the index exists, there is nothing more to do. Just ignore the error.

Is your Drupal or Backdrop CMS site slow?
Is it suffering from server resources shortages?
Is it experiencing outages?
Contact us for Drupal or Backdrop CMS Performance Optimization and Tuning Consulting