We have mentioned before that both Pressflow 6.x and Drupal 7.x (but not core Drupal 6.x), disable page caching when a session is created for an anonymous user.
An extreme case of this happened recently, because of a perfect storm.
Symptoms
The client sends a newsletter to site users, be they who have accounts on the site, or others who just entered their email to get the newsletter.
After a recent code change, when a newsletter was sent, suddenly, we found saw a very high load average, very high CPU usage, and because we plot the number of logged in and anonymous users too, we found around 800 anonymous users on the site, in addition to the peak of 400 logged in users!
Since this is Pressflow, anonymous users are mostly served by Varnish, and they are not supposed to have sessions.
Investigation
So, we started to investigate those anonymous sessions in the database, in the sessions table.
Indeed, there are lots of anonymous sessions.
SELECT COUNT(*) FROM sessions WHERE uid = 0; +----------+ | count(*) | +----------+ | 6664 | +----------+
And upon closer investigation, most of those sessions had a message in them saying "Comment field is requried".
SELECT COUNT(*) FROM sessions WHERE uid = 0 AND session LIKE '%Comment field is required%'; +----------+ | count(*) | +----------+ | 5176 | +----------+
And just to compare the day the newsletter was sent to other days, we confirmed that indeed, that day had many multiples of any other day in terms of sessions.
In fact, more than 5X the highest day prior, and up to 55X higher than more typical days.
SELECT DATE(FROM_UNIXTIME(timestamp)) AS date, COUNT(*) FROM sessions WHERE uid = 0 GROUP BY date; +------------+----------+ | date | count(*) | +------------+----------+ | .......... | .. | | 2013-04-19 | 55 | | 2013-04-20 | 81 | | 2013-04-21 | 66 | | 2013-04-22 | 115 | | 2013-04-23 | 99 | | 2013-04-24 | 848 | | 2013-04-25 | 72 | | 2013-04-26 | 4524 | | .......... | .. | +------------+----------+
Graphs show the magnitude of the problem
Look at the graphs, to the very right of each one, after Friday noon.
You can see how the load shot up after the newsletter was sent:
The number of anonymous sessions shot up from only a handful to around 800!
The number of logged in users had a spike to 400, up from the 300 or above.
The number of SQL queries also shot up.
And so did the MySQL threads too.
And the CPU usage was very high, with the server trying to serve around 1200 users with no caching for them.
Root Cause Analysis
So, it turns out that the recent code change was done to encourage more people to sign up for an account on the site. This form alters the comment form and adds extra fields to prod the visitor to register for an account, including the email address. Another form above the node also captures the email address.
If people clicked on the button to add their email, Pressflow complained about the missing comment field. And since any message, be it for a logged in users or an anonymous one, is stored in a session, all users who tried to register for an account were treated as logged in users in that they bypass the page cache for Pressflow. This effectively tripled the number of logged in users (from 400 to 1200), who all have to execute PHP and MySQL queries and not being served from Varnish.
Hence the high load and high CPU usage.
Solution
The fix was to revoke the post comment permission for anonymous users, and therefore, remove the comment form from the bottom of every node.
After that, the newsletter was sent without increasing the load the server at all.
Although this problem was on Pressflow 6.x, it should apply to Drupal 7.x as well, since it also disables sessions for anonymous users.
Comments
Clarence Larkins (not verified)
Thanks for sharing this
Fri, 2013/06/07 - 00:18Thanks for sharing this informative post. If people clicked on the button to add their email, Pressflow complained about the missing comment field, then this is a root Cause Analysis