One client of ours was facing severe issues with their relatively new well equipped server: the server stopped responding to web requests, and was rebooted, only to stop responding again.
Upon investigation, we found out that pages were taking a lot of time to load.
This only happened when viewing a node in full page view, not when the
nodes were in lists (just as in views, node edit form ...etc.)
Devel was showing this:
Page execution time was 189895.68 ms. Executed 863 queries in 619.69 milliseconds. Page execution time was 309544.34 ms. Executed 682 queries in 125.29 milliseconds. Page execution time was 309567.13 ms. Executed 533 queries in 123.34 milliseconds.
As you can see, it is more than 5 minutes for the page to load, and it was not because of database queries.
At one point we found the server using 100% of all 8 CPUs, with 350+ processes, compared to a normal of 95 to 150 processes under normal load. More curiously, no one process was hogging the CPU. All the time was system not user time.
1 [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%] Tasks: 352 total, 1 running 2 [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%] Load average: 69.28 123.59 112.93 3 [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%] Uptime: 02:16:10 4 [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%] 5 [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%] 6 [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%] 7 [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%] 8 [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.0%] Mem[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||7417/8002MB] Swp[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||1729/2000MB] PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command 15573 www-data 20 0 257M 31840 2540 S 8.8 0.4 0:00.87 /usr/sbin/apache2 -k start 15016 www-data 20 0 265M 39632 2684 S 2.5 0.5 0:03.80 /usr/sbin/apache2 -k start 16099 root 20 0 19528 1508 996 R 1.3 0.0 0:00.67 htop 5762 mysql 20 0 322M 79324 2504 S 0.6 1.0 16:46.69 /usr/sbin/mysqld --basedir=/usr 15536 www-data 20 0 243M 18152 2836 S 0.6 0.2 0:59.75 /usr/sbin/apache2 -k start 14937 www-data 20 0 264M 38608 2604 D 0.6 0.5 0:02.23 /usr/sbin/apache2 -k start 15112 www-data 20 0 234M 9208 2748 S 0.6 0.1 0:04.44 /usr/sbin/apache2 -k start
We suspected a network issue, and using netstat, we were able to confirm that it is the case, since we saw the following:
tcp 0 212 10.0.0.1:50803 64.191.203.26:80 ESTABLISHED 7110/apache2 tcp 0 240 10.0.0.1:45267 64.191.203.26:80 ESTABLISHED 7071/apache2 tcp 0 217 10.0.0.1:45287 64.191.203.26:80 ESTABLISHED 7095/apache2 tcp 0 246 10.0.0.1:50812 64.191.203.26:80 ESTABLISHED 6451/apache2 tcp 0 207 10.0.0.1:50844 64.191.203.26:80 ESTABLISHED 6487/apache2 tcp 0 203 10.0.0.1:45202 64.191.203.26:80 ESTABLISHED 6460/apache2 tcp 0 206 10.0.0.1:50810 64.191.203.26:80 ESTABLISHED 7534/apache2 tcp 0 206 10.0.0.1:50842 64.191.203.26:80 ESTABLISHED 7060/apache2 tcp 0 206 10.0.0.1:50756 64.191.203.26:80 ESTABLISHED 7545/apache2 tcp 0 191 10.0.0.1:45276 64.191.203.26:80 ESTABLISHED 5762/apache2 tcp 0 214 10.0.0.1:50795 64.191.203.26:80 ESTABLISHED 7073/apache2 tcp 0 219 10.0.0.1:50801 64.191.203.26:80 ESTABLISHED 7081/apache2 tcp 0 192 10.0.0.1:45207 64.191.203.26:80 ESTABLISHED 6448/apache2 tcp 0 217 10.0.0.1:50832 64.191.203.26:80 ESTABLISHED 6449/apache2 tcp 0 207 10.0.0.1:45240 64.191.203.26:80 ESTABLISHED 6443/apache2 tcp 0 211 10.0.0.1:50814 64.191.203.26:80 ESTABLISHED 7046/apache2 tcp 0 206 10.0.0.1:50799 64.191.203.26:80 ESTABLISHED 7112/apache2 tcp 0 188 10.0.0.1:50735 64.191.203.26:80 ESTABLISHED 6979/apache2 tcp 0 212 10.0.0.1:45220 64.191.203.26:80 ESTABLISHED 6442/apache2 tcp 0 194 10.0.0.1:45256 64.191.203.26:80 ESTABLISHED 6552/apache2 tcp 0 210 10.0.0.1:50742 64.191.203.26:80 ESTABLISHED 6532/apache2 tcp 0 212 10.0.0.1:50798 64.191.203.26:80 ESTABLISHED 6551/apache2 tcp 0 243 10.0.0.1:45225 64.191.203.26:80 ESTABLISHED 7036/apache2 tcp 0 207 10.0.0.1:45229 64.191.203.26:80 ESTABLISHED 6560/apache2 tcp 0 191 10.0.0.1:50830 64.191.203.26:80 ESTABLISHED 5813/apache2 tcp 0 192 10.0.0.1:45270 64.191.203.26:80 ESTABLISHED 6438/apache2 tcp 0 194 10.0.0.1:50794 64.191.203.26:80 ESTABLISHED 6578/apache2 tcp 0 217 10.0.0.1:50759 64.191.203.26:80 ESTABLISHED 7121/apache2 tcp 0 207 10.0.0.1:50819 64.191.203.26:80 ESTABLISHED 7048/apache2 tcp 0 202 10.0.0.1:50805 64.191.203.26:80 ESTABLISHED 6465/apache2 tcp 0 212 10.0.0.1:45222 64.191.203.26:80 ESTABLISHED 7108/apache2 tcp 0 259 10.0.0.1:45242 64.191.203.26:80 ESTABLISHED 6437/apache2 tcp 0 225 10.0.0.1:45238 64.191.203.26:80 ESTABLISHED 7024/apache2 tcp 0 232 10.0.0.1:50749 64.191.203.26:80 ESTABLISHED 6479/apache2 tcp 0 243 10.0.0.1:45218 64.191.203.26:80 ESTABLISHED 5526/apache2
With DNS resolving the names, we saw this:
tcp 0 192 example.com:45209 services.digg.com:www ESTABLISHED 6564/apache2 tcp 0 207 example.com:45213 services.digg.com:www ESTABLISHED 8323/apache2 tcp 0 212 example.com:50734 services.digg.com:www ESTABLISHED 7118/apache2 tcp 0 199 example.com:50821 services.digg.com:www ESTABLISHED 7119/apache2 tcp 0 196 example.com:45258 services.digg.com:www ESTABLISHED 6476/apache2 tcp 0 222 example.com:45261 services.digg.com:www ESTABLISHED 6523/apache2 tcp 0 212 example.com:45286 services.digg.com:www ESTABLISHED 6559/apache2 tcp 0 224 example.com:45249 services.digg.com:www ESTABLISHED 5703/apache2 tcp 0 202 example.com:50755 services.digg.com:www ESTABLISHED 6562/apache2 tcp 0 227 example.com:45244 services.digg.com:www ESTABLISHED 6457/apache2 tcp 0 205 example.com:50736 services.digg.com:www ESTABLISHED 7062/apache2 tcp 0 236 example.com:50745 services.digg.com:www ESTABLISHED 7109/apache2 tcp 0 236 example.com:45277 services.digg.com:www ESTABLISHED 7013/apache2 tcp 0 204 example.com:50757 services.digg.com:www ESTABLISHED 6563/apache2 tcp 0 256 example.com:45295 services.digg.com:www ESTABLISHED 6960/apache2 tcp 0 185 example.com:50818 services.digg.com:www ESTABLISHED 6489/apache2 tcp 0 206 example.com:45221 services.digg.com:www ESTABLISHED 6475/apache2 tcp 0 207 example.com:45268 services.digg.com:www ESTABLISHED 7922/apache2
Now, we know that it was connections to services.digg.com. How many? 169 connection at one point.
This means 169 Apache processes tied up doing nothing, occuping memory, waiting for the other end to respond. Moreover, with more incoming requests, more and more memory was consumed causing the server to start swapping and going into thrashing and "freezing up".
Digging into the site's module, we found that it was the diggthis that was the culprit. More accurately, it was a combination of using module, with a problem with a remote server accessed from the backend.
Disabling that module solved the problem.
So, the morale of the story is to avoid having server-side code that does calls to third party servers outside of your control. If that server is down, or slow, your site will be down or slow too. At least have a reasonable timeout value if you must do so.
Comments
Khalid
Here is a patch for timeout
Fri, 2008/10/10 - 23:45A patch in #156582 adds a timeout value to drupal_http_request().
--
2bits -- Drupal and Backdrop CMS consulting
Visitor (not verified)
Signs of the widget bubble
Sat, 2008/10/11 - 03:35I'll go you one further: these 3rd party "widgets" in drupal should use cron.php to update their own cache occasionally, in addition to having a short temper when it comes to outside servers locking them up in some perverted "keep-alive" ruse.
Visitor (not verified)
death to blog widgets
Sat, 2008/10/11 - 18:27Yes, blog widgets that rely on real time updates are the real problem, in my opinion.
I wrote about this almost a year ago and found some data for it when it first really hit me.
Death to Blog Widgets
Khalid
You are right ...
Sat, 2008/10/11 - 21:41You are right. Your blog post is mainly about front end widgets. They do slow down the user's experience, but they have no effect on the back end of the server.
This case of diggthis module, doing server to server connections over the network killed the server totally, not only made the site feel slow to users browsing it.
--
2bits -- Drupal and Backdrop CMS consulting
Visitor (not verified)
Great find!
Sat, 2008/10/11 - 08:24Great find!
Visitor (not verified)
The Web 2.0 Myth
Sun, 2008/10/12 - 02:04Thanks again for bringing attention to something that everyone should know.
There is no such thing as a free lunch.
The truth off web 2.0 is that you can mash anything together... the myth of web 2.0 is that you can do it in real time. If you are building for performance and scaling DO NOT use web services in real time.
As this example shows a dependency on an a service outside of your control can ruin your day.
andre
Visitor (not verified)
One more vote to back up the others
Tue, 2009/01/06 - 21:41Indeed, I've become very disillusioned about most third-party services. In the better cases they seem like they slow things down 'a little bit', at worse they're totally crippling to the performance and even the availability of a site, and in all in all cases it puts a part of your code base out of your hands and makes it un-testable (unit testing-wise). Stay away from them as much as possibly can, and only use them if you absolutely have_to_have_to, and *never, ever* use one on for mission critical functionality. (says someone who has used a third-party service in user registration workflow)
Nikhil (not verified)
Dependence on Twitter/Facebook
Wed, 2010/10/13 - 04:27Indeed, many sites now want to be social-network compatible and adding FB like/share and Twitter tweet buttons has become the norm. I shudder to think the repercussions of such sites slowing down..