Performance Case Study: slow database queries from web to database server

A client contacted us for help with a Drupal multi site installation that is experiencing very slow page load times, causing lots of complaints from their user base.

We started by checking the hosting environment, which was two Xen based VPSs, one with 2 GB of memory acting as the web server, and the other with 1 GB of memory acting as the database server.

The site's home page took a total of 10,055 milliseconds, and had 1,060 queries taking 9,327 milliseconds.

The number of queries is on the high side, but what is more troubling was the 9.3 seconds taken by the queries to execute.

We copied the site to our testing server, and found that the values were a total of 1,177 ms, same number of queries, and taking only 227 milliseconds to generate.

On the VPSs, we timed a query that took 15 ms when executed locally on the database server, but was taking 350 ms when executed from the web server.

The two servers were only one hop apart when pinged. Being Xen VPSs, we could not glean any info on the underlying NICs, their speed and such.

But we realized that there was an important issue: the two servers had only one virtual NIC configured, which is not a typical setup for a multi server system. It is common practice to have two NICs, one for the outside world, where the web server will be communicating with browsers, and the other a private LAN that is only for local traffic.

Another warning sign was that the hosting company was sending "bandwidth exceeded" email reminders, which pointed us more in the direction that traffic between the web server and the database was counted as bandwith from their allocated cap per month.

After the hosting company reconfigured the servers to have two NICs configured, the bandwidth warnings went away, but performance was still constricted.

After more emails to the hosting company convincing them that the constricted bandwidth was the main issue with the application, they agreed to up the capacity on the private LAN from the normal 10 Mbps to 30 Mbps, and eventually to 60 Mbps.

Now, the front page takes 1,012 ms, instead of the 10 seconds that it took before!

Still room for further tuning and improvement, but the major issue was diagnosed and solved without any application changes.



Great work! I just LOVE

Great work! I just LOVE troubleshooting problems like that. Eventhough the host was at fault, you and your client were fortunate that the host was willing to make the necessary changes. Some hosts will ignore requests like that, and you end up needing to move a site because of it.

Just goes to show, you can never be too careful with your hosting selection... AND, it PAYS to have a bright consultant help you with your problems!