WordPress Configuration Part 5: Performance Tuning

Wordpress in the Cloud

This is the last post in a five part series that covers the basics of getting WordPress up and running on Engine Yard. In the first post, we covered how to get a basic install deployed. In the second post we introduced a way to make modifications locally, and freeze them, so that they can be deployed to a cluster of cattle servers. In the third post we looked at two approaches to installing plugins locally, and got an S3 bucket set up to serve images so that we’re not relying on the filesystem, which is ephemeral. Then, in the fourth post we set up New Relic for application monitoring, and CloudFlare for site speedup, analytics, and security protection.

Phew! That’s quite a lot!

In this post, we’re going to bring it all together and test the performance of our setup to see if there’s anything we can do to improve.

Enter Blitz

Engine Yard offers Blitz as an add-on, meaning that account creation and billing is handled for you through your Engine Yard account. Blitz is a cloud-based load-testing tool that’s a complete snap to use. Just point it at your website, tell it how many users you want it to send, and watch as it graphs out your site performance under load in real time. There are plenty of other features too, like integration into your continuous deployment setup. But we won’t be covering those in this post.

To get started, head on over to the Blitz add-on page (making sure to log in) and then click SET IT UP. Once that’s done, click Blitz Dashboard and you’re in. Welcome to your new Blitz account!

Before we continue, you need to authorise the domains you’re going to test. This is a security requirement to make sure you’re the operator of the site you want to load-test. Visit the Blitz authorization page, and grab the filename that is displayed to you. It should look something like this:

/mu-87572b4e-9058f6a2-2b9bca35-16445c32.txt

Open a terminal and navigate to your site’s Git checkout and run:

$ echo 42 > mu-87572b4e-9058f6a2-2b9bca35-16445c32.txt

You will need to change to the public directory (or similar) if that’s where WordPress is being served out of. Add this change to Git, commit, and push. Now log into your Engine Yard dashboard and find your app. Click the Deploy button and wait for the deployment to complete. Now click on the HTTP link on your application master instance, and add /mu-87572b4e-9058f6a2-2b9bca35-16445c32.txt to the end of the URL to check that the file is being served correctly.

The full URL you’re checking should look something like this:

http://ec2-184-73-140-47.compute-1.amazonaws.com/mu-87572b4e-9058f6a2-2b9bca35-16445c32.txt

If that works (it should be a white page with the number 42) go to your Blitz dashboard and find the part that invites you to Add a Domain. We’re going to be testing both of your application instances, so grab the HTTP link for one, plug it in, verify that it’s authorised, and then do the same for the other application instance.

Before we continue, you should log in to your New Relic account, as we’ll be using New Relic to monitor CPU usage throughout our tests. Head on over to the New Relic add-on page and click New Relic Dashboard. Leave this window open in a separate tab.

Testing A Single Instance

When we first set up our blog we opted to use the production configuration for our environment, meaning we have one application master, and one application instance. To start our testing, we’re going to disable the application instance by removing it from the load-balancing pool, i.e. HTTP requests won’t be balanced to it, meaning that we’re testing the ability of our master to handle all requests.

To do this, go to your Engine Yard dashboard, find the instance that says Application instead of Application Master, and then click on the SSH link. This should open up a terminal for you and log you in. If it doesn’t, you’ll need to set up SSH keys, add them to your account, and then install them in your environment.

Once you’re in, run:

$ sudo /etc/init.d/nginx stop

This stops the nginx webserver, and takes the instance out of the HAProxy pool, meaning requests won’t be sent to it.

Next, grab the HTTP link for your Application Master instance, and head on over to Blitz’s play screen. Plug in the URL for your application master, and set the following parameters:

  • Type: Rush
  • Timeout: 30000
  • Users (start): 1
  • Users (end): 200
  • Duration: 5m

What this means is that Blitz is going to take 5 minutes gradually building up traffic from one visitor to 200 simultaneous visitors. This is called a rush test. And we’ve set the timeout to 30 seconds because we want to see how slow our site is under high load, no matter how slow it gets.

When you’re ready, press the big play button.

Here’s the results I got:

What we can see here is that our application instance is able to handle the load with no impact on response latency right up until the point of having to serve around 25 simultaneous users. At that point, as we increase the number of simultaneous users, the response times start to slow down. As we can see from the graph, the increase in response latency is linear with the increase in users, which is a great. The server isn’t grinding to a halt, but it is slowing down in proportion to the amount of work you’re throwing at it.

Switch back to your New Relic tab, and navigate to Servers and then find the server that corresponds with your application master. Select that and you’ll be able to see an overview of the CPU and load.

Here’s what mine looked like:

Nothing unexpected here. A large spike in CPU usage that drops off after the experiment finishes. We can see the load spiked at 5.

By the end of the experiment, with 200 simultaneous users, responses are taking 10 seconds to complete. If this were a real site, most users would think the site was down and would probably leave.

Testing A Cluster

So that’s what a single instance looks like. But let’s bring the other application instance into the pool and see what the performance looks like when we’re sharing the workload between two instances.

SSH back into your application instance (not the master) and run:

$  sudo /etc/init.d/nginx start

This starts the nginx webserver and brings the instance back into the HAProxy pool, meaning requests will be load balanced to it.

Go back to Blitz and do another rush, keeping all of the same settings.

Here’s what I got:

Okay, so what we’re seeing here is that the application cluster handled the increasing load without any degradation in performance up until somewhere just under the 50 concurrent users mark. This makes sense. Our previous test told us that our application master can handle about 25 concurrent users without performance degradation. Now we’re sharing that workload between two machines, and the number approximately doubles.

Again, the relationship between load and performance is linear. For our target range of concurrent users, we actually want performance to be constant, no matter what the load is. But if we’re pushing our app performance out of bounds, a linear performance degradation is better than a non-linear one.

Here’s the CPU for the application master:

Again, nothing surprising here. We’re maxing out the CPU again. And the graph looked about the same for the other application instance. (Note that in both cases, the database instance hardly breaks a sweat. Most of the work being done here is in assembling the page in PHP.)

So what does this mean for our blog? Well, in its current state, we can handle about 50 concurrent users before performance starts to suffer. That’s quite a lot of users. Way more than most blogs will need on a day to day basis. But we might still struggle if a post gets popular on social media or in the news.

Cache All the Things

So, if all that CPU time is being spent inside PHP assembling the page, can we speed that up? Perhaps eliminate it entirely? Yup, we sure can.

W3 Total Cache is a WordPress plugin that does just this. The plugin itself has quite a lot of features, but we’ll only be making use of a few of them for this experiment. We’re interested in the functionality that uses APC to store the results of expensive operations inside PHP.

Install the plugin using one of the “freeze and deploy” methods outlined previously. Once you’ve checked in the changed files and redeployed, open up your blog admin. Find the plugin, activate it, and then go to the settings page.

Turn on the Page Cache and select APC. What this does is store the results of the computationally expensive page generation. When you visit a page that has been cached, it is pulled directly out of RAM. If you wanted to get fancy, you could configure W3 Total Cache to use memcached here. You’d need to spin up a memcached instance in your environment, but doing so would allow all of your application instances to share a cache. For our blog, this isn’t very important, so we’re going to skip doing that.

Turn on the Object Cache and select APC. This is less important than the page cache, and really only speeds up responses when PHP computation is needed. But we may as well turn it on. More caching the merrier!

Leave the rest of these config settings for now. Find and click any of the Save All Settings buttons. If you get some notices that pop up, it’s probably safe to dismiss them. I got one about file permissions which wasn’t important.

Go back to Blitz and do another rush, keeping all of the same settings.

Here’s what I got:

Impressive, huh?

Let’s take a look at what’s going on here. The first thing to notice is the scale on the y axis. We’ve gone from 1-10 seconds as our response time range, to 10-70ms. No response took more than 1/10th of a second. In fact, we can see that at the start of the experiment, the response times are the highest, but as the caching starts to kick in (both at the W3 Total Cache level and the operating system level) response times actually drop quite considerably, eventually levelling out to 20ms. That’s blazingly fast.

Here’s the CPU data from New Relic:

Our application master hardly broke a sweat. It peaks at a load of 0.46.

So it’s safe to say that in this setup, our blog can handle up to 200 concurrent users with absolutely no performance degradation whatsoever. What does that mean in terms of daily hits? Well, our experiment averaged 98 hits per second, which equates to about 8,500,000 hits per day. And we could push our app further, by the looks of things. But how much further?

To The Limit

Let’s crank up the dial on our Blitz rush and see if we can cause some performance degradation, to see where our limits are.

This time, set the end users target at 1,000.

This is what I saw:

Okay, so this is a familiar looking graph. It seems like we found our application’s performance limit. Somewhere around the 400 concurrent users mark, our app starts to slow down a little. And when I say a little, I mean a little. Whereas in our first experiment we saw response times climb to 10 seconds, here we’re seeing a max response time of 1.5 seconds. Which is slow, but probably not slow enough to turn away all your users. And this is under the load of 1,000 concurrent users. Not bad!

Here’s the CPU graph:

What’s interesting here is that CPU is being maxed out again, but we’re still seeing very low responses times. That’s because with this config, all that CPU is being spent fetching items from the cache. There’s still a lot of work to do with so many concurrent users, but each individual request is a lot easier (from a computational perspective) to respond to.

Conclusion

Blitz is an easy tool for load-testing your apps in the cloud. We tested a single application instance and then tested an application cluster. This gave us an understanding of our application’s performance range, which is important when you’re thinking about how to prepare your site for expected amounts of traffic. By installing a plugin that caches expensive operations for us, we were able to dramatically increase our performance range, going from a max of 50 concurrent users to a max of 400 concurrent users.

What about CloudFlare? In the previous post we installed CloudFlare to help improve our performance. But we didn’t include it here for a reason. You can’t load-test CloudFlare’s proxy servers, because doing so looks like a DOS attack and will trip up the CloudFlare security safeguards! Which is great. That’s exactly what we want for our production site. And as CloudFlare point out themselves, load-testing is useful for figuring your app’s real performance range. CloudFlare then increases performance on top of that, with its caching and its global CDN.

P.S. What do you think? Have you had experience scaling out WordPress? Does this method make sense to you? Throw us a comment below.

About Noah Slater

Noah Slater is a Briton in Berlin who’s been involved with open source since 1999. They’ve contributed to Debian, GNU, and the Free Software Foundation. They currently serve as a member of the Apache Software Foundation. Their principal project is Apache CouchDB, the document database that kicked off the NoSQL movement. They also help out in the Apache Incubator, where they mentor new projects in the ways of community and open source.