Share Nothing, Scale Everything
In the previous post in this series, we explained how the shared-nothing architecture places additional constraints on cloud app developers. We also explained how embracing these constraints enables apps to have high scalability and high availability.
In this present post, we explain how to adapt an app for the cloud by removing any dependency on the file system, in order to make it compatible with a shared-nothing architecture.
Replacing the File System
If you’re deploying an existing app to the cloud, whether it’s an internal app or an off-the-shelf app, you may find that there are some points of contention.
The most common problem we have found is that apps designed for traditional hosting environments expect the file system to behave like a database. That is, they write out a file, and then expect that this file is going to exist at some point in the future.
This is a problem for languages like PHP, where many of the off-the-shelf apps have existed since long before the cloud was popular. These apps generally assume that you are only using one server, and that the master copy of your site lives on the server.
Unfortunately, this causes some problems. This model does not work when you want to scale across multiple servers, or when you want to keep the master copy of your site in a revision control system like Git.
Let’s take one example: WordPress. The default WordPress configuration requires write access to the wp-content
directory on the local file system. If you log into the WordPress administration console and make some changes, WordPress may update a file on the local file system. But if you have multiple servers, none of the other servers have this updated config.
If, on the other hand, you re-deploy from Git, your configuration changes will be overwritten! You could try to use something like gitdocs to automatically propagate changes, but what happens when you have a merge conflict, or your local state becomes particularly byzantine? Your app could fail instantly, or it may even experience hidden corruption, failing later in such a way that makes it difficult to debug.
So what’s the solution?
There are two primary approaches you can take.
The Stateful Approach
Allow write access to the file system on the application server.
Positives
This requires no modification to your app code or environment configuration. You can take a stock WordPress release (for instance) and get it running in a matter of minutes.
Negatives
You are now dealing with a pet server.
You can only run a single server. Since the local file system can change, if you attempt to run your app across multiple servers, they will quickly get out of sync.
And since a server can disappear without notice, you run the risk of downtime. If your server runs into difficulty, you will need to rebuild it from your snapshots.
In addition, deploying new code will revert any local changes you have made.
The Stateless Approach
Disallow write access on the application servers.
There are a number of ways to achieve this, depending on the app you are deploying. It might be as simple as switching to uploading files to S3 instead of the file system, though this will also mean converting your code to use an S3 library for manipulating files. Or with something like WordPress, you might install one of the existing plugins that do this for you, which is something that we’ll cover in more detail in a later post.
Positives
You can treat your servers as cattle, and run as many or as few of them as you need. You can increase or decrease the number of servers at will. If any of your servers run into difficulties, you can replace them without any noticeable downtime for your end-users.
Negatives
Configuration changes that require write access to the file system cannot be made on your production servers. One solution is to deploy your app locally, make configuration changes there, and then commit any changes that result in your source tree.
Deploying locally to make configuration changes before deploying to production may seem unusual, but doing so keeps all of your configuration changes in Git (where they can be seen, reverted, etc.) and allows you to cluster (and hence scale) your application.
An Alternative
There is another alternative: use a distributed file system. This way, you can have multiple app instances all sharing a single file system. Unfortunately, the reality of this is very complex. Because your file system is now distributed, you need to think about the CAP theorem.
Consistency, availability, and partition tolerance. Pick two.
Sacrificing any of those things is going to be hard when you’re dealing with a file system, and an application that expects the file system to always be there, and always be consistent. You’ll also have to start thinking about ACID, and file system level locking.
In short, it’s a mess.
Distributed file systems with a POSIX interface cannot be scaled up in a decentralised way without sacrificing either function or performance. (Or more likely: both.) If you do attempt to scale a POSIX file system like this, you will incur problems. Avoid this approach, or be prepared to work around the oddities of a distributed file system in your application.
Conclusion
The primary obstacle when adapting an app to be compatible with the cloud is handling the file system. We discovered three approaches to eliminating this reliance, but only one of them, the stateless model, is both easy to setup and works well with cloud architecture.
Sometimes you have the luxury of writing an app from scratch. Other times, you have to make do with what you’ve got. Luckily, there is a standard way to to deploy stateful applications into a stateless environment. We’ll cover this in the next, and final, post in this series, as well as wrapping up everything we’ve covered so far.
Have you had difficulty dealing with the file system for a cloud app? What approach did you take? Do you have any war stories to share about trying to use distributed file systems? Share your thoughts in the comments below.
Share your thoughts with @engineyard on Twitter