Functions, Threads, and Processes. What’s Next? Cows

cows-fork

In the previous post in this series, we discovered that setting up a server before you even boot it is not only possible, but gives enormous productivity rewards in a cloud architecture. In this post, we look at the four essential principles of designing your app for the cloud.

Introduction

Using a cloud architecture places extra constraints on app developers. These constraints cannot be worked around, and therefore app developers must get to grips with them before they start designing their apps.

Because the constraints are often unusual and unintuitive, it is worth also understanding the context behind the constraints. Why are they necessary? How do they improve the reliability of applications? Are there no alternatives? With these questions answered, an app developer can plan more confidently, knowing that they are taking the optimum path.

All of the core constraints emerge from one fundamental principle: cloud computing must account for random failures. To cope with this you will need multiple servers, and when you have multiple servers you also need a strategy for setting up and maintaining these servers with as little effort and as few subsequent interventions as possible. Such strategies are said to belong to the cattle model, because they contrast with the usual non-cloud method of dealing with servers as something like pets.

From an app developer’s point of view, having multiple servers means thinking about an app as being distributed much like in a multi-threaded or multi-process environment. But this is the next step up from threads or processes. This is a multi-server environment, which is usually called a cluster. As if that were not enough of a conceptual leap to have to take, these multiple servers must be identical. This means that instead of having lots of different pieces of the system all distributed and talking to one another through something like a thread-safe or IPC channel, we use something more akin to the fork() model. There is only one app, but there are many clones of it, and each clone must run on a server which can be terminated at any point.

The majority of the ramifications of this strange new conceptual world for the app developer lie in storage. Because the app is effectively forked across servers, any storage which is local to a particular server cannot be shared between apps. This is called a shared-nothing design, because nothing is shared. If you had an app forked on a single server, for example, you could use a file system lock to make sure that any fork is only editing one particular file at a time, using the file system as a kind of IPC method for persistent storage. Or, if more complex data management were required, a database with atomic commits could be used. In the cloud model, nothing on the present machine can be used in this way. The hierarchy is something like the following:

Distribution Environment Shared Access Methods
(Includes any that come below the present method)
None Program Any
Threading Thread Thread safe memory
Multiprocess Process Any IPC, e.g. shared pipes, file system, or database
Cluster Server instance Protocols, e.g. HTTP, SSH, FTP, or database protocols

Though this means that certain IPC methods like shared pipes are ruled out, in general this means that the primary shared access method which is ruled out for a cloud app running on a cluster of server instances is the file system. In any cloud app, you can no longer use the file system for any global communication or data storage. This is a significant change which must be taken into account when writing any cloud app. Any use of the file system, including databases on the file system, must be replaced by remote counterparts. They must also be truly remote: for example, it is not enough to run MySQL on the same server instance as the app and communicate with that. MySQL must be run on a separate instance, outside of the server cluster entirely.

Principles

Every File System is Ephemeral

Servers can go away at any moment. And when a server is misbehaving, you should treat it like cattle, and replace it with a new one. This is the quickest and easiest way to recover from failure, but only works when each file system is identical.

The most important part of your application is the code and the environment configuration. The individual servers running your code are a commodity that can be created or destroyed at will. This also makes scaling your app trivial. Just add more servers to the cluster.

Fortunately, Engine Yard provides an advanced set of tools for managing server clusters that fully embraces the cattle model of system administration.

For example, if your app master runs into difficulties, we will automatically replace it. A slave will be promoted to master, and your misbehaving master will be removed from the cluster. A new app server is then booted to replace the promoted slave. From a user’s perspective, your app suffered no downtime. However, any data that was written to the file system on your app master would become unavailable to your app and can only be recovered manually.

There Are Multiple File Systems

A server cluster allows you to scale out your app by adding servers to the cluster. Conversely, you can scale in by removing servers from the cluster. This is known as horizontal scaling, and each new server means a new file system.

Every File System is Isolated

Each server is given a writable file system, but this file system is not shared with any other servers. So if you upload a file to one server, you cannot access it from another server unless you manually transfer the file between servers.

Because of this, your app should only write files that do not need to be shared, e.g. temporary files and caches. Files that need to be shared between servers should be written to an external data store. For file-backed sessions, this might mean moving to database-backed sessions. And for user-uploaded content, this might mean using something like Amazon S3.

Every File System is (or Should Be) Identical

When a new server is added to the cluster, it is created from a default server image. We then deploy your code and configure the server. This process is repeated every time a server is added, meaning that every file system should start out (more or less) identical.

Any modifications made locally to the file system will not show up when you boot a new server. For this reason, it is important that you configure your servers with Chef recipes and not manually. When Chef recipes are used, configuration is repeatable and automated.

Eliminating State

One of the important characteristics of an ex-ante server configuration is that we make the state of a server predictable. Instead of having an arbitrary server state at any given time, we make it so the state of the server is predetermined, and, barring hardware and other transient failures, replicable across several server instances.

The elimination of state is not limited to server configuration. We’ve seen how the more that you distribute a system, the further “out” you have to base any state. But eliminating the state entirely is another way to achieve the desired result with less overhead. This is already possible in certain kinds of application programming. Languages such as Haskell and Clojure, for example, are designed to eliminate mutable state, and make it easy for the programmer to declaratively specify state-like environments using techniques such as monads and STM.

Conclusion

Learning how to write apps for the cloud is a conceptual leap, shifting us up the hierarchy that started with threads and led to server clusters. By understanding that apps on the cloud are forked clones of one another that must not rely on the local environment, we realised that any persistent state that we use in an app must be moved outside of the cluster or removed entirely. In return, we gain the advantages of high scalability and high availability.

In the next post in this series, we’ll be explaining in more specific detail how to adapt existing apps for cloud architectures.

How has learning about the principles of cloud app development been for you? Are you excited about building distributed shared-nothing apps, or are the conceptual leaps involved presenting a challenge to you? We’d love to hear from you, so please do leave a comment.

About Noah Slater

Noah Slater is a Briton in Berlin who’s been involved with open source since 1999. They’ve contributed to Debian, GNU, and the Free Software Foundation. They currently serve as a member of the Apache Software Foundation. Their principal project is Apache CouchDB, the document database that kicked off the NoSQL movement. They also help out in the Apache Incubator, where they mentor new projects in the ways of community and open source.