Zero Downtime Deploys with Migrations
I gave a talk at RubyConf that went over how we automate our development and deploy process here at Engine Yard. The part of the talk that got the most questions and spawned the most discussion was about the way we do deploy with zero downtime, even with migrations. Here is a bit more detail on the process.
A Caveat
At Engine Yard, we almost exclusively use DataMapper for our ORM. This means that we specify all of the properties for our models in the model class, there is no database introspection to generate accessors for the properties. I have not tried this technique with ActiveRecord but the concepts will be the same.
Example: Adding a column
This is the simplest example, but also the most common migration that we do.
First, before merging any part of the feature to master that needs a new column, create a migration on master that adds the column.
migration 1, :"add published_at to posts" do
up do
modify_table :posts do
add_column :published_at, Time
end
end
end
Then we simply test the migration, and ship it all the way to production. Now the column is created in the database—ready for us to use when we merge the code that requires the column.
This also allows us to always run migrations last in the deploy, after we have restarted the app servers. This means that new code is running on the app servers earlier in the deploy process.
Renaming a Column
Adding a column is pretty straightforward but there are some migrations that are much trickier to do with zero downtime.
Lets say you have a Post
model, and you want to rename the body
field to content
.
We have a solution for this as well.
Step 1: Add the new column
Same as above, write and deploy the migration to add the new column to the database.
migration 1, :"add content to posts" do
up do
modify_table :posts do
add_column :content, Text
end
end
end
Step 2: Make the code aware of both columns
We usually do this by writing the code as if we were solely using the new column, then having a few accessors and hooks on the model to sync the data over.
class Post
def content
update_content
attribute_get(:content)
end
def content=(new_content)
self.body = new_content
attribute_set(:content, new_content)
end
before :save, :update_content
def update_content
self.content = body
end
end
Step 3: Migrate the data
Now that your code is handling the migration of data as it runs, we need to migrate all columns to the new schema.
We can either use a migration for this or just write and run a rake task to copy all data from the body column to the content column.
Once this step is complete, the content column should be canonical and remain that way due to the way the code is written.
Step 4: Remove the temporary syncing code
We can now remove all references to the body column, including the code that syncs the data between the two columns.
At this point, we are almost done. The code is all updated not to use the old column; it only remains in the database.
Step 5: Drop the column
Finally, all we need to do is write a migration to drop the column from the database and deploy it to production. The code does not reference the column, so we can deploy it at our leisure.
YMMV
This technique has worked very well for us at Engine Yard. It can be more complex, but overall it leads to less friction in the deploy process. When you know that there is no downtime needed for any step of your work, you can simply write code and ship it to production when you’re ready, rather than being blocked on waiting for a maintenance window.
Adding a column to the database is by far the most common kind of migration that we do. And using this technique still makes that very simple. Over all it has not added a significant amount of overhead to getting things done, and the lack of friction is a huge win.
Doesn’t always work
All this said, there are a small number of cases when you really cannot perform this technique. These are cases where the migration will take a long time and lock tables or cause heavy load on the database server. In the past year, these kinds of migrations have been the only ones that we have had to take downtime for. The most common manifestation was adding an index on a very large table. But if your database is heavily loaded, many ALTER TABLE
statements may take an unacceptably long time.
Share your thoughts with @engineyard on Twitter