Offsite Backups with fog

Note: Today’s post is from Phil Ripperger, an Engine Yard Support alumni and active fog user .

Engine Yard takes database backups seriously. By default, Engine Yard AppCloud comes with two forms of MySQL backup. The first is recurring AWS volume snapshots of your /db volume, the second is a MySQL dump of your database and storage of the file on S3. Combined, these two backup methods have saved many folks from numerous ‘close calls’.

Despite the default backup measures, quite a few people inquire about how to do offsite backups on AppCloud. Engine Yard typically advises manually downloading a backup of their database via the dashboard - look under the ‘More Options’ tab for the ‘Database Backups’ link.

Aside from MySQL data, consider the scenario where you store user-uploaded files on your /data volume. These files are also backed up via AWS volume snapshots, but the files might be vital to your business and you may want to store a copy offsite.

Also, what happens if you want to automate the offsite backups? Amazon usually has a stable and reliable service, but what happens if S3 is down and you cannot access your data or backups?

The best answer for the above questions is to store a copy of your data on a service other than Amazon. Immediately, Rackspace’s Cloud Files service comes to mind. Built as a competitor to S3, Cloud Files is essentially the same as S3 - a cheap place to store large amounts of data. Because Rackspace maintains their own infrastructure separate from Amazon, they are a good service to consider for offsite backups.

How do we automate storing files on the Cloud Files service? A quick check of the Cloud Files API shows that they only support a web service interface - so rsync is out of the picture.

Enter fog, an excellent Ruby cloud services library by Engine Yard’s own Wesley Beary. Engine Yard makes heavy use of fog behind the scenes. It’s a simple but powerful library, and it’s perfect for an offsite backup scenario.

First, we’re going to take a quick tour of fog with Cloud Files. Before we start, make sure you have an API key and username from Rackspace for your Cloud Files account. Once you have them, you’ll want to place the values in a file called .fog in your home directory -

#######################################################
# Fog Credentials File
#
# Key-value pairs should look like:
# :aws_access_key_id:                 022QF06E7MXBSAMPLE
:console:
  :aws_access_key_id:
  :aws_secret_access_key:
  :bluebox_api_key:
  :bluebox_customer_id:
  :brightbox_client_id:
  :brightbox_secret:
  :go_grid_api_key:
  :go_grid_shared_secret:
  :google_storage_access_key_id:
  :google_storage_secret_access_key:
  :linode_api_key:
  :local_root:
  :new_servers_password:
  :new_servers_username:
  :public_key_path:
  :private_key_path:
  :rackspace_api_key:                 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  :rackspace_username:                xxxxxxxx
  :slicehost_password:
  :terremark_username:
  :terremark_password:
  :zerigo_email:
  :zerigo_token:
#
# End of Fog Credentials File
#######################################################

Next, using the web interface provided by Rackspace, create a folder in your Cloud Files account called ‘test’ - we’ll use this going forward.

Once your .fog file is ready and you have a ‘test’ folder created, you can load the fog console and it will auto-connect to the providers with credentials -

➔ fog console
  Welcome to fog interactive!
  :console provides AWS, Bluebox, Brightbox, GoGrid, Google, Linode, Local, NewServers, Rackspace, Slicehost and Zerigo
>> Rackspace[:storage].collections
[:directories, :files]
>> Rackspace[:storage].directories.table
  +------------+-------+----------------+
  | bytes      | count | key            |
  +------------+-------+----------------+
  | 141114     | 2     | test           |
  +------------+-------+----------------+
nil
>> Rackspace[:storage].directories.get("test")
  <Fog::Rackspace::Storage::Directory
    key="test",
    bytes="0",
    count="0"
  >
>> Rackspace[:storage].directories.get("test").files.table
  +--+
  |  |
  +--+
nil
>>

The above illustrates some basic commands in the fog console. First I launch the console, then I connect to Rackspace storage and get a list of directories in my account. Next I ask for a list of files within the directory, which is currently empty.

So how do I upload files to the remote directory? It’s actually pretty easy. Say I have a directory of .jpg images, and wish to copy all of them to the ‘test’ directory on Cloud Files. Here’s a simple example of how to do so -

>> Dir.glob('*.jpg').each {|file| Rackspace[:storage].directories.get("test").files.create(:key => file, :body => File.open(file))}
["test.jpg", "test2.jpg"]
>> Rackspace[:storage].directories.get("test").files.table
  +----------------+--------------+----------------------------------+-----------+--------------------------------+
  | content_length | content_type | etag                             | key       | last_modified                  |
  +----------------+--------------+----------------------------------+-----------+--------------------------------+
  | 72156          | image/jpeg   | eb668566602b148a77bfc4fdaf08c84b | test.jpg  | Wed Feb 16 23:21:53 -0800 2011 |
  +----------------+--------------+----------------------------------+-----------+--------------------------------+
  | 68958          | image/jpeg   | 63f8f53019f2dc03e0d19da374ee0d9d | test2.jpg | Wed Feb 16 23:21:54 -0800 2011 |
  +----------------+--------------+----------------------------------+-----------+--------------------------------+
nil
>>

Putting everything together, here’s a simple Ruby script that will backup all .jpg files in a directory to your Cloud Files account -

#!/usr/bin/env ruby

require 'rubygems'
require 'fog'

storage = Fog::Storage.new(:provider => 'Rackspace', :rackspace_api_key => 'xxxxxxxxxxxxx', :rackspace_username => 'xxxxxx')

# get the remote directory
directory = storage.directories.get('test')

# clear remote directory
directory.files.each {|file| file.destroy}

# send files to Rackspace
Dir.glob('*.jpg').each {|file| directory.files.create(:key => file, :body => File.open(file))}

# list remote directory contents
puts directory.files.table

It’s easy to see how powerful fog becomes when added to a Ruby script. The above script could easily be called via cron, and suddenly you have dependable offsite backups for your data. With slight modifications you could copy only the new files, or if needed, copy everything. Add your own ‘mysqldump’ command to the script and you have offsite backup for your database. Want to store your data in two offsite locations? Just add another provider such as Google Storage.

Of course each application and environment will have different needs. You know the needs of your application’s data, so customize your fog scripts as needed.