Faster Database Transfers

Sometimes it becomes necessary to move your database from one environment to another. Common reasons for this include:

  • Updating a Testing or Development environment with Production data
  • Migrating from one stack version to another (e.g. Stable-v4 -> Stable-v5)
  • Upgrading to a newer major version of your database (e.g. MySQL 5.5 -> MySQL 5.6 or Postgres 9.2 -> Postgres 9.4)
  • Upgrading storage to Encrypted EBS
  • Upgrading the filesystem to EXT4

The traditional method involves three database related steps:

  1. Creating an on-demand backup in the source environment.
  2. Downloading the most recent backup in the source environment.
  3. Transferring and Restoring the backup from the source to the target environment.

This results in some time and processing inefficiency since using eybackup uploads the backup to s3 and then it needs to be downloaded as a separate operation. For a small database this works just fine; unfortunately, for large databases this extra step could add hours to your migration plan. A better way to approach this would be to use the database-native backup tools to backup the source database to a file, and then transfer that file to the destination for restore.

Before you dive in on creating a backup you’ll need to make sure your source environment can connect to your target database host. This can be done with an SSH agent with key forwarding, or by creating a special use SSH key.

SSH Agent Forwarding

Warning: Agent forwarding does carry its own risks and should only be used for trusted connections. If your organization has security policies that prohibit its use you’ll want to create a special use SSH key instead.

Using an ssh agent with key forwarding is typically the easiest approach to allowing connections between environments. Most end-users already have a key agent setup so they don’t need to enter their passphrase every time they connect to a remote server. For *nix and OSX the agent will usually be ssh-agent, for Windows this will depend on your client (Pageant is common for PuTTY). Adding keys to your agent will be agent specific so we recommend following the documentation of your specific tool for that step.

Once you’ve added your keys to your agent you’ll want to test that you have everything configured correctly. To enable agent key forwarding you just need to include the -A option to the ssh command. This can also be accomplished through the config file for ssh; however, since this makes network admins cry, I won’t discuss it further.

  • Source host: ec2-123-45-678-901.compute-1.amazonaws.com
  • Target host: ec2-987-65-432-109.compute-1.amazonaws.com
# ssh -A [email protected]
deploy@ec2-123-45-678-901 ~ > ssh [email protected]
... # you will probably encounter several lines of output and a host key check prompt
deploy@ec2-987-65-432-109 ~ > exit
deploy@ec2-123-45-678-901 ~ > exit
#

Special Use SSH Key

Generating a special SSH key specifically for your data migration is a good option if you can’t use a forwarding agent. To generate a key you’ll log into the source environment and type something like:

deploy@ec2-123-45-678-901 ~ > ssh-keygen -t rsa -f ~/.ssh/ey_dbmigrate_id_rsa -C 'Key for migrating databases between environments'
cat ~/.ssh/ey_dbmigrate_id_rsa.pub

Optionally you can include -N '' as an option to ssh-keygen if you don’t need the key-pair to have a passphrase. If you do set a passphrase you’ll be prompted for this every time you connect unless you set up an ssh agent on your Engine Yard instance with ssh-add.

The public key is printed to the console by the commands above and will start with ssh-rsa and end with the comment you included in the -C option. Copy this and add this SSH key in the cloud dashboard. Once this key is added to your environment configuration you can attach it to the destination environment. After the key has been added you must run a dashboard “Apply” for that environment to install this key on your instances.

Download a copy of the private key at /home/deploy/.ssh/ey_dbmigrate_id_rsa to your local system for safe keeping; in the event you need to replace an instance in the source environment you will be able to re-upload the private key to restore access. If you would like to install it on all instances in the source environment we would recommend using a custom chef cookbook that secures this key with Encrypted Data Bags.

You should add this key to the default ssh search path so that it is considered automatically when you attempt to connect to a remote host through SSH:

cat <<EOT >> ~/.ssh/config
Host        *
 IdentityFile ~/.ssh/ey_dbmigrate_id_rsa

EOT

Finally, verify that you can connect from the source to the target environment.

  • Source host: ec2-123-45-678-901.compute-1.amazonaws.com
  • Target host: ec2-987-65-432-109.compute-1.amazonaws.com
# ssh [email protected]
deploy@ec2-123-45-678-901 ~ > ssh [email protected]
... # you will probably encounter several lines of output and a host key check prompt
deploy@ec2-987-65-432-109 ~ > exit
deploy@ec2-123-45-678-901 ~ > exit
#

Backup and Restore MySQL

Now we are ready to backup and transfer your MySQL database to your destination host. Connect to your source host and run the following commands; be sure to set the database and destination_host variables to match your database information:

# configure these variables for your needs
database='[dbname]'
destination_host='[public_hostname]'

# set a variable if we can use a transactional instead of locking backup
st_option=$(mysql -NBe "select case when count(*) < 1 then '--single-transaction' else '' end from information_schema.tables where table_schema='${database}' and engine='MyISAM' limit 1")

# Backup and compress your data writing it to a file directly on the destination host
mysqldump ${st_option} ${database} | gzip | ssh ${destination_host} "sudo su root -c 'cat - > /db/${database}_migrate.sql.gz'"

Note: The mysqldump could have piped a restore directly to the target database. This can be dangerous if the wrong host is specified, or could cost more time if a network interupt interferes with your connection. We do not recommend a direct restore.

Now that your data has been transferred you can connect to the destination environment and perform the restore:

database='[dbname]'
gunzip < /db/${database}_migrate.sql.gz | mysql ${database}

Backup and Restore PostgreSQL

Now we are ready to backup and transfer your PostgreSQL database to your destination host. Connect to your source host and run the following commands; be sure to set the database and destination_host variables to match your database information:

# configure these variables for your needs
database='[dbname]'
destination_host='[public_hostname]'

# Backup and compress your data writing it to a file directly on the destination host
pg_dump -Fc ${database} | ssh ${destination_host} "sudo su root -c 'cat - > /db/${database}_migrate.pgz'"

Note: The pg_dump could have piped a restore directly to the target database. This can be dangerous if the wrong host is specified, or could cost more time if a network interupt interferes with your connection. We do not recommend a direct restore.

Now that your data has been transferred you can connect to the destination environment and perform the restore.

If Restoring to an Engine Yard database

Postgres Extensions at Engine Yard are managed by chef cookbooks and cannot be restored from backup by the default application user. These produce harmless errors using pg_restore, but to avoid these we provide the reload script load_foreign_postgres_db.sh which reloads the database as the admin user and then updates the permissions as needed.

Warning: This script drops and recreates the database named ${database} and reassigns all ownership to the default application user. If there is data or permissions you don’t want to lose the alternate restore approach might be a better fit.

database='[dbname]'
sudo /engineyard/bin/load_foreign_postgres_db.sh /db/${database}_migrate.pgz ${database}

Alternate Approach

When using this approach to restore your database you may encounter errors about being unable to remove and create various extensions such as plpgsql. If you are using an Engine Yard database you should use Chef to install these extensions for you. If you are using a local database you’ll need to install these extensions as a privileged user. On modern versions of Postgres the plpgsql extension is created automatically when creating a database and can be skipped.

database='[dbname]'
pg_restore -d ${database} /db/${database}_migrate.pgz

The “Complete” Plan

So now that you have all the pieces, your maintenance cycle would look something like this:

In Advance

  1. Lower the TTL for your DNS records to 5 minutes or investigate an Elastic IP migration
  2. Make sure you can connect via SSH from the source to target environment

Migration

  1. Stop background processing (depending on your jobs this may take 10 or 15 minutes)
  2. Apply your application maintenance page
  3. Complete the backup and transfer operation relevant to your database
  4. Complete the restore operation in the target environment
  5. Migrate your DNS records or Elastic IP (this can be started while the restore is running)
  6. Remove the maintenance page in the target environment
  7. Perform any application validations in the target environment
  8. Restart background processing

Its a good idea to do a walkthrough of this process prior to completing this type of change for a Production environment. We’d recommend doing this with a Staging environment, or with Production leaving the application running and using a replica database as the source.

Additional Resources

  • For long running commands like backups and restores the screen utility can be valuable as it will allow you to disconnect and reconnect to your session as needed.
  • EYDR: a custom cookbook that helps with the setup and management of database replication for disaster recovery environments. Note this cannot be used for major database upgrades.

Assisted major database upgrades and stack migrations are included as part of the Engine Yard Premium Support plan and higher, or through Professional Services.

For additional details or if you have any questions please open a ticket with our support team. If you’d like to hear more about related topics or have other suggestions we’d love to hear from you.