All About Cloud Storage

With the rise of social apps like Facebook, Instagram, YouTube and more, managing user generated content has become a growing challenge and problem to be solved. Amazon AWS S3, Google Storage, Rackspace Cloud Files, and other similar services have sprung up to help application developers solve a common problem - scalable asset storage. And of course, they all utilize “the cloud”!

The Problem

Popular social applications, scientific applications, media generating applications and more are able to generate massive amounts of data in a small amount of time. Here are just a few examples:

  • 72 hours of video are uploaded every minute by YouTube users. ([source](http://www.youtube.com/t/press_statistics))
  • 20 million photos are uploaded to SnapChat every day. ([source](http://techcrunch.com/2012/10/29/billion-snapchats/))
  • Pinterest has stored over 8 billion objects and 410 terabytes of data since their launch in 2009. ([source](http://aws.amazon.com/solutions/case-studies/))
  • Twitter generates roughly 12 terabytes of data per day. ([source](http://www.linuxnews.co/2012/06/psychsoftpc-offers-hadoop-cluster-solution/))

When your application begins to store massive amounts of user generated data, your team will inevitably need to decide where to spend its engineering effort in relation to that data. If your application is engineered to store assets on your own hardware/infrastructure, your team will spend plenty of time and money related to storing and serving your assets efficiently. Alternately, an application can easily store assets with a cloud storage provider. Choosing this route allows application content to scale almost limitlessly while only paying for the resources and space needed to store and serve the assets. In effect, cloud storage frees up your teams engineering time to focus more on creating unique application features, rather than reinventing the file storage wheel when scalability becomes an issue. When should you consider using cloud based storage for your application?

  • __When user generated assets are part of your application.__Does your application accept uploads from your users? Does your application generate files serverside? If your application is going to accept uploads from users or generate content stored on the filesystem, you will likely want to consider using a cloud storage provider sooner rather than later.
  • __When your application is likely to grow beyond a single server.__If your application is small enough to run on a dedicated single server or web host, and you don't expect it to grow beyond that single server, it doesn't make sense to use cloud storage for your assets. Simply store them on the local file system and call it a day.If, however, you expect any growth from your application that would require you to run more than one application server, you will immediately reap the benefits of storing your assets in the cloud. By storing your assets in the cloud, you can horizontally scale your service to as many front-end application servers as your heart desires without the need to replicate your file system assets to the any new servers. Because your assets are stored centrally with a cloud service, they will be accessible from a given hostname, no matter how many application servers your application runs.
  • __When its more cost effective for your team to focus on business critical application features rather than engineering a scalable file storage system.__If you are strapped for either time or money, and you expect your application to grow, you can't go wrong with cloud storage. Cloud storage gives you the flexibility to get up and running quickly, scale your storage to the growing needs of your application and only pay for the storage and resources you use. This in turn allows you to focus less on hardware costs, operations and configuration for storing assets and more importantly focus your time on developing your business.

Integration & Access

Most of the leaders in online cloud storage provide API access to their platform allowing developers to integrate web-scale asset storage and file access within their applications. Below we’ll look at some code examples using an SDK or library to store assets on Amazon S3. Many libraries and SDKs make setting the storage provider a breeze, allowing you to easily deploy file storage on many of the popular providers.

Ruby & Carrierwave

Code examples below have been adapted from the CarrierWave github repository.

  • Install CarrierWave: `gem install carrierwave` or in your gemfile `gem 'carrierwave'`
  • Install Fog: `gem install fog` or in your gemfile `gem "fog", "~> 1.3.1"`
  • In an initiailization file add the following: CarrierWave.configure do |config| config.fog_credentials = { :provider => 'AWS', # required :aws_access_key_id => 'xxx', # required :aws_secret_access_key => 'yyy', # required :region => 'eu-west-1' # optional, defaults to 'us-east-1' } config.fog_directory = 'name_of_directory' config.fog_public = false config.fog_attributes = {'Cache-Control'=>'max-age=315576000'} config.asset_host = 'https://assets.example.com' end
  • Create your uploader class: class AvatarUploader < CarrierWave::Uploader::Base storage :fog end
  • Using your uploader directly: uploader = AvatarUploader.new uploader.store!(my_file) uploader.retrieve_from_store!('my_file.png')
  • Using your uploader with ActiveRecord:
  • Add a field to your database table and require CarrierWave: add_column :users, :avatar, :string in a database migration file require 'carrierwave/orm/activerecord' in your model file.
  • Mount your uploader to your model: class User < ActiveRecord::Base mount_uploader :avatar, AvatarUploader end
  • Work with your model and files: u = User.new u.avatar = params[:file] u.avatar = File.open('somewhere') u.save! u.avatar.url # => '/url/to/file.png' u.avatar.current_path # => 'path/to/file.png' u.avatar.identifier # => 'file.png'

Here are some CarrierWave examples for uploading to Amazon S3, Rackspace Cloud Files and Google Storage. And some gems for using with other ORMs like DataMapper, Mongoid and Sequel.

PHP & AWS SDK

Amazon provides a PHP SDK to work with AWS APIs and services. For this code example we will be using instructions straight from the SDK repository README and sample code.

  • Copy the contents of [config-sample.inc.php](https://github.com/amazonwebservices/aws-sdk-for-php/raw/master/config-sample.inc.php) and add your credentials as instructed in the file.
  • Move your file to `~/.aws/sdk/config.inc.php.`
  • Make sure that getenv('HOME') points to your user directory. If not you'll need to set `putenv('HOME=<your-user-directory>')`
// Instantiate the AmazonS3 class
$s3 = new AmazonS3();
// Create a bucket to upload to
$bucket = 'YOUR-BUCKET-NAME' . strtolower($s3->key);
if (!$s3->if_bucket_exists($bucket))
{
	$response = $s3->create_bucket($bucket,AmazonS3::REGION_US_E1);
	if (!$response->isOK()) die('Could not create `' . $bucket . '`.');
}
// Download a public object.
$response = $s3->get_object('aws-sdk-for-php', 'some/path-to-file.ext',array(
	'fileDownload' => './local/path-to-file.ext'
	));
	// Uploading an object.
	$response = $s3->create_object($bucket, 'some/path-to-file.ext', array(
		'fileUpload' => './local/path-to-file.ext'
		));

Node & Knox

For Node.js I have adapted example code from the Knox Amazon S3 Client on Github.

// Configure the client
var client = knox.createClient({
	key: '<api-key-here>'
	, secret: '<secret-here>'
	, bucket: 'BUCKET-NAME'
});
// Putting a file on S3
client.putFile('some/path-to-file.ext', 'bucket/file-name.ext', function(err, res){
	// Logic
});
// Getting a file from S3
client.get('/some/path-to-file.ext').on('response', function(res){
	console.log(res.statusCode);
	console.log(res.headers);
	res.setEncoding('utf8');
	res.on('data', function(chunk){
		console.log(chunk);
	});
}).end();
// Deleting a file on S3
client.del('/some/path-to-file.ext').on('response', function(res){
	console.log(res.statusCode);
	console.log(res.headers);
}).end();

Conclusion As you can see in the previous code examples, working with the AWS S3 APIs are very straightforward and there are plenty of libraries readily available for most languages. I definitely recommend taking a hard look into using a cloud storage provider for your next project. You’ll save time not reinventing file storage solutions, reap the benefits of focusing on developing your application, and have practically unlimited storage scalability as you need it.