August 23rd 2010

By Gerlando Piro

TAGS:
Partners
Technology
Ruby

Pragmatic Polyglot Persistence with Rails

This post comes from guest community contributor Kent Fenwick. Kent is the tech co-founder of of Viewpointr, a personalized Q&A service that aims to provide an easy way to get and give help. When he isn’t programming, he spends time with his family and friends in Toronto. Kent writes here and can be followed on Twitter at @kentf.

It’s getting more and more difficult to pick a persistence layer for your web application. When I started in Rails four years ago, there was really only one option, MySQL. Now, there are many more, each with their own pros and cons. Some are new and some are old, some are tested, and others, not so much. What’s clear is that when you are building a business around data, you want to make good decisions. That being said, often only the future will tell if you’ve made the right ones. I want to share with you my persistence story about how I ended up getting the best of both worlds.

The Problem

There are too many choices and each choice has a loud evangelist of its own. When designing Viewpointr I went go back and forth daily between MongoDB, MySQL, PostgreSQL and Cassandra. Viewpointr is essentially Twitter with a focus on helping people. Therefore, we have some common data elements: a user specific time line, a user specific list of people who they are helping, and a user specific list of people helping them. Because I am ambitious, I would find myself asking questions like:

“Hmm… but will MySQL scale to 1,000,000 records?”

Looking back on these internal conversations I find them funny; programmers always tend to think big. However, these are real concerns that developers and teams think about. While planning I would constantly consult the blogosphere for help, and to see what others were doing. Kirk Haines of Engine Yard wrote a great series of NoSQL posts highlighting and comparing different key-value stores and explaining their pros and cons. Since then, there has been a flurry of articles each week outlining different NoSQL datastores, NoSQL vs. MySQL debates and flamewars etc.

The Opportunity

Data is not created equal and this is a good thing. The same way we do not use an array for every “list” type problem when programming, sometimes hashes or linked lists will better suit the needs of the problem. We need to start thinking about data the same way. This was the best decision we made at Viewpointr and it allowed us to move forward at a great pace.

I looked at our application and broke it down into components. Viewpointr has many typical CRUD features similar to all Rails apps. These are very well designed for MySQL and a relational database. Being able to pull a list of answers based on a given question using simple and optimized SQL that I understand is a big win. However, there are some things that it doesn’t model well.

Friendships. The simplest way to model friendship using a relational database is to create a relation that refers to the same table with two different names. Let’s say you have a users table and you want to model Twitter-like friendship where User:1 can befriend User:2 without User:2’s permission. It’s easy enough.

class Friend &lt; ActiveRecord::Base

 belongs_to :user
 belongs_to :contact, :class_name => "User", :foreign_key => "contact_id"

 # user befriends contact
 def self.befriend(user,contact)
    relationship = find_by_user_id_and_contact_id(user.id,friend.id)
    if relationship.nil?
      transaction do
        Friend.create(:user => user, :contact => contact)
      end
    end
 end

end

class User &lt; ActiveRecord::Base

  has_many :friends, :dependent => :destroy
  has_many :contacts, :through => :friends, :order => "created_at DESC", :dependent => :destroy

end

However, I have always felt that it’s clumsy. What I really want to say is:

“Each user has a list of IDs that represent the people that they are friends with.”

Sounds like a de-normalized list right?

The Solution

Enter Redis. Redis is a key-value store similar to memcached but more flexible since lists, sets, ordered sets and strings can all be used as values. Thanks to its simple API, the problem I described is essentially an atomic operation in Redis. Redis has a great “set” implementation and allows you to do all of the things you would imagine a set to do: addition, subtraction, unique insertion, deletion, union, intersection, etc.

The operation will ultimately look like this:

SET = Redis.new
SET.set_add key, value

However, since we are working inside a Rails app, we need to make sure we have the right plumbing setup.

Create a redis.rb in your initializers folder.
Create a new Redis database for each of your needs.

In our case, we want to have a dataset that keeps track of a User’s helpers (other users who are helping them) and a list of a User’s friends (other users that the user is helping). Since we are going to be using these Redis objects throughout the codebase, I like to declare them as global variables in the redis.rb initializer file.

HELPERS = Redis.new(:db => 0)
HELPING = Redis.new(:db => 1)

Notice that I pass in the :db key so that we make sure HELPERS and HELPING will hold two different Redis objects. You can use redis-namespace gem if you want, but I find the default syntax from the redis-rb gem works well enough for my purposes.

Now that we have these global Redis objects at our disposal throughout the application, we can start using it in our Friend.befriend method.

class Friend &lt; ActiveRecord::Base

 belongs_to :user
 belongs_to :contact, :class_name => "User", :foreign_key => "contact_id"

 # user befriends contact
 def self.befriend(user,contact)
    begin
     HELPERS.set_add contact.id, user.id
     HELPING.set_add user.id, contact.id
    rescue
     RedisLogger.info "Redis Exception"
    end
 end

end

class User &lt; ActiveRecord::Base

  has_many :friends, :dependent => :destroy
  has_many :contacts, :through => :friends, :order => "created_at DESC", :dependent => :destroy

end

However, this isn’t the best solution right out of the gate. Using a NoSQL datastore has some drawbacks that aren’t apparent in development mode but reveals its ugly face in production. If you are not careful, a simple restart of your Redis server can cause you to loose all your data. Managing your Redis data in production deserves it’s own post, (coming soon) but for now, let’s create a safer solution that you can gradually roll out as you become more comfortable with storing, backing up and using Redis datafiles.

class Friend &lt; ActiveRecord::Base

 belongs_to :user
 belongs_to :contact, :class_name => "User", :foreign_key => "contact_id"

 # user befriends contact
 def self.befriend(user,contact)
    relationship = find_by_user_id_and_contact_id(user.id,friend.id)
    if relationship.nil?
      transaction do
        Friend.create(:user => user, :contact => contact)
      end
    add_to_denormalized_list(user,contact)
    end
 end

  def self.add_to_denormalized_list(user,contact)
    begin
     HELPERS.set_add contact.id, user.id
     HELPING.set_add user.id, contact.id
    rescue e
      RedisLogger.info "Redis Exception"
    end
  end

end

class User &lt; ActiveRecord::Base

  has_many :friends, :dependent => :destroy
  has_many :contacts, :through => :friends, :order => "created_at DESC", :dependent => :destroy

end

The strategy is simple, mirror the MySQL data in Redis. By adding a call to add_to_denormalized_list, we mirror the ActiveRecord call using the simple and elegant Redis set syntax discussed above. As you and your team get more practice and become more comfortable using Redis in production, you can start writing more to the denormalized list, eventually moving this part of your application away from ActiveRecord and MySQL to Redis. You could do this manually or you can use James Golick’s recently released gem called Rollout that uses, you guessed it, Redis, to programatically rollout features to users.

Like anything else you code, testing and benchmarking this process in production is crucial to make sure you are saving time and cycles. It might seem like a waste to duplicate your data in Redis, but you are a pragmatic polyglot persistence developer right? You want to explore the NoSQL space while making sure that a little mistake or misunderstanding doesn’t sink your ship. Give something like this a try, it doesn’t get any more pragmatic. When do you try it or come up with something new, let me and everyone else know about it.

Thanks for reading.

Share your thoughts with @engineyard on Twitter