Using DataMapper and Rails with Legacy Schemas

Today’s guest post hails from community contributor Piotr Solnica of Code Benders, an agile software collective based in Bend, Oregon. In his free time Piotr supports OSS, is a DataMapper](http://datamapper.org/) core contributor, an author of DataMapper plugins and adapters and one of the developers of rubytime.org and coderack.org.

Ruby on Rails makes it easy to bootstrap a greenfield project. Unfortunately not all projects are greenfield, so sometimes you need to work with a legacy database schema. ActiveRecord is a great choice when you have a full control over your database from the very beginning, but what should you do if you need to connect to a database with a schema that isn’t in-line with Rails conventions? In this post I will explain how to setup a Rails application that uses DataMapper as the ORM. I’ll also guide you through configuring your models to work with a legacy schema. If you are not familiar with DataMapper I recommend reading the “Getting Started” section of the official documentation. ###A legacy schema For the sake of this post let’s use an example schema of a blog application using MySQL:

CREATE TABLE `post` (
 `postId` varchar(45) NOT NULL,
 `postAuthorId` int(11) DEFAULT NULL,
 `postTitle` varchar(45) DEFAULT NULL,
 `postContent` text,
 `postPublishTime` varchar(45) DEFAULT NULL,
 PRIMARY KEY (`postId`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `user` (
 `userId` int(11) NOT NULL AUTO_INCREMENT,
 `userName` varchar(45) DEFAULT NULL,
 `userEmail` varchar(45) DEFAULT NULL,
 PRIMARY KEY (`userId`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

Here’s what’s “wrong” with it:

  • table names are singularized
  • column names use camel-case naming convention prefixed with a table name
  • post primary key is an md5 sum of concatenated values of “author” and “publishTime”
  • post => user foreign key has a custom name “author”
  • “postPublishTime” is stored as a string and we would like to use DateTime object

###Creating a rails application with DataMapper as ORM This is actually a straight-forward task. All you need to do is to use a template provided by DataMapper when generating an application skeleton as follows: rails new blog-legacy -m http://datamapper.org/templates/rails.rb

By default the skeleton uses sqlite as the database. To change that, simply edit Gemfile and change ‘dm-sqlite-adapter’ to ‘dm-mysql-adapter’ and run bundle install. To generate empty models run:

script/rails g model user
script/rails g model post

###Naming conventions DataMapper gives you two ways of defining table and column names. You can either set it manually in every model or you can configure global naming conventions for all models.Here’s how to set table and field names explicitly in the model:

class User
  include DataMapper::Resource

  storage_names[:default] = "user"

  property :id,	Serial, :field => "userId"
  property :name,  String, :field => "userName"
  property :email, String, :field => "userEmail"
end

With many properties it would add clutter to our models, that is why it is better to set global naming conventions for resources and fields. Those conventions will be applied to every model and will eliminate the need to define storage names and set :field options for every property.

To set global naming conventions, we can create datamapper.rb initializer like this:

# config/initializers/datamapper.rb
Rails::DataMapper.configuration.resource_naming_convention[:default] = lambda do |name|
  name.downcase.singularize
end

# note: this will be available in dm-rails 1.1.1
Rails::DataMapper.configuration.field_naming_convention[:default] = lambda do |property|
  "#{property.model.name.downcase}#{property.name.to_s.camelize}"
end

Now we can clean up our User model like this:

class User
  include DataMapper::Resource

  property :id,	Serial
  property :name,  String
  property :email, String
end

Much better! ###Dealing with natural primary keys In our example schema, a post’s primary key is an MD5 sum of concatenated fields “postAuthorId” and “postPublishTime”. If we want to have the value generated in Ruby we can use :default options in property declaration:

class Post
  include DataMapper::Resource

  property :id, String, :key => true, :default => lambda { |resource, property|
	Digest::MD5.hexdigest("#{author}#{publish_time}") }

  # ...
end

Now whenever you create a post and don’t provide :id it will be generated automatically based on the values of :author and :publish_time properties.

As you can see this doesn’t look very nice in the model. So, we could create a custom property with a generalized functionality to clean this up! This way we can re-use this code in different scenarios whenever we want an MD5 sum of concatenated fields. You might be surprised how easy this is to do:

 module ::DataMapper
   class Property
     class MD5 < String
   	key	true
   	length 32

   	accept_options :fields

   	default lambda { |resource, property|
     	  Digest::MD5.hexdigest(property.options[:fields].join) }
     end
   end
 end

Now we can define post :id like this:

class Post
 include DataMapper::Resource

 property :id, MD5, :fields => [ :author, :publish_time ]
end

How easy was that?!? ###Associations with custom foreign keys Our database has a basic 1:M relationship between User and Post models. The only caveat is that foreign key on post table pointing to user id has a name that doesn’t follow our global naming conventions. To configure the association we must override our defaults:

class User
  include DataMapper::Resource

  property :id,	Serial
  property :name,  String
  property :email, String

  # We need to set child_key otherwise DataMapper would use :user_id
  has n, :posts, :child_key => :author_id
end

class Post
  include DataMapper::Resource

  property :id,	MD5, :fields => [ :author, :publish_time ]
  property :title,  String
  property :content, Text

  # This creates a property called :author_id for you. We also set “User” as the parent model
  belongs_to :author, “User”
end

Easy enough. ###Fine tuning - typecasting properties Post :publish_time is a String property as values are stored in a VARCHAR column. Our goal is to work with DateTime objects, not strings. To achieve that we could simply override property accessor and do typecasting like this:

class Post
  # …

  def publish_time
    DateTime.parse(super)
  end
end

While this approach works, it is not a recommended approach. DataMapper is based on the concept that a model’s logic should be decoupled from the database schema. A much better and cleaner way is to abstract away that logic in to a custom property object. An extra benefit comes with that too - you can re-use that code in many applications as it’s not tied to a specific model. Here’s another approach:

module DataMapper
  class Property
    class DateTimeString < String
      def load(value)
        ::DateTime.parse(value)
      end

      def dump(value)
        value.to_s
      end
    end
  end
end

Better? ###Wrap-up All right! We’ve got all we need to work with our legacy schema. Let’s take a look at the final version of the models:

class User
  include DataMapper::Resource

  property :id,            Serial
  property :name,  String
  property :email, String

  has n, :posts, :child_key => :author_id
end

class Post
  include DataMapper::Resource

  property :id, MD5, :fields => [ :author, :publish_time ]  

  property :publish_time, DateTimeString
  property :title,    	 String
  property :content,  Text

  belongs_to :author, "User"
end

At first sight it is hard to tell that the underlying database has a funky schema, isn’t it? These models look almost as if you were starting a fresh project.

Let’s take a look at how we could work with them:

user = User.create(:name => "John Doe", :email => "[email protected]")
=> #<User @id=2 @name="John Doe" @email="[email protected]">

This creates a following insert statement:

INSERT INTO `user` (`userName`, `userEmail`) VALUES ('John Doe', '[email protected]')

Let’s create some posts too:

user.posts.create(:title => "Hello World!", :content => "DataMapper is awesome :)", :publish_time => DateTime.now)
=>  #<Post @id="4a585935e1d2dd21da146f6d3bcfc41b"  @publish_time="2011-05-12T13:39:04+02:00" @title="Hello World!"  @content="DataMapper is awesome :)" @author_id=2>

And SQL:

INSERT  INTO `post` (`postId`, `postPublishTime`, `postTitle`, `postContent`,  `postAuthorId`) VALUES ('4a585935e1d2dd21da146f6d3bcfc41b',  '2011-05-12T13:39:04+02:00', 'Hello World!', 'DataMapper is awesome :)',  2)

###Further reading and resources This post explains basics as well as some more advanced things you can do with DataMapper. If you would like to learn more, please take a look at the documentation, subscribe to the Mailing List or jump into the #datamapper IRC channel.

Other useful links:

Author Bio

Piotr Solnica is a team member atCode Benders, an agile software collective based in Bend, Oregon. When Piotr’s not writing great code, he’s writing awesome code. He has been working as a web developer for 6+ years and has solid experience with Ruby-based technologies and frameworks. His proficiencies include a range of JavaScript libraries such as Prototype, jQuery, YUI and MooTools. Piotr is an unobtrusive JavaScript evangelist, an HTML/CSS purist and a linux geek. In his free time he supports OSS, is a DataMapper core contributor, an author of DataMapper plugins and adapters and one of the developers of rubytime.org and coderack.org.