To Redis or Not To Redis? (Key-Value Stores Part 4)
Welcome to another post in our key value series! This week, Redis! Redis is a persistent in-memory key-value store written in C by Salvatore Sanfilippo. It’s currently in version 1.0. So let’s get down to it, “To Redis or Not to Redis?” that’s the question…
So, let’s say you have a situation where…
- You want a key-value store that’s blazingly fast
- Your data set is small enough that it can fit in available RAM
- It’s OK if some recently updated records are lost in a catastrophic failure
- Your life would be a lot easier if it was cheap and easy to do set and list operations atomically
If this describes your situation, you should take a serious look at Redis. It provides a very fast store in part because it keeps the data set in memory. It handles persistence by asynchronously writing changes after a configurable number of seconds or number of updates have occurred, which means that if the Redis server goes down unexpectedly, it is possible to lose some records. (Redis does offer a master-slave replication mode which mitigates this risk, though). Finally, Redis provides storage for data structures other than strings.
With Redis, a value can also be a list or a set, and Redis provides atomic operations for manipulating those values. This feature eliminates the need for a lot of potentially troublesome locking antics if you need to maintain consistent lists or sets that are manipulated by multiple clients at the same time.
Furthermore, while Redis doesn’t inherently support a sharded, horizontally scalable architecture like Cassandra does, some Redis clients, including the Ruby one (by our own Ezra Zygmuntowicz), support consistent hashing and distribution of data across multiple servers. So, at least when using a client library that supports it, like the Ruby library does, Redis offers a compelling combination of performance with scalability.
After you’ve installed Redis and started up an instance of redis-server
, you’re ready to use it. If you haven’t already, grab Ezra’s redis-rb library and install it.
>> require 'rubygems'; require 'redis'
=> true
>> redis = Redis.new
=> #>Redis:0x2a98943500 @sock=#>TCPSocket:0x2a98943348>,
@host="127.0.0.1", @logger=nil, @password=nil,
@timeout=5, @db=0, @port=6379>
>> redis['key'] = 'value'
=> "value"
>> redis['key']
=> "value"
Functionally, this is a lot more like what you’re probably used to when thinking about a key-value store, (versus what you saw with Cassandra’s data storage model). Redis does have a concept of multiple databases, where each database is a separate key-value namespace, but Redis keeps it simple. Databases are numbered simply, starting with 0, and if you don’t tell Redis which database you want to use, it assumes you are using database 0.
>> another_db = Redis.new(:db => 2)
=> #>Redis:0x2a988bbc68 @sock=#>TCPSocket:0x2a988bb920>,
@host="127.0.0.1", @logger=nil, @password=nil,
@timeout=5, @db=2, @port=6379>
>> puts another_db
Redis Client connected to 127.0.0.1:6379 against DB 2
=> nil
>> another_db['key'] = 'Altoids FTW!'
=> "Altoids FTW!"
>> redis['key']
=> "value"
>> another_db['key']
=> "Altoids FTW!"
Redis supports several atomic operations on the data in a database, including moving data from one database to another, as well as incrementing and decrementing values.
>> redis['hits'] = 1
=> 1
>> redis['hits']
=> "1"
>> redis.incr('hits')
=> 2
>> redis.incr('misses')
=> 1
Notice a few things in the above example: first of all, Redis value data types are either strings, lists, or sets. So, when a numeric 1 was assigned as the value for a key, the client actually stored the to_s
version of that value, "1"
.
Second, notice that you don’t need to initialize a counter before using it. If you reference a key in an increment or decrement operation that doesn’t exist, it will be automatically vivified for you. Finally, as mentioned just a moment ago, numbers don’t appear anywhere in the list of Redis data types, so the increment/decrement operations work on a simple principle—try to interpret the value as a long, and then work with whatever you get. So, be careful not to increment or decrement a key that has non-numeric data in it. Rather than throwing some sort of exception, Redis will happily attempt to do what you are asking, and clobber your data along the way.
>> redis['not_a_counter'] = 'There be kittens!'
=> "There be kittens!"
>> redis.incr('not_a_counter')
=> 1
>> redis['not_a_counter']
=> "1"
Dealing with a really fast, straightforward key-value store with atomic increment/decrement is pretty useful in itself, but Redis really starts to shine when you look at what can be done with list and set operations. Let’s say that you want to keep an audit log of of client sessions in your application. You might start with something like this: audit_log.rb
class AuditLog
def initialize(args)
@db = args[:db]
@id = "audit_log_#{args[:id]}"
end
def >>(msg)
@db.push_tail @id, "#{Time.now.to_s}: #{msg}"
end
def to_a
@db.list_range(@id, 0, -1)
end
def method_missing(meth, *args)
@db.send(meth,@id,*args)
end
end
>> require 'rubygems'; require 'redis'; require 'audit_log'
=> true
>> redis = Redis.new
=> #>Redis:0x2a9880acd8 @logger=nil, @host="127.0.0.1",
@timeout=5, @password=nil, @port=6379, @db=0,
@sock=#>TCPSocket:0x2a9880abe8>>
>> log = AuditLog.new(:db => redis, :id => 'customer_x')
=> #>AuditLog:0x2a987bd9b0 @id="audit_log_customer_x",
@db=#>Redis:0x2a9880acd8 @logger=nil, @host="127.0.0.1",
@timeout=5, @password=nil, @port=6379, @db=0,
@sock=#>TCPSocket:0x2a9880abe8>>>
>> log >> "opened account"
=> "OK"
>> log >> "saved preferences"
=> "OK"
>> log >> "logout"
=> "OK"
>> log.to_a
=> ["Wed Sep 09 07:59:12 -0500 2009: opened account",
"Wed Sep 09 07:59:36 -0500 2009: saved preferences",
"Wed Sep 09 08:00:16 -0500 2009: logout"]
>> log.list_range(1,2)
=> ["Wed Sep 09 07:59:36 -0500 2009: saved preferences",
"Wed Sep 09 08:00:16 -0500 2009: logout"]
Sets in Redis are also easy to work with. Just like everything else, there’s no special preparation necessary. You just open up the Redis database and start using them. Imagine that you are creating the world’s next great dating site. You allow people to enter lists of keywords to describe themselves, and then you use the intersection of these keywords to help determine how well two people match each other. date_keywords.rb
class DateKeywords
attr_reader :keyword_id
def initialize(args)
@db = args[:db]
@keyword_id = "keywords_#{args[:id]}"
end
def insert_keyword_set(keywords)
keywords.each { |word| add_keyword word }
end
def add_keyword(keyword)
@db.set_add @keyword_id, keyword
end
def find_commonalities(potential_date)
@db.set_intersect @keyword_id, potential_date.keyword_id
end
end
>> require 'rubygems'; require 'redis'; require 'date_keywords'
=> true
>> redis = Redis.new
=> #>Redis:0x2a9893ae50 @sock=#>TCPSocket:0x2a9893ad60>,
@host="127.0.0.1", @logger=nil, @password=nil, @timeout=5,
@db=0, @port=6379>
>> gal_words = DateKeywords.new(:db => redis, :id => 'gal')
=> #>DateKeywords:0x2a988ce200 @keyword_id="keywords_gal",
@db=#>Redis:0x2a9893ae50 @sock=#>TCPSocket:0x2a9893ad60>,
@host="127.0.0.1", @logger=nil, @password=nil, @timeout=5,
@db=0, @port=6379>>
>> guy_words = DateKeywords.new(:db => redis, :id => 'guy')
=> #>DateKeywords:0x2a98894758 @keyword_id="keywords_guy",
@db=#>Redis:0x2a9893ae50 @sock=#>TCPSocket:0x2a9893ad60>,
@host="127.0.0.1", @logger=nil, @password=nil, @timeout=5,
@db=0, @port=6379>>
>> gal_words.insert_keyword_set(['adventurous','affectionate',
'camping','church','cooking','country','dancing','faith',
'farm','laughter','loyal','morals','movies','music',
'outdoors','ranch','respect','sunsets walking'])
=> \["adventurous", "affectionate", "camping", "church",
"cooking", "country", "dancing", "faith", "farm",
"laughter", "loyal", "morals", "movies", "music",
"outdoors", "ranch", "respect", "sunsets walking"\]
>> guy_words.insert_keyword_set(['architecture','beach',
'camping','carpenter','considerate','creative','family',
'funny','genuine','giving','happy','historicalhouses',
'ireland','italy','kids','kind','laughter','loyal',
'music','roadtrip','smile','travel','trust'])
=> \["architecture", "beach", "camping", "carpenter",
"considerate", "creative", "family", "funny", "genuine",
"giving", "happy", "historicalhouses", "ireland", "italy",
"kids", "kind", "laughter", "loyal", "music", "roadtrip",
"smile", "travel", "trust"\]
>> guy_words.find_commonalities(gal_words)
=> \["laughter", "camping", "loyal", "music"\]
It works like a charm. Given a long list values in the set, Redis intersected them for us. Using that algorithm, though, it doesn’t look like those two people have a lot in common. Unless opposites really do attract, maybe they should keep looking. Because this was all atomic, it works in the typical web scenario where there can be multiple processes simultaneously inserting, removing, and intersecting data. No need to worry about locking.
The joy of using Redis is that it’s simple to use, but there’s considerable depth to the API. It’s likely that any string, list or set value operation you can think of is there already. Keys can have TTL values, so that they time out of the data store. You can get and set in one operation. You can increment or decrement by more than one. You can pull random keys, or rename keys, or get the db size. You can push, pop, get ranges, etc… from lists, and do any set operation imaginable on sets. Complex sorting is supported. And there’s a lot more than that. The API really has impressive depth.
Given all of this, we come back to the title of this article - To Redis or Not to Redis. As an alternative, Tokyo Cabinet is very fast for a synchronous key value store, and it does support some features that Redis does not, such as tables. Redis permits a master/slave setup, which can alleviate fears of data loss from failure, but it’s not as certain as something like Tokyo Cabinet, which will write the data as soon as it gets it. On the other hand, Redis is blazingly fast, incredibly easy to use, and will support just about anything you can think of doing with your data.
If you have a large data set that cannot comfortably fit into RAM, Redis is not the key value store for you to use, but if you have smaller sets, and if you can live with the asynchronous write behavior, then, for me, the answer is definitely “to Redis.”
Share your thoughts with @engineyard on Twitter